IDL Programming > Concepts > String Operations > Comparing Strings

Comparing Strings

IDL provides several different mechanisms for performing string comparisons. In addition to the EQ operator, the STRCMP, STRMATCH, and STREGEX functions can all be used for string comparisons.

Case-Insensitive Comparisons of the First N Characters

The STRCMP function simplifies case-insensitive comparisons, and comparisons of only the first N characters of two strings. The STRCMP function uses the following syntax:

Result = STRCMP( String1, String2 [, N] )

where String1 and String2 are the strings to be compared, and N is the number of characters from the beginning of the string to compare.

Using the EQ operator to compare the first 3 characters of the strings “Moose” and “mOO” requires the following steps:

A = 'Moose'

B = 'mOO'

 

C=STRMID(A,0,3)

 

IF (STRLOWCASE(C) EQ STRLOWCASE(B)) THEN PRINT, "It's a match!"

Using the EQ operator for this case-insensitive comparison of the first 3 characters requires the STRMID function to extract the first 3 characters, and the STRLOWCASE (or STRUPCASE) function to change the case.

The STRCMP function could be used to simplify this comparison:

A='Moose'

B='mOO'

 

IF (STRCMP(A,B,3, /FOLD_CASE) EQ 1) THEN PRINT, "It's a match!"

The optional N argument of the STRCMP function allows us to easily specify how many characters to compare (from the beginning of the input strings), and the FOLD_CASE keyword specifies a case-insensitive search. If N is omitted, the full strings are compared.

String Comparisons Using Wildcards

The STRMATCH function can be used to compare a search string containing wildcard characters to another string. It is similar in function to the way the standard UNIX command shell processes file wildcard characters.

The STRMATCH function uses the following syntax:

Result = STRMATCH( String, SearchString )

where String is the string in which to search for SearchString.

SearchString can contain the following wildcard characters:

Wildcard Characters used by STRMATCH

Wildcard Character

Description

*

Matches any string, including an empty string.

?

Matches any single character.

[...]

Matches any one of the enclosed characters. A pair of characters separated by "-" matches any character lexically between the pair, inclusive. If the first character following the opening [ is a !, any character not enclosed is matched. To prevent one of these characters from acting as a wildcard, it can be quoted by preceding it with a backslash character (e.g. "\*" matches the asterisk character). Quoting any other character (including \ itself) is equivalent to the character (e.g. "\a" is the same as "a").

The following examples demonstrate various uses of wildcard matching:

Example 1: Find all 4-letter words in a string array that begin with “f” or “F” and end with “t” or “T”:

str = ['foot', 'Feet', 'fate', 'FAST', 'ferret', 'fort']

PRINT, str[WHERE(STRMATCH(str, 'f??t', /FOLD_CASE) EQ 1)]

This results in:

foot Feet FAST fort

Example 2: Find words of any length that begin with “f” and end with “t”:

str = ['foot', 'Feet', 'fate', 'FAST', 'ferret', 'fort']

PRINT, str[WHERE(STRMATCH(str, 'f*t', /FOLD_CASE) EQ 1)]

This results in:

foot Feet FAST ferret fort

Example 3: Find 4-letter words beginning with “f” and ending with “t”, with any combination of “o” and “e” in between:

str = ['foot', 'Feet', 'fate', 'FAST', 'ferret', 'fort']

PRINT, str[WHERE(STRMATCH(str, 'f[eo][eo]t', /FOLD_CASE) EQ 1)]

This results in:

foot Feet

Example 4: Find all words beginning with “f” and ending with “t” whose second character is not the letter “o”:

str = ['foot', 'Feet', 'fate', 'FAST', 'ferret', 'fort']

PRINT, str[WHERE(STRMATCH(str, 'f[!o]*t', /FOLD_CASE) EQ 1)]

This results in:

Feet FAST ferret

Complex Comparisons Using Regular Expressions

A more difficult search than the one above would be to find words of any length beginning with “f” and ending with “t” without the letter “o” in between. This would be difficult to accomplish with STRMATCH, but could be easily accomplished using the STREGEX function:

str = ['foot', 'Feet', 'fate', 'FAST', 'ferret', 'fort']

PRINT, STREGEX(str, '^f[^o]*t$', /EXTRACT, /FOLD_CASE)

This statement results in:

Feet FAST ferret

Note the following about this example:

Regular expressions are somewhat more difficult to use than simple wildcard matching (which is why the UNIX shell does matching) but in exchange offers unparalleled expressive power.

For more on the STREGEX function, see STREGEX , and for an introduction to regular expressions, see Learning About Regular Expressions.