Overview
 
A “comparator ” is a parameter used in scripting commands which compares one value to another.For
example:
If AreaCode ='416'Output 'Toronto'
In this example,a comparison is being made between the variable named AreaCode and the literal '416'.The
equals sign is the “comparator ”.
Now consider this command:
If AreaCode ='514'Region ='Montreal'
In this case,the first equals sign is a comparator because it is comparing two values.The second equal sign
is not a comparator;it is actually the Equals command,which assigns a value to a variable.
 
Types of Comparators
 
Parse-O-Matic Scripting supports several types of comparators:
 
————————— —————————————————————————————————————————————————————————
Type What It Does
————————— —————————————————————————————————————————————————————————
Literal Compares values character by character
Numerical Compares the arithmetic values of real or integer numbers
Length Compares the length of one value with a number
Pattern Compares a value against a pattern
————————— —————————————————————————————————————————————————————————
 
These are explained below in more detail.
 
Literal Comparators
 
Here is a list of the literal comparators:
 
—————————— ————————————————————— ————————————
Comparator Meaning Comments
—————————— ————————————————————— ————————————
= Identical  
<> Not identical  
> Higher See Note #1
>= Higher,or identical See Note #1
< Lower See Note #1
<= Lower,or identical See Note #1
^ Contains  
~ Does not contain  
Is Basically the same See Note #2
Longer Length is longer  
Shorter Length is shorter  
SameLen SameLen Length is the  
—————————— ————————————————————— ————————————
Note #1:Depends on sort order.For a discussion of what this means,refer to the section “Literal Comparisons and Sort Order ”.

Note #2:The two values are con sidered basically the same if they contain the same text,regardless of upper or lower case,and any surrounding whitespace.Thus 'CHESHIRE CAT 'is the considered the same as 'Cheshire Cat'.
 
Examples
 
With some restrictions (discussed later),literal comparators work on both numeric and alphabetic data.Here are some examples of literal comparisons that are true:
'ABC'<> 'ABCD' '333'<> '444'
ABC'<= 'ABCD' '333'<= '444'
'ABC'< 'ABCD' '333'< '444'
'ABC'Shorter 'ABCD' '333'SameLen '444'
'ABC'>= 'ABC' 'ABC'<> 'CDE'
'ABC'<= 'ABC' 'ABC'<= 'CDE'
'ABC'= 'ABC' 'ABC'< 'CDE'
'ABC'SameLen 'ABC' 'ABC'SameLen 'CDE'
'ABC'^ 'AB' 'ABC'~ 'CD'
'ABC'^ 'ABC' 'ABC'~ 'C'
 
Note especially the ^ (contains)and ~(does not contain)comparators.These are extremely useful when analyzing data.
 
Literal Comparisons and Sort Order
 
Some of the literal comparators compare text according to 'PC-ASCII sort order'.For plain English text,this works fine.However,if your text contains diacritical (accented)characters,you should be aware that some comparisons will not work correctly.For example,the 'o-circumflex'character (ô)appears in the PC-ASCII character set after the PC-ASCII value for 'Z'.
 
Numerical Comparators
 
Here is a list of the numerical comparators:
 
————————— —————————————————————————————————————————————————————————
Comparator Meaning
————————— —————————————————————————————————————————————————————————
#= Equal
#<> Not equal
#> Greater
#>= Greater, or equal
#< Less than
#<= Less than, or equal
————————— —————————————————————————————————————————————————————————
 
Numerical comparators avoid the problem of sort order.For a discussion of this,see Numeric Comparisons and Sort Order.
 
Examples
 
Here are some examples of numeric comparisons (encoded variously with and without surrounding quotes) that are true:
 
345 #<>567 '1.23' #<> '9.87'
345 #<=567 '1.23' #<= '9.87'
567 #>345 9.87 #> '1.23'
'3' #<'6.2'      
The last example compares an integer ('3')with a real number ('6.2').The numeric comparators automatically check if one of the numbers contains a decimal point.

In such case,the comparison is performed in 'real number'mode,which imposes the same accuracy restrictions as those imposed by the CalcReal command.This might create a problem if you are comparing a decimal number with a large integer,but this is rarely a cause for worry,since most data analysis tends to compare similar types of numbers.
 
Numeric Comparisons and Sort Order
 
You can get unintended results when you use literal comparators on numbers.For example,this does not work as you might expect at first glance:
count =count+
If count >=2 OutEnd count
You might expect this to output any number greater than or equal to '2',but in fact you will get a different result,because the comparison is a literal (text)comparison.In the example above,'2'to ''are greater or equal to '2',but '10'(which starts with '1')is considered less ,as is evident when you sort several numbers alphabetically:
1 10 11 15 100 2 20 200 3 30
As you can see,the values 1,10,11 and 15 come before '2'when sorted alphabetically.
 
To compare numbers,you should use the numerical comparators.The correct way to code the previous
example is as follows:
count =count+
If count #>=2 OutEnd count
Written in this way,numbers greater than or equal to 2 will be sent to the output file.
 
Length Comparators
 
Here is a list of the length comparators:
————————— ———————————————————
Comparator Meaning
————————— ———————————————————
Len= Equal
Len<> Not equal
Len> Greater
Len>= Greater, or equal
Len< Less than
Len<= Less than, or equal
————————— ———————————————————
The length of the value on the left side of the comparator is compared with a number on the right side of the comparator.For example:
If $OutData Len=0 NullLine ='Yes'
Of course,you could accomplish the same thing with this command:
If $OutData =''NullLine ='Yes'
However,in most cases the length comparisons will save you some coding because you will not have to use the Len command to obtain a variable for comparison.
 
Comparing Patterns
 
The Matches comparator compares a value against a pattern that uses “regular expression ” syntax ((explained later).For example:
If MyVar Matches 'c [aou ]t'GotMatch ='Yes'
This will set the variable GotMatch to 'Yes'if MyVar contains 'cat','cot'or 'cut'(case is ignored).
The pattern uses “regular expression ” syntax ((described in the next section)and must be the second item in the comparison.

In order for the comparison to be “true ”,the item being compared to the pattern must match the pattern precisely — the Matches comparator does not look for substrings.

If you want to allow a substring to match,use the Comprises comparator.For example:
If MyVar Comprises 'c [ao ]t'GotMatch ='Yes'
This will set GotMatch to 'Yes'if MyVar includes either the word 'cat'or 'cot'.Thus,the strings 'He had a cat'and 'He had a cot'both Comprise the pattern,as do the strings 'cat','cot','Cat','scatter'and so on.
 
Regular Expressions
 
A “Regular Expression ” is a sequence of characters where certain characters have a special meaning and are not matched literally.For example,a period will match any character (including the period),while the dollar-sign ($)matches the end of the line of text.

In the following list,the letters x,y and z stand in for any character.
 
^xxx Match a sequence of characters at the start of a line
xxx$ Match a sequence of characters at the end of line
x.y Match a single character (between 'x'and 'y'in this example)
[xz] Match a set of characters ('x'and 'z'in this example)
[x-z] Match a range of characters (this example covers 'x'to 'z')
x* Match zero or more occurrences of the preceding character
[xyz ]* Match zero or more occurrences from the preceding set
[x-z ]* Match zero or more occurrences from the preceding range
[^xyz ] Match any character but the ones specified
[^x-z ] Match any character but the ones in the specified range

The backslash (\)character has a special meaning in regular expressions:
\ x Means “take the next character literally ”
For example:\ [ means the actual [ character rather
than the start of a set or range
\ t Means “ a tab character ” (ASCII character 9)
 
Basic Regular Expressions
Here are some examples of matches:
C.t Match Cat, Cot, Cut, Cxt, C3t etc.
C[aou]t Match Cat, Cot, Cut only
B..d Match Bird, Bred, Bead etc.
^Dog Match Dog only if it is at the beginning of a line
Moose$ Match Moose only if it is at the end of a line
Pa*d Match Pd, Pad, Paad, Paaad etc.
Using the Asterisk
 
The last example given above uses the *character to indicate zero,one or more occurrences of a particular character — in this case,,the letter 'a'.Incidentally,this is different from the way the Windows operatingsystem uses the *wildcard character.In Windows,the *wildcard matches “any single character ”.

In regular expressions,however,the asterisk is specific about what you are looking for.That is why 'Pa*d' would not match 'Parsed';the asterisk means “match zero or more of the preceding character specification ”. If you actually want to search for 'Pa'followed by one or more letters and then 'd',the correct syntax is:
Pa [a-z ][a-z ]*d
This means that we want to match 'Pa',then a letter in the range from 'a'to 'z',then some number (including zero)of characters in the 'a'to 'z'range,and finally the letter 'd'.The character string 'Parsed'would meet these criteria,as would 'Pad','Paid'and 'Packed'.
 
 
Advanced Regular Expressions
 
Here are some more complicated examples of regular expressions:
C [^ou ] t Matches Cat,Cxt and so on,but not Cot or Cut
C [ao ]*t Matches Ct,Cat,Caat,Cot,Coot,Cooot,Coat,Coaoat etc.
[0-9 ][0-9 ] * Matches numbers such as 0,1,01,10,25,0990,9999 etc.
-[0-9 ][0-9 ] * Matches negative numbers such as -0,-1,-19,-12345 etc.
In the last example,[0-9 ] is specified twice to ensure that at least one digit is found.Bear in mind that the * character means “zero or more occurrences ”.If you had only specified '-[0-9 ]*'you would get a spurious match within the string 'Hello -there'since the '-'character is indeed found,followed by zero occurrences of the digits 0 through 9.

You can create fairly complex patterns using regular expressions.Consider this example:
\$[0-9 ][0-9 ]*\.[0-9 ][0-9 ]
This would match dollar amounts with two decimal places,such as $0.00,$03.23,$3.14,$9.99,$1234.56 and so on.

(This page is part of the online user manual for Parse-O-Matic.  Parse-O-Matic is a programmable parsing tool that can extract, manipulate, convert or mine existing data sources and turn them into importable data.  For more information on Parse-O-Matic products and conversion services, please visit www.ParseOMatic.com)