Overview
 
A “decapsulator ” is a command parameter that defines a search for where a string of characters either begins or ends.

If that definition was not particularly helpful,it is because decapsulators cannot be fully described by a single sentence.But we encourage you to read through this section,because decapsulators are very important in Parse-O-Matic Scripting.Here is the reason why:
 
Decapsulators let a single Parse-O-Matic Scripting comman accomplish
what might take dozens of commands in a standard programming language.
The underlying concept is this:when analyzing data,the part you are interested in (the “field ”)is typically surrounded ("encapsulated ” )by some kind of distinctive text.A decapsulator looks for the distinctive text on either side of the data you want and thus helps you extract the field.

Sometimes the “distinctive text ” appears more than once in the data you are examining.Decapsulators can handle this situation.

Sometimes one edge of the field is the beginning or end of the data you are examining,so there is no “distinctive text ” to look for.Decapsulators can handle this situation,too.
 
Quick Reference
 
Here are some sample decapsulators:
 
———————— —————————————————————————————— —————————————————————————
Sample “From ” Decapsulator Meaning “To” Decapsulator Meaning
———————— —————————————————————————————— —————————————————————————
'23' From column 23 onwards Up to column 23
'AB' After first occurrence of 'AB' Before first 'AB'
'1*CD' After first occurrence of 'CD' Before first 'CD'
'5*EF' After fifth occurrence of 'EF' Before fifth 'EF'
'<*GH' After first occurrence of 'GH' Before first 'GH'
'>*IJ' After last occurrence of 'IJ' Before last 'IJ'
'' From left edge of data From right edge of data
'-2' Two columns in from the right Same
———————— —————————————————————————————— —————————————————————————
Each of these techniques is explained below in more detail.
 
A Simple Example
 
Here is an example of how decapsulators work.Consider the following commands.
 
SourceVar ='AAABBBCCC'
ResultVar =Parse SourceVar '3*A''1*C'
The second command means “Set ResultVar to everything between the third occurrence of 'A'and the first
occurrence of 'C'.” In other words,,ResultVar will end up containing 'BBB'.
 
Why Decapsulators are Necessary
 
When analyzing data,the fields you are interested in are sometimes arranged in tidy columns — but not always.Quite frequently,a field will start after some kind of delimiter,as in the following example.
SourceVar ='Mouse,Gazelle,Mouse,Elephant'
Here the fields are separated by commas — a commonly--used format for data known as CSV (Comma Separated Values).

Extracting,say,the second item from free-form data is rather awkward if you are using a standard programming language.Fortunately,Parse-O-Matic Scripting has been developed with precisely this kind of situation in mind.

Using decapsulators,the Parse command lets you extract the “Nth ” item..For example,to extract the third item in the free-form example above,you could use this command:
ResultVar =Parse SourceVar '2*,''3*,'
This means “Set the variable ResultVar by looking in SourceVar and taking everything between the second
comma and the third comma ” .ResultVar would thus be set to 'Mouse'.
The name 'Trixie'contains an 'x',so it would be broken down into two scanterms ('Tri'and 'ie').You should always choose a scanlist delimiter that does not appear in the list of scanterms.
 
Introduction to Occurrence Numbers
 
Let's have another look at that last command.
ResultVar =Parse SourceVar '2*,''3*,'
The first decapsulator (i.e.the '2*,'part)is the “From ” specification..The second decapsulator (i.e.the '3*,' part)is the “To ” specification..It is interpreted as follows:
3

means “the third occurrence"
* marks the end of the occurrence number
is the text you are looking for
Decapsulators can be used to find more than a single character.Let's say that (for some odd reason)a variable named xyz has been set such that each field is separated with a pair of X's,as in the following example (with the XX strings highlighted for clarity).
 
xyz ='mouse XX gazelle XX mouse XX elephant'
 
You can extract the third item with this command:
abc = Parse xyz '2*XX' '3*XX'
---
|
    ---
|
|
   
Variable to set    
Variable to search   “To ” text being sought
“From ” occurrence number   “To ” occurrence number
“From ” text being sought  
 
This command sets the variable abc to 'mouse',since it is found between the second and third occurrences of XX .
 
Sample Application
 
The Parse command is particularly useful for extracting information from CSV (Comma Separated Value) files.Here is an example of a CSV file:
"Mouse","Gazelle","Mouse","Elephant"
"Dog","Giraffe","Elk","Mongoose"
"Monkey","Snake","Caribou","Trout"
These fields could be extracted with this series of commands:
field1 =Parse $OutData '1*"''2*"'
field2 =Parse $OutData '3*"''4*"'
field =Parse $OutData '5*"''6*"'
field4 =Parse $OutData '7*"''8*"'
For the first line of the input file,field1 is set to 'Mouse',field2 is set to 'Gazelle',and so on.
 
Occurrence Number Syntax
 
Occurrence numbers must be larger than zero.The following lines are not valid Parse commands:
field1 =Parse $OutData '0*"''2*"'    ;"From"decapsulator is zero
field2 =Parse $OutData '-1*"''2*"'   ;"From"decapsulator is negative
The occurrence number must always be followed by an asterisk (the *character)so you can search for a number.Consider the following example (the meaning of which would be unclear without the asterisk):
MyVar =Parse 'xxx2yyy2zzz2''1*2''2*2'
This sets MyVar to the text occurring between the first '2'and the second '2'.In other words,MyVar is set to 'yyy'.
 
Finding the First and Last Occurrence
 
A decapsulator can refer to “the last occurrence":
xyz =Parse 'AaaBAbbBAccB''>*A''>*B'
In both decapsulators,the >symbol means “the last occurrence ”.Thus,the command means,“Set the xyz
variable to everything between the last A and the last B ”.Thus,the xyz variable is set to “cc ”.
You can also use the <character to mean “the first occurrence ”,though this is somewhat redundant,since
the following commands are equivalent:
abc =Parse 'AaaBAbbBAccB' '<*A' '<*B'
abc =Parse 'AaaBAbbBAccB' '1*A' '1*B'
abc =Parse 'AaaBAbbBAccB' 'A'   'B'
All three commands would set the abc variable to 'aa'.
 
Finding the Next Occurrence
 
When using occurrence numbers for certain kinds of data,you will often find that the “To ” occurrence number is 1 (one)more than the “From ” occurrence number.Consider this example:
xyz ='AB,CD,EF,GH'
Field1 =Parse xyz '''1*,'
Field2 =Parse xyz '1*,''2*,'
Field3 =Parse xyz '2*,''3*,'
For Field3 you are extracting everything between the second and third comma.It can become tiresome to write code like this — always adding one to the “From ” occurrence number.Fortunately,you can use the “next occurrence ” symbol ''@*'in the “To ” decapsulator:
xyz ='AB,CD,EF,GH'
abc =Parse xyz '2*,''@*,'
This will set the “From ” position to the second comma,and the “To ” position to the comma after that ((i.e. the third one).The '@*'symbol means “Look for the To text starting immediately after the From text ”.

Note:The “next occurrence ” symbol ((@*)can only be used in the “To ” decapsulator..
 
Positional Decapsulators
 
Note:Positional decapsulators imply that operations proceed from or to the exact character position
indicated,regardless of the control settings.

You can specify a number to indicate the “From ” or “To ” character position..In this mode,the Parse command behaves exactly like the Cols command.Thus,the following two commands accomplish the same thing:
xyz =Parse MyVar '10''20'
xyz =Cols MyVar '10''20'
As such,this is not particularly helpful.However,you can combine positional decapsulators with other types of decapsulators,as in this example:
 
MyVar ='ABCD/abcd/'
abc =Parse MyVar '3''1*/'
 
This will set the variable abc to 'CD'.
 
Negative Positional Decapsulators
 
You can also count backwards from the right edge of the data.Consider this example:
MyVar ='ABCDEFG'
xyz =Parse MyVar '-3''-2'
This will set the variable xyz to 'EF'.(The last character in a variable is represented by position '-1'.)
 
Using Positional Decapsulators Safely
 
You need to be careful when you use positional decapsulators.If,for example,you use a negative positional decapsulator,and you end up referring to a character before the beginning of the string,it isn't clear to the Parse-O-Matic engine what you “meant ” by that..(In all likelihood,you didn't mean anything;these situations sometimes arise if you have not considered all possible variations in format of the input data.)

For the reason just noted,and others that will become evident as you write scripts:if there is a chance that a positional decapsulator will refer to a character position of zero or less,or if it might refer to a position beyond the end of the data,your script should check the length of the data before trying the command.
 
The Plain Decapsulator
 
The occurrence number is not always needed.Either the “From ” or “To ” decapsulator can be represented as
a plain (non-numeric)string,as in the following example.
OldVar ='zzzABChelloXYZzzz'
NewVar =Parse OldVar 'ABC''XYZ'
This would set the variable named NewVar to 'hello'since it means:
1. Copy from the character following the first 'ABC'
2. Copy up to the character preceding the first 'XYZ'
This is,of course,equivalent to the following command,which uses occurrence numbers:
NewVar =Parse OldVar '1*ABC''1*XYZ'
In general,it is best to explicitly give occurrence numbers,unless you know that the format of the data is not
going to change.
 
Unsuccessful Searches
 
When a command that uses decapsulators does not find the search text,it does as little as possible.For example,if a Parse command does not find the encapsulating text,it sets the variable to a null ('').Here are two examples:
abc =Parse 'ABCDEFGHIJ''1*K''1*J';There is no 'K'
abc =Parse 'ABCDEFGHIJ''1*A''1*X';There is no 'X'
To illustrate this principle further:if the Overlay command does not find the search text,it does nothing at all,as in the following example.
abc ='ABCDEFGHIJ';Set a variable
Overlay abc 'K''LMNOP';There is no 'K',so nothing is done
If the “From ” value is less than the “To ” value,,the Parse-O-Matic engine will display an error message,then terminate further processing.For example:
abc =Parse abc 'ABCDEFGHIJ''1*J''1*A';'J'comes after 'A'
This kind of failure typically happens if the data contains an odd arrangement of text that you had not foreseen.In such case,it would not be reasonable for processing to continue;you need to be warned about departures from what your script implies you expected.
 
The Control Setting
 
Commands that use decapsulators typically have a “control setting ” that allows you to adjust the way the command is performed.A few examples follow.

The Parse command's control setting tells Parse whether to include or exclude the surrounding (i.e.searched- for)text.By default,the surrounding text is excluded (unless the decapsulator is positional).However,if you want to include it,you can add 'Include'at the end of the Parse command,as in this example:
xyz =Parse 'aXcaYcaZc''2*a''2*c''Include'
This tells the command to give you everything between the second 'a'and the second 'c'— including the 'a' and 'c'.In other words,this sets the variable xyz to 'aYc'.

You can also set the Control specification to 'Exclude',though since this is the default setting for Parse,it isn't necessary.Here is an example:
xyz =Parse 'a1ca2ca3c''2*a''2*c''Exclude'
This sets the variable xyz to '2'.

You can specify several control settings at once,separated by spaces.By default,the Parse command's control setting is 'Exclude MatchCase'but you could set this to (for example)'Include IgnoreCase'.
 
The Null Decapsulator
 
Here is a helpful variation of the “From ” decapsulator:
''means “Start from the first character in the value being analyzed ”
A similar variation can be used with the “To ” decapsulator:
''means “End with the last character in the value being analyzed ”
If you use the null ('')decapsulator for “From ” or “To ”,the “found ” value ((the first character for “From ”,or the last character for “To ”)will always be included (see the section “Overlapping Decapsulators ” for an exception to this rule).Here is an example:
xyz =Parse 'ABCABCABC''''2*C"
This sets the variable xyz to 'ABCAB'.The “From ” value ((i.e.the first character)is not excluded.However, when Parse finds the “To ” value ((i.e.the second occurrence of the letter C)it is excluded.If you want to include the second 'C',you should write the command this way:
xyz =Parse 'ABCABCABC''''2*C''Include'
Incidentally,the following two commands accomplish the same thing:
xyz =Parse 'ABCD'''''
xyz ='ABCD'
They are equivalent because the Parse command means “Set the variable xyz with everything between (and
including)the first character and the last character ”.
 
Why Null Decapsulators Work Differently
 
It may not be immediately obvious why decapsulator-enabled commands treat the null ('')decapsulator differently.The examples given here are very simple,and not representative of real-world applications.

In day-to-day usage,though,you will frequently find it helpful to be able to specify a command that says, “Give me everything from the beginning of the line to just before such-and-such ” or “Give me everything from such-and-such a point until the end of the line."

For example,here is a command that means “Give me everything from just after the dollar sign,to the end of the line":
xyz =Parse 'Please give me $199.00''1*$'''
This sets xyz to “199.00 ”.If you want to include the dollar sign,write the command this way:
xyz =Parse 'Please give me $199.00''1*$''''Include'
In this example,the 'Include'control setting affects the way the “From ” decapsulator works,,since it is using an occurrence number.The null decapsulator is not affected.
 
Overlapping Decapsulators
 
Earlier,it was mentioned that the text found by the null decapsulator is “always included ” and is not affected by the 'Exclude'control setting.There is an exception to this:if the null decapsulator's “found text ” is contained in the text found by the other decapsulator,it can be affected.For example:
xyz =Parse 'ABCDEFABCDEF''''1*AB''Exclude'
This command means “Give me everything between the first character and the first occurrence of AB ”. Since the two items overlap (i.e.the first 'AB'includes the first character),the first character does indeed get excluded.As a result,the xyz variable is set to an empty string ('').

Here is another example.
xyz =Parse 'ABCDEFABCDEF''>*F''''Exclude'
This command means “give me everything between the last occurrence of F and the last character ”.Both decapsulators refer to the same character (i.e.the final 'F'),so it is excluded.As a result,the xyz variable is set to an empty string ('').

Note:In some circumstances,the FindPosn command is not affected by this exception.It will do its best to make sense of your request if the decapsulators overlap and one of them is a null decapsulator.
 
Parsing Empty Fields
 
Consider the following command,which is operating on CSV (Comma Separated Value)data.
xyz =Parse ',,,JOHN,SMITH''2*,''3*,'
There is nothing between the second and third comma,so the xyz variable is set to ''(an empty string).
Now consider this command:
xyz =Parse ',,,JOHN,SMITH'''','
You are asking for everything from the first character to the first comma (which also happens to be the first character). Obviously,there is nothing “between ” the two characters,,so the xyz variable would be set to '' (an empty string).This may be what you wanted,but whenever you are dealing with a field at the beginning or end of data,and there is a chance the field might be empty,it is a good idea to test your script to make sure that it does what you expect.

(This page is part of the online user manual for Parse-O-Matic.  Parse-O-Matic is a programmable parsing tool that can extract, manipulate, convert or mine existing data sources and turn them into importable data.  For more information on Parse-O-Matic products and conversion services, please visit www.ParseOMatic.com)