The third argument of sub and gsub serves as a replacement pattern that may use some specially interpreted characters to stand for all or part of the matched string.
Detailed rules for forming and interpreting regular expressions and replacement patterns are given below. Like many such constructions, however, they are often better learned by example; tables of examples follows the formal rule statements.
A regular expression may contain any character except "newline". The specially interpreted metacharacters are:
. * + ? [ ] ( ) | \ ^ $In formal terms, the syntax for a regular expression e0 is given by:
e3: literal charclass ^ $ . ( e0 ) literal: non-metacharacter \ metacharacter charclass: [ class-string ] e2: e3 e2 repeater repeater: * + ? e1: e2 e1 e2 e0: e1 e0 | e1A literal matches one character, either itself (if not a metacharacter) or the metacharacter that follows \.
A charclass matches any character in class-string, with exceptions for two characters that are specially interpreted:
For example, '[a-z]' matches all lower-case letters, and '[^a-zA-Z_]' matches all characters except letters and underscore. (It is optional to put a \ before a metacharacter in a class-string, except before - and ] and before ^ at the beginning of the string.)
- If the first character is ^, the charclass matches any character (except newline) not in class-string.
- Any substring c1-c2 matches all characters in the range from c1 to c2, in the standard unicode sort ordering (which, for ASCII characters, coincides with the ASCII sort ordering).
The following characters match in special ways:
The repeater operators match some number of instances of the preceding regular expression e2:
- . matches any character other than newline.
- ^ matches the beginning of a string.
- $ matches the end of a string.
A concatenated regular expression, e1 e2, matches a match to e1 followed by a match to e2.
- e2* matches zero or more instances of e2.
- e2+ matches one or more instances of e2.
- e2? matches zero or one instance of e2.
An alternative regular expression, e0 | e1, matches either a match to e0 or a match to e1.
Within a given string, a regular expression may match more than one substring. In such a case the longest match, roughly speaking, is taken. More precisely, a match to any part of a regular expression extends as far as possible without preventing a match to the remainder of the regular expression.
Regular expressions are shown quoted, as they would appear when given as arguments to sub, gsub or match. As AMPL string constants, they may be delimited by a pair of single quotes (') or double quotes ("). Within a string delimited by single quotes, a single quote is represented by two single quotes, and similarly for double quotes.
Regular expression Matches 'AMPL book' AMPL book 'AMPL''s syntax' AMPL's syntax "AMPL's syntax" AMPL's syntax 'book \(\$62\.50\)' book ($62.50) '[ampl]' a or m or p or l '^sa' sa at start of salsa 'sa$' sa at end of salsa 'b..k' book or back or beak or b23k etc. 'b.*k' bk or b3k or book or break etc. 'b[a-z]*k' bk or book or break etc. 'b.+k' b3k or book or break etc. 'b[aeiou]+k' buk or book or beak etc. 'b.?k' bk or bak or b3k or b_k etc. 'b.k|b..k' bak or b3k or back or b32k etc. 'a(b.k|b..k)' ab3k or aback etc.
Substitution Result sub('replacement','e','X') 'rXplacement' gsub('replacement','e','X') 'rXplacXmXnt' gsub('replacement','e','') 'rplacmnt' gsub('replacement','([ae])','{&}') 'r{e}pl{a}c{e}m{e}nt' sub('replacement',
'e([a-z]*)e([a-z]*)e','e{\2}e{\1}e')'re{m}e{plac}ent'
Return to the character strings writeup.
Return to the AMPL update page.