Draft Writeup of April 22, 1996

CHARACTER STRINGS

Set members like "coils" and "BEEF" are treated by AMPL as strings of characters. Option values and filenames are also treated as strings in AMPL. New string functions, operators and conventions greatly expand the power and flexibility of the AMPL language in specifying strings.

This writeup first considers general-purpose string manipulation features, including the new string concatenation operator and a variety of new string functions. Next we describe some changes that make the display of strings more natural and flexible. The final two sections describe additional new functions and conventions for converting numbers to strings and strings to numbers.

The following table summarizes the concatenation operator and all of the functions for string handling, in the order in which they are introduced below. Arguments beginning with s refer to strings, with i refer to integers, and with e refer to any numeric or symbolic expressions.

Syntax Return
Type
Return
Value
s1 & s2 string concatenation of s1 and s2
length(s1) number number of characters in s1
match(s1,s2) number first position where s2 matches a substring in s1 (or 0 if it never appears)
substr(s1,i2) string substring of s1 beginning at the position given by i2 and extending to the end of s1
substr(s1,i2,i3) string substring of s1 beginning at the position given by i2 and having a length of i3
sub(s1,s2,s3) string result of substituting s3 for the first match of s2 in s1
gsub(s1,s2,s3) string result of substituting s3 for all matches of s2 in s1
sprintf(s1,e2,. . .) string formatted string, the same as would be written by printf s1,e2,. . .
num(s1) number decimal number represented by s1
num0(s1) number decimal number represented by s1, ignoring extraneous characters
char(i1) string string of one character whose unicode value is i1
ichar(s1) number unicode value of the first character in s1

String functions and operators

The concatenation operator & takes two strings as operands, and returns a string consisting of the left operand followed by the right operand. For example, given the sets NUTR and FOOD defined by diet.mod and diet2.dat (Figures 2-1 and 2-3), you could use concatenation to define a set NUTR_FOOD whose members represent origin-destination pairs:

	ampl: model diet.mod;
	ampl: data diet2.dat;

	ampl: display NUTR, FOOD;
	set NUTR := A B1 B2 C NA CAL;
	set FOOD := BEEF CHK FISH HAM MCH MTL SPG TUR;

	ampl: set NUTR_FOOD := setof {i in NUTR, j in FOOD} i & "_" & j;

	ampl: display NUTR_FOOD;
	set NUTR_FOOD :=
	A_BEEF     B1_BEEF    B2_BEEF    C_BEEF     NA_BEEF    CAL_BEEF
	A_CHK      B1_CHK     B2_CHK     C_CHK      NA_CHK     CAL_CHK
	A_FISH     B1_FISH    B2_FISH    C_FISH     NA_FISH    CAL_FISH
	A_HAM      B1_HAM     B2_HAM     C_HAM      NA_HAM     CAL_HAM
	A_MCH      B1_MCH     B2_MCH     C_MCH      NA_MCH     CAL_MCH
	A_MTL      B1_MTL     B2_MTL     C_MTL      NA_MTL     CAL_MTL
	A_SPG      B1_SPG     B2_SPG     C_SPG      NA_SPG     CAL_SPG
	A_TUR      B1_TUR     B2_TUR     C_TUR      NA_TUR     CAL_TUR;
This is not a set that you would normally want to define, but it might be useful if you have to read data in which strings like "B2_BEEF" appear (see the example below).

The length function takes a string as argument and returns the number of characters in it. The match function takes two string arguments, and returns the first position where the second appears as a substring in the first -- or zero if the second never appears as a substring in the first. For example:

	ampl: display {j in FOOD} (length(j), match(j,"T"));

	:    length(j) match(j, 'T')    :=
	BEEF      4           0
	CHK       3           0
	FISH      4           0
	HAM       3           0
	MCH       3           0
	MTL       3           2
	SPG       3           0
	TUR       3           1
	;
The substr function takes a string and one or two integers as arguments. It returns a substring of the first argument that begins at the position given by the second argument; it has the length given by the third argument, or extends to the end of the string if no third argument is given. For instance:
	ampl: display {j in FOOD} (substr(j,1,2), substr(j,3));

	:    substr(j, 1, 2) substr(j, 3)    :=
	BEEF   BE              EF
	CHK    CH              K
	FISH   FI              SH
	HAM    HA              M
	MCH    MC              H
	MTL    MT              L
	SPG    SP              G
	TUR    TU              R
	;
An empty string is returned if the second argument is greater than the length of the first argument, or if the third argument is less than 1.

As an example of the use of several of these functions, suppose that you want to use the model from diet.mod and to supply the nutrition amount data in a table like this:

	param: NUTR_FOOD: amt_nutr :=

	          A_BEEF      60
	          B1_BEEF     10
	          CAL_BEEF   295
	          CAL_CHK    770  ...
Then in addition to the declarations for the parameter amt used in the model,

	set NUTR;
	set FOOD;
	param amt {NUTR,FOOD} >= 0;
you would declare a set and a parameter to hold the data from the "nonstandard" table:

	set NUTR_FOOD;
	param amt_nutr {NUTR_FOOD} >= 0;
To use the model, you need to write an assignment of some kind to get the data from set NUTR_FOOD and parameter amt_nutr into sets NUTR and FOOD and parameter amt. One solution is to extract the sets first, and then convert the parameters

	set NUTR := setof {ij in NUTR_FOOD} substr(ij,1,match(ij,"_")-1);
	set FOOD := setof {ij in NUTR_FOOD} substr(ij,match(ij,"_")+1);

	param amt {i in NUTR, j in FOOD} := amt_nutr[i & "_" & j];
As an alternative, you can extract the sets and parameters together, by use of an AMPL script such as the following:
	param iNUTR symbolic;
	param jFOOD symbolic;
	param upos > 0;

	let NUTR := {};
	let FOOD := {};
	
	for {ij in NUTR_FOOD} {
	   let upos := match(ij,"_");
	   let iNUTR := substr(ij,1,upos-1);
	   let jFOOD := substr(ij,upos+1);

	   let NUTR := NUTR union {iNUTR};
	   let FOOD := FOOD union {jFOOD};
	   let amt[iNUTR,jFOOD] := amt_nutr[ij];
	}
Under either alternative, errors such as a missing "_" in a member of NUTR_FOOD are eventually signaled by error messages.

For completeness, AMPL also provides two functions to make substitutions in a string. Both sub and gsub take three strings as arguments. Both return the string that results when the third argument is substituted for the second in the first; sub substitutes only for the first occurrence of the second argument, while gsub substitutes for all occurrences. For example, to replace each underscore in the membership of set NUTR_FOOD above with two hyphens,

	let NUTR_FOOD := setof {ij in NUTR_FOOD} sub(ij, '_', '--');

	ampl: display NUTR_FOOD;
	set NUTR_FOOD :=
	A--BEEF    B1--BEEF   B2--BEEF   C--BEEF    NA--BEEF   CAL--BEEF
	A--CHK     B1--CHK    B2--CHK    C--CHK     NA--CHK    CAL--CHK
	A--FISH    B1--FISH   B2--FISH   C--FISH    NA--FISH   CAL--FISH
	A--HAM     B1--HAM    B2--HAM    C--HAM     NA--HAM    CAL--HAM
	A--MCH     B1--MCH    B2--MCH    C--MCH     NA--MCH    CAL--MCH
	A--MTL     B1--MTL    B2--MTL    C--MTL     NA--MTL    CAL--MTL
	A--SPG     B1--SPG    B2--SPG    C--SPG     NA--SPG    CAL--SPG
	A--TUR     B1--TUR    B2--TUR    C--TUR     NA--TUR    CAL--TUR;
If the second argument has no occurrences in the first, then sub or gsub returns the first argument unchanged.

In match, sub and gsub, the second argument is actually taken to represent a "regular expression"; if it contains certain special characters, it is interpreted as a pattern that may match many sub-strings. The pattern "^B[0-9]+_", for example, matches any sub-string consisting of a B followed by one or more digits and then an underscore, and occurring at the beginning of a string. To use this feature, see the separate description of regular expression rules.


String expressions in AMPL commands

String-valued expressions may appear, enclosed in parentheses, in several AMPL contexts that previously required literal strings:

Here are two examples. The following script solves diet.mod with a series of different data files dietA.dat, dietB.dat, dietC.dat, . . . and saves the solution to files dietA.out, dietB.out, dietC.out, . . . :
	model diet.mod;
	set CASES := {"A","B","C"};

	for {j in CASES} {
	   reset data;
	   data ("diet" & j & ".dat");

	   solve;
	   display Buy >("diet" & j & ".out");
	}
The following script solves the same problem four times, each using a different pairing of the directives primal and dual with the directives primalopt and dualopt:
	model sched.mod;
	data sched.dat;

	option solver cplex;
	set DIR1 := {"primal","dual"};
	set DIR2 := {"primalopt","dualopt"};

	for {i in DIR1, j in DIR2} {
	   option cplex_options (i & " " & j);
	   solve;
	}
See the next section for examples that generate consecutive numbers as parts of filenames and directives.


Number-to-string conversions

When you tell AMPL to exhibit a numerical value, you are invoking a number-to-string conversion of some kind. This conversion can be triggered and regulated by a variety of options, commands and functions, several of them new.

Options ending in _precision and _round enable you to control number-to-string conversion in specific contexts. In general terms, their values are interpreted as follows:

context_precision n

n > 0: round to n digits of precision.
n = 0: show in full precision: the shortest decimal string representation that, when correctly rounded back to the computer's internal representation, yields the original numerical value

context_round n

n > 0: round to n digits after the decimal point
n = 0: round to integer
n < 0: round to -n digits before the decimal point

The different possibilities for context are:

Options Context affected
csvdisplay_precision
csvdisplay_round
numbers from the obscure _display and csvdisplay commands
display_precision
display_round
numbers from the display command
expand_precision
expand_round
numbers from the new expand command
MD_precision numbers written in debug output produced by the obscure -M and -D command-line switches
objective_precision optimal objective from the solve command
output_precision numbers written by the solve command to the file that will be read by a solver
print_precision
print_round
numbers from the print command
solution_precision
solution_round
values of variables and dual variables returned by the solve or solution command
The _round variant (if any) takes precedence when it has a numerical value; otherwise the precision variant is used.

The printf command provides more precise control over number-to-string conversions. Its syntax is

printf indexingopt   format-string,   expression-listopt   redirectionopt   ;
Members of the expression-list that evaluate to numbers are converted to strings according to instructions encoded into the format-string. The final result of this formatting is a character string that is sent to a file specified by the optional redirection or else by default to standard output. A guide to format strings is provided on the separate printf rules page (and on pages 328-329 of the AMPL book). Format strings in AMPL have mostly the same interpretation as in the C programming language. The most notable exceptions are AMPL's interpretation of %.g or %.0g to specify full precision (as explained earlier in this section), and the introduction of %q and %Q to produce quoted strings (as described under "Display formats for strings" below).

The new sprintf function also specifies a format-string and an expression-list:

sprintf ( format-string,   expression-listopt )
This function performs conversions in the same way as printf, except that the resulting formatted string is not sent to an output device, but instead becomes the function's return value.

Numbers are also converted automatically to strings when they appear as arguments to the string concatenation operator &. For example, for a multi-week model you can create a set of generically-named periods WEEK1, WEEK2, and so forth, by declaring:

	param T integer > 1;
	set WEEKS ordered := setof {t in 1..T} "WEEK" & t;
Many useful applications of this feature occur in scripts that (as in our previous examples) process a series of files or set an option in several ways. For example, the following script solves diet.mod with a series of different data files diet1.dat, diet2.dat, diet3.dat, . . . and saves the solution to files diet1.out, diet2.out, diet3.out, . . . :
	model diet.mod;
	param Nfiles integer := 3;

	for {j in 1..Nfiles} {
	   reset data;
	   data ("diet" & j & ".dat");

	   solve;
	   display Buy >("diet" & j & ".out");
	}
The following script solves the same problem three times, each using a different value for the solver's branch directive:
	model multmip3.mod;
	data multmip3.dat;

	option solver cplex;

	for {i in -1 .. 1} {
	   option cplex_options ("branch " & i);
	   solve;
	}
Numeric operands to & are always converted to full precision (or equivalently, to %.0g format) as previously defined. The conversion thus produces the expected results for concatention of numerical constants and of indices that run over sets of integers or constants, as in our examples. Full precision conversion of computed fractional values may surprise you, however. The following variation on the preceding example would seem to solve the problem for values 0.1, 0.2, 0.3, and 0.4 of directive linesearch_tolerance, saving the listings from solve in files nltrans0.1 through nltrans0.4:

	model nltransd.mod;
	data nltrans.dat;

	for {i in 0.1 .. 0.4 by 0.1} {
	   option reset_initial_guesses 1;
	   option minos_options ("linesearch_tolerance=" & i);

	   solve > ("nltrans" & i);
	}
The actual files created are
	nltrans0.1
	nltrans0.2
	nltrans0.30000000000000004
	nltrans0.4
Because 0.1 cannot be stored exactly in the computer's internal base-2 representation, the third member of the set 0.1 .. 0.4 by 0.1 in the for loop is slightly different from 0.3 in "full" precision. There is no easy way to predict this behavior, but you can prevent it by specifying the conversion explicitly. In our example, you could replace "nltrans" & i with sprintf("nltrans%3.1f",i).


String-to-number conversions

AMPL does not automatically convert strings to numbers; if an expression or command uses a string where a number is expected, it is rejected with an error message. You can specify conversions from strings to numbers explicitly, however, by use of the new num and num0 functions. Both take a string as an argument, and return the numerical value expressed by the string. Thus, for example, suppose the strings declared by

	param cc {j in FOOD} symbolic;
are "cost codes" that contain the cost per unit in the leading 4 characters: "3.19BE", "2.59CH", and so forth. The expression substr(costcode[j],1,4) * Buy[j] is rejected. because a string appears where a number is expected, as the left operand of *; but
	num(substr(costcode[j],1,4)) * Buy[j] 
is correct.

The num function accepts an argument that evaluates to a valid AMPL numerical constant written as a string, such as "-12.34" or " 1.2e+6". Leading and trailing white space -- any combination of spaces, tabs and newlines -- is ignored. The use of num with any other argument is flagged as an error.

The num0 function is more forgiving. It extracts the longest leading substring of its argument that would be acceptable to num, returns the corresponding numerical value, and disregards the rest of the argument. If no leading substring can be converted, a value of zero is returned. Thus num0 accepts " 12.34kg" or "1.2e+4?", for example, returning the numbers 12.34 and 12000. Our sample expression above, with costcode[j] having the form "3.19BE", can be written more concisely as

	num0(costcode[j]) * Buy[j] 
If the cost codes are instead of the form "BE3.19", however, then num0 returns 0 for all of them.


Character-code conversions

Two complementary functions convert between characters and integer character codes.

The char function takes a nonnegative integer as an argument, and returns a string of length 1 comprising the corresponding unicode character. (There is no distinct character data type in AMPL.) Unicode is a superset of ASCII, so char(65) returns "A", and display char(7) sends a "control-G" to the AMPL output stream (where it may provoke a "beep" or other alert sound).

The ichar function is the inverse of char. It takes a string as argument, and returns the corresponding integer unicode for the string's first character. Thus ichar("A") and ichar("AT&T") both return 65.


Display formats for strings

A string is normally displayed (by display, print or printf) as simply the list of characters that comprise it, without surrounding delimiters of any kind. The same holds when a string is passed to any function. This is a change from some of the earlier versions of AMPL, in which the surrounding quotes were sometimes shown or passed.

Strings still do need to be delimited by quotes (' or ") when you write them as input to AMPL, so that they can be parsed correctly. The only exception is in data statements (Chapter 9 of the AMPL book) where for convenience the quotes may be omitted from strings that contain only letters, digits, and underscores and that do not represent numbers.

For the command printf or the function sprintf, the conversions specification %s in the format string requests the unquoted contents of the string. Two new specifiers have been added to produce quoted strings: %q shows quotes if and only if they would be required in a data statement, while %Q shows quotes around all strings.



Comments or questions?
Write to info@ampl.com or use our comment form.

Return to the AMPL update page.

Return to the AMPL home page.


LAST MODIFIED 22 APRIL 1996 BY 4er.