Data Frame

class DataFrame

DataFrame objects are used to communicate data to and from AMPL entities. An object of this class can be used to do the following tasks:

  • Assign values to AMPL entities (once the DataFrame is populated, use the AMPL.setData to assign its values to the modelling entities in its columns)

  • Get values from AMPL, decoupling the values from the AMPL entities they originate from the getData functions in Variables, Parameters, Constraints and Objectives

Creation

A DataFrame object can be created in two ways.

  • Create a scheleton via DataFrame([amplEntities]), in which the indexing columns are automatically detected depending on the AMPL entities used as headers.

  • Create a scheleton via DataFrame(int numberOfIndexColumns, header1, ..., headern), where arbitrary objects can be used as column headers (they will be converted to their string representation when communicating with the AMPL interpreter).

Adding data

Populating a DataFrame object can be done adding row by row to a pre-existing scheleton via DataFrame.addRow, setting whole columns of a pre-existing scheleton via DataFrame.setColumn or adding columns (including indexing columns) via DataFrame.addColumn. The function DataFrame.setMatrix provides a way to populate a DataFrame with values from MATLAB n-dimensional matrices and cell arrays.

Modifying a DataFrame object can be done via DataFrame.setColumn or, item by item, via DataFrame.setValue.

Accessing data

Accessing data in a DataFrame can be done row by row using DataFrame.getRow or by column via DataFrame.getColumn, or by using MATLAB own indexing procedures.

Access the dataframe by row by using MATLAB indexing:

  • df(index1, ..., indexn) (where index1, ..., indexn represent an indexing tuple for the current dataframe) is equivalent to calling DataFrame.getRow with the same parameters

  • df(startindex:endindex) where startindex and endiindex are numerical indices of rows (position) return a cell array composed of all the rows in the range

Access the dataframe by column using table-like syntax:

  • df.columnname is equivalent to call df.getColumn(columnname)

Conversion to and from MATLAB tables

MATLAB tables are a very powerful tool for handling data, and are very similar to dataframes. In built routines are provided for converting a DataFrame to and from MATLAB tables.

To create a DataFrame from a MATLAB tables, the static function fromTable can be used:

` table t; df =  DataFrame.fromTable(t); `

Whilst the opposite functionality is provided by the function toTable:

` t = df.toTable(); `

DataFrame

classmethod DataFrame.DataFrame()

Syntax

df = DataFrame([amplEntities])

df = DataFrame(int numberOfIndexColumns, header1, ..., headern)

Description

df = DataFrame([amplEntities]) creates a skeleton structure, where the indexing columns are automatically inferred depending on the AMPL entities used as headers

df = DataFrame(numberOfIndexColumns, header1, ..., headern) creates a scheleton structure, where arbitrary objects can be used as column headers (they will be converted to their string representation when communicating with the AMPL interpreter).

Input Arguments

numberOfIndexColumns The number of columns to be considered as indices in the table this object represents. Can be zero in case of DataFrames representing scalar data.

amplEntities A vector of AMPL entities (obtained via one of the functions at AMPL.getEntity). Sets should be listed first, as the have to be the first columns of the table.

header1,...,headern A list of header to be used as column headers. These names will specify how the data will be then bound to AMPL entities when using AMPL.setData.

Output Arguments

df

A DataFrame object that can be used for assigning and manipulating data.

Example 1

Create a dataframe with two indexing columns starting from AMPL entities and populate them row by row:

ampl.eval('set PROD; set COLOUR; param price{PROD, COLOUR};');
PROD = ampl.getSet('PROD');
COLOUR = ampl.getSet('COLOUR');
price = ampl.getParameter('price');

df = DataFrame(PROD, COLOUR, price);
df.addRow('shirts', 'red', 5);
df.addRow('shirts', 'blue',  4.5);
df.addRow('skirts', 'blue',  5.5);
df

gives:

df =
PROD    COLOUR  |  price
shirts  red     |  5.0
shirts  blue    |  4.5
skirts  blue    |  5.5

Example 2

Create a dataframe with one indexing column and two data columns, start adding data row by row, then add a whole column with data

df = DataFrame(1, 'PROD', 'price');
df.addRow('shirts', 5);
df.addRow('skirts', 4.5);
df.addColumn('cost', [4.5 5])
df

gives:

df =
PROD    |  price  cost
shirts  |  5.0    4.5
skirts  |  4.5    5.0

addColumn

classmethod DataFrame.addColumn()

Syntax

addColumn(headerName)

addColumn(amplEntity)

addColumn(headerName, [values])

addColumn(amplEntity, [values])

Description

addColumn(headerName) add a new empty column with the specified header to the dataframe.

addColumn(amplEntity) add a new empty column with the specified header as the name of the AMPL entity to the dataframe

addColumn(headerName, [values]) add a column with the specified header to the DataFrame, containing the listed values. If the added column is an indexing column, the indices will be populated with the cartesian product of all the members of the columns already populated.

Input Arguments

headerName The name of the column to be added

amplEntity An AMPL entity, its name will be used for the new column

values A vector of values to be used in the column. It has to have the same cardinality as the DataFrame (accessible via DataFrame.getNumRows) if the column is not an indexing column. Otherwise, the cardinality of the table will be modified to the cartesian product of the defined sets.

Example 1

Create a dataframe with one indexing columns and add an empty column later. Populate then the column via DataFrame.setColumn. row by row

df = DataFrame(1, 'PROD');
df.setColumn('PROD', {'shirts'; 'skirts'});
df.addColumn('price')
df.setColumn('price', {6.5; 5.5})
df

gives:

PROD    |  price
shirts  |  6.5
skirts  |  5.5

Example 2

Create a dataframe with three indexing columns, and populate them.

df = DataFrame(3, 'ORIGIN', 'DEST', 'MODE');
routes = {'London', 'New York'; 'London', 'Milan'}
modes = {'air', 'normal'};



df.setColumn('ORIGIN', routes(:,1));
df.setColumn('DEST' , routes(:,2));
df.setColumn('MODE', modes)
df

gives:

df =
ORIGIN  DEST      MODE    |
London  New York  air     |
London  Milan     normal  |

then to add a column with travelling times (ordered as shown), it is sufficient to add:

df.addColumn('time', [7 15]);
df

will list:

df =
ORIGIN  DEST      MODE    |  time
London  New York  air     |  7.0
London  Milan     normal  |  15.0

addRow

classmethod DataFrame.addRow()

Syntax

addRow({colValues})

Description

addRow({colValues}) adds a row to the DataFrame

Input Arguments

{colValues} A cell array of values, of cardinality DataFrame.getNumCols

Example

Construct a DataFrame row by row:

ampl.eval('set PROD; set COLOUR; param price{PROD, COLOUR};');
PROD = ampl.getSet('PROD');
COLOUR = ampl.getSet('COLOUR');
price = ampl.getParameter('price');

df = DataFrame(PROD, COLOUR, price);
df.addRow('shirts', 'red', 5);
df.addRow('shirts', 'blue',  4.5);
df.addRow('skirts', 'blue',  5.5);
df

gives:

df =
PROD    COLOUR  |  price
shirts  red     |  5.0
shirts  blue    |  4.5
skirts  blue    |  5.5

setColumn

classmethod DataFrame.setColumn()

Syntax

setColumn(headerName, [values])

setColumn(amplEntity, [values])

Description

setColumn(headerName, [values]) sets the values of the column specified by headername to the given vector of values. Cannot be used to specify values of indexing columns if not the first one.

setColumn(amplEntity, [values]) sets the values of the column inferred by the AMPL entity to the given vector of values. Cannot be used to specify values of indexing columns if not the first one.

Input Arguments

headerName The name of the column to be added

amplEntity An AMPL entity, its name will be used for the new column

values A vector of values to be used in the column. It has to have the same cardinality as the DataFrame (accessible via DataFrame.getNumRows).

Example

Create a dataframe with two columns and populate them.

df = DataFrame(1, 'PROD', 'price');
df.setColumn('PROD', {'shirts', 'skirts'})
df.setColumn('price', [6.5, 5.5])

gives:

PROD    |  price
shirts  |  6.5
skirts  |  5.5

getNumCols

classmethod DataFrame.getNumCols()

Syntax

nc = getNumCols

Description

getNumCols() gets the number of columns in the current DataFrame. Note that MATLAB’s own size function can be used as well.

Input Arguments

nc The number of columns actually defined in the DataFrame

Example

Create a dataframe and keep adding columns

df = DataFrame(2, 'PROD', 'COLOUR');
df.getNumCols
df.addColumn('price');
df.getNumCols
size(df)

gives:

ans =
     2
ans =
     3
ans =
  0     3

getNumRows

classmethod DataFrame.getNumRows()

Syntax

nr = getNumRows

Description

getNumRows() gets the number of rows in the current DataFrame. Note that MATLAB’s own size function can be used as well.

Input Arguments

nr The number of rows in the dataframe

Example

Create a dataframe by cartesian product and explore the number of rows

df = DataFrame(3, 'ORIGIN', 'DEST', 'MODE');
hubs = {'London', 'New York', 'Delhi'};
df.setColumn('ORIGIN', hubs);
df.setColumn('DEST', hubs);
df.setColumn('MODE', {'air', 'normal', 'slow'})
df.getNumRows
size(df)

gives:

ans =
  3
ans =
  3   3

getNumIndices

classmethod DataFrame.getNumIndices()

Syntax

ni = getNumIndices

Description

getNumIndices() gets the number of indexing columns in the current DataFrame

Input Arguments

ni The number of indexing columns in the dataframe

Example

Show the number of indexing columns of a DataFrame

df = DataFrame(2, 'index1', 'index2');
df.getNumIndices

gives:

ans =
  2

getRow

classmethod DataFrame.getRow()

Syntax

[row] = getRow({rowindices})

[row] = getRowByIndex(index)

Description

[row] = getRow({rowindices}) gets the row indicated by rowindices

[row] = getRowByIndex(index) gets the row specified by the zero based index specified

Input Arguments

rowindices A cell array of indices values, of cardinality DataFrame.getNumIndices

index The zero-based integer index of the wanted row

Output Arguments

row A cell array containing an horizontal “slice” of the DataFrame

Example

Extract rows by specifying the indices and by their ordinal number in the table

df = DataFrame(3, 'ORIGIN', 'DEST', 'MODE', 'time');
origins = {'London'; 'New York'; 'Milan'; 'London'; 'New York'; 'Milan'}
destinations = {'New York'; 'Milan'; 'London'; 'New York'; 'Milan'; 'London'}
df.setColumn('ORIGIN', origins);
df.setColumn('DEST', destinations);
df.setColumn('MODE', {'air';'air';'air'; 'normal';'normal';'normal'})
df.setColumn('time', [6,7,2,60,60,20]);
% The two statements below refer to the same row
row1 = df.getRow('London', 'New York', 'air');
row2 = df.getRowByIndex(1);

row1(4)
row2(4)

gives:

ans =
  6
ans =
  6

getColumn

classmethod DataFrame.getColumn()

Syntax

[column] = getColumn(columnHeader)

[columnvalues] = getColumnAsDoubles(columnHeader)

[columnstrings] = getColumnAsStrings(columnHeader)

Description

[column] = getColumn(columnHeader) gets the specified column as a java array of objects

[columnvalues] = getColumnAsDoubles(columnHeader) gets the specified column as a matlab vector of numbers

[columnstrings] = getColumnAsStrings(columnHeader) gets the specified column as a vector of strings

Input Arguments

columnHeader The header of the column to be extracted

Output Arguments

column A cell array containing the values of the column

columnvalues A vector containing the values of the column

columnstrings A cell array containing the values of the column

Example 1

Extract a column from a dataframe

df = DataFrame(3, 'ORIGIN', 'DEST', 'MODE', 'time');
origins = {'London'; 'New York'; 'Milan'; 'London'; 'New York'; 'Milan'};
destinations = {'New York'; 'Milan'; 'London'; 'New York'; 'Milan'; 'London'};
df.setColumn('ORIGIN', origins);
df.setColumn('DEST', destinations);
df.setColumn('MODE', {'air';'air';'air'; 'normal';'normal';'normal'})
df.setColumn('time', [6,7,2,60,60,20]);

column=df.getColumnAsDoubles('time')

gives:

column =
  6
  7
  2
 60
 60
 20

Example 2

Extract values from an AMPL parameter as doubles, using a DataFrame as intermediary

ampl = AMPL
ampl.eval('param P{i in 1..4} := i;');
P = ampl.getParameter('P');
df = P.getValues;
Pvalues = df.getColumnAsDoubles('val');
Pvalues

gives:

Pvalues =

     1
     2
     3
     4

setValue

classmethod DataFrame.setValue()

Syntax

setValue(columnHeader, {rowIndex}, value))

Description

setValue(columnHeader, {rowIndex}, value)) sets the value of the specified column and row

Input Arguments

columnHeader The header of the column to be assigned

{rowIndex} A cell array containing the index of the row to be assigned

value The value to be assigned

Example

Modify a DataFrame a column from a table

df = DataFrame(1, 'PROD', 'price');
df.addRow('shirts', 5);
df.addRow('skirts', 4.5);
df.addColumn('cost', [4.5, 5]);
df.setValue('shirts', 'price', 10)

gives:

df =
PROD    |  price  cost
shirts  |  10.0   4.5
skirts  |  4.5    5.0

setMatrix

classmethod DataFrame.setMatrix()

Syntax

setMatrix(value, {indices1, ..., indicesn})

Description

setMatrix(matrix, {indices1, ..., indicesn}) assigns all the values of a MATLAB n-dimensional matrix to this DataFrame, together with the indices values.

Input Arguments

matrix The matrix containing the values. It can be an n-dimensional cell array containing strings too. Note that, in the current version, matrices to up to six dimensions are supported.

{indices1, ... indicesn} A cell array containing n vectors where n is the number of dimensions of the matrix. Each vector must have the same size of the matrix in that dimension.

Example 1: One dimension

Create a one dimensional matrix (an array) and assign it

% Create a vector of four random numbers
A = randn(4,1);
% Create a vector of indices
A1 = {'a1' , 'a2', 'a3' , 'a4'};
% Create a dataframe with 1 index
df = DataFrame(1, 'A1', 'Values');
% Assign the vector
df.setMatrix(A, A1)
% Print the value
df

gives:

df =
A1  |  Values
a1  |  0.3187652398589808
a2  |  -1.3076882963052734
a3  |  -0.43359202230568356
a4  |  0.3426244665386499

Example 2: Two dimensions and assignment to an AMPL parameter

Create a two dimensional matrix and assign it to an AMPL parameter using AMPL.setData. The binding between the DataFrame and the parameter to which the values are assigned is achieved via the name of the only data column in the DataFrame.

% Create a 4x3 matrix
A = [1 2 3; 4 5 6; 7 8 9; 10 11 12];
% First index (cardinality = number of rows
A1 = {'a1' ; 'a2'; 'a3' ; 'a4'};
% Second index (cardinality = number of columns)
A2 = {1; 2; 3};
% Create a dataframe with two idexing columns and
% a data column called TheParameter
df = DataFrame(2, 'A1', 'A2', 'TheParameter');
% Assign the matrix to the dataframe
df.setMatrix(A, A1, A2)

% Create an AMPL instance
ampl = AMPL;
% Create two sets and a parameter. The name of the parameter
% has to be the same as the column in our dataframe.
% Sets name can differ, since we are not assigning them using
% AMPL.setData
ampl.eval('set S1; set S2; param TheParameter{S1, S2};');
% Set the data for sets S1 and S2
ampl.getSet('S1').setValues(A1);
ampl.getSet('S2').setValues(A2);
ampl.setData(df);
ampl.display('TheParameter')

gives:

TheParameter :=
   a1 1    1
   a1 2    2
   a1 3    3
   a2 1    4
   a2 2    5
   a2 3    6
   a3 1    7
   a3 2    8
   a3 3    9
   a4 1   10
   a4 2   11
   a4 3   12
   ;

Example 3: Symbolic parameters assignments

This Example shows how to assign a two-dimensional cell array containing strings to a parameter.

% Create a 2-d cell array
cellArray = {'r1c1', 'r1c2', 'r1c3'; 'r2c1', 'r2c2', 'r2c3'; 'r3c1', 'r3c2','r3c3'; 'r4c1', 'r4c2', 'r4c3'}
% assign it to the previous dataframe
df.setMatrix(cellArray, A1, A2)
% display the dataframe
df

gives:

df =
A1  A2   |  TheParameter
a1  1.0  |  r1c1
a1  2.0  |  r1c2
a1  3.0  |  r1c3
a2  1.0  |  r2c1
a2  2.0  |  r2c2
a2  3.0  |  r2c3
a3  1.0  |  r3c1
a3  2.0  |  r3c2
a3  3.0  |  r3c3
a4  1.0  |  r4c1
a4  2.0  |  r4c2
a4  3.0  |  r4c3

Example 4: Higher dimensional matrices

This Example shows how to assign a 4-d matrix to a DataFrame. The same logic applies to higher dimensional cell arrays.

% Create a 4-d matrix
B = randn(2,2,2,2);
% Create 4 sets of indices of cardinality 2 and of various types
% (both strings and numbers)
B1 = {'a1', 4};
B2 = {'1', '4'};
B3 = {'c1' ,'2'};
B4 = {4, '4'};
% Create a dataframe with 4 index columns
df = DataFrame(4, 'B1', 'B2', 'B3', 'B4', 'Value')
% Assign the valuees to the dataframe
df.setMatrix(B, B1, B2, B3, B4)
% Display the contents
df

gives:

df =
B1   B2  B3  B4   |  Value
a1   1   c1  4.0  |  -1.0688704581680317
a1   1   c1  4    |  -0.10224244608549089
a1   1   2   4.0  |  0.32519053945619786
a1   1   2   4    |  -0.8648799173244565
a1   4   c1  4.0  |  -2.9442841619948963
a1   4   c1  4    |  0.3192067391655018
a1   4   2   4.0  |  1.370298540095228
a1   4   2   4    |  -0.1648790192090383
4.0  1   c1  4.0  |  -0.8094986944248755
4.0  1   c1  4    |  -0.24144704160735794
4.0  1   2   4.0  |  -0.7549283191697034
4.0  1   2   4    |  -0.03005129619626856
4.0  4   c1  4.0  |  1.4383802928150984
4.0  4   c1  4    |  0.31285859663742843
4.0  4   2   4.0  |  -1.711516418853698
4.0  4   2   4    |  0.6277072875287265