Data Frame¶
- class DataFrame¶
DataFrame objects are used to communicate data to and from AMPL entities. An object of this class can be used to do the following tasks:
Assign values to AMPL entities (once the DataFrame is populated, use the
AMPL.setData
to assign its values to the modelling entities in its columns)Get values from AMPL, decoupling the values from the AMPL entities they originate from the getData functions in Variables, Parameters, Constraints and Objectives
Creation¶
A DataFrame object can be created in two ways.
Create a scheleton via
DataFrame([amplEntities])
, in which the indexing columns are automatically detected depending on the AMPL entities used as headers.Create a scheleton via
DataFrame(int numberOfIndexColumns, header1, ..., headern)
, where arbitrary objects can be used as column headers (they will be converted to their string representation when communicating with the AMPL interpreter).
Adding data¶
Populating a DataFrame object can be done adding row by row to a pre-existing scheleton via DataFrame.addRow
, setting whole columns
of a pre-existing scheleton via DataFrame.setColumn
or adding columns (including indexing columns) via DataFrame.addColumn
.
The function DataFrame.setMatrix
provides a way to populate a DataFrame with values from MATLAB n-dimensional matrices and cell arrays.
Modifying a DataFrame object can be done via DataFrame.setColumn
or, item by item, via DataFrame.setValue
.
Accessing data¶
Accessing data in a DataFrame can be done row by row using DataFrame.getRow
or by column via DataFrame.getColumn
, or by using
MATLAB own indexing procedures.
Access the dataframe by row by using MATLAB indexing:
df(index1, ..., indexn)
(whereindex1, ..., indexn
represent an indexing tuple for the current dataframe) is equivalent to callingDataFrame.getRow
with the same parametersdf(startindex:endindex)
wherestartindex
andendiindex
are numerical indices of rows (position) return a cell array composed of all the rows in the range
Access the dataframe by column using table-like syntax:
df.columnname
is equivalent to call df.getColumn(columnname)
Conversion to and from MATLAB tables¶
MATLAB tables are a very powerful tool for handling data, and are very similar to dataframes. In built routines are provided for converting a DataFrame to and from MATLAB tables.
To create a DataFrame from a MATLAB tables, the static function fromTable
can be used:
`
table t;
df = DataFrame.fromTable(t);
`
Whilst the opposite functionality is provided by the function toTable
:
`
t = df.toTable();
`
DataFrame¶
- classmethod DataFrame.DataFrame()¶
Syntax
df = DataFrame([amplEntities])
df = DataFrame(int numberOfIndexColumns, header1, ..., headern)
Description
df = DataFrame([amplEntities])
creates a skeleton structure, where the indexing columns are automatically inferred depending on the AMPL entities used as headers
df = DataFrame(numberOfIndexColumns, header1, ..., headern)
creates a scheleton structure, where arbitrary objects can be used as column headers (they will be converted to their string representation when communicating with the AMPL interpreter).
Input Arguments
numberOfIndexColumns
The number of columns to be considered as indices in the table this object represents. Can be zero in case of DataFrames representing scalar data.
amplEntities
A vector of AMPL entities (obtained via one of the functions at AMPL.getEntity
). Sets should be listed first, as the have to be the first columns of the table.
header1,...,headern
A list of header to be used as column headers. These names will specify how the data will be then bound to AMPL entities when using AMPL.setData
.
Output Arguments
df
A DataFrame object that can be used for assigning and manipulating data.
Example 1
Create a dataframe with two indexing columns starting from AMPL entities and populate them row by row:
ampl.eval('set PROD; set COLOUR; param price{PROD, COLOUR};');
PROD = ampl.getSet('PROD');
COLOUR = ampl.getSet('COLOUR');
price = ampl.getParameter('price');
df = DataFrame(PROD, COLOUR, price);
df.addRow('shirts', 'red', 5);
df.addRow('shirts', 'blue', 4.5);
df.addRow('skirts', 'blue', 5.5);
df
gives:
df =
PROD COLOUR | price
shirts red | 5.0
shirts blue | 4.5
skirts blue | 5.5
Example 2
Create a dataframe with one indexing column and two data columns, start adding data row by row, then add a whole column with data
df = DataFrame(1, 'PROD', 'price');
df.addRow('shirts', 5);
df.addRow('skirts', 4.5);
df.addColumn('cost', [4.5 5])
df
gives:
df =
PROD | price cost
shirts | 5.0 4.5
skirts | 4.5 5.0
addColumn¶
- classmethod DataFrame.addColumn()¶
Syntax
addColumn(headerName)
addColumn(amplEntity)
addColumn(headerName, [values])
addColumn(amplEntity, [values])
Description
addColumn(headerName)
add a new empty column with the specified header to the dataframe.
addColumn(amplEntity)
add a new empty column with the specified header as the name of the AMPL entity to the dataframe
addColumn(headerName, [values])
add a column with the specified header to the DataFrame, containing the listed values.
If the added column is an indexing column, the indices will be populated with the cartesian product of all the members of the columns already populated.
Input Arguments
headerName
The name of the column to be added
amplEntity
An AMPL entity, its name will be used for the new column
values
A vector of values to be used in the column. It has to have the same cardinality as the DataFrame (accessible via DataFrame.getNumRows
) if
the column is not an indexing column. Otherwise, the cardinality of the table will be modified to the cartesian product of the defined sets.
Example 1
Create a dataframe with one indexing columns and add an empty column later. Populate then the column via DataFrame.setColumn
.
row by row
df = DataFrame(1, 'PROD');
df.setColumn('PROD', {'shirts'; 'skirts'});
df.addColumn('price')
df.setColumn('price', {6.5; 5.5})
df
gives:
PROD | price
shirts | 6.5
skirts | 5.5
Example 2
Create a dataframe with three indexing columns, and populate them.
df = DataFrame(3, 'ORIGIN', 'DEST', 'MODE');
routes = {'London', 'New York'; 'London', 'Milan'}
modes = {'air', 'normal'};
df.setColumn('ORIGIN', routes(:,1));
df.setColumn('DEST' , routes(:,2));
df.setColumn('MODE', modes)
df
gives:
df =
ORIGIN DEST MODE |
London New York air |
London Milan normal |
then to add a column with travelling times (ordered as shown), it is sufficient to add:
df.addColumn('time', [7 15]);
df
will list:
df =
ORIGIN DEST MODE | time
London New York air | 7.0
London Milan normal | 15.0
addRow¶
- classmethod DataFrame.addRow()¶
Syntax
addRow({colValues})
Description
addRow({colValues})
adds a row to the DataFrame
Input Arguments
{colValues}
A cell array of values, of cardinality DataFrame.getNumCols
Example
Construct a DataFrame row by row:
ampl.eval('set PROD; set COLOUR; param price{PROD, COLOUR};');
PROD = ampl.getSet('PROD');
COLOUR = ampl.getSet('COLOUR');
price = ampl.getParameter('price');
df = DataFrame(PROD, COLOUR, price);
df.addRow('shirts', 'red', 5);
df.addRow('shirts', 'blue', 4.5);
df.addRow('skirts', 'blue', 5.5);
df
gives:
df =
PROD COLOUR | price
shirts red | 5.0
shirts blue | 4.5
skirts blue | 5.5
setColumn¶
- classmethod DataFrame.setColumn()¶
Syntax
setColumn(headerName, [values])
setColumn(amplEntity, [values])
Description
setColumn(headerName, [values])
sets the values of the column specified by headername to the given vector of values. Cannot be used to specify values of indexing columns if not the first one.
setColumn(amplEntity, [values])
sets the values of the column inferred by the AMPL entity to the given vector of values. Cannot be used to specify values of indexing columns if not the first one.
Input Arguments
headerName
The name of the column to be added
amplEntity
An AMPL entity, its name will be used for the new column
values
A vector of values to be used in the column. It has to have the same cardinality as the DataFrame (accessible via DataFrame.getNumRows
).
Example
Create a dataframe with two columns and populate them.
df = DataFrame(1, 'PROD', 'price');
df.setColumn('PROD', {'shirts', 'skirts'})
df.setColumn('price', [6.5, 5.5])
gives:
PROD | price
shirts | 6.5
skirts | 5.5
getNumCols¶
- classmethod DataFrame.getNumCols()¶
Syntax
nc = getNumCols
Description
getNumCols()
gets the number of columns in the current DataFrame.
Note that MATLAB’s own size function can be used as well.
Input Arguments
nc
The number of columns actually defined in the DataFrame
Example
Create a dataframe and keep adding columns
df = DataFrame(2, 'PROD', 'COLOUR');
df.getNumCols
df.addColumn('price');
df.getNumCols
size(df)
gives:
ans =
2
ans =
3
ans =
0 3
getNumRows¶
- classmethod DataFrame.getNumRows()¶
Syntax
nr = getNumRows
Description
getNumRows()
gets the number of rows in the current DataFrame.
Note that MATLAB’s own size function can be used as well.
Input Arguments
nr
The number of rows in the dataframe
Example
Create a dataframe by cartesian product and explore the number of rows
df = DataFrame(3, 'ORIGIN', 'DEST', 'MODE');
hubs = {'London', 'New York', 'Delhi'};
df.setColumn('ORIGIN', hubs);
df.setColumn('DEST', hubs);
df.setColumn('MODE', {'air', 'normal', 'slow'})
df.getNumRows
size(df)
gives:
ans =
3
ans =
3 3
getNumIndices¶
- classmethod DataFrame.getNumIndices()¶
Syntax
ni = getNumIndices
Description
getNumIndices()
gets the number of indexing columns in the current DataFrame
Input Arguments
ni
The number of indexing columns in the dataframe
Example
Show the number of indexing columns of a DataFrame
df = DataFrame(2, 'index1', 'index2');
df.getNumIndices
gives:
ans =
2
getRow¶
- classmethod DataFrame.getRow()¶
Syntax
[row] = getRow({rowindices})
[row] = getRowByIndex(index)
Description
[row] = getRow({rowindices})
gets the row indicated by rowindices
[row] = getRowByIndex(index)
gets the row specified by the zero based index specified
Input Arguments
rowindices
A cell array of indices values, of cardinality DataFrame.getNumIndices
index
The zero-based integer index of the wanted row
Output Arguments
row
A cell array containing an horizontal “slice” of the DataFrame
Example
Extract rows by specifying the indices and by their ordinal number in the table
df = DataFrame(3, 'ORIGIN', 'DEST', 'MODE', 'time');
origins = {'London'; 'New York'; 'Milan'; 'London'; 'New York'; 'Milan'}
destinations = {'New York'; 'Milan'; 'London'; 'New York'; 'Milan'; 'London'}
df.setColumn('ORIGIN', origins);
df.setColumn('DEST', destinations);
df.setColumn('MODE', {'air';'air';'air'; 'normal';'normal';'normal'})
df.setColumn('time', [6,7,2,60,60,20]);
% The two statements below refer to the same row
row1 = df.getRow('London', 'New York', 'air');
row2 = df.getRowByIndex(1);
row1(4)
row2(4)
gives:
ans =
6
ans =
6
getColumn¶
- classmethod DataFrame.getColumn()¶
Syntax
[column] = getColumn(columnHeader)
[columnvalues] = getColumnAsDoubles(columnHeader)
[columnstrings] = getColumnAsStrings(columnHeader)
Description
[column] = getColumn(columnHeader)
gets the specified column as a java array of objects
[columnvalues] = getColumnAsDoubles(columnHeader)
gets the specified column as a matlab vector of numbers
[columnstrings] = getColumnAsStrings(columnHeader)
gets the specified column as a vector of strings
Input Arguments
columnHeader
The header of the column to be extracted
Output Arguments
column
A cell array containing the values of the column
columnvalues
A vector containing the values of the column
columnstrings
A cell array containing the values of the column
Example 1
Extract a column from a dataframe
df = DataFrame(3, 'ORIGIN', 'DEST', 'MODE', 'time');
origins = {'London'; 'New York'; 'Milan'; 'London'; 'New York'; 'Milan'};
destinations = {'New York'; 'Milan'; 'London'; 'New York'; 'Milan'; 'London'};
df.setColumn('ORIGIN', origins);
df.setColumn('DEST', destinations);
df.setColumn('MODE', {'air';'air';'air'; 'normal';'normal';'normal'})
df.setColumn('time', [6,7,2,60,60,20]);
column=df.getColumnAsDoubles('time')
gives:
column =
6
7
2
60
60
20
Example 2
Extract values from an AMPL parameter as doubles, using a DataFrame as intermediary
ampl = AMPL
ampl.eval('param P{i in 1..4} := i;');
P = ampl.getParameter('P');
df = P.getValues;
Pvalues = df.getColumnAsDoubles('val');
Pvalues
gives:
Pvalues =
1
2
3
4
setValue¶
- classmethod DataFrame.setValue()¶
Syntax
setValue(columnHeader, {rowIndex}, value))
Description
setValue(columnHeader, {rowIndex}, value))
sets the value of the specified column and row
Input Arguments
columnHeader
The header of the column to be assigned
{rowIndex}
A cell array containing the index of the row to be assigned
value
The value to be assigned
Example
Modify a DataFrame a column from a table
df = DataFrame(1, 'PROD', 'price');
df.addRow('shirts', 5);
df.addRow('skirts', 4.5);
df.addColumn('cost', [4.5, 5]);
df.setValue('shirts', 'price', 10)
gives:
df =
PROD | price cost
shirts | 10.0 4.5
skirts | 4.5 5.0
setMatrix¶
- classmethod DataFrame.setMatrix()¶
Syntax
setMatrix(value, {indices1, ..., indicesn})
Description
setMatrix(matrix, {indices1, ..., indicesn})
assigns all the values of a MATLAB n-dimensional matrix to this DataFrame,
together with the indices values.
Input Arguments
matrix
The matrix containing the values. It can be an n-dimensional cell array containing strings too.
Note that, in the current version, matrices to up to six dimensions are supported.
{indices1, ... indicesn}
A cell array containing n vectors where n is the number of dimensions of the matrix.
Each vector must have the same size of the matrix
in that dimension.
Example 1: One dimension
Create a one dimensional matrix (an array) and assign it
% Create a vector of four random numbers
A = randn(4,1);
% Create a vector of indices
A1 = {'a1' , 'a2', 'a3' , 'a4'};
% Create a dataframe with 1 index
df = DataFrame(1, 'A1', 'Values');
% Assign the vector
df.setMatrix(A, A1)
% Print the value
df
gives:
df =
A1 | Values
a1 | 0.3187652398589808
a2 | -1.3076882963052734
a3 | -0.43359202230568356
a4 | 0.3426244665386499
Example 2: Two dimensions and assignment to an AMPL parameter
Create a two dimensional matrix and assign it to an AMPL parameter using AMPL.setData
.
The binding between the DataFrame and the parameter to which the values are assigned is
achieved via the name of the only data column in the DataFrame.
% Create a 4x3 matrix
A = [1 2 3; 4 5 6; 7 8 9; 10 11 12];
% First index (cardinality = number of rows
A1 = {'a1' ; 'a2'; 'a3' ; 'a4'};
% Second index (cardinality = number of columns)
A2 = {1; 2; 3};
% Create a dataframe with two idexing columns and
% a data column called TheParameter
df = DataFrame(2, 'A1', 'A2', 'TheParameter');
% Assign the matrix to the dataframe
df.setMatrix(A, A1, A2)
% Create an AMPL instance
ampl = AMPL;
% Create two sets and a parameter. The name of the parameter
% has to be the same as the column in our dataframe.
% Sets name can differ, since we are not assigning them using
% AMPL.setData
ampl.eval('set S1; set S2; param TheParameter{S1, S2};');
% Set the data for sets S1 and S2
ampl.getSet('S1').setValues(A1);
ampl.getSet('S2').setValues(A2);
ampl.setData(df);
ampl.display('TheParameter')
gives:
TheParameter :=
a1 1 1
a1 2 2
a1 3 3
a2 1 4
a2 2 5
a2 3 6
a3 1 7
a3 2 8
a3 3 9
a4 1 10
a4 2 11
a4 3 12
;
Example 3: Symbolic parameters assignments
This Example shows how to assign a two-dimensional cell array containing strings to a parameter.
% Create a 2-d cell array
cellArray = {'r1c1', 'r1c2', 'r1c3'; 'r2c1', 'r2c2', 'r2c3'; 'r3c1', 'r3c2','r3c3'; 'r4c1', 'r4c2', 'r4c3'}
% assign it to the previous dataframe
df.setMatrix(cellArray, A1, A2)
% display the dataframe
df
gives:
df =
A1 A2 | TheParameter
a1 1.0 | r1c1
a1 2.0 | r1c2
a1 3.0 | r1c3
a2 1.0 | r2c1
a2 2.0 | r2c2
a2 3.0 | r2c3
a3 1.0 | r3c1
a3 2.0 | r3c2
a3 3.0 | r3c3
a4 1.0 | r4c1
a4 2.0 | r4c2
a4 3.0 | r4c3
Example 4: Higher dimensional matrices
This Example shows how to assign a 4-d matrix to a DataFrame. The same logic applies to higher dimensional cell arrays.
% Create a 4-d matrix
B = randn(2,2,2,2);
% Create 4 sets of indices of cardinality 2 and of various types
% (both strings and numbers)
B1 = {'a1', 4};
B2 = {'1', '4'};
B3 = {'c1' ,'2'};
B4 = {4, '4'};
% Create a dataframe with 4 index columns
df = DataFrame(4, 'B1', 'B2', 'B3', 'B4', 'Value')
% Assign the valuees to the dataframe
df.setMatrix(B, B1, B2, B3, B4)
% Display the contents
df
gives:
df =
B1 B2 B3 B4 | Value
a1 1 c1 4.0 | -1.0688704581680317
a1 1 c1 4 | -0.10224244608549089
a1 1 2 4.0 | 0.32519053945619786
a1 1 2 4 | -0.8648799173244565
a1 4 c1 4.0 | -2.9442841619948963
a1 4 c1 4 | 0.3192067391655018
a1 4 2 4.0 | 1.370298540095228
a1 4 2 4 | -0.1648790192090383
4.0 1 c1 4.0 | -0.8094986944248755
4.0 1 c1 4 | -0.24144704160735794
4.0 1 2 4.0 | -0.7549283191697034
4.0 1 2 4 | -0.03005129619626856
4.0 4 c1 4.0 | 1.4383802928150984
4.0 4 c1 4 | 0.31285859663742843
4.0 4 2 4.0 | -1.711516418853698
4.0 4 2 4 | 0.6277072875287265