.. _secMATLABrefDataFrame: Data Frame ========== .. mat:class:: DataFrame DataFrame objects are used to communicate data to and from AMPL entities. An object of this class can be used to do the following tasks: * Assign values to AMPL entities (once the DataFrame is populated, use the :mat:meth:`AMPL.setData` to assign its values to the modelling entities in its columns) * Get values from AMPL, decoupling the values from the AMPL entities they originate from the getData functions in Variables, Parameters, Constraints and Objectives Creation -------- A DataFrame object can be created in two ways. * Create a scheleton via ``DataFrame([amplEntities])``, in which the indexing columns are automatically detected depending on the AMPL entities used as headers. * Create a scheleton via ``DataFrame(int numberOfIndexColumns, header1, ..., headern)``, where arbitrary objects can be used as column headers (they will be converted to their string representation when communicating with the AMPL interpreter). Adding data ----------- Populating a DataFrame object can be done adding row by row to a pre-existing scheleton via :mat:meth:`DataFrame.addRow`, setting whole columns of a pre-existing scheleton via :mat:meth:`DataFrame.setColumn` or adding columns (including indexing columns) via :mat:meth:`DataFrame.addColumn`. The function :mat:meth:`DataFrame.setMatrix` provides a way to populate a DataFrame with values from MATLAB n-dimensional matrices and cell arrays. Modifying a DataFrame object can be done via :mat:meth:`DataFrame.setColumn` or, item by item, via :mat:meth:`DataFrame.setValue`. Accessing data -------------- Accessing data in a DataFrame can be done row by row using :mat:meth:`DataFrame.getRow` or by column via :mat:meth:`DataFrame.getColumn`, or by using MATLAB own indexing procedures. Access the dataframe by row by using MATLAB indexing: * ``df(index1, ..., indexn)`` (where ``index1, ..., indexn`` represent an indexing tuple for the current dataframe) is equivalent to calling :mat:meth:`DataFrame.getRow` with the same parameters * ``df(startindex:endindex)`` where ``startindex`` and ``endiindex`` are numerical indices of rows (position) return a cell array composed of all the rows in the range Access the dataframe by column using table-like syntax: * ``df.columnname`` is equivalent to call df.getColumn(columnname) Conversion to and from MATLAB tables ------------------------------------ MATLAB tables are a very powerful tool for handling data, and are very similar to dataframes. In built routines are provided for converting a DataFrame to and from MATLAB tables. To create a DataFrame from a MATLAB tables, the static function ``fromTable`` can be used: ``` table t; df = DataFrame.fromTable(t); ``` Whilst the opposite functionality is provided by the function ``toTable``: ``` t = df.toTable(); ``` DataFrame --------- .. mat:classmethod:: DataFrame.DataFrame() **Syntax** ``df = DataFrame([amplEntities])`` ``df = DataFrame(int numberOfIndexColumns, header1, ..., headern)`` **Description** ``df = DataFrame([amplEntities])`` creates a skeleton structure, where the indexing columns are automatically inferred depending on the AMPL entities used as headers ``df = DataFrame(numberOfIndexColumns, header1, ..., headern)`` creates a scheleton structure, where arbitrary objects can be used as column headers (they will be converted to their string representation when communicating with the AMPL interpreter). **Input Arguments** ``numberOfIndexColumns`` The number of columns to be considered as indices in the table this object represents. Can be zero in case of DataFrames representing scalar data. ``amplEntities`` A vector of AMPL entities (obtained via one of the functions at :mat:meth:`AMPL.getEntity`). Sets should be listed first, as the have to be the first columns of the table. ``header1,...,headern`` A list of header to be used as column headers. These names will specify how the data will be then bound to AMPL entities when using :mat:meth:`AMPL.setData`. **Output Arguments** ``df`` A DataFrame object that can be used for assigning and manipulating data. **Example** 1 Create a dataframe with two indexing columns starting from AMPL entities and populate them row by row: .. code-block:: matlab ampl.eval('set PROD; set COLOUR; param price{PROD, COLOUR};'); PROD = ampl.getSet('PROD'); COLOUR = ampl.getSet('COLOUR'); price = ampl.getParameter('price'); df = DataFrame(PROD, COLOUR, price); df.addRow('shirts', 'red', 5); df.addRow('shirts', 'blue', 4.5); df.addRow('skirts', 'blue', 5.5); df gives:: df = PROD COLOUR | price shirts red | 5.0 shirts blue | 4.5 skirts blue | 5.5 **Example** 2 Create a dataframe with one indexing column and two data columns, start adding data row by row, then add a whole column with data .. code-block:: matlab df = DataFrame(1, 'PROD', 'price'); df.addRow('shirts', 5); df.addRow('skirts', 4.5); df.addColumn('cost', [4.5 5]) df gives:: df = PROD | price cost shirts | 5.0 4.5 skirts | 4.5 5.0 addColumn --------- .. mat:classmethod:: DataFrame.addColumn() **Syntax** ``addColumn(headerName)`` ``addColumn(amplEntity)`` ``addColumn(headerName, [values])`` ``addColumn(amplEntity, [values])`` **Description** ``addColumn(headerName)`` add a new empty column with the specified header to the dataframe. ``addColumn(amplEntity)`` add a new empty column with the specified header as the name of the AMPL entity to the dataframe ``addColumn(headerName, [values])`` add a column with the specified header to the DataFrame, containing the listed values. If the added column is an indexing column, the indices will be populated with the cartesian product of all the members of the columns already populated. **Input Arguments** ``headerName`` The name of the column to be added ``amplEntity`` An AMPL entity, its name will be used for the new column ``values`` A vector of values to be used in the column. It has to have the same cardinality as the DataFrame (accessible via :mat:meth:`DataFrame.getNumRows`) if the column is not an indexing column. Otherwise, the cardinality of the table will be modified to the cartesian product of the defined sets. **Example** 1 Create a dataframe with one indexing columns and add an empty column later. Populate then the column via :mat:meth:`DataFrame.setColumn`. row by row .. code-block:: matlab df = DataFrame(1, 'PROD'); df.setColumn('PROD', {'shirts'; 'skirts'}); df.addColumn('price') df.setColumn('price', {6.5; 5.5}) df gives:: PROD | price shirts | 6.5 skirts | 5.5 **Example** 2 Create a dataframe with three indexing columns, and populate them. .. code-block:: matlab df = DataFrame(3, 'ORIGIN', 'DEST', 'MODE'); routes = {'London', 'New York'; 'London', 'Milan'} modes = {'air', 'normal'}; df.setColumn('ORIGIN', routes(:,1)); df.setColumn('DEST' , routes(:,2)); df.setColumn('MODE', modes) df gives:: df = ORIGIN DEST MODE | London New York air | London Milan normal | then to add a column with travelling times (ordered as shown), it is sufficient to add: .. code-block:: matlab df.addColumn('time', [7 15]); df will list:: df = ORIGIN DEST MODE | time London New York air | 7.0 London Milan normal | 15.0 addRow ------ .. mat:classmethod:: DataFrame.addRow() **Syntax** ``addRow({colValues})`` **Description** ``addRow({colValues})`` adds a row to the DataFrame **Input Arguments** ``{colValues}`` A cell array of values, of cardinality :mat:meth:`DataFrame.getNumCols` **Example** Construct a DataFrame row by row: .. code-block:: matlab ampl.eval('set PROD; set COLOUR; param price{PROD, COLOUR};'); PROD = ampl.getSet('PROD'); COLOUR = ampl.getSet('COLOUR'); price = ampl.getParameter('price'); df = DataFrame(PROD, COLOUR, price); df.addRow('shirts', 'red', 5); df.addRow('shirts', 'blue', 4.5); df.addRow('skirts', 'blue', 5.5); df gives:: df = PROD COLOUR | price shirts red | 5.0 shirts blue | 4.5 skirts blue | 5.5 setColumn --------- .. mat:classmethod:: DataFrame.setColumn() **Syntax** ``setColumn(headerName, [values])`` ``setColumn(amplEntity, [values])`` **Description** ``setColumn(headerName, [values])`` sets the values of the column specified by headername to the given vector of values. Cannot be used to specify values of indexing columns if not the first one. ``setColumn(amplEntity, [values])`` sets the values of the column inferred by the AMPL entity to the given vector of values. Cannot be used to specify values of indexing columns if not the first one. **Input Arguments** ``headerName`` The name of the column to be added ``amplEntity`` An AMPL entity, its name will be used for the new column ``values`` A vector of values to be used in the column. It has to have the same cardinality as the DataFrame (accessible via :mat:meth:`DataFrame.getNumRows`). **Example** Create a dataframe with two columns and populate them. .. code-block:: matlab df = DataFrame(1, 'PROD', 'price'); df.setColumn('PROD', {'shirts', 'skirts'}) df.setColumn('price', [6.5, 5.5]) gives:: PROD | price shirts | 6.5 skirts | 5.5 getNumCols ---------- .. mat:classmethod:: DataFrame.getNumCols() **Syntax** nc = ``getNumCols`` **Description** ``getNumCols()`` gets the number of columns in the current DataFrame. Note that MATLAB's own size function can be used as well. **Input Arguments** ``nc`` The number of columns actually defined in the DataFrame **Example** Create a dataframe and keep adding columns .. code-block:: matlab df = DataFrame(2, 'PROD', 'COLOUR'); df.getNumCols df.addColumn('price'); df.getNumCols size(df) gives:: ans = 2 ans = 3 ans = 0 3 getNumRows ---------- .. mat:classmethod:: DataFrame.getNumRows() **Syntax** nr = ``getNumRows`` **Description** ``getNumRows()`` gets the number of rows in the current DataFrame. Note that MATLAB's own size function can be used as well. **Input Arguments** ``nr`` The number of rows in the dataframe **Example** Create a dataframe by cartesian product and explore the number of rows .. code-block:: matlab df = DataFrame(3, 'ORIGIN', 'DEST', 'MODE'); hubs = {'London', 'New York', 'Delhi'}; df.setColumn('ORIGIN', hubs); df.setColumn('DEST', hubs); df.setColumn('MODE', {'air', 'normal', 'slow'}) df.getNumRows size(df) gives:: ans = 3 ans = 3 3 getNumIndices ------------- .. mat:classmethod:: DataFrame.getNumIndices() **Syntax** ni = ``getNumIndices`` **Description** ``getNumIndices()`` gets the number of indexing columns in the current DataFrame **Input Arguments** ``ni`` The number of indexing columns in the dataframe **Example** Show the number of indexing columns of a DataFrame .. code-block:: matlab df = DataFrame(2, 'index1', 'index2'); df.getNumIndices gives:: ans = 2 getRow ------ .. mat:classmethod:: DataFrame.getRow() **Syntax** [row] = ``getRow({rowindices})`` [row] = ``getRowByIndex(index)`` **Description** [row] = ``getRow({rowindices})`` gets the row indicated by rowindices [row] = ``getRowByIndex(index)`` gets the row specified by the zero based index specified **Input Arguments** ``rowindices`` A cell array of indices values, of cardinality :mat:meth:`DataFrame.getNumIndices` ``index`` The zero-based integer index of the wanted row **Output Arguments** ``row`` A cell array containing an horizontal "slice" of the DataFrame **Example** Extract rows by specifying the indices and by their ordinal number in the table .. code-block:: matlab df = DataFrame(3, 'ORIGIN', 'DEST', 'MODE', 'time'); origins = {'London'; 'New York'; 'Milan'; 'London'; 'New York'; 'Milan'} destinations = {'New York'; 'Milan'; 'London'; 'New York'; 'Milan'; 'London'} df.setColumn('ORIGIN', origins); df.setColumn('DEST', destinations); df.setColumn('MODE', {'air';'air';'air'; 'normal';'normal';'normal'}) df.setColumn('time', [6,7,2,60,60,20]); % The two statements below refer to the same row row1 = df.getRow('London', 'New York', 'air'); row2 = df.getRowByIndex(1); row1(4) row2(4) gives:: ans = 6 ans = 6 getColumn --------- .. mat:classmethod:: DataFrame.getColumn() **Syntax** [column] = ``getColumn(columnHeader)`` [columnvalues] = ``getColumnAsDoubles(columnHeader)`` [columnstrings] = ``getColumnAsStrings(columnHeader)`` **Description** [column] = ``getColumn(columnHeader)`` gets the specified column as a java array of objects [columnvalues] = ``getColumnAsDoubles(columnHeader)`` gets the specified column as a matlab vector of numbers [columnstrings] = ``getColumnAsStrings(columnHeader)`` gets the specified column as a vector of strings **Input Arguments** ``columnHeader`` The header of the column to be extracted **Output Arguments** ``column`` A cell array containing the values of the column ``columnvalues`` A vector containing the values of the column ``columnstrings`` A cell array containing the values of the column **Example** 1 Extract a column from a dataframe .. code-block:: matlab df = DataFrame(3, 'ORIGIN', 'DEST', 'MODE', 'time'); origins = {'London'; 'New York'; 'Milan'; 'London'; 'New York'; 'Milan'}; destinations = {'New York'; 'Milan'; 'London'; 'New York'; 'Milan'; 'London'}; df.setColumn('ORIGIN', origins); df.setColumn('DEST', destinations); df.setColumn('MODE', {'air';'air';'air'; 'normal';'normal';'normal'}) df.setColumn('time', [6,7,2,60,60,20]); column=df.getColumnAsDoubles('time') gives:: column = 6 7 2 60 60 20 **Example** 2 Extract values from an AMPL parameter as doubles, using a DataFrame as intermediary .. code-block:: matlab ampl = AMPL ampl.eval('param P{i in 1..4} := i;'); P = ampl.getParameter('P'); df = P.getValues; Pvalues = df.getColumnAsDoubles('val'); Pvalues gives:: Pvalues = 1 2 3 4 setValue -------- .. mat:classmethod:: DataFrame.setValue() **Syntax** ``setValue(columnHeader, {rowIndex}, value))`` **Description** ``setValue(columnHeader, {rowIndex}, value))`` sets the value of the specified column and row **Input Arguments** ``columnHeader`` The header of the column to be assigned ``{rowIndex}`` A cell array containing the index of the row to be assigned ``value`` The value to be assigned **Example** Modify a DataFrame a column from a table .. code-block:: matlab df = DataFrame(1, 'PROD', 'price'); df.addRow('shirts', 5); df.addRow('skirts', 4.5); df.addColumn('cost', [4.5, 5]); df.setValue('shirts', 'price', 10) gives:: df = PROD | price cost shirts | 10.0 4.5 skirts | 4.5 5.0 setMatrix --------- .. mat:classmethod:: DataFrame.setMatrix() **Syntax** ``setMatrix(value, {indices1, ..., indicesn})`` **Description** ``setMatrix(matrix, {indices1, ..., indicesn})`` assigns all the values of a MATLAB n-dimensional matrix to this DataFrame, together with the indices values. **Input Arguments** ``matrix`` The matrix containing the values. It can be an n-dimensional cell array containing strings too. Note that, in the current version, matrices to up to six dimensions are supported. ``{indices1, ... indicesn}`` A cell array containing *n* vectors where *n* is the number of dimensions of the matrix. Each vector must have the same size of the ``matrix`` in that dimension. **Example** 1: One dimension Create a one dimensional matrix (an array) and assign it .. code-block:: matlab % Create a vector of four random numbers A = randn(4,1); % Create a vector of indices A1 = {'a1' , 'a2', 'a3' , 'a4'}; % Create a dataframe with 1 index df = DataFrame(1, 'A1', 'Values'); % Assign the vector df.setMatrix(A, A1) % Print the value df gives:: df = A1 | Values a1 | 0.3187652398589808 a2 | -1.3076882963052734 a3 | -0.43359202230568356 a4 | 0.3426244665386499 **Example** 2: Two dimensions and assignment to an AMPL parameter Create a two dimensional matrix and assign it to an AMPL parameter using :mat:meth:`AMPL.setData`. The binding between the DataFrame and the parameter to which the values are assigned is achieved via the name of the only data column in the DataFrame. .. code-block:: matlab % Create a 4x3 matrix A = [1 2 3; 4 5 6; 7 8 9; 10 11 12]; % First index (cardinality = number of rows A1 = {'a1' ; 'a2'; 'a3' ; 'a4'}; % Second index (cardinality = number of columns) A2 = {1; 2; 3}; % Create a dataframe with two idexing columns and % a data column called TheParameter df = DataFrame(2, 'A1', 'A2', 'TheParameter'); % Assign the matrix to the dataframe df.setMatrix(A, A1, A2) % Create an AMPL instance ampl = AMPL; % Create two sets and a parameter. The name of the parameter % has to be the same as the column in our dataframe. % Sets name can differ, since we are not assigning them using % AMPL.setData ampl.eval('set S1; set S2; param TheParameter{S1, S2};'); % Set the data for sets S1 and S2 ampl.getSet('S1').setValues(A1); ampl.getSet('S2').setValues(A2); ampl.setData(df); ampl.display('TheParameter') gives:: TheParameter := a1 1 1 a1 2 2 a1 3 3 a2 1 4 a2 2 5 a2 3 6 a3 1 7 a3 2 8 a3 3 9 a4 1 10 a4 2 11 a4 3 12 ; **Example** 3: Symbolic parameters assignments This **Example** shows how to assign a two-dimensional cell array containing strings to a parameter. .. code-block:: matlab % Create a 2-d cell array cellArray = {'r1c1', 'r1c2', 'r1c3'; 'r2c1', 'r2c2', 'r2c3'; 'r3c1', 'r3c2','r3c3'; 'r4c1', 'r4c2', 'r4c3'} % assign it to the previous dataframe df.setMatrix(cellArray, A1, A2) % display the dataframe df gives:: df = A1 A2 | TheParameter a1 1.0 | r1c1 a1 2.0 | r1c2 a1 3.0 | r1c3 a2 1.0 | r2c1 a2 2.0 | r2c2 a2 3.0 | r2c3 a3 1.0 | r3c1 a3 2.0 | r3c2 a3 3.0 | r3c3 a4 1.0 | r4c1 a4 2.0 | r4c2 a4 3.0 | r4c3 **Example** 4: Higher dimensional matrices This **Example** shows how to assign a 4-d matrix to a DataFrame. The same logic applies to higher dimensional cell arrays. .. code-block:: matlab % Create a 4-d matrix B = randn(2,2,2,2); % Create 4 sets of indices of cardinality 2 and of various types % (both strings and numbers) B1 = {'a1', 4}; B2 = {'1', '4'}; B3 = {'c1' ,'2'}; B4 = {4, '4'}; % Create a dataframe with 4 index columns df = DataFrame(4, 'B1', 'B2', 'B3', 'B4', 'Value') % Assign the valuees to the dataframe df.setMatrix(B, B1, B2, B3, B4) % Display the contents df gives:: df = B1 B2 B3 B4 | Value a1 1 c1 4.0 | -1.0688704581680317 a1 1 c1 4 | -0.10224244608549089 a1 1 2 4.0 | 0.32519053945619786 a1 1 2 4 | -0.8648799173244565 a1 4 c1 4.0 | -2.9442841619948963 a1 4 c1 4 | 0.3192067391655018 a1 4 2 4.0 | 1.370298540095228 a1 4 2 4 | -0.1648790192090383 4.0 1 c1 4.0 | -0.8094986944248755 4.0 1 c1 4 | -0.24144704160735794 4.0 1 2 4.0 | -0.7549283191697034 4.0 1 2 4 | -0.03005129619626856 4.0 4 c1 4.0 | 1.4383802928150984 4.0 4 c1 4 | 0.31285859663742843 4.0 4 2 4.0 | -1.711516418853698 4.0 4 2 4 | 0.6277072875287265