{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "WHPFGk_7OkHg", "tags": [] }, "source": [ "```{index} single: application; regression\n", "```\n", "```{index} pandas dataframe\n", "```\n", "```{index} single: solver; highs\n", "```\n", "\n", "# Wine quality prediction with L1 regression" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "1LZZDfmaOkHo", "outputId": "00288bfb-14c0-4c0c-99d5-e62c163bd369", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m5.6/5.6 MB\u001b[0m \u001b[31m14.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hUsing default Community Edition License for Colab. Get yours at: https://ampl.com/ce\n", "Licensed to AMPL Community Edition License for the AMPL Model Colaboratory (https://colab.ampl.com).\n" ] } ], "source": [ "# install dependencies and select solver\n", "%pip install -q amplpy pandas matplotlib\n", "\n", "SOLVER = \"highs\"\n", "\n", "from amplpy import AMPL, ampl_notebook\n", "\n", "ampl = ampl_notebook(\n", " modules=[\"highs\"], # modules to install\n", " license_uuid=\"default\", # license to use\n", ") # instantiate AMPL object and register magics" ] }, { "cell_type": "markdown", "metadata": { "id": "7zCMJVsQOkHr", "tags": [] }, "source": [ "## Problem description\n", "\n", "Regression is the task of fitting a model to data. If things go well, the model might provide useful predictions in response to new data. This notebook shows how linear programming and least absolute deviation (LAD) regression can be used to create a linear model for predicting wine quality based on physical and chemical properties. The example uses a well known data set from the machine learning community.\n", "\n", "Physical, chemical, and sensory quality properties were collected for a large number of red and white wines produced in the Portugal then donated to the UCI machine learning repository (Cortez, Paulo, Cerdeira, A., Almeida, F., Matos, T. & Reis, J.. (2009). Wine Quality. UCI Machine Learning Repository.) The following cell reads the data for red wines directly from the UCI machine learning repository.\n", "\n", "Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision support systems, 47(4), 547-553. https://doi.org/10.1016/j.dss.2009.05.016\n", "\n", "Let us first download the data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 441 }, "id": "IwrjDMunOkHs", "outputId": "da10886e-d2ed-4450-828e-80b37876f255", "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "
| \n", " | fixed acidity | \n", "volatile acidity | \n", "citric acid | \n", "residual sugar | \n", "chlorides | \n", "free sulfur dioxide | \n", "total sulfur dioxide | \n", "density | \n", "pH | \n", "sulphates | \n", "alcohol | \n", "quality | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "7.4 | \n", "0.700 | \n", "0.00 | \n", "1.9 | \n", "0.076 | \n", "11.0 | \n", "34.0 | \n", "0.99780 | \n", "3.51 | \n", "0.56 | \n", "9.4 | \n", "5 | \n", "
| 1 | \n", "7.8 | \n", "0.880 | \n", "0.00 | \n", "2.6 | \n", "0.098 | \n", "25.0 | \n", "67.0 | \n", "0.99680 | \n", "3.20 | \n", "0.68 | \n", "9.8 | \n", "5 | \n", "
| 2 | \n", "7.8 | \n", "0.760 | \n", "0.04 | \n", "2.3 | \n", "0.092 | \n", "15.0 | \n", "54.0 | \n", "0.99700 | \n", "3.26 | \n", "0.65 | \n", "9.8 | \n", "5 | \n", "
| 3 | \n", "11.2 | \n", "0.280 | \n", "0.56 | \n", "1.9 | \n", "0.075 | \n", "17.0 | \n", "60.0 | \n", "0.99800 | \n", "3.16 | \n", "0.58 | \n", "9.8 | \n", "6 | \n", "
| 4 | \n", "7.4 | \n", "0.700 | \n", "0.00 | \n", "1.9 | \n", "0.076 | \n", "11.0 | \n", "34.0 | \n", "0.99780 | \n", "3.51 | \n", "0.56 | \n", "9.4 | \n", "5 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 1594 | \n", "6.2 | \n", "0.600 | \n", "0.08 | \n", "2.0 | \n", "0.090 | \n", "32.0 | \n", "44.0 | \n", "0.99490 | \n", "3.45 | \n", "0.58 | \n", "10.5 | \n", "5 | \n", "
| 1595 | \n", "5.9 | \n", "0.550 | \n", "0.10 | \n", "2.2 | \n", "0.062 | \n", "39.0 | \n", "51.0 | \n", "0.99512 | \n", "3.52 | \n", "0.76 | \n", "11.2 | \n", "6 | \n", "
| 1596 | \n", "6.3 | \n", "0.510 | \n", "0.13 | \n", "2.3 | \n", "0.076 | \n", "29.0 | \n", "40.0 | \n", "0.99574 | \n", "3.42 | \n", "0.75 | \n", "11.0 | \n", "6 | \n", "
| 1597 | \n", "5.9 | \n", "0.645 | \n", "0.12 | \n", "2.0 | \n", "0.075 | \n", "32.0 | \n", "44.0 | \n", "0.99547 | \n", "3.57 | \n", "0.71 | \n", "10.2 | \n", "5 | \n", "
| 1598 | \n", "6.0 | \n", "0.310 | \n", "0.47 | \n", "3.6 | \n", "0.067 | \n", "18.0 | \n", "42.0 | \n", "0.99549 | \n", "3.39 | \n", "0.66 | \n", "11.0 | \n", "6 | \n", "
1599 rows × 12 columns
\n", "| \n", " | volatile acidity | \n", "density | \n", "alcohol | \n", "quality | \n", "
|---|---|---|---|---|
| volatile acidity | \n", "1.000000 | \n", "0.022026 | \n", "-0.202288 | \n", "-0.390558 | \n", "
| density | \n", "0.022026 | \n", "1.000000 | \n", "-0.496180 | \n", "-0.174919 | \n", "
| alcohol | \n", "-0.202288 | \n", "-0.496180 | \n", "1.000000 | \n", "0.476166 | \n", "
| quality | \n", "-0.390558 | \n", "-0.174919 | \n", "0.476166 | \n", "1.000000 | \n", "