Bilevel Optimization Introduction#

bilevel_introduction.ipynb Open In Colab Open In Deepnote Open In Kaggle Open In Gradient Open In SageMaker Studio Lab Powered by AMPL

Description: A notebook as a gentle introduction to Bilevel Optimization and how to reformulate to Single Level using KKT conditions and Complementarity using AMPLPy and MP with a simple Stackelberg model

Tags: educational, bilevel, complementarity, amplpy, gurobi, knitro, stackelberg

Notebook author: Eduardo Salazar <eduardo@ampl.com>

Model author: Eduardo Salazar <eduardo@ampl.com>

References:

  1. A Gentle and Incomplete Introduction to Bilevel Optimization: https://opus4.kobv.de/opus4-trr154/files/392/bilevel-optimization.pdf

# Install dependencies
%pip install -q amplpy
# Google Colab & Kaggle integration
from amplpy import AMPL, ampl_notebook

ampl = ampl_notebook(
    modules=["gurobi", "knitro"],  # modules to install
    license_uuid="default",  # license to use
)  # instantiate AMPL object and register magics

What is a Bilevel Program?#

A bilevel program (or bilevel optimization problem) is an optimization problem where one optimization problem is nested inside another. It has a hierarchical structure with two decision-makers:

  • Upper level (Leader): Makes decisions first

  • Lower level (Follower): Observes the leader’s decision, then optimizes their own objective

General Structure#

$

(1)#\[\begin{align} \min_{x} \quad & F(x, y^*) && \text{(Leader's objective)} \\ \text{s.t.} \quad & G(x, y^*) \leq 0 && \text{(Leader's constraints)} \\ & y^* \in \arg\min_{y} \{f(x,y) : g(x,y) \leq 0\} && \text{(Follower's problem)} \end{align}\]

$

Key point: The leader’s objective depends on \(y*\), which is the optimal solution to the follower’s problem (which itself depends on \(x\)).

Why “Bilevel”?#

The decision-making happens in two levels:

  • Level 1 (Upper): Leader chooses \(x\)

  • Level 2 (Lower): Given \(x\), follower solves their optimization to find \(y*\)

  • Back to Level 1: Leader evaluates \(F(x, y*)\) and picks best \(x\)

The leader must anticipate the follower’s rational response.

Bilevel problems are notoriously difficult, so there’s no single “best” method. The approach depends on the problem structure. In this notebook we will solve bilevel problems through the KKT Reformulation → Single-Level MPEC approach.

Toy example#

Say we have the following bilevel problem:

(2)#\[\begin{align} \min_{x, y} \quad & x^2 + (y - 10)^2 && \text{(Leader's objective)} \\ \text{s.t.} \quad & \min (y - x)^2 \text{ s.t.} \quad y \geq 0 && \text{(Follower's objective and constraint)} \\ \end{align}\]

Leader’s Objective: \(x^2 + (y - 10)^2\)

The leader has conflicting goals:

  • \(x^2\) term: Prefers \(x\) close to 0 (keep their own action small)

  • \((y-10) ^2\) term: Prefers \(y\) close to 10 (wants follower to respond with \(y=10\))

Follower’s Objective: \((y - x)^2\)

The follower wants to match the leader:

  • Minimizing \((y - x)^2\) means preferring \(y=x\)

  • But constrained by \( y \geq 0\) (can’t go negative)

The Strategic Tension#

This creates a Stackelberg (leader-follower) game:

  1. Leader moves first, choosing \(x\)

  2. Follower observes \(x\), then solves their optimization problem

  3. Leader anticipates the follower’s optimal response when making their choice

The leader can’t directly control \(y\), but can influence it through their choice of \(x\).

The Karush-Kuhn-Tucker (KKT) conditions are necessary conditions for optimality in constrained optimization. They generalize the idea that “at a minimum, the gradient equals zero” to problems with constraints.

To solve with the KKT and MPEC approach, we first need to do the following:

Step 1: Write the Lower-Level Problem For a given \(x\), the follower solves:

(3)#\[\begin{align} \min_y \quad (y - x)^2 \\ \text{s.t.} \quad y \geq 0 \end{align}\]

Step 2: Derive KKT Conditions for Lower Level.

The Lagrangian for the lower level is: \(L(y, \lambda) = (y - x)^2 + \lambda(-y)\)

where \(\lambda\) is the dual variable for the constraint \(y \geq 0\) (written as \(-y \leq 0\)).

KKT Conditions:

  1. Stationarity: \(2(y - x) - \lambda = 0\). At the optimum, the gradient of the objective is a linear combination of the gradients of the active constraints. You can’t improve the objective without violating a constraint.

  2. Primal feasibility: \(y \geq 0\). The solution must satisfy all constraints (obvious but necessary to state).

  3. Dual feasibility: \(\lambda \geq 0\). Dual variables for inequality constraints must be non-negative. This reflects that constraints can only “push” the solution in one direction.

  4. Complementary slackness: \(\lambda \cdot y = 0\). For each inequality constraint, either the constraint is active OR the dual variable is zero, or both. Inactive constraints don’t affect the solution.

Step 3: Single-Level Reformulation

Replace the lower-level problem with its KKT conditions:

(4)#\[\begin{align} \min_{x,y,\lambda} \quad x^2 + (y - 10)^2 \\ \text{s.t.} \quad 2(y - x) - \lambda = 0 \\ y \geq 0 \\ \lambda \geq 0 \\ \lambda y = 0 \end{align}\]

Important Notes

The complementarity constraint \(\lambda y = 0\) makes this an MPEC (Mathematical programming with equilibrium constraints).

This is non-convex and violates standard constraint qualifications, however, through specialized solvers and AMPL MP, we can solve this problem directly.

%%writefile stackelberg.mod
# Simple Stackelberg Game - Bilevel MPEC Formulation
#
# Upper level (Leader): min x^2 + (y - 10)^2
# Lower level (Follower): min (y - x)^2  s.t. y >= 0

# Upper level variables (leader's decisions)
var x;  # Leader's choice (unrestricted)

# Lower level primal variables (follower's decisions)
var y;  # Follower's choice (note: no bound here, it's in the complementarity)

# Lower level dual variables (KKT multipliers)
var lambda;  # Multiplier (note: no bound here either)

# Upper level objective: Leader's cost
minimize leader_cost:
    x^2 + (y - 10)^2;

# Lower level KKT conditions:

# 1. Stationarity: gradient of Lagrangian = 0
subject to stationarity:
    2 * (y - x) - lambda = 0;

# 2. Primal feasibility & complementarity combined
subject to complementarity:
    lambda >= 0 complements y >= 0;
Writing stackelberg.mod
# Load the model from a file
ampl.read("stackelberg.mod")
# Set the solver to Gurobi
ampl.option["solver"] = "gurobi"
ampl.option["gurobi_options"] = "outlev=0"

# Solve the problem
ampl.solve()

# Get the solve status
solve_result = ampl.get_value("solve_result")
print(f"Solve result: {solve_result}")

# Retrieve solution values
x_val = ampl.get_variable("x").value()
y_val = ampl.get_variable("y").value()
lambda_val = ampl.get_variable("lambda").value()
obj_val = ampl.get_objective("leader_cost").value()

print(f"\nSolution:")
print(f"x (leader) = {x_val:.6f}")
print(f"y (follower) = {y_val:.6f}")
print(f"lambda (multiplier) = {lambda_val:.6f}")
print(f"Objective value = {obj_val:.6f}")

# Verify complementarity condition
comp_violation = abs(lambda_val * y_val)
print(f"\nComplementarity violation: {comp_violation:.2e}")
WARNING:amplpy:
	Nonsquare complementarity system:
		2 complementarities including 1 equations
		3 variables
Gurobi 12.0.3:   tech:outlev = 0
Gurobi 12.0.3: optimal solution; objective 50
7 simplex iterations
3 branching nodes
Solve result: solved

Solution:
x (leader) = 5.000000
y (follower) = 5.000000
lambda (multiplier) = 0.000000
Objective value = 50.000000

Complementarity violation: 0.00e+00

Note that we solved the MPEC problem, which is nonconvex and quadratic using Gurobi. AMPL’s MP has automatically translated the complementarity constraints to use indicator constraints that Gurobi can handle. We can also solve using Knitro, which supports complementarity constraints natively. We should be getting the same solution for both solvers.

# Set the solver to Knitro
ampl.option["solver"] = "knitro"
ampl.option["knitro_options"] = "outlev=0"

# Solve the problem
ampl.solve()

# Get the solve status
solve_result = ampl.get_value("solve_result")
print(f"Solve result: {solve_result}")

# Retrieve solution values
x_val = ampl.get_variable("x").value()
y_val = ampl.get_variable("y").value()
lambda_val = ampl.get_variable("lambda").value()
obj_val = ampl.get_objective("leader_cost").value()

print(f"\nSolution:")
print(f"x (leader) = {x_val:.6f}")
print(f"y (follower) = {y_val:.6f}")
print(f"lambda (multiplier) = {lambda_val:.6f}")
print(f"Objective value = {obj_val:.6f}")

# Verify complementarity condition
comp_violation = abs(lambda_val * y_val)
print(f"\nComplementarity violation: {comp_violation:.2e}")
Artelys Knitro 15.0.1: outlev=0
Knitro 15.0.1: Locally optimal or satisfactory solution.
objective 49.99999992; feasibility error 1.58e-08
4 iterations; 7 function evaluations

suffix feaserror OUT;
suffix opterror OUT;
suffix numfcevals OUT;
suffix numiters OUT;
Solve result: solved

Solution:
x (leader) = 5.000000
y (follower) = 5.000000
lambda (multiplier) = 0.000000
Objective value = 50.000000

Complementarity violation: 7.92e-08

Understanding Complementarity: Simple Stackelberg Example#

The complements operator in AMPL encodes a special type of constraint that’s fundamental to bilevel optimization. Let’s understand it through this simple example.


What Does lambda >= 0 complements y >= 0 Mean?#

This single line encodes three conditions simultaneously:

  1. \(\lambda \geq 0\) (dual variable is non-negative)

  2. \(y \geq 0\) (primal variable satisfies lower bound)

  3. \(\lambda \cdot y = 0\) (complementarity: at least one must equal zero)

In words: Either \(\lambda = 0\) OR \(y = 0\) (or both).


The Three Possible Cases#

The complementarity constraint defines three mutually exclusive scenarios:

Case

\(\lambda\)

\(y\)

Interpretation

1. Interior

\(= 0\)

\(> 0\)

Follower’s constraint is inactive; \(y\) is chosen freely

2. Boundary

\(> 0\)

\(= 0\)

Follower’s constraint is binding; forced to \(y = 0\)

3. Degenerate

\(= 0\)

\(= 0\)

Both at boundary (rare, but allowed)

Invalid: Both \(\lambda > 0\) AND \(y > 0\) (violates complementarity)


Why Complementarity Represents “Either/Or” Logic#

The lower level problem is:

\[ \min_{y} \quad (y - x)^2 \quad \text{s.t.} \quad y \geq 0 \]

The KKT conditions for this problem are:

  1. Stationarity: \(\frac{\partial}{\partial y}[(y-x)^2 - \lambda \cdot (-y)] = 2(y - x) - \lambda = 0\)

  2. Primal feasibility: \(y \geq 0\)

  3. Dual feasibility: \(\lambda \geq 0\)

  4. Complementarity: \(\lambda \cdot y = 0\)

The complementarity condition captures the following logic:

IF the constraint y ≥ 0 is INACTIVE (y > 0):
    THEN the multiplier λ = 0 (constraint has no shadow value)

IF the constraint y ≥ 0 is ACTIVE (y = 0):
    THEN the multiplier λ > 0 (constraint is preventing further reduction)

This is standard Lagrange multiplier theory: a constraint’s multiplier is positive only when the constraint is binding.


Solving This Problem Analytically#

Let’s solve by considering both cases:

Case 1: Interior Solution (\(y > 0\), so \(\lambda = 0\))#

From stationarity: $\(2(y - x) - \lambda = 0 \implies 2(y - x) = 0 \implies y = x\)$

For this to be valid, we need \(y = x > 0\).

Substituting into the upper level: $\(\min_{x} \quad x^2 + (x - 10)^2 = x^2 + x^2 - 20x + 100 = 2x^2 - 20x + 100\)$

Taking the derivative: $\(\frac{d}{dx}[2x^2 - 20x + 100] = 4x - 20 = 0 \implies x^* = 5\)$

Therefore: \(x^* = 5\), \(y^* = 5\), \(\lambda^* = 0\)

Check feasibility: \(y^* = 5 > 0\)

Case 2: Boundary Solution (\(y = 0\), so \(\lambda > 0\))#

If \(y = 0\), from stationarity:

\[2(0 - x) - \lambda = 0 \implies \lambda = -2x\]

For dual feasibility, we need \(\lambda \geq 0\), which requires \(x \leq 0\).

Substituting \(y = 0\) into the upper level:

\[\min_{x \leq 0} \quad x^2 + (0 - 10)^2 = x^2 + 100\]

This is minimized at \(x = 0\) (closest feasible point to unconstrained minimum).

Therefore: \(x^* = 0\), \(y^* = 0\), \(\lambda^* = 0\)

Objective value: \(0^2 + (0-10)^2 = 100\)

Comparison:#

  • Case 1 (interior): Objective = \(5^2 + (5-10)^2 = 25 + 25 = 50\) ✓ Better!

  • Case 2 (boundary): Objective = \(0^2 + (0-10)^2 = 100\)

Optimal solution: \(x^* = 5\), \(y^* = 5\), \(\lambda^* = 0\) (interior solution)


Economic Interpretation: Leader-Follower Game#

The Story:

  • Leader (upper level) chooses \(x\), trying to minimize \(x^2 + (y-10)^2\)

  • Follower (lower level) observes \(x\), then chooses \(y\) to minimize \((y-x)^2\) subject to \(y \geq 0\)

What happens:

  1. Leader prefers \(y = 10\) (makes \((y-10)^2 = 0\))

  2. Follower prefers \(y = x\) (makes \((y-x)^2 = 0\))

  3. These preferences conflict!

Stackelberg Equilibrium:

  • Leader anticipates follower will choose \(y = \max(x, 0)\)

  • Leader chooses \(x = 5\), knowing follower will respond with \(y = 5\)

  • This balances leader’s competing objectives: keeping \(x\) small and getting \(y\) close to 10

Why \(\lambda = 0\)? The constraint \(y \geq 0\) is not binding at the optimum (\(y^* = 5 > 0\)), so its shadow value is zero.


Why We Need the complements Operator#

What if we tried to write this without complements?

# WRONG - Don't do this!
subject to complementarity_wrong:
    lambda * y = 0;

Problems:

  1. This is a bilinear constraint (product of two variables)

  2. Creates a non-convex feasible region

  3. Standard NLP solvers will struggle

  4. May converge to incorrect solutions

With complements:

# ✓ CORRECT
subject to complementarity:
    lambda >= 0 complements y >= 0;

AMPL recognizes this as a complementarity constraint and:

  • Passes it to MPEC-capable solvers (like Knitro)

  • Uses specialized algorithms that handle the “either/or” logic (like Gurobi)

  • Guarantees the solution satisfies \(\lambda \cdot y = 0\)


Visualizing the Feasible Region#

The complementarity constraint creates a non-convex feasible region:

     λ
     ↑
     |
     |  ✓ Feasible
     |  (λ > 0, y = 0)
     |
     |_______________→ y
        ✓ Feasible
        (λ = 0, y > 0)

The feasible set is the union of two rays:

  1. The positive \(\lambda\)-axis: \(\{(\lambda, y) : \lambda > 0, y = 0\}\)

  2. The positive \(y\)-axis: \(\{(\lambda, y) : \lambda = 0, y > 0\}\)

The union of two convex sets (rays) is not convex — this is why bilevel problems are hard!


Connection to the MPEC Framework#

This simple model demonstrates the MPEC (Mathematical Program with Equilibrium Constraints) structure:

MPEC Components:

Component

In This Model

Upper level objective

\(\min \, x^2 + (y-10)^2\)

Upper level variables

\(x\)

Lower level primal variables

\(y\)

Lower level dual variables

\(\lambda\)

Primal feasibility

Implicit in y >= 0 part of complementarity

Stationarity

\(2(y-x) - \lambda = 0\)

Dual feasibility

Implicit in lambda >= 0 part of complementarity

Complementarity

lambda >= 0 complements y >= 0

Every bilevel problem can be transformed into this MPEC structure by:

  1. Writing the lower level’s KKT conditions

  2. Replacing the lower level problem with these conditions

  3. Using complements for the complementarity conditions


Key Takeaways#

  1. complements encodes three conditions in one line:

    • Both inequalities must hold

    • At least one must hold with equality

    • This means: \(\lambda \cdot y = 0\)

  2. Complementarity represents regime-switching:

    • Either the constraint is binding (\(y = 0\)) OR it’s not (\(\lambda = 0\))

    • Never both positive simultaneously

  3. This is why bilevel is non-convex:

    • The “either/or” logic creates a union of convex sets

    • Standard convex optimization techniques don’t work


This simple Stackelberg game illustrates all the key concepts you’ll see in more complex bilevel problems.