Convex Optimization

In this notebook we talk about Convex Optimization fundamentals. We also examine a simple practical application of convex optimization; Image in-painting using CVXPY package in Python.

Introduction

Convex Optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many methods are classified as convex optimization. Although it is instrumental in Artificial Intelligence, Convex Optimization is a general technique that does not limit to Artificial Intelligence and has applications in various fields, such as information and communication systems, circuit design, portfolio optimization (stock exchange), and many others more. This Notebook will cover the fundamental theoretical concepts and optimization and convex optimization and show some simple Python examples to learn how to use this technique.

To define Convex Optimization, we must first look at the definitions of optimization and convex functions.

What is Optimization?

Mathematical optimization is selecting the best element, subject to some criterion from a set of available alternatives. It must be noted that the word Optimization is used in many different contexts. For example, we have Compiler Optimization in programming language implementation and compiler theory, which is very different from what we will talk about. In this notebook, whenever you see optimization, it means "Mathematical Optimization."

Many definitions try to formalize the definition of Mathematical Optimization. Here, we present one of the most used notations.

$$ \text{minimize} \hspace{1cm} f_0(x) $$$$\hspace{3.1cm} \text{subject to} \hspace{1cm} f_i(x) \leq 0, i=1,\cdots,m$$$$\hspace{5.7cm} g_i(x) = 0, i=1,\cdots,p$$

In which $x \in \mathbb{R}^n$ is a vector varialbe to be chosen. $f_0$ is the objective function to be minimized. $f_1,\cdots,f_m$ are the inequality constraint functions. And $g_1,\cdots,g_p$ are the equality constraint functions.

There are various variations of these notations, but they can easily be transformed to the one presented above. For example the problem of maximizing function $f_0(x)$ could easily be transformed into the problem of minimizing function $-f_o(x)$.

There are two aspects in optimization problems:

  • Finding good models
  • Solving the problem

Each of these two aspects is as important as the other one. If you don't have a good model, solving it will not help you solve the real-world problem.

A model is a mapping from the real-world high-level description of the problem to the mathematical notations. For example, in circuit design problems, x can represent the specifications of the actual design, like the placement of each component and other technical information. Constraints come from the physical limitations of the manufacturing process and performance requirements, and last, but not least, the objective function can be a combination of cost, weight, power, area, etc. We should define all of these aspects mathematically in order to have a good model.

There are multiple methods to solve the problem. Convex Optimization focuses on methods of solving specific but prevalent types of optimization problems. More on that later.

What do we mean by Convex?

If you remember High School geometry, a convex polygon is a polygon in which no line segment between two points on the boundary ever goes outside the polygon.

Convex functions and Convex sets follow the same intuition. A function $f: D \to \mathbb{R}$ is called convex if and only if the following condition holds:

  • For all $0\leq \theta \leq 1$ and all $x_1 , x_2 \in X$: $f(\theta x_1 + (1-\theta) x_2) \leq \theta f(x_1) + (1-\theta) f(x_2)$.

Also a function is Strictly Convex if and olny if the following confition holds:

  • For all $0\ < \theta < 1$ and all $x_1 , x_2 \in X$: $f(\theta x_1 + (1-\theta) x_2) < \theta f(x_1) + (1-\theta) f(x_2)$.

Assume we have two points $(x_1 , f(x_1)) , (x_2 , f(x_2))$ and we connect them with a straight line. If the function $f$ is convex, then all other points on the function between $x_1$ and $x_2$ must reside under this line.

A convex set is an affine space over the reals that, given any two points in it, the set contains the whole line segment that joins them.

Convex Optimization Problem

The Convex Optimization problem most used notation is $$ \text{minimize} \hspace{1cm} f_0(x) $$ $$\hspace{3.1cm} \text{subject to} \hspace{1cm} f_i(x) \leq 0, i=1,\cdots,m$$ $$\hspace{3.0cm} A x = b$$

In which $x \in \mathbb{R^n}$ and $f_0 ,... , f_m$ are convex. Also, we must note that equality constraints are linear and cannot be arbitrary functions. To show linear equations, we use matrix notations. If we want to have $p$ different equality constraints, we denote it as multiplication of a $p \times n$ matrix $A$ and a $n\times1$ column vector variable $x$; therefore, the other side of the equation will be a $p\times 1$ column vector.

Convex functions have a lot of good properties that help us get to the result easier. You can find some of these properties in Wikipedia. These properties lead to some crucial properties of convex optimization problems:

  • Every local minimum is a global minimum.
  • The optimal set is convex.
  • If the objective function is strictly convex, then the problem has at most one optimal point.

These properties lead to methods that can numerically solve convex optimization problems in polynomial time.

Because of having efficient methods, we usually try to formulate optimization problems as convex. Some problems can easily be transformed into this format, but we need some tricks for other problems. Some of the more common tricks are listed here:

  • Change of variables
  • Approximation of true objective function and constraints.
  • Relaxation: ignoring some constraints that are hard to handle.

Convex Optimization Applications

Many optimization methods are different cases of convex optimization or can be reduced to convex optimization with some tricks. You can see a small list of some well-known optimization methods that can be reduced to convex optimization. You may be familiar with some of these concepts. This list shows how robust convex optimization is.

Moreover, Convex Optimization is used in many different scientific fields and areas. Here we list some application areas of Convex Optimization. Below each item, you can see one or more links to sites containing more information about the subject or the application of convex optimization in that area.

Practical Example using CVXPY

CVXPY is a great python library developed initially at Stanford University. It is a domain-specific language embedded in python that allows the programmer to define the problem model easily and solve the problem using Convex Optimization techniques.

In this notebook, we examine the in-painting problem. In this problem, the program receives a corrupted image. Some pixel values of this corrupted image are missing, and the program should try to guess these missing values to get a clear image.

First, we install the required packages using pip. Note that installing and downloading CVXPY may take a little longer than NumPy and Matplotlib.

In [ ]:
!pip install numpy
!pip install matplotlib
!pip install cvxpy

Consider the following image of a cat. We make it corrupted by keeping about 30% percent of its pixels and discarding others. Then we try to develop a simple algorithm that gets the corrupted image as input and tries to in-paint the image.

In [1]:
import matplotlib.pyplot as plt
import numpy as np

# Load the images.
u_orig = plt.imread("./images/cat512color.png")
rows, cols, colors = u_orig.shape

# known is 1 if the pixel is known,
# 0 if the pixel was corrupted.
# The known matrix is initialized randomly.
known = np.zeros((rows, cols, colors))
for i in range(rows):
    for j in range(cols):
        if np.random.random() > 0.7:
            for k in range(colors):
                known[i, j, k] = 1        
u_corr = known * u_orig

# Display the images.
%matplotlib inline
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].imshow(u_orig, cmap='gray');
ax[0].set_title("Original Image")
ax[0].axis('off')
ax[1].imshow(u_corr);
ax[1].set_title("Corrupted Image")
ax[1].axis('off');

For the in-painting, we must find a way to guess the value of missing pixels. Let us denote the pixels array with $P_{i,j}$ notation. We can choose many different objective functions. Here we use $l_2$ total variation and try to minimize it. By minimizing total variation, we try to make each missing pixel have the minimum possible distance from its neighbors.

$$ \text{total_variation}(P)=\sum_{i=1}^{m-1} \sum_{j=1}^{n-1}\left\|\left[\begin{array}{l}P_{i+1, j}-P_{i j} \\ P_{i, j+1}-P_{i j}\end{array}\right]\right\|_{2} $$

In [2]:
import cvxpy as cp


variables = []
constraints = []
for i in range(colors): # colors == 3
    U = cp.Variable(shape=(rows, cols))
    variables.append(U)
    constraints.append(cp.multiply(known[:, :, i], U) == cp.multiply(known[:, :, i], u_corr[:, :, i]))

objective_function = cp.Minimize(cp.tv(*variables)) # tv is total variation
problem = cp.Problem(objective_function, constraints)
problem.solve(verbose=True, solver=cp.SCS)
print("optimal objective value: {}".format(problem.value))
===============================================================================
                                     CVXPY                                     
                                    v1.1.15                                    
===============================================================================
(CVXPY) Oct 11 10:09:38 PM: Your problem has 786432 variables, 3 constraints, and 0 parameters.
(CVXPY) Oct 11 10:09:38 PM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Oct 11 10:09:38 PM: (If you need to solve this problem multiple times, but with different data, consider using parameters.)
(CVXPY) Oct 11 10:09:38 PM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
-------------------------------------------------------------------------------
                                  Compilation                                  
-------------------------------------------------------------------------------
(CVXPY) Oct 11 10:09:38 PM: Compiling problem (target solver=SCS).
(CVXPY) Oct 11 10:09:38 PM: Reduction chain: Dcp2Cone -> CvxAttr2Constr -> ConeMatrixStuffing -> SCS
(CVXPY) Oct 11 10:09:38 PM: Applying reduction Dcp2Cone
(CVXPY) Oct 11 10:09:38 PM: Applying reduction CvxAttr2Constr
(CVXPY) Oct 11 10:09:38 PM: Applying reduction ConeMatrixStuffing
(CVXPY) Oct 11 10:09:39 PM: Applying reduction SCS
(CVXPY) Oct 11 10:09:40 PM: Finished problem compilation (took 2.565e+00 seconds).
-------------------------------------------------------------------------------
                                Numerical solver                               
-------------------------------------------------------------------------------
(CVXPY) Oct 11 10:09:40 PM: Invoking solver SCS  to obtain a solution.
WARN: A->p (column pointers) not strictly increasing, column 523264 empty
WARN: A->p (column pointers) not strictly increasing, column 785408 empty
WARN: A->p (column pointers) not strictly increasing, column 1047552 empty
----------------------------------------------------------------------------
	SCS v2.1.4 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 3629464
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 0, rho_x = 1.00e-03
Variables n = 1047553, constraints m = 2614279
Cones:	primal zero / dual free vars: 786432
	soc vars: 1827847, soc blks: 261121
WARN: aa_init returned NULL, no acceleration applied.
Setup time: 5.95e+00s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
     0| 1.46e+22  1.91e+22  1.00e+00 -8.40e+26  4.96e+25  6.86e+25  3.84e-01 
   100| 8.54e-04  6.45e-04  2.34e-04  1.15e+04  1.15e+04  9.72e-13  1.94e+01 
   200| 2.22e-04  7.24e-05  4.20e-05  1.15e+04  1.15e+04  1.05e-11  3.96e+01 
   300| 9.41e-05  2.22e-05  1.38e-05  1.15e+04  1.15e+04  9.66e-13  5.87e+01 
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 5.87e+01s
	Lin-sys: nnz in L factor: 35250282, avg solve time: 1.21e-01s
	Cones: avg projection time: 3.96e-03s
	Acceleration: avg step time: 2.69e-08s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 1.1102e-16, dist(y, K*) = 4.4409e-16, s'y/|s||y| = 6.5124e-18
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 9.4109e-05
dual res:   |A'y + c|_2 / (1 + |c|_2) = 2.2237e-05
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 1.3835e-05
----------------------------------------------------------------------------
c'x = 11483.1306, -b'y = 11483.4483
============================================================================
-------------------------------------------------------------------------------
                                    Summary                                    
-------------------------------------------------------------------------------
(CVXPY) Oct 11 10:10:45 PM: Problem status: optimal
(CVXPY) Oct 11 10:10:45 PM: Optimal value: 1.148e+04
(CVXPY) Oct 11 10:10:45 PM: Compilation took 2.565e+00 seconds
(CVXPY) Oct 11 10:10:45 PM: Solver (including time spent in interface) took 6.482e+01 seconds
optimal objective value: 11483.560370852501

Now let's see the final result and compare it to the original image.

In [3]:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
%matplotlib inline

rec_arr = np.zeros((rows, cols, colors))
for i in range(colors):
    rec_arr[:, :, i] = variables[i].value
rec_arr = np.clip(rec_arr, 0, 1)

fig, ax = plt.subplots(1, 2,figsize=(10, 5))
ax[0].imshow(rec_arr)
ax[0].set_title("In-Painted Image")
ax[0].axis('off')

img_diff = np.clip(10 * np.abs(u_orig - rec_arr), 0, 1)
ax[1].imshow(img_diff)
ax[1].set_title("Difference Image")
ax[1].axis('off')
Out[3]:
(-0.5, 511.5, 511.5, -0.5)

The in-painted image looks almost identical to the original one. Although it has many differences from the original image and minor artifacts are visible in the picture (for example, in the cat's whiskers), we can say the result is acceptable for almost ten lines of code.

References

Authors
Amirmahdi Namjoo
Author