author: niplav, created: 2022-10-19, modified: 2022-11-25, language: english, status: in progress, importance: 2, confidence: likely
Solutions to the textbook “Introduction to Optimization”.
See here.
Some standard numpy code:
import numpy as np
def givec(x, c=10):
n=len(x)
iexp=np.array(range(0,n))
cdiag=c**(iexp/((n-1)*np.ones(n)))
C=np.diag(cdiag)
return C
def fsq(x, c=10):
C=givec(x, c=c)
return x.T@C@x
def fhole(x, c=10, a=0.1):
retval=fsq(x, c=c)
return retval/(a**2+retval)
The derivative of $f_{\text{sq}}=x^{\top}Cx$
is $(C+C^{\top})x$
(as per identity 81 from the Matrix
Cookbook).
Which is excellent, since the identity 2.3 of Maths for Intelligent
Systems (Toussaint 2022) gives
$\frac{\partial}{\partial x} x^{\top}Cx=x^{\top}C \frac{\partial}{\partial x}x+x^{\top} C^{\top} \frac{\partial}{\partial x} x$
which would be $x^{\top}C+x^{\top}C^{\top}=C^{\top}x+Cx=(C^{\top}+C)x$
and therefore the same as above (and in identity 97 from the Matrix
Cookbook).
So then
def fsqgrad(x, c=10):
C=givec(x, c=c)
return (C.T+C)@x
Since $f_{\text{sq}}$
and $f_{\text{hole}}$
return a scalar, one can
use a simple quotient rule:
The implementation is straightforwardly
def fholegrad(x, c=10, a=0.1):
retval_fsq=fsq(x, c=c)
retval_fsqgrad=fsqgrad(x, c=c, a=a)
return (retval_fsqgrad*(a**2+retval_fsq)-retval_fsq*retval_fsqgrad)/((a**2+retval_fsq))**2
The function can be pretty directly translated from the pseudocode in the book to Python (no surprise here):
def graddesc(f, fgrad, alpha=0.001, theta=10e-20, x0=[1,1], c=10, a=0.1):
x=np.array(x0)
i=0
prevcost=-math.inf
curcost=f(x)
while abs(prevcost-curcost)>theta:
print("#iteration: ", i)
print("current cost: ", curcost)
prevcost=curcost
x=x-alpha*fgrad(x)
curcost=f(x)
i=i+1
print("solution: ", x)
print("iterations: ", i)
print("cost: ", f(x))
return x
Executing graddesc
with fsq
/fsqgrad
and fhole
/fholegrad
gives
~1000 iterations for finding a local minimum with fsq
, and takes ~185k
iterations for fhole
. But it finds the minimum:
print("fsq:")
graddesc(fsq, fsqgrad)
print("fhole:")
graddesc(fhole, fholegrad)
fsq:
solution: [4.98354923e-09 1.65118900e-84]
iterations: 9549
cost: 2.4835762963261225e-17
fhole:
solution: [ 4.13240000e-11 -7.98149419e-19]
iterations: 184068
cost: 1.7076729729684466e-19
def backtrack_graddesc(f, fgrad, alpha=0.05, theta=10e-15, x0=[1,1], c=10, a=0.1, rho_ls=0.01, rho_plus=1.2, rho_minus=0.5, delta_max=math.inf):
x=np.array(x0)
i=0
prevcost=-math.inf
curcost=f(x)
delta=-fgrad(x)/np.linalg.norm(fgrad(x))
while np.linalg.norm(alpha*delta)>=theta:
while f(x+alpha*delta)>f(x)+rho_ls*fgrad(x)@(alpha*delta).T:
alpha=rho_minus*alpha
prevcost=curcost
x=x+alpha*delta
alpha=min(rho_plus*alpha, delta_max)
curcost=f(x)
i=i+1
return x
First we compute the derivative of $\underset{β}{\text{argmin}}||y-Xβ||^2+λ||β||^2$
:
(since $b=\mathbf{0}$
and $c=0$
). So the negative gradient is
$x_0=\left [ \matrix{1 \cr 1} \right ]$
, so
Then
The derivative of that is $5+18 \cdot α$
. Setting it to zero gives
$α=\frac{5}{18}$
.
The next $x$
then is (going by the values we computed)
$\left [ \matrix{0.\overline{4} \cr 0.7\overline{2}} \right ]$
.
The new gradient then is
Now computing $β$
:
which is smaller than zero, so our new $β$
is $0$
.
Then $δ$
is $\left [ \matrix{-0.\overline{4} & -0.36\overline{1}} \right ]$
.
We now go through the whole exercise a second time:
Then the derivative is approximately $-0.6559+0.5255 \cdot α$
, and
the new $α$
is the solution to that, namely $\approx 1.248$
.
Then the new $x$
is $\approx \left [ \matrix{-0.1103 & 0.2715} \right ]$
.
The new $g$
is $\approx \left [ \matrix{0.11029 & -0.1358} \right ]$
, and
$β\approx 0.0933$
, which is this time greater than zero so we can keep it.
The new $δ$
therefore is $\approx \left [ \matrix{0.0688 & -0.1694} \right ]$
.
…And we're done with the second round. I'm not going to do this for
$x_0=(-1,2)$
, in case you were wondering, I hope I've shown that I can
do this, and it's tedious to type out.
Intuitively, this makes sense: We walked all the way to the minimum in the direction of the current gradient, so the next direction should not go "back" in the direction where we came from, or further than we needed to go.
Not gonna show you the sketch, of course. (I also don't have time right now to learn manim for that, though it would be cool).
The solution is $\frac{-π}{2}$
.
The solution is
$\left [ \matrix{-\frac{\sqrt{2}}{2} \cr -\frac{\sqrt{2}}{2}} \right ]$
.
Since the gradient is "pulling" to the bottom left, the best one can do
is to set $x_1$
to zero. So
$x=\left [ \matrix{-\frac{\sqrt{2}}{2} \cr -\frac{\sqrt{2}}{2}} \right ]$
.
The second constraint can be pictured the following way: On the $x_2$
axis there is a parabola forming a paraboloid cylinder which is zero where
$x_2$
is zero, and grows symmetrically in both other directions. At
$x_2≥1$
and $x_2≤-1$
this cylinder is greater than $x_1$
,
which we want to avoid. So we know that $x_2=-1$
.
But what is $x_1$
? It should be as small as possible. Plugging in
$x_1=0$
tells us that this is the best we can do.