home

author: niplav, created: 2022-10-19, modified: 2022-12-20, language: english, status: in progress, importance: 2, confidence: likely

Solutions to the textbook “Maths for Intelligent Systems”.

Contents

Chapter 2
- Stray Non-Exercise 1
- 2.4
  - (i)
  - (ii)
  - (iii)
  - (iv)
  - (v)
  - (vi)
- 2.6.1
- 2.6.2
- 2.6.3
  - i)
  - ii)
- 2.6.4

Solutions to “Maths for Intelligent Systems”

Chapter 2

Stray Non-Exercise 1

Let me start with an example: We have three real-valued quantities and which depend on each other. Specifically, $f(x,g)=3x+2g$ and .
Question: What is the “derivative of w.r.t. ”?

Intuitively, I'd say that . But then I notice that is allegedly a "real-valued quantity", what is that supposed to mean? Is it not a function?

Alas, plugging in into gives and .

2.4

(i)

(ii)

(iii)

(iv)

(v)

(vi)

2.6.1

I… I don't know what the skew matrix is :-/, and Wikipedia isn't very helpful (I don't think it's the skew-Hermitian matrix or the skew-symmetric matrix or the skew-Hamiltonian matrix).

2.6.2

2.6.3

Writing code: This I can do.

i)

using Random, LinearAlgebra

function gradient_check(x, f, df):
    n=length(x)
    d=length(f(x))
    ε=10^-6
    J=zero(Matrix{Float64}(undef, d, n))
    for i in 1:n
        unit=zero(rand(n))
        unit[i]=1
        J[:,i]=(f(x+ε*unit)-f(x-ε*unit))/(2*ε)
    end
    if norm(J-df(x), Inf)<10^-4
        return true
    else
        return false
    end
end

ii)

julia> A=rand(Float64, (10, 15))
julia> f(x)=A*x
julia> df(x)=A
julia> x=randn(15)
15-element Vector{Float64}:
  1.536516645971545
  1.0136394994998532
 -0.09863977762813898
  1.3510191388362935
  0.84503226122143
  0.09296670831415606
 -1.5390337565597376
  1.4679194319980104
 -0.7085023577127753
 -0.10676335224166593
 -0.8686753109089055
  1.2912744597257453
  0.7364123079861109
  0.5736005534388826
  0.5332386427039576
julia> gradient_check(x, f, df)
true

And now the cooler :

julia> f(x)=transpose(x)*x
f (generic function with 1 method)
julia> df(x)=2*transpose(x)
df (generic function with 1 method)

2.6.4

The derivative of , using the chain rule and the derivative of , is .

Applying this again for , we get .

Again: .

And finally: .

Then the formula for computing is , where is the number of matrices, and is left matrix multiplication.