author: niplav, created: 2022-10-19, modified: 2022-12-20, language: english, status: in progress, importance: 2, confidence: likely
Solutions to the textbook “Maths for Intelligent Systems”.
Let me start with an example: We have three real-valued quantities and which depend on each other. Specifically, $f(x,g)=3x+2g$ and .
Question: What is the “derivative of w.r.t. ”?
Intuitively, I'd say that . But then I notice that is allegedly a "real-valued quantity", what is that supposed to mean? Is it not a function?
Alas, plugging in into gives and .
I… I don't know what the skew matrix is :-/, and Wikipedia isn't very helpful (I don't think it's the skew-Hermitian matrix or the skew-symmetric matrix or the skew-Hamiltonian matrix).
Writing code: This I can do.
using Random, LinearAlgebra
function gradient_check(x, f, df):
n=length(x)
d=length(f(x))
ε=10^-6
J=zero(Matrix{Float64}(undef, d, n))
for i in 1:n
unit=zero(rand(n))
unit[i]=1
J[:,i]=(f(x+ε*unit)-f(x-ε*unit))/(2*ε)
end
if norm(J-df(x), Inf)<10^-4
return true
else
return false
end
end
julia> A=rand(Float64, (10, 15))
julia> f(x)=A*x
julia> df(x)=A
julia> x=randn(15)
15-element Vector{Float64}:
1.536516645971545
1.0136394994998532
-0.09863977762813898
1.3510191388362935
0.84503226122143
0.09296670831415606
-1.5390337565597376
1.4679194319980104
-0.7085023577127753
-0.10676335224166593
-0.8686753109089055
1.2912744597257453
0.7364123079861109
0.5736005534388826
0.5332386427039576
julia> gradient_check(x, f, df)
true
And now the cooler :
julia> f(x)=transpose(x)*x
f (generic function with 1 method)
julia> df(x)=2*transpose(x)
df (generic function with 1 method)
The derivative of , using the chain rule and the derivative of , is .
Applying this again for , we get .
Again: .
And finally: .
Then the formula for computing is , where is the number of matrices, and is left matrix multiplication.