Score Matching overview
This page introduces the Julia package ScoreMatching
.
This page comes from a single Julia file: 01-overview.jl
.
You can access the source code for such Julia documentation using the 'Edit on GitHub' link in the top right. You can view the corresponding notebook in nbviewer here: 01-overview.ipynb
, or open it in binder here: 01-overview.ipynb
.
Setup
Packages needed here.
using MIRTjim: jim, prompt
using Distributions: Distribution, Normal, MixtureModel, logpdf, pdf
using Distributions: Gamma, Uniform
import Distributions: logpdf, pdf
import ForwardDiff
using LinearAlgebra: tr, norm
using LaTeXStrings
using Random: shuffle, seed!; seed!(0)
using StatsBase: mean, std
using Optim: optimize, BFGS, Fminbox
import Optim: minimizer
import Flux
using Flux: Chain, Dense, Adam
import Plots
using Plots: Plot, plot, plot!, scatter, histogram, quiver!, default, gui
using Plots.PlotMeasures: px
using InteractiveUtils: versioninfo
default(label="", markerstrokecolor=:auto)
The following line is helpful when running this file as a script; this way it will prompt user to hit a key after each figure is displayed.
isinteractive() ? jim(:prompt, true) : prompt(:prompt);
Overview
Given $T$ IID training data samples $\bm{x}_1, …, \bm{x}_T ∈ \mathbb{R}^N$, we often want to find the parameters $\bm{θ}$ of a model distribution $p(\bm{x}; \bm{θ})$ that "best fit" the data.
Maximum-likelihood estimation is impractical for complicated models where the normalizing constant is intractable.
Hyvärinen 2005 proposed an alternative called score matching that circumvents needing to find the normalizing constant by using the score function of the model distribution, defined as $\bm{s}(\bm{x}; \bm{θ}) = \nabla_{\bm{x}} \log p(\bm{x}; \bm{θ}).$
Score functions
Before describing score matching methods, we first illustrate what a score function looks like.
Consider the (improper) model $p(\bm{x}; \bm{θ}) = \frac{1}{Z(\bm{θ})} \mathrm{e}^{-β |x_2 - x_1|^α}$ where here there are two parameters $\bm{θ} = (β, α)$, for $β > 0$ and $α > 1$. The score function for this model is
\[\bm{s}(\bm{x}; \bm{θ}) = \nabla_{\bm{x}} \log p(\bm{x}; \bm{θ}) = - β \nabla_{\bm{x}} |x_2 - x_1|^α = α β \begin{bmatrix} 1 \\ -1 \end{bmatrix} |x_2 - x_1|^{α-1} \mathrm{sign}(x_2 - x_1).\]
This example is related to generalized Gaussian image priors, and, for $α=1$, is related to total variation (TV) regularization.
Here is a visualization of this 'TV' pdf and the score functions.
The quiver plot shows that the score function describe directions that ascend the prior.
function do_quiver!(p::Plot, x, y, dx, dy; thresh=0.02, scale=0.15)
tmp = d -> maximum(abs, filter(!isnan, d))
dmax = max(tmp(dx), tmp(dy))
ix = 5:11:length(x)
iy = 5:11:length(y)
x, y = x .+ 0*y', 0*x .+ y'
x = x[ix,iy]
y = y[ix,iy]
dx = dx[ix,iy] / dmax * scale
dy = dy[ix,iy] / dmax * scale
good = @. (abs(dx) > thresh) | (abs(dy) > thresh)
x = x[good]
y = y[good]
dx = dx[good]
dy = dy[good]
Plots.arrow(:open, :head, 0.001, 0.001)
return quiver!(p, x, y, quiver=(dx,dy);
aspect_ratio = 1,
title = "TV score quiver",
color = :red,
)
end;
if !@isdefined(ptv)
α = 1.01 # fairly close to TV
β = 1
x1 = range(-1, 1, 101) * 2
x2 = range(-1, 1, 101) * 2
tv_pdf2 = @. exp(-β * abs(x2' - x1)^α) # ignoring partition constant
ptv0 = jim(x1, x2, tv_pdf2; title = "'TV' pdf", clim = (0, 1),
color=:cividis, xlabel = L"x_1", ylabel = L"x_2", prompt=false,
)
tv_score1 = @. β * abs(x2' - x1)^(α-1) * sign(x2' - x1)
ptv1 = jim(x1, x2, tv_score1; title = "TV score₁", prompt=false,
color=:cividis, xlabel = L"x_1", ylabel = L"x_2", clim = (-1,1) .* 1.2,
)
tv_score2 = @. -β * abs(x2' - x1)^(α-1) * sign(x2' - x1)
ptv2 = jim(x1, x2, tv_score2; title = "TV score₂", prompt=false,
color=:cividis, xlabel = L"x_1", ylabel = L"x_2", clim = (-1,1) .* 1.2,
)
ptvq = do_quiver!(deepcopy(ptv0), x1, x2, tv_score1, tv_score2)
ptv = plot(ptv0, ptv1, ptvq, ptv2)
# Plots.savefig("score-tv.pdf")
end
prompt()
Score matching
The idea behind the score matching approach to model fitting is
\[\hat{\bm{θ}} = \arg \min_{\bm{θ}} J(\bm{θ}) ,\qquad J(\bm{θ}) = \frac{1}{T} ∑_{t=1}^T \| \bm{s}(\bm{x}_t; \bm{θ}) - \bm{s}(\bm{x}_t) \|^2\]
where $\bm{s}(\bm{x}; \bm{θ}) = \nabla_{\bm{x}} \log p(\bm{x}; \bm{θ})$ is the score function of the model distribution, and $\bm{s}(\bm{x}) = \nabla_{\bm{x}} \log p(\bm{x})$ is the score function of the (typically unknown) data distribution.
Vincent, 2011 calls this approach explicit score matching (ESM).
1D example
We begin by illustrating the score function for a simple $\mathcal{N}(8, 3)$ distribution.
Some convenience methods
logpdf(d::Distribution) = x -> logpdf(d, x)
pdf(d::Distribution) = x -> pdf(d, x)
derivative(f::Function) = x -> ForwardDiff.derivative(f, x)
gradient(f::Function) = x -> ForwardDiff.gradient(f, x)
# hessian(f::Function) = x -> ForwardDiff.hessian(f, x)
score(d::Distribution) = derivative(logpdf(d))
score_deriv(d::Distribution) = derivative(score(d)); # scalar x only
gauss_μ = 8
gauss_σ = 3
gauss_disn = Expr(:call, :Normal, gauss_μ, gauss_σ)
gauss_dist = eval(gauss_disn)
xaxis = (L"x", (-1,1).*5gauss_σ .+ gauss_μ, (-3:3)*gauss_σ .+ gauss_μ)
left_margin = 20px; bottom_margin = 10px
pgp = plot(pdf(gauss_dist); label="$gauss_disn pdf", color = :blue,
left_margin, bottom_margin,
xaxis, yaxis = (L"p(x)", (0, 0.15), (0:3)*0.05), size=(600,200),
)
# Plots.savefig(pgp, "gauss-pdf.pdf")
prompt()
ylabel_score1 = L"s(x) = \frac{\mathrm{d}}{\mathrm{d}x} \, \log \ p(x)"
pgs = plot(derivative(logpdf(gauss_dist)); color=:red, xaxis, size=(600,200),
label = "$gauss_disn score function",
yaxis = (ylabel_score1, (-2,2), -3:3), left_margin, bottom_margin,
)
# Plots.savefig(pgs, "gauss-score.pdf")
prompt()
Same plots for a gaussian mixture model (GMM)
mix = MixtureModel(Normal, [(2,1), (8,3), (16,2)], [0.3, 0.4, 0.3])
mix = MixtureModel(Normal, [(3,1), (13,3)], [0.4, 0.6])
xaxis = (L"x", (-4,24), [0, 3, 13, 20])
pmp = plot(pdf(mix); label="Gaussian mixture pdf", color = :blue,
left_margin, bottom_margin, xaxis, size=(600,200),
yaxis = (L"p(x)", (0, 0.17), (0:3)*0.05),
)
# Plots.savefig(pmp, "mix-pdf.pdf")
prompt()
pms = plot(derivative(logpdf(mix)); color=:red, xaxis, size=(600,200),
label = "GMM score function",
yaxis = (ylabel_score1, (-5,5), -4:2:4), left_margin, bottom_margin,
)
# Plots.savefig(pms, "mix-score.pdf")
prompt()
Illustration
For didactic purposes, we illustrate explicit score matching (ESM) by fitting samples from a Gamma distribution to a mixture of gaussians.
Generate training data
if !@isdefined(data)
T = 100
gamma_k = 8 # shape
gamma_θ = 1 # scale
gamma_mode = gamma_k > 1 ? (gamma_k - 1) * gamma_θ : 0
gamma_mean = gamma_k * gamma_θ
gamma_std = sqrt(gamma_k) * gamma_θ
data_disn = Expr(:call, :Gamma, gamma_k, gamma_θ)
data_dis = eval(data_disn)
data_score = derivative(logpdf(data_dis))
data = Float32.(rand(data_dis, T))
xlims = (-1, 25)
xticks = [0, floor(Int, minimum(data)), gamma_mode, gamma_mean, ceil(Int, maximum(data))]
xticks = sort(xticks) # ticks that span the data range
pfd = scatter(data, zeros(T); xlims, xticks, color=:black)
plot!(pfd, pdf(data_dis); label="$data_disn pdf",
color = :black, xlabel = L"x", ylabel = L"p(x)")
psd = plot(data_score; color=:black,
label = "$(data_disn.args[1]) score function",
xaxis=(L"x", (1,20), xticks), ylims = (-3, 5), yticks=[0,4],
)
psdn = deepcopy(psd)
tmp = score(Normal(mean(data), std(data)))
plot!(psdn, tmp; label = "Normal score function", line=:dash, color=:black)
ph = histogram(data; linecolor=:blue,
xlabel=L"x", size=(600,300), yaxis=("count", (0,15), 0:5:15),
bins=-1:0.5:25, xlims, xticks, label="data histogram")
# Plots.savefig(ph, "gamma-data.pdf")
plot!(ph, x -> T*0.5 * pdf(data_dis)(x);
color=:black, label="$data_disn Distribution")
# Plots.savefig(ph, "gamma-fit.pdf")
end
if false # plots for a talk
pdt = plot(pdf(data_dis); label="$data_disn pdf", color = :blue,
left_margin, bottom_margin,
xaxis = (L"x", xlims, xticks), ylabel = L"p(x)", size=(600,200))
pst = deepcopy(psd)
tmp = score(Normal(gamma_mean, gamma_std))
plot!(pst, tmp; label = "Normal score function", size=(600,200), xlims,
line=:dash, color=:magenta, left_margin, bottom_margin,
ylabel=ylabel_score1)
# Plots.savefig(pdt, "gamma-pdf.pdf")
# Plots.savefig(pst, "gamma-score.pdf")
end
To perform unconstrained minimization of a $D$-component mixture, the following mapping from $\mathbb{R}^{D-1}$ to the $D$-dimensional simplex is helpful. It is the inverse of the additive logratio transform. It is related to the softmax function.
function map_r_s(y::AbstractVector; scale::Real = 1.0)
y = scale * [y; 0]
y .-= maximum(y) # for numerical stability
p = exp.(y)
return p / sum(p)
end
map_r_s(y::Real...) = map_r_s([y...])
y1 = range(-1,1,101) * 9
y2 = range(-1,1,101) * 9
tmp = map_r_s.(y1, y2')
pj = jim(y1, y2, tmp; title="Simplex parameterization", nrow=1)
Define model distribution
nmix = 3 # how many gaussians in the mixture model
function model(θ ;
σmin::Real = 1,
σmax::Real = 19,
)
mu = θ[1:nmix]
sig = θ[nmix .+ (1:nmix)]
any(<(σmin), sig) && throw("too small σ")
any(>(σmax), sig) && throw("too big σ $sig")
# sig = σmin .+ exp.(sig) # ensure σ > 0
# sig = @. σmin + (σmax - σmin) * (tanh(sig/2) + 1) / 2 # "constraints"
p = map_r_s(θ[2nmix .+ (1:(nmix-1))])
tmp = [(μ,σ) for (μ,σ) in zip(mu, sig)]
mix = MixtureModel(Normal, tmp, p)
return mix
end;
Define explicit score-matching cost function
function cost_esm2(x::AbstractVector{<:Real}, θ)
model_score = score(model(θ))
return (0.5/T) * sum(abs2, model_score.(x) - data_score.(x))
end;
Minimize this explicit score-matching cost function:
β = 0e-4 # optional small regularizer to ensure coercive
cost_esm1 = (θ) -> cost_esm2(data, θ) + β * 0.5 * norm(θ)^2;
Initial crude guess of mixture model parameters
θ0 = Float64[5, 7, 9, 1.5, 1.5, 1.5, 0, 0]; # Gamma
Plot data pdf and initial model pdf
pf = deepcopy(pfd)
plot!(pf, pdf(model(θ0)), label = "Initial Gaussian mixture", color=:blue)
prompt()
Check descent and non-convexity
if false
tmp = gradient(cost_esm1)(θ0)
a = range(0, 9, 101)
h = a -> cost_esm1(θ0 - a * tmp)
plot(a, log.(h.(a)))
end
Explicit score matching (ESM) (impractical)
if !@isdefined(θesm)
lower = [fill(0, nmix); fill(1.0, nmix); fill(-Inf, nmix-1)]
upper = [fill(Inf, nmix); fill(Inf, nmix); fill(Inf, nmix-1)]
opt_esm = optimize(cost_esm1, lower, upper, θ0, Fminbox(BFGS());
autodiff = :forward)
# opt_esm = optimize(cost_esm1, θ0, BFGS(); autodiff = :forward) # unconstrained
θesm = minimizer(opt_esm)
end;
plot!(pf, pdf(model(θesm)), label = "ESM Gaussian mixture", color=:green)
prompt()
Plot the data score and model score functions to see how well they match. The largest mismatch is in the tails of the distribution where there are few (if any) data points.
ps = deepcopy(psd)
plot!(ps, score(model(θesm)); label = "ESM score function", color=:green)
prompt()
Maximum-likelihood estimation
This toy example is simple enough that we can apply ML estimation to it directly. In fact, ML estimation is a seemingly more practical optimization problem than score matching in this case.
As expected, ML estimation leads to a lower negative log-likelihood.
negloglike(θ) = (-1/T) * sum(logpdf(model(θ)), data)
opt_ml = optimize(negloglike, lower, upper, θ0, Fminbox(BFGS()); autodiff = :forward)
θml = minimizer(opt_ml)
negloglike.([θml, θesm, θ0])
3-element Vector{Float64}:
2.28225368245204
2.3430638561944686
2.620547866486972
Curiously, ML estimation here leads to much worse fits to the pdf than score matching, even though we initialized the ML optimizer with the score-matching parameters. Perhaps the landscape of the log-likelihood is less well-behaved than that of the SM cost.
plot!(pf, pdf(model(θml)), label = "ML Gaussian mixture", color=:magenta)
plot!(ps, score(model(θml)), label = "ML score function", color=:magenta)
plot(pf, ps)
prompt()
Implicit score matching (ISM) (more practical)
The above ESM fitting process used score(data_dis)
, the score-function of the data distribution, which is unknown in practical situations.
Hyvärinen 2005 derived the following more practical cost function that is independent of the unknown data score function:
\[J_{\mathrm{ISM}}(\bm{θ}) = \frac{1}{T} ∑_{t=1}^T ∑_{i=1}^N ∂_i s_i(\bm{x}_t; \bm{θ}) + \frac{1}{2} | s_i(\bm{x}_t; \bm{θ}) |^2,\]
ignoring a constant that is independent of $θ,$ where
\[∂_i s_i(\bm{x}; \bm{θ}) = \frac{∂}{∂ x_i} s_i(\bm{x}; \bm{θ}) = \frac{∂^2}{∂ x_i^2} \log p(\bm{x}; \bm{θ}).\]
(For large models this version is still a bit impractical because it depends on the diagonal elements of the Hessian of the log prior. Subsequent pages deal with that issue.)
Vincent, 2011 calls this approach implicit score matching (ISM).
Implicit score-matching cost function
function cost_ism2(x::AbstractVector{<:Real}, θ)
tmp = model(θ)
model_score = score(tmp)
return (1/T) * (sum(score_deriv(tmp), x) +
0.5 * sum(abs2 ∘ model_score, x))
end;
Minimize this implicit score-matching cost function:
if !@isdefined(θism)
cost_ism1 = (θ) -> cost_ism2(data, θ)
opt_ism = optimize(cost_ism1, lower, upper, θ0, Fminbox(BFGS()); autodiff = :forward)
##opt_ism = optimize(cost_ism1, θ0, BFGS(); autodiff = :forward)
θism = minimizer(opt_ism)
cost_ism1.([θism, θesm, θml])
end;
plot!(pf, pdf(model(θism)), label = "ISM Gaussian mixture", color=:cyan)
plot!(ps, score(model(θism)), label = "ISM score function", color=:cyan)
pfs = plot(pf, ps)
prompt()
Curiously the supposedly equivalent ISM cost function works much worse. Like the ML estimate, the first two $σ$ values are stuck at the lower
limit. Could it be local extrema? More investigation is needed!
Ideally (as $T → ∞$), the ESM and ISM cost functions should differ by a constant independent of $θ$. Here they differ for small, finite $T$.
tmp = [θ0, θesm, θml, θism]
cost_esm1.(tmp) - cost_ism1.(tmp)
4-element Vector{Float64}:
0.08616722371394606
0.08596904043777506
0.24967328261006547
0.26172629070742814
Regularized score matching (RSM)
Kingma & LeCun, 2010 reported some instability of ISM and suggested a regularized version corresponding to the following (practical) cost function:
\[J_{\mathrm{RSM}}(\bm{θ}) = J_{\mathrm{ISM}}(\bm{θ}) + λ R(\bm{θ}) ,\quad R(\bm{θ}) = \frac{1}{T} ∑_{t=1}^T ∑_{i=1}^N | ∂_i s_i(\bm{x}_t; \bm{θ}) |^2.\]
Regularized score matching (RSM) cost function
function cost_rsm2(x::AbstractVector{<:Real}, θ, λ)
mod = model(θ)
model_score = score(mod)
tmp = score_deriv(mod).(x)
R = sum(abs2, tmp)
J_ism = sum(tmp) + 0.5 * sum(abs2 ∘ model_score, x)
return (1/T) * (J_ism + λ * R)
end;
Minimize this RSM cost function:
λ = 4e-1
cost_rsm1 = (θ) -> cost_rsm2(data, θ, λ)
if !@isdefined(θrsm)
opt_rsm = optimize(cost_rsm1, lower, upper, θ0, Fminbox(BFGS());
autodiff = :forward)
θrsm = minimizer(opt_rsm)
cost_rsm1.([θrsm, θ0, θism, θesm, θml])
end
5-element Vector{Float64}:
-0.11336964204446329
0.07895853368771694
-0.06530672522847489
-0.0745676328254189
-0.07460264962142038
plot!(pf, pdf(model(θism)), label = "RSM Gaussian mixture", color=:red)
plot!(ps, score(model(θism)), label = "RSM score function", color=:red)
pfs = plot(pf, ps)
prompt()
Sadly the regularized score matching (RSM) approach did not help much here. Increasing $λ$ led to optimize
errors.
Denoising score matching (DSM)
Vincent, 2011 proposed a practical approach called denoising score matching (DSM) that matches the model score function to the score function of a Parzen density estimate of the form
\[q_{σ}(\bm{x}) = \frac{1}{T} ∑_{t=1}^T g_{σ}(\bm{x} - \bm{x}_t)\]
where $g_{σ}$ denotes a Gaussian distribution $\mathcal{N}(\bm{0}, σ \bm{I})$.
Statistically, this approach is equivalent (in expectation) to adding noise to the measurements, and then applying the ESM approach. The DSM cost function is
\[J_{\mathrm{DSM}}(\bm{θ}) = \frac{1}{T} ∑_{t=1}^T \mathbb{E}_{\bm{z} ∼ g_{σ}}\left[ \frac{1}{2} \left\| \bm{s}(\bm{x}_t + \bm{z}; \bm{θ}) + \frac{\bm{z}}{σ^2} \right\|_2^2 \right].\]
A benefit of this approach is that it does not require differentiating the model score function w.r.t $\bm{x}$.
The inner expectation over $g_{σ}$ is typically analytically intractable, so in practice we replace it with a sample mean where we draw $M ≥ 1$ values of $\bm{z}$ per training sample, leading to the following practical cost function
\[J_{\mathrm{DSM}, \, M}(\bm{θ}) = \frac{1}{T} ∑_{t=1}^T \frac{1}{M} ∑_{m=1}^M \frac{1}{2} \left\| s(\bm{x}_t + \bm{z}_{t,m}; \bm{θ}) + \frac{\bm{z}_{t,m}}{σ^2} \right\|_2^2,\]
where the noise samples $\bm{z}_{t,m}$ are IID.
The next code blocks investigate this DSM approach for somewhat arbitrary choices of $M$ and $σ$.
seed!(0)
M = 9
σdsm = 1.0
zdsm = σdsm * randn(T, M);
Define denoising score-matching cost function, where input data
has size $(T,)$ and input z
has size $(T,M)$.
function cost_dsm2(data::AbstractVector{<:Real}, z::AbstractArray{<:Real}, θ)
model_score = score(model(θ))
tmp = model_score.(data .+ z) # (T,M) # add noise to data
return (0.5/T/M) * sum(abs2, tmp + z ./ σdsm^2)
end;
if !@isdefined(θdsm)
cost_dsm1 = (θ) -> cost_dsm2(data, zdsm, θ) # + β * 0.5 * norm(θ)^2;
opt_dsm = optimize(cost_dsm1, lower, upper, θ0, Fminbox(BFGS());
autodiff = :forward)
θdsm = minimizer(opt_dsm)
end;
plot!(pf, pdf(model(θdsm)); label = "DSM Gaussian mixture", color=:orange)
plot!(ps, score(model(θdsm)); label = "DSM score function", color=:orange)
pfs = plot(pf, ps)
At least for this case, DSM worked better than ML, ISM and RSM.
prompt()
Noise-conditional score-matching (NCSM)
Above we used a single noise value for DSM. Contemporary methods use a range of noise values with noise-conditional score models, e.g., Song et al. ICLR 2021.
A representative formulation uses a $σ^2$-weighted expectation like the following:
\[J_{\mathrm{NCSM}}(\bm{θ}) ≜ \mathbb{E}_{σ ∼ f(σ)}\left[ σ^2 J_{\mathrm{DSM}}(\bm{θ}, σ) \right],\]
\[J_{\mathrm{DSM}}(\bm{θ}, σ) ≜ \frac{1}{T} ∑_{t=1}^T \mathbb{E}_{\bm{z} ∼ g_{σ}}\left[ \frac{1}{2} \left\| \bm{s}(\bm{x}_t + \bm{z}; \bm{θ}, σ) + \frac{\bm{z}}{σ^2} \right\|_2^2 \right].\]
for some distribution $f(σ)$ of noise levels.
Here we use a multi-layer perceptron (MLP) to model the 1D noise-conditional score function $\bm{s}(\bm{x}; \bm{θ}, σ)$. We use a residual approach with data normalization and the baseline score of a standard normal distribution.
function make_nnmodel(
data; # (2,?)
nweight = [2, 8, 16, 8, 1], # 2 inputs: [x, σ] for NCSM
μdata = mean(data),
σdata = std(data),
)
layers = [Dense(nweight[i-1], nweight[i], Flux.gelu) for i in 2:length(nweight)]
# Change of variables y ≜ (x - μdata) / σ(x + z)
pick1(x) = transpose(x[1,:]) # extract 1D data as a row vector
pick2(x) = transpose(x[2,:]) # extract σ as a row vector
quad(σ) = sqrt(σ^2 + σdata^2) # σ(x + z) (quadrature)
scale1(x) = [(pick1(x) .- μdata) ./ quad.(pick2(x)); pick2(x)] # standardize
# The baseline score function here is just -y; use in a "ResNet" way:
tmp1 = Flux.Parallel(.-, Chain(layers...), pick1)
scale2(σ) = quad.(σ) / σdata^2 # "un-standardize" for final score w.r.t x
tmp2 = Flux.Parallel(.*, tmp1, scale2 ∘ pick2)
return Chain(scale1, tmp2)
end;
Function to make data pairs suitable for NN training. Each use of this function's output is akin to $M$ epochs of data
. Use shuffle
in case we use mini-batches later.
function dsm_data(
M::Int = 9,
σdist = Uniform(0.2,2.0),
)
σdsm = Float32.(rand(σdist, T, M))
z = σdsm .* randn(Float32, T, M)
tmp1 = shuffle(data) .+ z # (T,M)
tmp2 = transpose([vec(tmp1) vec(σdsm)]) # (2, T*M)
return (tmp2, -transpose(vec(z ./ σdsm.^2)))
end
if !@isdefined(nnmodel)
nnmodel = make_nnmodel(data;)
# σ^2-weighted MSE loss:
loss3(model, x, y) = mean(abs2, (model(x) - y) .* transpose(x[2,:])) / 2
iters = 2^9
dataset = [dsm_data() for i in 1:iters]
state1 = Flux.setup(Adam(), nnmodel)
@info "begin train"
@time Flux.train!(loss3, nnmodel, dataset, state1)
@info "end train"
end
nnscore = x -> nnmodel([x, 0.4])[1] # lower end of σdist range
tmp = deepcopy(ps)
plot!(tmp, nnscore; label = "NN score function", color=:blue)
The noise-conditional NN score model worked OK. Training "coarse to fine" might work better than the Uniform
approach above.
prompt()
if false # look at the set of NN score functions
tmp = deepcopy(psd)
for s in 0.1:0.2:1.6
plot!(tmp, x -> nnmodel([x, s])[1]; label = "NN score function $s")
end
gui()
end
Plots of data distribution with various added noise levels
σlist = [0.0005, 0.05, 0.1, 0.5, 1, 5]
nσ = length(σlist)
pl = Array{Any}(undef, nσ)
for (i, σ) in enumerate(σlist)
local mix = MixtureModel(Normal, [(d,σ) for d in data])
xm = range(0, 20, step = min(σ/5, 0.1))
local tmp = pdf(mix).(xm)
pl[i] = plot(xm, tmp; color = :red,
xaxis = (L"x", (0, 20), [0, gamma_mean, 18]),
yaxis = (L"q_σ(x)", [0,]), title = "σ = $σ",
)
# plot!(pl, xm, tmp / maximum(tmp), label = "σ = $σ")
end
plot(pl...)
# Plots.savefig("data-qsig.pdf")
prompt()
Reproducibility
This page was generated with the following version of Julia:
using InteractiveUtils: versioninfo
io = IOBuffer(); versioninfo(io); split(String(take!(io)), '\n')
11-element Vector{SubString{String}}:
"Julia Version 1.11.6"
"Commit 9615af0f269 (2025-07-09 12:58 UTC)"
"Build Info:"
" Official https://julialang.org/ release"
"Platform Info:"
" OS: Linux (x86_64-linux-gnu)"
" CPU: 4 × AMD EPYC 7763 64-Core Processor"
" WORD_SIZE: 64"
" LLVM: libLLVM-16.0.6 (ORCJIT, znver3)"
"Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)"
""
And with the following package versions
import Pkg; Pkg.status()
Status `~/work/ScoreMatching.jl/ScoreMatching.jl/docs/Project.toml`
[31c24e10] Distributions v0.25.120
[e30172f5] Documenter v1.14.1
[587475ba] Flux v0.16.5
[f6369f11] ForwardDiff v1.0.1
[b964fa9f] LaTeXStrings v1.4.0
[98b081ad] Literate v2.20.1
[170b2178] MIRTjim v0.25.0
[429524aa] Optim v1.13.2
[91a5bcdd] Plots v1.40.18
[0d6d0290] ScoreMatching v0.0.1 `~/work/ScoreMatching.jl/ScoreMatching.jl`
[2913bbd2] StatsBase v0.34.6
[1986cc42] Unitful v1.24.0
[b77e0a4c] InteractiveUtils v1.11.0
[37e2e46d] LinearAlgebra v1.11.0
[9a3f8284] Random v1.11.0
This page was generated using Literate.jl.