Video foreground/background separation
This example illustrates video foreground/background separation via robust PCA using the Julia language. For simplicity, the method here assumes a static camera. For free-motion camera video, see Moore et al., 2019.
This page comes from a single Julia file: foreback.jl
.
You can access the source code for such Julia documentation using the 'Edit on GitHub' link in the top right. You can view the corresponding notebook in nbviewer here: foreback.ipynb
, or open it in binder here: foreback.ipynb
.
Setup
Add the Julia packages used in this demo. Change false
to true
in the following code block if you are using any of the following packages for the first time.
if false
import Pkg
Pkg.add([
"ColorTypes"
"ColorVectorSpace"
"Downloads"
"InteractiveUtils"
"LaTeXStrings"
"LinearAlgebra"
"LinearMapsAA"
"MIRT"
"MIRTjim"
"Plots"
"Statistics"
"VideoIO"
])
end
Tell Julia to use the following packages. Run Pkg.add()
in the preceding code block first, if needed.
using ColorTypes: RGB, N0f8
using ColorVectorSpace
using Downloads: download
using InteractiveUtils: versioninfo
using LaTeXStrings
using LinearAlgebra: Diagonal, I, norm, svd, svdvals
using LinearMapsAA: LinearMapAA, redim
using MIRT: pogm_restart
using MIRTjim: jim, prompt
using Plots: default, gui, plot, savefig
using Plots: gif, @animate, Plots
using Statistics: mean
using VideoIO
default(); default(markerstrokecolor=:auto, label = "", markersize=6,
legendfontsize = 8) # todo: https://github.com/JuliaPlots/Plots.jl/issues/4621
The following line is helpful when running this file as a script; this way it will prompt user to hit a key after each image is displayed.
isinteractive() && prompt(:prompt);
Load video data
Load raw data
if !@isdefined(y1)
url = "https://github.com/JeffFessler/book-la-data/raw/main/data/bmc-12/111-240-320-100.mp4"
tmp = download(url)
y1 = VideoIO.load(tmp) # 100 frames of size (240,320)
if !isinteractive() # downsample for github cloud
y1 = map(y -> y[1:2:end,1:2:end], (@view y1[2:2:end]))
end
end;
Convert to array
yf = map(y -> 1f0*permutedims(y, (2,1)), y1) # 100 frames of size (320,240)
Y3 = stack(yf) # (nx,ny,nf)
(nx, ny, nf) = size(Y3)
py = jim([yf[1], yf[end], yf[end]-yf[1]];
nrow = 1, size = (600, 200),
title="Frame 001 | Frame $nf | Difference")
Cost function
Encoding operator A = [I I]
for L+S because we stack X = [L;;;S]
tmp = LinearMapAA(I(nx*ny*nf);
odim=(nx,ny,nf), idim=(nx,ny,nf), T=Float32, prop=(;name="I"))
tmp = kron([1 1], tmp)
A = redim(tmp; odim=(nx,ny,nf), idim=(nx,ny,nf,2)) # "squeeze" odim
unstack(X, i) = selectdim(X, ndims(X), i)
Lpart = X -> unstack(X, 1) # extract "L" from X
Spart = X -> unstack(X, 2) # extract "S" from X
nucnorm(L::AbstractMatrix) = sum(svdvals(L)) # nuclear norm
nucnorm(L::AbstractArray) = nucnorm(reshape(L, :, nf)); # (nx*ny, nf) for L
The robust PCA optimization cost function is:
\[Ψ(\mathbf{L},\mathbf{S}) = \frac{1}{2} \| \mathbf{L} + \mathbf{S} - \mathbf{Y} \|_{\mathrm{F}}^2 + α \| \mathbf{L} \|_* + β \| \mathrm{vec}(\mathbf{S}) \|_1\]
or equivalently
\[Ψ(\mathbf{X}) = \frac{1}{2} \left\| [\mathbf{I} \ \mathbf{I}] \mathbf{X} - \mathbf{Y} \right\|_{\mathrm{F}}^2 + α \| \mathbf{X}_1 \|_* + β \| \mathrm{vec}(\mathbf{X}_2) \|_1\]
where $\mathbf{Y}$ is the original data matrix.
robust_pca_cost(Y, X, α::Real, β::Real) =
0.5 * norm( A * X - Y )^2 + α * nucnorm(Lpart(X)) + β * norm(Spart(X), 1);
Proximal algorithm helpers:
soft(z,t) = sign(z) * max(abs(z) - t, 0);
Singular value soft thresholding (SVST) function:
function SVST(X, beta)
shape = size(X)
X = reshape(X, :, shape[end]) # unfold
U,s,V = svd(X)
sthresh = @. soft(s, beta)
jj = findall(>(0), sthresh)
out = U[:,jj] * Diagonal(sthresh[jj]) * V[:,jj]'
return reshape(out, shape)
end;
Algorithm
Proximal gradient methods for minimizing robust PCA cost function.
The proximal optimized gradient method (POGM) with adaptive restart is faster than FISTA with very similar computation per iteration. POGM does not require any algorithm tuning parameter, making it easier to use than ADMM in many practical composite optimization problems.
function robust_pca(Y;
L = Y,
S = zeros(size(Y)),
α = 1,
β = 1,
mom = :pogm,
Fcost::Function = X -> robust_pca_cost(Y, X, α, β),
# fun = (iter, xk, yk, is_restart) -> (),
fun = mom === :fgm ?
(iter, xk, yk, is_restart) -> (yk, Fcost(yk), is_restart) :
(iter, xk, yk, is_restart) -> (xk, Fcost(xk), is_restart),
kwargs..., # for pogm_restart
)
X0 = stack([L, S])
f_grad = X -> A' * (A * X - Y) # gradient of smooth term
f_L = 2 # Lipschitz constant of f_grad
g_prox = (X, c) -> stack([SVST(Lpart(X), c * α), soft.(Spart(X), c * β)])
Xhat, out = pogm_restart(X0, Fcost, f_grad, f_L; g_prox, fun, mom, kwargs...)
return Xhat, out
end;
Run algorithm
Apply robust PCA to each RGB color channel separately for simplicity, then reassemble.
channels = [:r :g :b]
if !@isdefined(Xpogm)
α = 30
β = 0.1
niter = 10
Xc = Array{Any}(undef, 3)
out = Array{Any}(undef, 3)
for (i, c) in enumerate(channels) # separate color channels
@info "channel $c"
Yc = map(y -> getfield(y, c), Y3);
Xc[i], out[i] = robust_pca(Yc; α, β, mom = :pogm, niter)
end
Xpogm = map(RGB{Float32}, Xc...) # reassemble colors
end;
[ Info: channel r
[ Info: channel g
[ Info: channel b
Results
Extract low-rank (background) and sparse (foreground) components:
Lpogm = Lpart(Xpogm)
Spogm = Spart(Xpogm)
iz = nf * 81 ÷ 100
tmp = stack([Y3[:,:,iz], Lpogm[:,:,iz], Spogm[:,:,iz]])
jim(:line3type, :white)
pf = jim(tmp; nrow=1, size=(700, 250),
# xaxis = false, yaxis = false, # book
title="Original frame $iz | low-rank background | sparse foreground")
# savefig(pf, "foreback-81.pdf")
Cost function plot overall cost for all 3 color channels
tmp = sum(out -> [o[2] for o in out], out)
pc = plot(0:niter, tmp; marker = :circle, label="POGM", widen = true,
xaxis=("Iteration", (0,niter), 0:2:niter), ylabel="Cost function")
prompt()
Alternatives
Explore simpler methods:
- average of each color channel
- first SVD component of each color channel
if !@isdefined(Xsvd3)
Xmean = Vector{Matrix{Float32}}(undef, 3)
Xsvd3 = Vector{Matrix{Float32}}(undef, 3)
refold = v -> reshape(v, nx, ny)
for (i, c) in enumerate(channels) # separate color channels
@info "channel $c"
tmp_ = map(y -> getfield(y, c), Y3) # (nx,ny,nf)
Xsvd3[i] = refold(svd(reshape(tmp_, :, nf)).U[:,1]) # first component
Xmean[i] = refold(mean(tmp_, dims=3))
end
Xmean = map(RGB{Float32}, Xmean...) # reassemble colors
L1 = Lpogm[:,:,1]
extrema(norm.(Lpogm .- Xmean))
end
(0.0031556103607409546, 0.08418844585069493)
In this case the temporal average of the video sequence is pretty close to the low-rank component, because this video is so simple. The benefits of robust PCA would be more apparent with more complicated videos, e.g., with illumination changes. Even here one can see undesirable effects of the sparse component in the average when we scale the difference image.
pm = jim(
jim(Xmean, "Average"),
jim(L1, L"L_1"),
jim(9 * abs.(L1 - Xmean), "9 × |L_1 - average|"),
# jim(L1 .== Xmean),
)
Examining the first SVD component of each color takes a bit more work. SVD components have unit norm, but RGB values should be in [0,1] range. And the SVD has a sign ambiguity. Even after correcting for those issues, the SVD version has a visibly different color tint.
ps1 = jim(Xsvd3; nrow=1, title="Xsvd before corrections", size=(600,200))
Correct for SVD sign ambiguity
Xsvd = map(x -> x / sign(mean(x)), Xsvd3)
ps2 = jim(Xsvd; nrow=1, title="Xsvd after sign correction", size=(600,200))
Correct for scaling
svdmax = maximum(maximum, Xsvd)
Xsvd = map(x -> x / svdmax, Xsvd)
Xsvd = map(RGB{Float32}, Xsvd...) # reassemble colors
ps = jim(
jim(yf[1], "First frame"),
jim(Lpogm[:,:,1], L"L_1"),
jim(Xsvd, "Xsvd"),
layout = (1,3),
size = (600,200),
)
Explore rank-1 approximation of each color channel.
function lr_channel(arr::Array, r::Int=1)
Xlr = Vector{Array{Float32}}(undef, 3)
for (i, c) in enumerate(channels) # separate color channels
@info "lr_channel $c"
tmp = map(y -> getfield(y, c), arr) # (nx,ny,nf)
tmp = svd(reshape(tmp, :, nf))
tmp = tmp.U[:,1:r] * Diagonal(tmp.S[1:r]) * tmp.Vt[1:r,:]
tmp = reshape(tmp, nx, ny, :)
Xlr[i] = tmp
end
return Xlr
end
if !@isdefined(Xr1)
Xr1 = lr_channel(Y3)
Xr1 = map(RGB{Float32}, Xr1...) # reassemble colors
jim(Xr1)
end;
[ Info: lr_channel r
[ Info: lr_channel g
[ Info: lr_channel b
Examine residual between rank-1 approximation and the original video sequence. The residual has a lot of sparse structure to it, rather than being random noise, suggesting that a robust PCA approach with a low-rank + sparse signal model could be more suitable.
pr = jim(Y3 - Xr1)
Animate videos
anim1 = @animate for it in 1:nf
tmp = stack([Y3[:,:,it], Lpogm[:,:,it], Spogm[:,:,it]])
jim(tmp; nrow=1, title="Original | Low-rank | Sparse",
xlabel = "Frame $it", size=(600, 250))
# gui()
end
gif(anim1; fps = 6)
Reproducibility
This page was generated with the following version of Julia:
using InteractiveUtils: versioninfo
io = IOBuffer(); versioninfo(io); split(String(take!(io)), '\n')
11-element Vector{SubString{String}}:
"Julia Version 1.11.1"
"Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)"
"Build Info:"
" Official https://julialang.org/ release"
"Platform Info:"
" OS: Linux (x86_64-linux-gnu)"
" CPU: 4 × AMD EPYC 7763 64-Core Processor"
" WORD_SIZE: 64"
" LLVM: libLLVM-16.0.6 (ORCJIT, znver3)"
"Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)"
""
And with the following package versions
import Pkg; Pkg.status()
Status `~/work/book-la-demo/book-la-demo/docs/Project.toml`
[6e4b80f9] BenchmarkTools v1.5.0
[aaaa29a8] Clustering v0.15.7
[35d6a980] ColorSchemes v3.27.1
⌅ [3da002f7] ColorTypes v0.11.5
⌃ [c3611d14] ColorVectorSpace v0.10.0
[717857b8] DSP v0.7.10
[72c85766] Demos v0.1.0 `~/work/book-la-demo/book-la-demo`
[e30172f5] Documenter v1.7.0
[4f61f5a4] FFTViews v0.3.2
[7a1cc6ca] FFTW v1.8.0
[587475ba] Flux v0.14.25
[a09fc81d] ImageCore v0.10.4
[71a99df6] ImagePhantoms v0.8.1
[b964fa9f] LaTeXStrings v1.4.0
[7031d0ef] LazyGrids v1.0.0
[599c1a8e] LinearMapsAA v0.12.0
[98b081ad] Literate v2.20.1
[7035ae7a] MIRT v0.18.2
[170b2178] MIRTjim v0.25.0
[eb30cadb] MLDatasets v0.7.18
[efe261a4] NFFT v0.13.5
[6ef6ca0d] NMF v1.0.3
[15e1cf62] NPZ v0.4.3
[0b1bfda6] OneHotArrays v0.2.5
[429524aa] Optim v1.10.0
[91a5bcdd] Plots v1.40.9
[f27b6e38] Polynomials v4.0.11
[2913bbd2] StatsBase v0.34.3
[d6d074c3] VideoIO v1.1.0
[b77e0a4c] InteractiveUtils v1.11.0
[37e2e46d] LinearAlgebra v1.11.0
[44cfe95a] Pkg v1.11.0
[9a3f8284] Random v1.11.0
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`
This page was generated using Literate.jl.