Sparse spectral clustering (SSC)
This example illustrates sparse spectral clustering using POGM applied to simulated data and (todo) hand-written digits.
Original version by Javier Salazar Cavazos.
This page comes from a single Julia file: ssc.jl
.
You can access the source code for such Julia documentation using the 'Edit on GitHub' link in the top right. You can view the corresponding notebook in nbviewer here: ssc.ipynb
, or open it in binder here: ssc.ipynb
.
Setup
Add the Julia packages used in this demo. Change false
to true
in the following code block if you are using any of the following packages for the first time.
if false
import Pkg
Pkg.add([
"Clustering"
"InteractiveUtils"
"LaTeXStrings"
"LinearAlgebra"
"MIRT"
"MIRTjim"
"MLDatasets"
"Plots"
"Random"
])
end
Tell Julia to use the following packages. Run Pkg.add()
in the preceding code block first, if needed.
using Clustering: kmeans
using InteractiveUtils: versioninfo
using LinearAlgebra: Diagonal, eigen, I, opnorm
using MIRT: pogm_restart
using MIRTjim: jim, prompt
using Plots: default, gui, palette, plot, plot!, scatter, scatter!
using Random: seed!
default(); default(markersize=5, markerstrokecolor=:auto, label="")
The following line is helpful when running this file as a script; this way it will prompt user to hit a key after each figure is displayed.
isinteractive() ? jim(:prompt, true) : prompt(:draw);
Synthetic data
Generate synthetic data points in ℝ² that lie along K = 2
subspaces in the span of (1,1)
and (1,-1)
.
seed!(3) # fix random generation for better debugging
xval = -9.5:1:9.5 # x locations before adding noise
N = length(xval)
σ = 0.5
x1 = xval + σ * randn(N) # noisy data that lies on union of subspaces
x2 = xval + σ * randn(N)
y1 = 1 * x1 .+ σ * randn(N) # y=x and y=-x are the 2 subspaces
y2 = -1 * x2 .+ σ * randn(N)
data = [ [x1';y1'] [x2';y2'] ] # gathering data to one matrix
clusters = [1*ones(20,1); 2*ones(20,1)]; # clusters of our data to compare method
permuteOrder = randperm(40)[1:40] # permute our points in data matrix to be safe data = data[:,permuteOrder] clusters = clusters[permuteOrder]
plot(aspect_ratio = 1, size = (550, 500))
plot!(xval, 1 .* xval) # plot subspaces and data points
plot!(xval, -1 .* xval)
scatter!(x1, y1, color=1)
scatter!(x2, y2, color=2)
POGM for SSC
Solve the SSC problem with the self-representation cost function `$\arg\min_C ‖Y - Y (M ⊙ C)‖_F² + λ ‖ C ‖_{1,1}$ where M
is a mask matrix that is unity everywhere except 0 along the diagonal that forces each column of Y
to be represented as a (sparse) linear combination of other columns of Y
. The regularizer encourages sparsity of C
.
POGM is an optimal accelerated method for convex composite cost functions.
- https://doi.org/10.1007/s10957-018-1287-4
- https://doi.org/10.1137/16m108104x
z_lam = 0.001 # regularization parameter for sparsity term
Lf = opnorm(data,2)^2 # Lipschitz constant for ∇f: smooth term of obj function
npoint = size(data,2) # total # of points
x0 = zeros(npoint, npoint) # initialize solution
M = 1 .- Diagonal(ones(npoint)) # mask to force diag(C)=0
40×40 Matrix{Float64}:
0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 … 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 … 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
⋮ ⋮ ⋱ ⋮
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 … 1.0 1.0 0.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
smooth grad is -Y'(Y-YC) but must force diag elements to zero for each step
grad = x -> M .* (-1*data' * (data - data*x)) # gradient of smooth term
soft = (x,t) -> sign(x) * max(abs(x) - t, 0) # soft threshold at t
g_prox = (z,c) -> soft.(z, c * z_lam) # proximal operator for c*b*|x|_1
niter = 1000
A, _ = pogm_restart(x0, x->0, grad, Lf ; g_prox, niter) # POGM method
jim(A, "A")
Spectral clustering
Cluster via a spectral method; see:
- https://doi.org/10.1109/JSTSP.2018.2867446
- https://doi.org/10.1109/TPAMI.2013.57
W = transpose(abs.(A)) + abs.(A) # Weight matrix, force Hermitian
jim(W, "W")
D = vec(sum(W, dims=2)) # degree matrix of graph
D = D .^ (-1/2)
D = Diagonal(D) # normalized symmetric Laplacian formula
L = I - D * W * D
jim(L, "L")
E = eigen(L) # eigen value decomposition, really only need vectors
LinearAlgebra.Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
40-element Vector{Float64}:
-9.71445146547012e-17
0.06112101364991481
0.9202994484406659
0.9568116269376992
0.9852162858216016
0.9886201157528227
0.9937203703444983
0.9963553694456465
0.997148763398926
0.9982645957708101
⋮
1.1096778685266817
1.1104577168607206
1.1195119152994244
1.125673850892635
1.1277911531508884
1.1377966135287103
1.1429153680248443
1.2951510341125019
1.370835497118999
vectors:
40×40 Matrix{Float64}:
0.262313 0.257203 0.101777 … -0.0781241 -0.233005 -0.274181
0.21634 0.227126 -0.107173 -0.165156 -0.0401929 -0.0378697
0.190253 0.200731 -0.196159 0.0188781 0.0798783 0.0891692
0.191667 0.202619 -0.194261 0.0124655 0.0776446 0.0848699
0.136296 0.133431 -0.0796271 0.114869 0.147155 0.190354
0.119264 0.117862 0.0354104 … 0.0253161 0.155109 0.196964
0.105326 0.101003 -0.0297658 0.0718794 0.123543 0.166509
0.0777727 0.074132 0.13793 0.0265844 0.12726 0.161925
0.0692666 0.0527311 0.253549 0.00428898 0.104934 0.172347
0.0289793 0.0258633 0.219914 -0.151688 0.0696589 0.0935106
⋮ ⋱
0.0634652 -0.0565571 0.216141 -0.0562887 -0.153035 0.117821
0.0812732 -0.081869 0.107397 0.0174493 -0.177143 0.114656
0.0830172 -0.0786558 0.0649302 0.0563871 -0.167372 0.119701
0.101982 -0.101452 -0.0547956 0.0601086 -0.161605 0.111172
0.114586 -0.122118 0.00308471 … 0.0552411 -0.217562 0.127193
0.193082 -0.201003 -0.196423 0.0499534 -0.0989817 0.0775557
0.206192 -0.216726 -0.112042 -0.0317989 -0.0555372 0.0507629
0.234188 -0.240877 -0.0832709 -0.273439 0.15469 -0.0820361
0.273093 -0.262176 0.195442 0.256807 0.318525 -0.312216
eigenVectors = eigvecs(L) for K=2 subspaces we pick the bottom K eigenvectors (smallest λ)
eigenVectors = E.vectors[:, 1:2]
K = 2
2
K subspaces so we look for K clusters in rows of eigenvectors
results = kmeans(eigenVectors', K)
assign = results.assignments; # store assignments
seriescolor = palette([:orange, :skyblue], 2)
p4 = scatter(eigenVectors[:,1], eigenVectors[:,2],
title="Spectral Embedding Plot",
marker_z = clusters;
seriescolor,
)
plot truth on the left
p1 = scatter(data[1,:], data[2,:];
aspect_ratio = 1, size = (550, 450),
xlims = (-11,11), ylims = (-11,11),
marker_z = clusters,
seriescolor,
title = "Truth",
)
plot ssc results on the right
p2 = scatter(data[1,:], data[2,:];
aspect_ratio = 1, size = (550, 450),
xlims = (-11,11), ylims = (-11,11),
marker_z = assign,
seriescolor,
title = "SSC (POGM)",
)
p12 = plot(p1, p2, layout = (1, 2), size=(1100, 450))
Reproducibility
This page was generated with the following version of Julia:
using InteractiveUtils: versioninfo
io = IOBuffer(); versioninfo(io); split(String(take!(io)), '\n')
11-element Vector{SubString{String}}:
"Julia Version 1.11.5"
"Commit 760b2e5b739 (2025-04-14 06:53 UTC)"
"Build Info:"
" Official https://julialang.org/ release"
"Platform Info:"
" OS: Linux (x86_64-linux-gnu)"
" CPU: 4 × AMD EPYC 7763 64-Core Processor"
" WORD_SIZE: 64"
" LLVM: libLLVM-16.0.6 (ORCJIT, znver3)"
"Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)"
""
And with the following package versions
import Pkg; Pkg.status()
Status `~/work/book-la-demo/book-la-demo/docs/Project.toml`
[6e4b80f9] BenchmarkTools v1.6.0
[aaaa29a8] Clustering v0.15.8
[35d6a980] ColorSchemes v3.29.0
⌅ [3da002f7] ColorTypes v0.11.5
⌃ [c3611d14] ColorVectorSpace v0.10.0
[717857b8] DSP v0.8.4
[72c85766] Demos v0.1.0 `~/work/book-la-demo/book-la-demo`
[e30172f5] Documenter v1.13.0
[4f61f5a4] FFTViews v0.3.2
[7a1cc6ca] FFTW v1.9.0
[587475ba] Flux v0.16.4
[a09fc81d] ImageCore v0.10.5
[71a99df6] ImagePhantoms v0.8.1
[b964fa9f] LaTeXStrings v1.4.0
[7031d0ef] LazyGrids v1.1.0
[599c1a8e] LinearMapsAA v0.12.0
[98b081ad] Literate v2.20.1
[7035ae7a] MIRT v0.18.2
[170b2178] MIRTjim v0.25.0
[eb30cadb] MLDatasets v0.7.18
[efe261a4] NFFT v0.13.7
[6ef6ca0d] NMF v1.0.3
[15e1cf62] NPZ v0.4.3
[0b1bfda6] OneHotArrays v0.2.10
[429524aa] Optim v1.13.2
[91a5bcdd] Plots v1.40.14
[f27b6e38] Polynomials v4.0.20
[2913bbd2] StatsBase v0.34.5
[d6d074c3] VideoIO v1.1.1
[b77e0a4c] InteractiveUtils v1.11.0
[37e2e46d] LinearAlgebra v1.11.0
[44cfe95a] Pkg v1.11.0
[9a3f8284] Random v1.11.0
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`
This page was generated using Literate.jl.