Vector outer product
This example illustrates different ways of computing vector outer products using the Julia language.
This page comes from a single Julia file: outer.jl
.
You can access the source code for such Julia documentation using the 'Edit on GitHub' link in the top right. You can view the corresponding notebook in nbviewer here: outer.ipynb
, or open it in binder here: outer.ipynb
.
Setup
Add the Julia packages used in this demo. Change false
to true
in the following code block if you are using any of the following packages for the first time.
if false
import Pkg
Pkg.add([
"BenchmarkTools"
"InteractiveUtils"
"LazyGrids"
"LinearAlgebra"
])
end
Tell Julia to use the following packages. Run Pkg.add()
in the preceding code block first, if needed.
using BenchmarkTools: @benchmark
using InteractiveUtils: versioninfo
using LazyGrids: btime
using LinearAlgebra: mul!
Overview of outer products
The outer product between two vectors x
and y
is simply x * y'
.
This demo explores a couple ways of coding that operation.
We write each method as a function because the most reliable way to benchmark different methods is to use functions.
We are interested in the computation time, not the time spent allocating memory, so we use mutating mul!
versions of all methods.
The built-in method:
f0!(out, x, y) = mul!(out, x, y');
Hand-coded double loop
function f1!(out, x, y)
for j in 1:length(y)
@simd for i in 1:length(x)
@inbounds out[i,j] = x[i] * conj(y[j])
end
end
return out
end
f1! (generic function with 1 method)
column times scalar
function f2!(out, x, y)
for j in 1:length(y)
@inbounds @. (@view out[:,j]) = x * conj(y[j])
end
return out
end
f2! (generic function with 1 method)
Using threads across columns
function f3!(out, x, y)
Threads.@threads for j in 1:length(y)
@inbounds @. (@view out[:,j]) = x * conj(y[j])
end
return out
end
f3! (generic function with 1 method)
Data for timing tests
M, N = 2^9, 2^10
T = ComplexF32
x = rand(T, M)
y = rand(T, N)
out = Matrix{T}(undef, M, N)
out2 = Matrix{T}(undef, M, N);
Verify the methods are equivalent:
@assert f0!(out,x,y) ≈ f1!(out2,x,y) # why is ≈ needed here?!
@assert f1!(out,x,y) == f2!(out2,x,y)
@assert f1!(out,x,y) == f3!(out2,x,y)
Benchmark the methods
The results will depend on the computer used, of course.
x*y'
t = @benchmark f0!($out, $x, $y)
timeu = t -> btime(t, unit=:μs)
t0 = timeu(t);
double loop
t = @benchmark f1!($out, $x, $y)
timeu = t -> btime(t, unit=:μs)
t1 = timeu(t);
column times scalar
t = @benchmark f2!($out, $x, $y)
t2 = timeu(t);
threads
t = @benchmark f3!($out, $x, $y)
t3 = timeu(t);
Result summary:
["built-in" t0; "double" t1; "column" t2; "thread" t3]
4×2 Matrix{String}:
"built-in" "time=262.4μs mem=0 alloc=0"
"double" "time=646.6μs mem=0 alloc=0"
"column" "time=147.3μs mem=0 alloc=0"
"thread" "time=149.6μs mem=608 alloc=7"
Remarks
With Julia 1.9 a 2017 iMac with 8 threads (4.2 GHz Quad-Core Intel i7), the results are
- "time=235.7μs mem=0 alloc=0"
- "time=153.9μs mem=0 alloc=0"
- "time=153.0μs mem=0 alloc=0"
- "time=54.1μs mem=4096 alloc=43"
Interestingly, the hand coded loop is faster than the built-in mul!
for x * y'
.
The results in github's cloud may differ, because it uses different CPUs and typically only one thread.
Reproducibility
This page was generated with the following version of Julia:
using InteractiveUtils: versioninfo
io = IOBuffer(); versioninfo(io); split(String(take!(io)), '\n')
11-element Vector{SubString{String}}:
"Julia Version 1.11.1"
"Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)"
"Build Info:"
" Official https://julialang.org/ release"
"Platform Info:"
" OS: Linux (x86_64-linux-gnu)"
" CPU: 4 × AMD EPYC 7763 64-Core Processor"
" WORD_SIZE: 64"
" LLVM: libLLVM-16.0.6 (ORCJIT, znver3)"
"Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)"
""
And with the following package versions
import Pkg; Pkg.status()
Status `~/work/book-la-demo/book-la-demo/docs/Project.toml`
[6e4b80f9] BenchmarkTools v1.5.0
[aaaa29a8] Clustering v0.15.7
[35d6a980] ColorSchemes v3.27.1
⌅ [3da002f7] ColorTypes v0.11.5
⌃ [c3611d14] ColorVectorSpace v0.10.0
[717857b8] DSP v0.7.10
[72c85766] Demos v0.1.0 `~/work/book-la-demo/book-la-demo`
[e30172f5] Documenter v1.7.0
[4f61f5a4] FFTViews v0.3.2
[7a1cc6ca] FFTW v1.8.0
[587475ba] Flux v0.14.25
[a09fc81d] ImageCore v0.10.4
[71a99df6] ImagePhantoms v0.8.1
[b964fa9f] LaTeXStrings v1.4.0
[7031d0ef] LazyGrids v1.0.0
[599c1a8e] LinearMapsAA v0.12.0
[98b081ad] Literate v2.20.1
[7035ae7a] MIRT v0.18.2
[170b2178] MIRTjim v0.25.0
[eb30cadb] MLDatasets v0.7.18
[efe261a4] NFFT v0.13.5
[6ef6ca0d] NMF v1.0.3
[15e1cf62] NPZ v0.4.3
[0b1bfda6] OneHotArrays v0.2.5
[429524aa] Optim v1.10.0
[91a5bcdd] Plots v1.40.9
[f27b6e38] Polynomials v4.0.11
[2913bbd2] StatsBase v0.34.3
[d6d074c3] VideoIO v1.1.0
[b77e0a4c] InteractiveUtils v1.11.0
[37e2e46d] LinearAlgebra v1.11.0
[44cfe95a] Pkg v1.11.0
[9a3f8284] Random v1.11.0
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`
This page was generated using Literate.jl.