Vector outer product

This example illustrates different ways of computing vector outer products using the Julia language.

This page comes from a single Julia file: outer.jl.

You can access the source code for such Julia documentation using the 'Edit on GitHub' link in the top right. You can view the corresponding notebook in nbviewer here: outer.ipynb, or open it in binder here: outer.ipynb.

Setup

Add the Julia packages used in this demo. Change false to true in the following code block if you are using any of the following packages for the first time.

if false
    import Pkg
    Pkg.add([
        "BenchmarkTools"
        "InteractiveUtils"
        "LazyGrids"
        "LinearAlgebra"
    ])
end

Tell Julia to use the following packages. Run Pkg.add() in the preceding code block first, if needed.

using BenchmarkTools: @benchmark
using InteractiveUtils: versioninfo
using LazyGrids: btime
using LinearAlgebra: mul!

Overview of outer products

The outer product between two vectors x and y is simply x * y'.

This demo explores a couple ways of coding that operation.

We write each method as a function because the most reliable way to benchmark different methods is to use functions.

We are interested in the computation time, not the time spent allocating memory, so we use mutating mul! versions of all methods.

The built-in method:

f0!(out, x, y) = mul!(out, x, y');

Hand-coded double loop

function f1!(out, x, y)
    for j in 1:length(y)
        @simd for i in 1:length(x)
            @inbounds out[i,j] = x[i] * conj(y[j])
        end
    end
    return out
end
f1! (generic function with 1 method)

column times scalar

function f2!(out, x, y)
    for j in 1:length(y)
        @inbounds @. (@view out[:,j]) = x * conj(y[j])
    end
    return out
end
f2! (generic function with 1 method)

Using threads across columns

function f3!(out, x, y)
    Threads.@threads for j in 1:length(y)
        @inbounds @. (@view out[:,j]) = x * conj(y[j])
    end
    return out
end
f3! (generic function with 1 method)

Data for timing tests

M, N = 2^9, 2^10
T = ComplexF32
x = rand(T, M)
y = rand(T, N)
out = Matrix{T}(undef, M, N)
out2 = Matrix{T}(undef, M, N);

Verify the methods are equivalent:

@assert f0!(out,x,y) ≈ f1!(out2,x,y) # why is ≈ needed here?!
@assert f1!(out,x,y) == f2!(out2,x,y)
@assert f1!(out,x,y) == f3!(out2,x,y)

Benchmark the methods

The results will depend on the computer used, of course.

x*y'

t = @benchmark f0!($out, $x, $y)
timeu = t -> btime(t, unit=:μs)
t0 = timeu(t);

double loop

t = @benchmark f1!($out, $x, $y)
timeu = t -> btime(t, unit=:μs)
t1 = timeu(t);

column times scalar

t = @benchmark f2!($out, $x, $y)
t2 = timeu(t);

threads

t = @benchmark f3!($out, $x, $y)
t3 = timeu(t);

Result summary:

["built-in" t0; "double" t1; "column" t2; "thread" t3]
4×2 Matrix{String}:
 "built-in"  "time=267.7μs mem=0 alloc=0"
 "double"    "time=647.4μs mem=0 alloc=0"
 "column"    "time=147.7μs mem=0 alloc=0"
 "thread"    "time=150.7μs mem=608 alloc=7"

Remarks

With Julia 1.9 a 2017 iMac with 8 threads (4.2 GHz Quad-Core Intel i7), the results are

  • "time=235.7μs mem=0 alloc=0"
  • "time=153.9μs mem=0 alloc=0"
  • "time=153.0μs mem=0 alloc=0"
  • "time=54.1μs mem=4096 alloc=43"

Interestingly, the hand coded loop is faster than the built-in mul! for x * y'.

The results in github's cloud may differ, because it uses different CPUs and typically only one thread.

Reproducibility

This page was generated with the following version of Julia:

using InteractiveUtils: versioninfo
io = IOBuffer(); versioninfo(io); split(String(take!(io)), '\n')
11-element Vector{SubString{String}}:
 "Julia Version 1.11.2"
 "Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)"
 "Build Info:"
 "  Official https://julialang.org/ release"
 "Platform Info:"
 "  OS: Linux (x86_64-linux-gnu)"
 "  CPU: 4 × AMD EPYC 7763 64-Core Processor"
 "  WORD_SIZE: 64"
 "  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)"
 "Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)"
 ""

And with the following package versions

import Pkg; Pkg.status()
Status `~/work/book-la-demo/book-la-demo/docs/Project.toml`
  [6e4b80f9] BenchmarkTools v1.5.0
  [aaaa29a8] Clustering v0.15.7
  [35d6a980] ColorSchemes v3.27.1
⌅ [3da002f7] ColorTypes v0.11.5
⌃ [c3611d14] ColorVectorSpace v0.10.0
⌅ [717857b8] DSP v0.7.10
  [72c85766] Demos v0.1.0 `~/work/book-la-demo/book-la-demo`
  [e30172f5] Documenter v1.8.0
  [4f61f5a4] FFTViews v0.3.2
  [7a1cc6ca] FFTW v1.8.0
  [587475ba] Flux v0.15.2
  [a09fc81d] ImageCore v0.10.5
  [71a99df6] ImagePhantoms v0.8.1
  [b964fa9f] LaTeXStrings v1.4.0
  [7031d0ef] LazyGrids v1.0.0
  [599c1a8e] LinearMapsAA v0.12.0
  [98b081ad] Literate v2.20.1
  [7035ae7a] MIRT v0.18.2
  [170b2178] MIRTjim v0.25.0
  [eb30cadb] MLDatasets v0.7.18
  [efe261a4] NFFT v0.13.5
  [6ef6ca0d] NMF v1.0.3
  [15e1cf62] NPZ v0.4.3
  [0b1bfda6] OneHotArrays v0.2.6
  [429524aa] Optim v1.10.0
  [91a5bcdd] Plots v1.40.9
  [f27b6e38] Polynomials v4.0.12
  [2913bbd2] StatsBase v0.34.4
  [d6d074c3] VideoIO v1.1.1
  [b77e0a4c] InteractiveUtils v1.11.0
  [37e2e46d] LinearAlgebra v1.11.0
  [44cfe95a] Pkg v1.11.0
  [9a3f8284] Random v1.11.0
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`

This page was generated using Literate.jl.