Vector dot product

This example illustrates different ways of computing vector dot products using the Julia language.

This page comes from a single Julia file: dot.jl.

You can access the source code for such Julia documentation using the 'Edit on GitHub' link in the top right. You can view the corresponding notebook in nbviewer here: dot.ipynb, or open it in binder here: dot.ipynb.

Setup

Add the Julia packages used in this demo. Change false to true in the following code block if you are using any of the following packages for the first time.

if false
    import Pkg
    Pkg.add([
        "BenchmarkTools"
        "InteractiveUtils"
        "LazyGrids"
        "LinearAlgebra"
    ])
end

Tell Julia to use the following packages. Run Pkg.add() in the preceding code block first, if needed.

using BenchmarkTools: @benchmark
using InteractiveUtils: versioninfo
using LazyGrids: btime
using LinearAlgebra: dot

Overview of dot products

The dot product between two vectors is such a basic method in linear algebra that, of course, Julia has a function dot built-in for it.

In practice one should simply call that dot method.

This demo explores other ways of coding the dot product, to illustrate, in a simple setting, techniques for writing efficient code.

We write each method as a function because the most reliable way to benchmark different methods is to use functions.

The built-in dot method:

f1(x,y) = dot(y,x);

An equivalent method using the adjoint '

It can be written y' * x or *(y', x). By checking @which *(y', x) one can verify that these all call dot.

f2(x,y) = y'x;

Using sum with vector conjugate

This is suboptimal because it must allocate memory for conj(y)

f3(x,y) = sum(conj(y) .* x); # must allocate "conj(y)"

Using zip and sum with a function argument

This approach avoids the needless allocation.

f4(x,y) = sum(z -> z[1] * conj(z[2]), zip(x,y));

A basic for loop like one would write in a low-level language

function f5(x,y)
    accum = zero(promote_type(eltype(x), eltype(y)))
    for i in 1:length(x)
        accum += x[i] * conj(y[i])
    end
    return accum
end;

An advanced for loop that uses bounds checking and SIMD operations

function f6(x,y)
    accum = zero(promote_type(eltype(x), eltype(y)))
    @boundscheck length(x) == length(y) || throw("incompatible")
    @simd for i in 1:length(x)
        @inbounds accum += x[i] * conj(y[i])
    end
    return accum
end;

The Julia fallback method (from source code as of v1.8.1)

This code is what is used for general AbstractArray types.

function f7(x,y)
    accum = zero(promote_type(eltype(x), eltype(y)))
    @boundscheck length(x) == length(y) || throw("incompatible")
    for (ix,iy) in zip(eachindex(x), eachindex(y))
        @inbounds accum += x[ix] * conj(y[iy]) # same as dot(y[iy], x[ix])
    end
    return accum
end;

Data for timing tests

N = 2^16; x = rand(ComplexF32, N); y = rand(ComplexF32, N)
65536-element Vector{ComplexF32}:
  0.14947855f0 + 0.021087527f0im
   0.5866981f0 + 0.76371014f0im
   0.7822265f0 + 0.6409512f0im
 0.099886596f0 + 0.37353855f0im
 0.035921574f0 + 0.9762154f0im
   0.1824488f0 + 0.34989107f0im
   0.5055843f0 + 0.3756556f0im
   0.5917391f0 + 0.5952088f0im
  0.38995582f0 + 0.31110466f0im
   0.7519869f0 + 0.40369594f0im
               ⋮
  0.53670263f0 + 0.09900993f0im
  0.95800006f0 + 0.77594686f0im
   0.8010514f0 + 0.478539f0im
  0.74888307f0 + 0.67354673f0im
 0.027859032f0 + 0.5965502f0im
   0.9826501f0 + 0.08740562f0im
  0.14444572f0 + 0.9518465f0im
   0.8514663f0 + 0.2091958f0im
  0.20708722f0 + 0.49118364f0im

Verify the methods are equivalent

@assert f1(x,y) == f2(x,y) ≈ f3(x,y) ≈ f4(x,y) ≈ f5(x,y) ≈ f6(x,y) ≈ f7(x,y)

Benchmark the methods

The results will depend on the computer used, of course.

y'x

t = @benchmark f1($x,$y)
timeu = t -> btime(t, unit=:μs)
timeu(t)
"time=13.0μs mem=0 alloc=0"

dot(y,x)

t = @benchmark f2($x,$y)
timeu(t)
"time=13.0μs mem=0 alloc=0"

sum with conj()

t = @benchmark f3($x,$y)
timeu(t)
"time=124.1μs mem=1048672 alloc=4"

zip sum

t = @benchmark f4($x,$y)
timeu(t)
"time=68.0μs mem=0 alloc=0"

basic loop

t = @benchmark f5($x,$y)
timeu(t)
"time=61.3μs mem=0 alloc=0"

fancy loop with @inbounds & @simd

t = @benchmark f6($x,$y)
timeu(t)
"time=61.3μs mem=0 alloc=0"

zip accum loop

t = @benchmark f7($x,$y)
timeu(t)
"time=61.0μs mem=0 alloc=0"

Remarks

The built-in dot method is the fastest. Behind the scenes it calls BLAS.dot which is highly optimized because it uses cpu specific assembly code based on Single instruction, multiple data (SIMD) to perform, say, 4 multiplies in a single instruction. Thus the basic loop is several times slower than dot().

Sometimes we can speed up code by promising the Julia compiler that array indexing operations like x[i] are valid, by adding the @inbounds macro.

Depending on the CPU, using @simd and @inbounds can lead to speeds close to that of dot.

The promote_type function ensures that the accumulator uses the better precision of the two arguments.

Reproducibility

This page was generated with the following version of Julia:

using InteractiveUtils: versioninfo
io = IOBuffer(); versioninfo(io); split(String(take!(io)), '\n')
12-element Vector{SubString{String}}:
 "Julia Version 1.10.1"
 "Commit 7790d6f0641 (2024-02-13 20:41 UTC)"
 "Build Info:"
 "  Official https://julialang.org/ release"
 "Platform Info:"
 "  OS: Linux (x86_64-linux-gnu)"
 "  CPU: 4 × AMD EPYC 7763 64-Core Processor"
 "  WORD_SIZE: 64"
 "  LIBM: libopenlibm"
 "  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)"
 "Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)"
 ""

And with the following package versions

import Pkg; Pkg.status()
Status `~/work/book-la-demo/book-la-demo/docs/Project.toml`
  [6e4b80f9] BenchmarkTools v1.5.0
  [aaaa29a8] Clustering v0.15.7
  [35d6a980] ColorSchemes v3.24.0
  [3da002f7] ColorTypes v0.11.4
⌅ [c3611d14] ColorVectorSpace v0.9.10
  [717857b8] DSP v0.7.9
  [72c85766] Demos v0.1.0 `~/work/book-la-demo/book-la-demo`
  [e30172f5] Documenter v1.2.1
  [4f61f5a4] FFTViews v0.3.2
  [7a1cc6ca] FFTW v1.8.0
  [587475ba] Flux v0.14.12
⌅ [a09fc81d] ImageCore v0.9.4
  [71a99df6] ImagePhantoms v0.7.2
  [b964fa9f] LaTeXStrings v1.3.1
  [7031d0ef] LazyGrids v0.5.0
  [599c1a8e] LinearMapsAA v0.11.0
  [98b081ad] Literate v2.16.1
  [7035ae7a] MIRT v0.17.0
  [170b2178] MIRTjim v0.23.0
  [eb30cadb] MLDatasets v0.7.14
  [efe261a4] NFFT v0.13.3
  [6ef6ca0d] NMF v1.0.2
  [15e1cf62] NPZ v0.4.3
  [0b1bfda6] OneHotArrays v0.2.5
  [429524aa] Optim v1.9.2
  [91a5bcdd] Plots v1.40.1
  [f27b6e38] Polynomials v4.0.6
  [2913bbd2] StatsBase v0.34.2
  [d6d074c3] VideoIO v1.0.9
  [b77e0a4c] InteractiveUtils
  [37e2e46d] LinearAlgebra
  [44cfe95a] Pkg v1.10.0
  [9a3f8284] Random
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`

This page was generated using Literate.jl.