Vector dot product
This example illustrates different ways of computing vector dot products using the Julia language.
This page comes from a single Julia file: dot.jl
.
You can access the source code for such Julia documentation using the 'Edit on GitHub' link in the top right. You can view the corresponding notebook in nbviewer here: dot.ipynb
, or open it in binder here: dot.ipynb
.
Setup
Add the Julia packages used in this demo. Change false
to true
in the following code block if you are using any of the following packages for the first time.
if false
import Pkg
Pkg.add([
"BenchmarkTools"
"InteractiveUtils"
"LazyGrids"
"LinearAlgebra"
])
end
Tell Julia to use the following packages. Run Pkg.add()
in the preceding code block first, if needed.
using BenchmarkTools: @benchmark
using InteractiveUtils: versioninfo
using LazyGrids: btime
using LinearAlgebra: dot
Overview of dot products
The dot product between two vectors is such a basic method in linear algebra that, of course, Julia has a function dot
built-in for it.
In practice one should simply call that dot
method.
This demo explores other ways of coding the dot product, to illustrate, in a simple setting, techniques for writing efficient code.
We write each method as a function because the most reliable way to benchmark different methods is to use functions.
The built-in dot
method:
f1(x,y) = dot(y,x);
An equivalent method using the adjoint '
It can be written y' * x
or *(y', x)
. By checking @which *(y', x)
one can verify that these all call dot
.
f2(x,y) = y'x;
Using sum
with vector conjugate
This is suboptimal because it must allocate memory for conj(y)
f3(x,y) = sum(conj(y) .* x); # must allocate "conj(y)"
Using zip
and sum
with a function argument
This approach avoids the needless allocation.
f4(x,y) = sum(z -> z[1] * conj(z[2]), zip(x,y));
A basic for
loop like one would write in a low-level language
function f5(x,y)
accum = zero(promote_type(eltype(x), eltype(y)))
for i in 1:length(x)
accum += x[i] * conj(y[i])
end
return accum
end;
An advanced for
loop that uses bounds checking and SIMD operations
function f6(x,y)
accum = zero(promote_type(eltype(x), eltype(y)))
@boundscheck length(x) == length(y) || throw("incompatible")
@simd for i in 1:length(x)
@inbounds accum += x[i] * conj(y[i])
end
return accum
end;
The Julia fallback method (from source code as of v1.8.1)
This code is what is used for general AbstractArray
types.
function f7(x,y)
accum = zero(promote_type(eltype(x), eltype(y)))
@boundscheck length(x) == length(y) || throw("incompatible")
for (ix,iy) in zip(eachindex(x), eachindex(y))
@inbounds accum += x[ix] * conj(y[iy]) # same as dot(y[iy], x[ix])
end
return accum
end;
Data for timing tests
N = 2^16; x = rand(ComplexF32, N); y = rand(ComplexF32, N)
65536-element Vector{ComplexF32}:
0.07220274f0 + 0.12917209f0im
0.08736408f0 + 0.3162822f0im
0.3744228f0 + 0.7122413f0im
0.88951254f0 + 0.17039335f0im
0.1378656f0 + 0.6426035f0im
0.08065158f0 + 0.35964632f0im
0.18257457f0 + 0.9571821f0im
0.08358139f0 + 0.5920267f0im
0.3477599f0 + 0.26871932f0im
0.43255037f0 + 0.6908962f0im
⋮
0.80309445f0 + 0.83026934f0im
0.22034061f0 + 0.48742265f0im
0.10962021f0 + 0.64986295f0im
0.07547808f0 + 0.8553651f0im
0.94202656f0 + 0.22452813f0im
0.72354704f0 + 0.86859953f0im
0.74631196f0 + 0.14737839f0im
0.5308883f0 + 0.9761387f0im
0.5186516f0 + 0.7543823f0im
Verify the methods are equivalent
@assert f1(x,y) == f2(x,y) ≈ f3(x,y) ≈ f4(x,y) ≈ f5(x,y) ≈ f6(x,y) ≈ f7(x,y)
Benchmark the methods
The results will depend on the computer used, of course.
y'x
t = @benchmark f1($x,$y)
timeu = t -> btime(t, unit=:μs)
timeu(t)
"time=13.0μs mem=0 alloc=0"
dot(y,x)
t = @benchmark f2($x,$y)
timeu(t)
"time=13.0μs mem=0 alloc=0"
sum with conj()
t = @benchmark f3($x,$y)
timeu(t)
"time=105.3μs mem=1048704 alloc=6"
zip sum
t = @benchmark f4($x,$y)
timeu(t)
"time=67.9μs mem=0 alloc=0"
basic loop
t = @benchmark f5($x,$y)
timeu(t)
"time=61.0μs mem=0 alloc=0"
fancy loop with @inbounds & @simd
t = @benchmark f6($x,$y)
timeu(t)
"time=61.1μs mem=0 alloc=0"
zip accum loop
t = @benchmark f7($x,$y)
timeu(t)
"time=60.9μs mem=0 alloc=0"
Remarks
The built-in dot
method is the fastest. Behind the scenes it calls BLAS.dot
which is highly optimized because it uses cpu specific assembly code based on Single instruction, multiple data (SIMD) to perform, say, 4 multiplies in a single instruction. Thus the basic loop is several times slower than dot()
.
Sometimes we can speed up code by promising the Julia compiler that array indexing operations like x[i]
are valid, by adding the @inbounds
macro.
Depending on the CPU, using @simd
and @inbounds
can lead to speeds close to that of dot
.
The promote_type
function ensures that the accumulator uses the better precision of the two arguments.
Reproducibility
This page was generated with the following version of Julia:
using InteractiveUtils: versioninfo
io = IOBuffer(); versioninfo(io); split(String(take!(io)), '\n')
11-element Vector{SubString{String}}:
"Julia Version 1.11.2"
"Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)"
"Build Info:"
" Official https://julialang.org/ release"
"Platform Info:"
" OS: Linux (x86_64-linux-gnu)"
" CPU: 4 × AMD EPYC 7763 64-Core Processor"
" WORD_SIZE: 64"
" LLVM: libLLVM-16.0.6 (ORCJIT, znver3)"
"Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)"
""
And with the following package versions
import Pkg; Pkg.status()
Status `~/work/book-la-demo/book-la-demo/docs/Project.toml`
[6e4b80f9] BenchmarkTools v1.5.0
[aaaa29a8] Clustering v0.15.7
[35d6a980] ColorSchemes v3.27.1
⌅ [3da002f7] ColorTypes v0.11.5
⌃ [c3611d14] ColorVectorSpace v0.10.0
⌅ [717857b8] DSP v0.7.10
[72c85766] Demos v0.1.0 `~/work/book-la-demo/book-la-demo`
[e30172f5] Documenter v1.8.0
[4f61f5a4] FFTViews v0.3.2
[7a1cc6ca] FFTW v1.8.0
[587475ba] Flux v0.15.2
[a09fc81d] ImageCore v0.10.5
[71a99df6] ImagePhantoms v0.8.1
[b964fa9f] LaTeXStrings v1.4.0
[7031d0ef] LazyGrids v1.0.0
[599c1a8e] LinearMapsAA v0.12.0
[98b081ad] Literate v2.20.1
[7035ae7a] MIRT v0.18.2
[170b2178] MIRTjim v0.25.0
[eb30cadb] MLDatasets v0.7.18
[efe261a4] NFFT v0.13.5
[6ef6ca0d] NMF v1.0.3
[15e1cf62] NPZ v0.4.3
[0b1bfda6] OneHotArrays v0.2.6
[429524aa] Optim v1.10.0
[91a5bcdd] Plots v1.40.9
[f27b6e38] Polynomials v4.0.12
[2913bbd2] StatsBase v0.34.4
[d6d074c3] VideoIO v1.1.1
[b77e0a4c] InteractiveUtils v1.11.0
[37e2e46d] LinearAlgebra v1.11.0
[44cfe95a] Pkg v1.11.0
[9a3f8284] Random v1.11.0
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`
This page was generated using Literate.jl.