SketchySVD.jl
Fast, memory-efficient streaming SVD for Julia
SketchySVD.jl provides state-of-the-art algorithms for computing low-rank approximations of large matrices using randomized sketching techniques. Based on the SIAM paper by Tropp et al. (2019), this package enables:
- Streaming SVD: Process data that doesn't fit in memory
- Randomized SVD: 10-50x faster than standard SVD with minimal accuracy loss
- Multiple sketching methods: Gaussian, Sparse, and SSRFT reduction maps
- Online learning: Track evolving low-rank structure in non-stationary data
Why SketchySVD?
🚀 Fast
Randomized algorithms are orders of magnitude faster than standard SVD for large matrices:
# Standard SVD: ~30 seconds for 10000×5000 matrix
U, S, V = svd(A)
# Randomized SVD: ~2 seconds with comparable accuracy
U, S, V = rsvd(A, 50)💾 Memory Efficient
Process matrices larger than available RAM by streaming:
# Matrix: 100GB, RAM: 16GB → No problem!
sketchy = init_sketchy(100_000, 50_000, 100; ReduxMap=:sparse)
for j in 1:n
x = load_column_from_disk(j)
increment!(sketchy, x, j)
end
U, S, V = finalize(sketchy)🎯 Accurate
Theoretical guarantees ensure controlled approximation error:
# Error estimation built-in
sketchy = init_sketchy(m, n, r; estimate_error=true)
# ... process data ...
U, S, V, est_err = finalize(sketchy)
println("Estimated relative error: $est_err")⚡ Flexible
Choose the right algorithm for your problem:
- Gaussian: Best accuracy, general matrices
- Sparse: Fastest for large-scale problems
- SSRFT: Optimal for structured/periodic data
Quick Start
Installation
Install from the Julia package registry:
using Pkg
Pkg.add("SketchySVD")Or install the development version:
Pkg.add(url="https://github.com/username/SketchySVD.jl")Your First Computation
Compute a rank-50 approximation in three lines:
using SketchySVD, LinearAlgebra
A = randn(5000, 2000) # Your matrix
U, S, V = rsvd(A, 50) # Randomized SVD
println("Approximation error: ", norm(A - U*Diagonal(S)*V') / norm(A))Streaming Large Data
Process data that doesn't fit in memory:
sketchy = init_sketchy(m, n, r) # Initialize
for j in 1:n
x = load_column(j) # Get data
increment!(sketchy, x, j) # Update sketch
end
U, S, V = finalize(sketchy) # Extract SVDDocumentation Overview
For New Users
- Getting Started: Installation, basic usage, parameter selection
- Examples: Complete examples for common applications
- API Reference: Function documentation
For Advanced Users
- Mathematical Theory: Sketching framework and theoretical guarantees
- Sketching Algorithms: Detailed algorithm walkthrough
- Dimension Reduction Maps: Comparison of Gaussian, Sparse, SSRFT
- Randomized SVD: Batch processing algorithms
Key Algorithms
Randomized SVD (Batch)
For matrices that fit in memory, use rsvd():
U, S, V = rsvd(A, r) # Basic usage
U, S, V = rsvd(A, r; p=10) # With oversampling
U, S, V = rsvd(A, r; q=2) # With power iterations
U, S, V = rsvd(A, r; redux=:sparse) # Sparse sketchingUse when: Full matrix available, batch processing, quick prototyping.
Sketchy SVD (Streaming)
For streaming data, use init_sketchy() + increment!() + finalize():
sketchy = init_sketchy(m, n, r; ReduxMap=:sparse, γ=0.99)
for j in 1:n
increment!(sketchy, x_j, j)
end
U, S, V = finalize(sketchy)Use when: Data arrives incrementally, memory constrained, online learning.
Usage Examples
Example 1: Random Matrix Decomposition
See scripts/random_matrix.jl for a complete example of decomposing a random matrix and comparing results with the exact SVD.
using SketchySVD
using LinearAlgebra
using Test
# Generate test matrix
m, n, r = 1000, 500, 50
A = randn(m, n)
# Initialize and compute
sketchy = init_sketchy(m=m, n=n, r=r; method=:ssrft)
full_increment!(sketchy, A)
finalize!(sketchy)
# Compare with exact SVD
U_exact, Σ_exact, V_exact = svd(A)
error = norm(Σ_exact[1:r] - sketchy.Σ) / norm(Σ_exact[1:r])
println("Relative error in singular values: ", error)Example 2: Viscous Burgers Equation
See scripts/burgers.jl for an application to fluid dynamics data.
using SketchySVD
using CSV, DataFrames
# Load data
data = CSV.read("scripts/data/viscous_burgers1d_states.csv", DataFrame)
A = Matrix(data)
m, n = size(A)
# Compute sketchy SVD
r = 10
sketchy = init_sketchy(m=m, n=n, r=r; method=:gauss)
full_increment!(sketchy, A)
finalize!(sketchy)
# Analyze dominant modes
println("Top 5 singular values: ", sketchy.Σ[1:5])Sketching Methods
Gaussian Random Projection (:gauss)
Projects data using Gaussian random matrices. Provides good accuracy with moderate computational cost.
Sparse Random Projection (:sparse)
Uses sparse random matrices for projection. Faster than Gaussian for very large matrices with similar accuracy.
Subsampled Randomized Fourier Transform (:ssrft)
Leverages FFT for fast random projections. Most efficient for structured matrices.
Performance Tips
- Choose appropriate rank: Set
rto capture desired accuracy while minimizing computation - Adjust oversampling: Increase for better accuracy, decrease for speed
- Select optimal method:
- Use
:ssrftfor structured/large matrices - Use
:sparsefor very large data - Use
:gaussfor general matrices
- Use
- Batch processing: Use incremental updates for streaming data
Testing
Run the test suite:
using Pkg
Pkg.test("SketchySVD")Or run individual examples:
include("scripts/random_matrix.jl")
include("scripts/burgers.jl")Project Structure
SketchySVD.jl/
├── src/
│ ├── SketchySVD.jl # Main module
│ ├── sketchy.jl # Core data structures
│ ├── increment.jl # Incremental update functions
│ ├── finalize.jl # Finalization and SVD computation
│ └── redux/
│ ├── dimredux.jl # Dimension reduction interface
│ ├── gauss.jl # Gaussian sketching
│ ├── sparse.jl # Sparse sketching
│ └── ssrft.jl # SSRFT sketching
├── scripts/
│ ├── random_matrix.jl # Example: random matrix SVD
│ ├── burgers.jl # Example: Burgers equation
│ └── data/
│ └── viscous_burgers1d_states.csv
└── Project.tomlContributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
References
- J. A. Tropp, A. Yurtsever, M. Udell, and V. Cevher, “Streaming Low-Rank Matrix Approximation with an Application to Scientific Simulation,” SIAM J. Sci. Comput., vol. 41, no. 4, pp. A2430–A2463, Jan. 2019, doi: 10.1137/18M1201068.
- N. Halko, P. G. Martinsson, and J. A. Tropp, “Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions,” SIAM Rev., vol. 53, no. 2, pp. 217–288, Jan. 2011, doi: 10.1137/090771806.
Contact
Tomoki Koike (tkoike@gatech.edu)