Mathematical Background
Introduction to Sketching
Sketching is a dimensionality reduction technique that uses random projections to compress large matrices while preserving their essential structure. The key idea is to multiply a large matrix by a much smaller random matrix, creating a "sketch" that captures the important features of the original data.
The Streaming Low-Rank Approximation Problem
Given a matrix $A \in \mathbb{R}^{m \times n}$ that arrives as a stream of columns $\{a_1, a_2, \ldots, a_n\}$, we want to compute a rank-$r$ approximation:
\[A \approx U \Sigma V^T\]
where:
- $U \in \mathbb{R}^{m \times r}$ contains left singular vectors
- $\Sigma \in \mathbb{R}^{r \times r}$ is diagonal with singular values
- $V \in \mathbb{R}^{n \times r}$ contains right singular vectors
The challenge is to compute this approximation incrementally as columns arrive, without storing the entire matrix $A$.
Sketch-and-Solve Framework
The SketchySVD algorithm [1] maintains four sketches:
- Range sketch $Y \in \mathbb{R}^{m \times k}$: Captures the column space of $A$
- Corange sketch $X \in \mathbb{R}^{k \times n}$: Captures the row space of $A$
- Core sketch $Z \in \mathbb{R}^{s \times s}$: Provides tighter approximation
- Error sketch $E \in \mathbb{R}^{q \times n}$: Estimates approximation error (optional)
These sketches are updated incrementally as each column $a_j$ arrives:
\[\begin{aligned} X_{:,j} &\leftarrow X_{:,j} + \Xi a_j \\ Y &\leftarrow Y + a_j \cdot (\Omega^T)_{j,:} \\ Z &\leftarrow Z + (\Phi a_j) \cdot (\Psi^T)_{j,:} \\ E_{:,j} &\leftarrow E_{:,j} + \Theta a_j \end{aligned}\]
where $\Xi, \Omega, \Phi, \Psi$ are random test matrices (dimension reduction maps), and $\Theta$ is a Gaussian test matrix for error estimation.
Dimension Selection
The sketching dimensions are chosen based on the target rank $r$:
Without storage budget (Equation 5.3 in [1]):
\[\begin{aligned} k &= 4r + \alpha \\ s &= 2k + \alpha \\ q &= m \end{aligned}\]
With storage budget $T$ (Equation 5.5-5.6 in [1]):
\[\begin{aligned} k &= \left\lfloor \frac{\sqrt{(m+n+4\alpha)^2 + 16(T-\alpha^2)} - (m+n+4\alpha)}{8} \right\rfloor \\ s &= \left\lfloor \sqrt{T - k(m+n)} \right\rfloor \end{aligned}\]
where $\alpha = 1$ for real matrices and $\alpha = 0$ for complex matrices.
The constraint $k \leq s \leq \min\{m,n\}$ must be satisfied.
Forgetting Factors
For time-varying data, SketchySVD supports exponential forgetting through parameters $\eta$ (forgetting factor) and $\nu$ (scaling factor):
\[\begin{aligned} X_{:,j} &\leftarrow \eta X_{:,j} + \nu \Xi a_j \\ Y &\leftarrow \eta Y + \nu a_j \cdot (\Omega^T)_{j,:} \\ Z &\leftarrow \eta Z + \nu (\Phi a_j) \cdot (\Psi^T)_{j,:} \end{aligned}\]
Typically:
- $\eta = 1, \nu = 1$: Standard incremental update (stationary data)
- $\eta < 1, \nu = 1$: Exponential forgetting (non-stationary data)
- $\eta = 0, \nu = 1$: Sliding window
Finalization Step
After all columns have been processed, the SVD is recovered from the sketches:
Orthonormalize sketches:
\[Q_Y = \text{orth}(Y), \quad Q_X = \text{orth}(X^T)\]
Form core matrix:
\[C = (\Phi Q_Y)^{-1} Z (Q_X^T \Psi)^{-1}\]
Compute SVD of core:
\[C = \tilde{U} \tilde{\Sigma} \tilde{V}^T\]
Recover final approximation:
\[U = Q_Y \tilde{U}_{:,1:r}, \quad \Sigma = \tilde{\Sigma}_{1:r,1:r}, \quad V = Q_X \tilde{V}_{:,1:r}\]
Error Estimation
When $E$ is maintained, the Frobenius norm error can be estimated as:
\[\|A - U\Sigma V^T\|_F \approx \frac{\|E - \Theta \hat{A}\|_2}{\sqrt{\beta q}}\]
where $\hat{A} = Q_Y C Q_X^T$ is the reconstruction and $\beta = 1$ for real matrices, $\beta = 2$ for complex matrices.
Spectral Decay Analysis
When computing the scree plot, SketchySVD estimates the relative energy captured at each rank:
\[\text{scree}(r) = \left(\frac{\sum_{i=r+1}^s \sigma_i(\hat{A}) + \epsilon}{\|A\|_F}\right)^2\]
where $\epsilon$ is the estimated error and $\sigma_i(\hat{A})$ are singular values of the approximation.
Theoretical Guarantees
Under appropriate conditions [1], the SketchySVD approximation satisfies:
\[\mathbb{E}\left[\|A - U_r\Sigma_r V_r^T\|_F^2\right] \leq \tau_{r+1}^2(A) + 2 \left[ \frac{s-\alpha}{s-k-\alpha} \cdot \min_{\rho < k-\alpha} \frac{k+\rho-\alpha}{k-\rho-\alpha} \cdot \tau_{\rho+1}^2(A) \right]^{1/2}\]
where
\[\tau_{r+1}^2(A) = \sum_{j \geq r} \sigma_j^2(A)\]
and this error depends on the sketching dimensions $k, s$ and decreases as these increase.
The algorithm has:
- Memory complexity: $O(k(m+n) + s^2 + qn)$
- Time complexity: $O(kmn + smn)$ for dense test matrices, less for structured ones
- Update complexity: $O(km + sn)$ per column