Singular Value Decomposition (SVD) for Recommendation

SVD decomposes a user-item rating matrix into latent user and item factors, enabling predictions of missing ratings and powering collaborative filtering recommendation systems.


SVD Decomposition

SVD factorizes a matrix M = U \u03a3 V^T, where U contains user factors, V contains item factors, and \u03a3 contains singular values (importance of each latent factor). Keeping only the top-k singular values gives the best rank-k approximation.

Low-Rank Approximation

<pre><code class="language-python">import numpy as np # Simulate a user-item rating matrix (0 = missing) np.random.seed(42) R = np.array([ [5, 3, 0, 1], [4, 0, 4, 1], [1, 1, 0, 5], [0, 0, 5, 4], [0, 1, 5, 4] ], dtype=float) # Full SVD U, sigma, Vt = np.linalg.svd(R, full_matrices=False) # Keep top-2 latent factors k = 2 R_approx = U[:, :k] @ np.diag(sigma[:k]) @ Vt[:k, :] print("Approximated matrix (rank-2):") print(R_approx.round(2))</pre>

Truncated SVD for Sparse Matrices

For large, sparse user-item matrices (typical in production), truncated SVD is computed efficiently without forming the full matrix. scikit-learn's TruncatedSVD uses randomized SVD for speed.

TruncatedSVD Usage

<pre><code class="language-python">from sklearn.decomposition import TruncatedSVD from scipy.sparse import csr_matrix # Sparse rating matrix R_sparse = csr_matrix(R) svd = TruncatedSVD(n_components=2, random_state=42) U_k = svd.fit_transform(R_sparse) # user embeddings Vt_k = svd.components_ # item embeddings sigma_k = svd.singular_values_ # Reconstruct and predict missing ratings R_pred = U_k @ np.diag(sigma_k) @ Vt_k print("Predicted ratings:") print(R_pred.round(2))</pre>

Making Recommendations

<pre><code class="language-python"># Recommend unrated items for user 0 with highest predicted ratings user_id = 0 predicted_ratings = R_pred[user_id] rated_items = np.where(R[user_id] > 0)[0] unrated = [i for i in range(R.shape[1]) if i not in rated_items] recommendations = sorted(unrated, key=lambda i: predicted_ratings[i], reverse=True) print(f"Recommendations for user {user_id}: items {recommendations}")</pre>

Matrix Factorization Libraries

Production recommendation systems use specialized libraries like Surprise, implicit, or LightFM that support explicit and implicit feedback with regularization.

Beyond Vanilla SVD

Vanilla SVD cannot directly handle missing entries. In practice, the SVD++ and ALS (Alternating Least Squares) algorithms optimize matrix factorization only on observed ratings, handling sparsity correctly. Libraries like surprise provide SVD, NMF, and KNN collaborative filtering with cross-validation support.