Dimensionality Reduction Techniques in Data Science
Fashion-MNIST has 784 features per image (28×28 pixels). Most of those features are correlated or redundant. Dimensionality reduction is about finding a lower-dimensional representation that preserves what you actually care about, whether that’s variance, geodesic distance, or local neighborhood structure. PCA, Random Projections, Isomap, and t-SNE each define “what you care about” differently.
PCA
PCA finds the directions of maximum variance in the data and projects onto those. It’s linear, which means it won’t capture curved structure, but it’s fast and interpretable.
On Fashion-MNIST, the eigenvalue spectrum drops sharply after the first few components. The scree plot shows an elbow around the 50th component. In practice this means you can discard roughly 94% of the original dimensions and retain most of the variance. What’s left is a subspace that captures the dominant shape variations across clothing categories.
Visualising the first ten eigenvectors as images (the “eigenclothes,” if you want to be slightly ridiculous about it) gives you a sense of what the model learned: the leading components capture coarse silhouette and texture differences between categories, while later components pick up finer details.
In classification experiments, both training and test error fell as the number of components increased, up to about 50 components. Past that, test error started climbing while training error continued falling. Classic overfitting. The model was picking up noise in the low-variance directions.
Random Projections
RP replaces the principled variance-maximising projection of PCA with a random projection matrix drawn from a distribution (usually Gaussian). This sounds like it should be terrible. The Johnson-Lindenstrauss lemma says it isn’t: for large enough target dimension, pairwise distances are approximately preserved with high probability.
At low dimensions, RP loses to PCA on classification error. At higher dimensions (50+), the gap closes. For a Fashion-MNIST application where you’re projecting to 100 dimensions anyway, RP becomes competitive while being dramatically faster to compute. No eigendecomposition, no covariance matrix. Just multiply by a random matrix.
The tradeoff is lack of control. PCA’s projection is deterministic and ordered by variance; you can inspect what each component captures. RP’s projection is random and uninterpretable. For anything that requires understanding what the model learned, PCA wins. For pure performance at scale, RP is worth considering.
Isomap
Isomap targets geodesic distances rather than Euclidean distances. The idea: if the data lies on a curved manifold in the high-dimensional space, Euclidean distances between distant points don’t reflect how far apart they actually are along the manifold. Isomap first constructs a k-nearest neighbors graph, then uses shortest-path distances in that graph as the geodesic approximation.
On Fashion-MNIST, Isomap produced a different pattern of classification error than PCA. For some categories (especially categories with strong non-linear variation in shape, like boots vs. sneakers), Isomap did better than PCA at the same target dimension.
The k parameter matters a lot. Small k keeps the neighborhood graph local but risks disconnecting the manifold for sparse regions. Large k incorporates distant neighbors and degrades the geodesic approximation. I found that k around 10-15 worked well here, but this needed tuning per category.
Isomap is significantly slower than PCA for large datasets. The shortest-path computation scales poorly. For Fashion-MNIST at 70,000 examples, you’re waiting.
t-SNE
t-SNE is not a general-purpose dimensionality reduction method. It’s a visualization tool, specifically designed for projecting to 2 or 3 dimensions while preserving local neighborhood structure. Using it as input to a classifier is a bad idea (the embedding is not deterministic, doesn’t preserve global structure, and can’t embed new points without rerunning the algorithm).
For visualization, though, it’s genuinely useful. The 2D t-SNE embedding of Fashion-MNIST produces reasonably clean clusters for most categories: t-shirts, trousers, and bags separate clearly; boots, ankle boots, and sneakers bleed into each other (which makes sense given how similar they are visually).
Perplexity controls the effective neighborhood size. Low perplexity (5-10) focuses on very local structure and produces many small clusters. High perplexity (50+) incorporates more global structure. The sweet spot for Fashion-MNIST was around 30-50. Below 10, the visualization fractures into too many micro-clusters to be useful.
The overlap between footwear categories in the t-SNE plot is worth paying attention to. A classifier that achieves 90% on the full dataset might be doing much worse specifically on boot/sneaker discrimination. t-SNE can surface that kind of failure mode visually before you run the numbers.
Which method for what
PCA for fast baseline dimensionality reduction where interpretability matters. Random Projections when speed matters and you’re projecting to a dimension high enough that the Johnson-Lindenstrauss bound kicks in. Isomap when you have evidence of curved manifold structure and can afford the compute. t-SNE for visualization only, never as a preprocessing step for classification.
The “which method performs best” framing is less useful than “which method’s assumptions match my data.” Fashion-MNIST is close enough to linear (most of the variance is captured by coarse shape differences) that PCA holds up well. On data with genuine non-linear structure, Isomap and UMAP (which I didn’t test here) would likely pull ahead.