M-space Coloring
TL;DR
- Compress: Train a parametric AutoEncoder to map eigenvectors \(Z \in \mathbb{R}^{K}\) to a low-dimensional "Mood Space" (M-space, \(\mathbb{R}^3\)).
- Preserve Structure: Minimize Eigenvector Consistency Loss. The eigenvectors of the compressed 3D points should match the original high-dimensional eigenvectors.
- Maximize Contrast: Use Repulsion Loss to push the embedding to fill the RGB cube, maximizing color contrast and avoiding overcrowding.
- Color: Map the 3D coordinates to RGB values.
M-space Coloring is a parametric dimensionality reduction method designed specifically for visualizing spectral embeddings. It serves as an upgrade to t-SNE or UMAP for coloring purposes, offering better preservation of global structure and smoother, more consistent color transitions.
Why M-space?
While t-SNE and UMAP are excellent for visualizing local structure, they have some limitations when used for coloring:
- Global Structure Distortion: They can sometimes break the global geometry of the data.
- Clumping: They often produce clusters with empty space in between, which wastes the available color space.
M-space addresses these by training a neural network (AutoEncoder) that directly optimizes for spectral consistency.
How M-space Works
The Core Idea
We want to find a mapping \(f: \mathbb{R}^K \to \mathbb{R}^3\) such that if we run Normalized Cut (NCut) on the 3D points, we get the same eigenvectors as the original data.
This ensures that the clustering structure is perfectly preserved in the visualization. If two points are in the same cluster in the high-dimensional space, they will be geometrically close in the 3D M-space.
The Loss Functions
The model is trained with a combination of losses:
-
Eigenvector Consistency Loss (The "Soul" of M-space)
We compute the eigenvectors of the compressed 3D points (let's call them \(\hat{Z}\)) and compare them to the original eigenvectors \(Z\).
\[ \mathcal{L}_{eig} = \| Z Z^\top - \hat{Z} \hat{Z}^\top \|_1 \]This loss ensures that the subspace spanned by the eigenvectors is preserved. It's computationally intensive but ensures the global topology is correct.
-
Repulsion Loss (The "Body" of M-space)
To make full use of the color space (the RGB cube), we want the data to spread out. M-space samples random points in the 3D grid and pushes the data away if they clump too much.
This acts like a "gas" that fills the container (the unit cube), ensuring high contrast in the final coloring.
Comparison with t-SNE / UMAP
| Feature | t-SNE / UMAP | M-space |
|---|---|---|
| Objective | Preserve local distances (neighbor probabilities) | Preserve spectral clustering structure (eigenvectors) |
| Global Structure | Can be distorted | Strongly preserved |
| Color Space Usage | Can form tight, separated islands | Fills the space (via Repulsion Loss) |
Usage
You can use mspace_color directly from ncut_pytorch.color.
If pytorch-lightning is not installed, M-space falls back to a pure PyTorch trainer and emits a one-time install hint. Install pytorch-lightning~=2.0 to keep the Lightning trainer path and its optional logger integrations.