Feature Alignment
TL;DR
Problem: Feature spaces from different models or layers are incompatible. Solution: Convert absolute features to relative affinity features (\(A_{ij} = \text{similarity}(x_i, x_j)\)). Key Insight: Use RBF Auto-scaling to dynamically tune \(\sigma\) so the affinity matrix has a constant mean density, making the alignment robust to scaling and outliers.
Motivation
Deep learning models (e.g., CLIP, DINO, Stable Diffusion) learn robust visual concepts, but represent them in incompatible embedding spaces. Even within the same model, different layers operate in distinct coordinate systems. Direct comparison (e.g., \(L_2\) distance) between these spaces is mathematically invalid.
Feature Alignment maps these disjoint representations into a common space where semantic correspondence is preserved. This enables cross-model visualization, layer-wise analysis, and valid distance computation.
Inspired by Representational Similarity Analysis (RSA), we leverage the insight that while absolute coordinates vary, the relative geometry of concepts remains consistent.
Quick Start
from ncut_pytorch.utils.math import rbf_affinity
import torch
# Features from two different sources (e.g., different models/layers)
features_A = torch.randn(100, 768)
features_B = torch.randn(100, 1024)
# Transform to aligned relative space
# The resulting matrices encode the relative geometry of each set
aligned_A = rbf_affinity(features_A) # Shape: (100, 100)
aligned_B = rbf_affinity(features_B) # Shape: (100, 100)
# Now valid to compare aligned_A and aligned_B using standard metrics
Results
Alignment reveals consistent semantic structures across models.
Case Study: SAM vs. DINO
SAM (Segment Anything Model) focuses on boundaries and edges. DINO captures high-level semantic parts.
| Model | Before Alignment | After Alignment |
|---|---|---|
| SAM | ![]() |
![]() |
| DINO v3 | ![]() |
![]() |
Observe how facial regions (hair, eyes, nose) map to consistent colors after alignment, despite the models' differing original representations.



