Skip to content

Feature Alignment

TL;DR

Problem: Feature spaces from different models or layers are incompatible. Solution: Convert absolute features to relative affinity features (\(A_{ij} = \text{similarity}(x_i, x_j)\)). Key Insight: Use RBF Auto-scaling to dynamically tune \(\sigma\) so the affinity matrix has a constant mean density, making the alignment robust to scaling and outliers.

Motivation

Deep learning models (e.g., CLIP, DINO, Stable Diffusion) learn robust visual concepts, but represent them in incompatible embedding spaces. Even within the same model, different layers operate in distinct coordinate systems. Direct comparison (e.g., \(L_2\) distance) between these spaces is mathematically invalid.

Feature Alignment maps these disjoint representations into a common space where semantic correspondence is preserved. This enables cross-model visualization, layer-wise analysis, and valid distance computation.

Inspired by Representational Similarity Analysis (RSA), we leverage the insight that while absolute coordinates vary, the relative geometry of concepts remains consistent.

Quick Start

from ncut_pytorch.utils.math import rbf_affinity
import torch

# Features from two different sources (e.g., different models/layers)
features_A = torch.randn(100, 768) 
features_B = torch.randn(100, 1024)

# Transform to aligned relative space
# The resulting matrices encode the relative geometry of each set
aligned_A = rbf_affinity(features_A) # Shape: (100, 100)
aligned_B = rbf_affinity(features_B) # Shape: (100, 100)

# Now valid to compare aligned_A and aligned_B using standard metrics

Results

Alignment reveals consistent semantic structures across models.

Case Study: SAM vs. DINO

SAM (Segment Anything Model) focuses on boundaries and edges. DINO captures high-level semantic parts.

Model Before Alignment After Alignment
SAM before after
DINO v3 before after

Observe how facial regions (hair, eyes, nose) map to consistent colors after alignment, despite the models' differing original representations.