Tutorial 2: Discrete NCut
This tutorial explores the Discrete Normalized Cut (NCut) approach using the K-Way NCut algorithm. Unlike standard NCut, which provides continuous eigenvector representations, Discrete NCut explicitly partitions data into distinct categories. By converting continuous eigenvectors into discrete cluster labels, this method facilitates the segmentation of images into semantically meaningful regions.
Quick Start
The following example demonstrates how to perform discrete segmentation using the NcutDinov3Predictor.
from ncut_pytorch.predictor import NcutDinov3Predictor
from PIL import Image
# Initialize the predictor with a specific model configuration
predictor = NcutDinov3Predictor(model_cfg="dinov3_vitl16")
predictor = predictor.to('cuda')
# Load an input image
# Note: You can also provide a list of images:
# images = [Image.open("view_0.jpg"), Image.open("view_1.jpg")]
images = [Image.open("example.jpg")]
predictor.set_images(images)
# Generate segmentation masks with a specified number of segments
segments = predictor.generate(n_segment=20)
# Create a colored visualization of the segmentation (with borders)
color = predictor.color_discrete(segments, draw_border=True)
output_image = color[0]
# Save the result
output_image.save("segments.jpg")
Understanding K-way NCut Segmentation
The visualizations below illustrate the results of applying K-way NCut to features extracted from a DINOv3 (ViT-H/16+) model. The layout presents the original image, the discrete NCut assignments, and the clustering centroids.
The choice of \(K\) (the number of clusters) significantly impacts the segmentation granularity:
- Larger \(K\): Results in finer segmentation, capturing more detail but potentially introducing noise or over-segmenting coherent objects.
- Smaller \(K\): Produces coarser segmentation, merging distinct areas into broader regions.
- Positional Encoding: You may observe background segmentation patterns; these are often artifacts of the DINO architecture's positional encoding.
Use the tabs below to observe how the segmentation evolves with different values of \(K\).
The Role of K
As demonstrated above, selecting an appropriate \(K\) is a trade-off between detail and interpretability. An optimal \(K\) yields segmentation maps that align closely with perceptually coherent regions or objects, avoiding both the over-segmentation of textures and the under-segmentation of distinct structural elements.
Intermediate Outputs and Implementation Details
For a deeper understanding of the process, we can examine the intermediate outputs, specifically the transition from continuous eigenvectors to discrete clusters.
Click to expand full implementation code
import torch
from ncut_pytorch import Ncut, kway_ncut
# 1. Example features: shape (n, d)
features = torch.rand(1960, 768)
# 2. Compute continuous eigenvectors from NCut, shape (n, k)
# These vectors represent the continuous partitioning of the graph
eigvecs = Ncut(n_eig=20).fit_transform(features) # (1960, 20)
# 3. Align for discretization-friendly basis
# K-way NCut rotates/transforms the eigenvectors to be more axis-aligned
kway_eigvecs = kway_ncut(eigvecs)
# 4. Cluster assignment and (axis-wise) centroids
# Assign each node to the cluster corresponding to the max value in the aligned vector
cluster_assignment = kway_eigvecs.argmax(1)
cluster_centroids = kway_eigvecs.argmax(0)
Visualization: Before vs. After K-way Alignment
The panels below compare the raw eigenvectors from the standard NCut algorithm with the axis-aligned projection channels obtained after K-way alignment.
- Before K-way (NCut Eigenvectors): The eigenvectors show smooth, continuous variations. Early eigenvectors typically capture low-frequency global structures (often nearly constant), while later ones capture higher-frequency details.
- After K-way (Aligned Channels): Applying K-way alignment transforms these vectors into more axis-aligned, unimodal representations. Each channel tends to highlight a specific cluster (e.g., a face or a background region), making the results significantly sharper and easier to discretize.
Before k-way (NCut eigenvectors)
The first row is theoretically near-constant; deeper rows have higher frequency information.
After k-way (K-way projection channels, k=11)
These are the 11 eigvec responses before one-hot; after alignment, eigvec become more axis-aligned (unimodal).