Skip to content

Clustering and Layout

HiMaLAYAS uses hierarchical clustering to organize matrix rows and columns into contiguous regions of related observations. Clusters are defined by cutting the dendrogram at a user-defined depth (distance threshold).

Signature

Analysis.cluster(
    linkage_method: str = "ward",
    linkage_metric: str = "euclidean",
    linkage_threshold: float = 0.7,
    *,
    optimal_ordering: bool = False,
    min_cluster_size: int = 1,
) -> Analysis

Parameters

Parameter Type Default Description
linkage_method str "ward" Hierarchical linkage method. Common: ward, average, complete, single.
linkage_metric str "euclidean" Distance metric. Common: euclidean, correlation, cosine, cityblock.
linkage_threshold float 0.7 Dendrogram cut threshold (depth). Lower gives more clusters, higher gives fewer.
optimal_ordering bool False Enables optimal leaf ordering in linkage output. Often improves visual ordering, but can be slower.
min_cluster_size int 1 Merge small clusters upward until size is met. Values <= 1 disable.

Example

analysis = Analysis(matrix, annotations).cluster(
    linkage_method="ward",
    linkage_metric="euclidean",
    linkage_threshold=16,
    min_cluster_size=30,
)

After clustering, cluster assignments are attached as:

Attribute Type Description
analysis.clusters Clusters Dendrogram, per-label cluster IDs, and cluster membership mappings.

Notes

  • Any method or metric supported by SciPy linkage is valid (see the SciPy linkage docs).
  • When optimal_ordering=False, HiMaLAYAS uses fastcluster if installed and otherwise falls back to SciPy linkage.
  • min_cluster_size preserves hierarchy by merging undersized clusters into their parent cluster.
  • linkage_method, linkage_metric, and optimal_ordering are reused by Analysis.finalize(col_cluster=True) for optional column ordering.