Zoom and Non-Biological Workflows¶

This page collects two advanced patterns used in the example notebooks.

Cluster-Specific Zoom Analysis¶

A common pattern is to zoom into a single cluster, re-run clustering and enrichment, and then plot a higher-resolution view. The key steps are:

Subset the results to a single cluster.
Rebuild annotations for the subset.
Cut the subset dendrogram at a lower depth.
Re-run enrichment with a background (parent) matrix.

Repeat the analysis at different dendrogram depths to explore depth-dependent enrichment.

def run_zoom_analysis(
    *,
    results,
    cluster_id,
    go_bp,
    linkage_threshold,
    min_cluster_size=6,
    min_overlap=2,
    qval_cutoff=0.05,
):
    """Recluster a single cluster, re-run enrichment, and return zoomed results."""
    zoom_view = results.subset(cluster=cluster_id)
    zoom_matrix = zoom_view.matrix
    zoom_annotations = Annotations(go_bp, zoom_matrix)
    zoom_analysis = (
        Analysis(zoom_matrix, zoom_annotations)
        .cluster(
            linkage_method="ward",
            linkage_metric="euclidean",
            linkage_threshold=linkage_threshold,
            min_cluster_size=min_cluster_size,
        )
        .enrich(min_overlap=min_overlap, background=results.matrix)
        .finalize(col_cluster=True)
    )
    zoom_results = zoom_analysis.results
    zoom_results_sig = zoom_results.filter(f"qval <= {qval_cutoff}")
    return zoom_matrix, zoom_results, zoom_results_sig

Why background=results.matrix? It keeps the enrichment universe fixed so zoomed p-values remain comparable to the full analysis.

Example¶

Choose a cluster and a tighter cut, then run the zoom:

example_cluster = int(results.clusters.unique_clusters[0])
zoom_threshold = 6

zoom_matrix, zoom_results, zoom_results_sig = run_zoom_analysis(
    results=results,
    cluster_id=example_cluster,
    go_bp=go_bp,
    linkage_threshold=zoom_threshold,
)

Then plot the zoomed matrix with the same Plotter pipeline:

plotter = (
    Plotter(zoom_results)
    .plot_dendrogram()
    .plot_matrix(cmap="RdBu_r", center=0)
    .plot_cluster_labels(rank_by="q", label_mode="top_term", label_fields=("label", "q"))
)

plotter.show()

For a cluster-level summary of the zoomed result, see Condensed Dendrogram.

Non-Biological Example (Recipes)¶

HiMaLAYAS supports biological and non-biological domains. The recipe example builds an ingredient-by-recipe matrix and annotates clusters by country of origin using a worldwide recipe dataset.

Key steps:

Clean and merge near-duplicate ingredient tokens.
Build a sparse binary matrix.
Filter low-frequency ingredients and very small recipes.
Map countries to recipe IDs and run enrichment.

country_to_recipes = {
    "India": ["r_001", "r_003"],
    "Nigeria": ["r_002"],
    "Mexico": ["r_004", "r_005"],
}

matrix = Matrix(ingredient_matrix)
annotations = Annotations(country_to_recipes, matrix)

analysis = (
    Analysis(matrix, annotations)
    .cluster(
        linkage_method="ward",
        linkage_metric="euclidean",
        linkage_threshold=7.5,
        min_cluster_size=15,
    )
    .enrich(min_overlap=2)
    .finalize(col_cluster=True)
)

results = analysis.results
results_sig = results.filter("qval <= 0.05")
cluster_labels = results_sig.cluster_labels(rank_by="q", label_mode="top_term")