Zoom and Non-Biological Workflows¶
This page collects two advanced patterns used in the example notebooks.
Cluster Zoom Analysis¶
A common pattern is to zoom into one cluster, re-run clustering and enrichment, and then plot a higher-resolution view. The same pattern also supports a union of multiple clusters.
- Subset the results to one cluster (
results.subset(...)) or multiple clusters (results.subset_clusters(...)). - Rebind annotations to the subset matrix.
- Cut the subset dendrogram at a lower depth.
- Re-run enrichment with a background (parent) matrix.
Repeat the analysis at different dendrogram depths to explore depth-dependent enrichment.
Subset reruns are often smaller than the full matrix. Keep the default annotation term-size floor (min_term_size=2) unless you have a specific exploratory reason to change it.
For API details on subset(...) and subset_clusters(...), see Results and Filtering.
def run_zoom_analysis(
*,
results,
cluster_id,
annotations,
linkage_threshold,
min_cluster_size=6,
min_overlap=2,
fdr_scope="global",
qval_cutoff=0.05,
):
"""Recluster a single cluster, re-run enrichment, and return zoomed results."""
zoom_view = results.subset(cluster=cluster_id)
zoom_matrix = zoom_view.matrix
zoom_annotations = annotations.rebind(zoom_matrix)
zoom_analysis = (
Analysis(zoom_matrix, zoom_annotations)
.cluster(
linkage_method="ward",
linkage_metric="euclidean",
linkage_threshold=linkage_threshold,
min_cluster_size=min_cluster_size,
)
.enrich(min_overlap=min_overlap, background=results.matrix)
.finalize(col_cluster=True, fdr_scope=fdr_scope)
)
zoom_results = zoom_analysis.results
zoom_results_sig = zoom_results.filter(f"qval <= {qval_cutoff}")
return zoom_matrix, zoom_results, zoom_results_sig
Why background=results.matrix? It keeps the enrichment universe fixed so zoomed p-values remain comparable to the full analysis.
By default, subset reruns use the subset matrix as the local enrichment universe; pass the parent matrix explicitly via background to anchor enrichment to the parent universe.
q-values may still differ between master and subset reruns because FDR correction depends on the tested hypothesis family.
Single-Cluster Example¶
Choose a cluster and a tighter cut, then run the zoom:
example_cluster = int(results.clusters.unique_clusters[0])
zoom_threshold = 6
FDR_SCOPE = "global"
zoom_matrix, zoom_results, zoom_results_sig = run_zoom_analysis(
results=results,
cluster_id=example_cluster,
annotations=annotations,
linkage_threshold=zoom_threshold,
fdr_scope=FDR_SCOPE,
)
Then plot the zoomed matrix with the same Plotter pipeline:
plotter = (
Plotter(zoom_results)
.plot_dendrogram()
.plot_matrix(cmap="RdBu_r", center=0)
.plot_cluster_labels(rank_by="q", label_mode="top_term", label_fields=("label", "q", "fe"))
)
plotter.show()
Multi-Cluster Example (Union View)¶
Use the same zoom pipeline, but subset a union of parent clusters:
cluster_ids = [2, 7, 9]
zoom_threshold = 6
FDR_SCOPE = "global"
zoom_view = results.subset_clusters(clusters=cluster_ids)
zoom_matrix = zoom_view.matrix
zoom_annotations = annotations.rebind(zoom_matrix)
zoom_analysis = (
Analysis(zoom_matrix, zoom_annotations)
.cluster(
linkage_method="ward",
linkage_metric="euclidean",
linkage_threshold=zoom_threshold,
min_cluster_size=6,
)
.enrich(min_overlap=2, background=results.matrix)
.finalize(col_cluster=True, fdr_scope=FDR_SCOPE)
)
zoom_results = zoom_analysis.results
zoom_results_sig = zoom_results.filter("qval <= 0.05")
Then plot with the same Plotter pipeline used above.
For a cluster-level summary of the zoomed result, see Condensed Dendrogram.
Non-Biological Example (Recipes)¶
HiMaLAYAS supports biological and non-biological domains. The recipe example builds an ingredient-by-recipe matrix and annotates clusters by country of origin using a worldwide recipe dataset.
Key steps:
- Clean and merge near-duplicate ingredient tokens.
- Build a sparse binary matrix.
- Filter low-frequency ingredients and very small recipes.
- Map countries to recipe IDs and run enrichment.
country_to_recipes = {
"India": ["r_001", "r_003"],
"Nigeria": ["r_002"],
"Mexico": ["r_004", "r_005"],
}
matrix = Matrix(ingredient_matrix)
annotations = Annotations(country_to_recipes, matrix)
analysis = (
Analysis(matrix, annotations)
.cluster(
linkage_method="ward",
linkage_metric="euclidean",
linkage_threshold=7.5,
min_cluster_size=15,
)
.enrich(min_overlap=2)
.finalize(col_cluster=True, fdr_scope="global")
)
results = analysis.results
results_sig = results.filter("qval <= 0.05")
cluster_labels = results_sig.cluster_labels(rank_by="q", label_mode="top_term")