Results and Filtering¶

Results holds the enrichment table and attached context (matrix, clusters, layout). It is passed to Plotter for visualization.

Common Attributes¶

Attribute	Type	Description
`results.df`	`pd.DataFrame`	Enrichment table (`cluster`, `term`, `k`, `K`, `n`, `N`, `pval`, and optional `qval`).
`results.method`	`str`	Method identifier for the result object (for example, `"hypergeom"` after enrichment or `"subset"` after `results.subset(...)` / `results.subset_clusters(...)`).
`results.params`	`dict[str, Any]`	Analysis metadata attached to results (for example, `linkage_threshold` when available).
`results.matrix`	`Matrix \\| None`	Matrix attached to the result object, useful for zoom workflows and background reuse.
`results.clusters`	`Clusters \\| None`	Cluster assignments and dendrogram metadata attached to the result object.
`results.clusters.unique_clusters`	`np.ndarray`	Sorted cluster ids present in the result context (when clusters are attached).
`results.clusters.cluster_sizes`	`dict[int, int]`	Mapping from cluster id to cluster size (when clusters are attached).
`results.clusters.cluster_to_labels`	`dict[int, set[Any]]`	Mapping from cluster id to member labels (when clusters are attached).
`results.clusters.label_to_cluster`	`dict[Any, int]`	Mapping from label to cluster id (when clusters are attached).

Common Methods¶

Results.filter(expr: str, **kwargs: Any) -> Results
Results.subset(cluster: int) -> Results
Results.subset_clusters(clusters: Iterable[int]) -> Results
Results.with_qvalues(pval_col: str = "pval", qval_col: str = "qval") -> Results
Results.cluster_layout() -> ClusterLayout
Results.cluster_spans() -> list[tuple[int, int, int]]
Results.cluster_labels(
    *,
    rank_by: str = "p",
    label_mode: str = "top_term",
    max_words: int = 6,
) -> pd.DataFrame

Method	Description
`results.filter(...)`	Returns a new `Results` filtered by a query expression on `results.df`.
`results.subset(...)`	Returns a single-cluster view for zoom workflows (with subset matrix attached).
`results.subset_clusters(...)`	Returns a multi-cluster view for zoom workflows by taking the union of selected cluster labels.
`results.with_qvalues(...)`	Returns a new `Results` with BH-FDR q-values added to `results.df`.
`results.cluster_layout()`	Returns the attached plotting layout (required by `Plotter`).
`results.cluster_spans()`	Returns contiguous cluster spans in dendrogram order.
`results.cluster_labels(...)`	Builds one label per cluster for inspection or export.

`filter`¶

Results.filter(expr: str, **kwargs: Any) -> Results

Returns a new Results filtered by a query expression on results.df.

Parameter	Type	Default	Description
`expr`	`str`	required	`pandas.DataFrame.query` expression applied to `results.df`.
`**kwargs`	`Any`	`{}`	Additional keyword arguments forwarded to `DataFrame.query`.

`subset`¶

Results.subset(cluster: int) -> Results

Returns a single-cluster view for zoom workflows (with subset matrix attached).

Parameter	Type	Default	Description
`cluster`	`int`	required	Cluster id to subset. Returns a new `Results` view with a subset matrix attached.

`subset_clusters`¶

Results.subset_clusters(clusters: Iterable[int]) -> Results

Returns a multi-cluster view for zoom workflows by taking the union of selected cluster labels.

Parameter	Type	Default	Description
`clusters`	`Iterable[int]`	required	Cluster ids to subset. Returns one combined `Results` view with a subset matrix attached.

`with_qvalues`¶

Results.with_qvalues(pval_col: str = "pval", qval_col: str = "qval") -> Results

Returns a new Results with BH-FDR q-values added to results.df.

Parameter	Type	Default	Description
`pval_col`	`str`	`"pval"`	Source p-value column used for BH-FDR correction.
`qval_col`	`str`	`"qval"`	Output q-value column name.

`cluster_layout` and `cluster_spans`¶

Results.cluster_layout() -> ClusterLayout
Results.cluster_spans() -> list[tuple[int, int, int]]

results.cluster_layout() returns the attached plotting layout (required by Plotter). results.cluster_spans() returns contiguous cluster spans in dendrogram order.

`cluster_labels`¶

Results.cluster_labels(
    *,
    rank_by: str = "p",
    label_mode: str = "top_term",
    max_words: int = 6,
) -> pd.DataFrame

Builds one label per cluster for inspection or export.

Parameter	Type	Default	Description
`rank_by`	`str`	`"p"`	Ranking statistic for representative terms. Must be `"p"` or `"q"` (`ValueError` otherwise).
`label_mode`	`str`	`"top_term"`	One of `"top_term"` or `"compressed"`.
`max_words`	`int`	`6`	Maximum words for compressed labels.

Behavior details:

Uses canonical input columns term and cluster; term_name is used as an optional display fallback.
Returns one row per cluster with columns ["cluster", "label", "pval", "qval", "score", "n", "term"].
score is the statistic selected by rank_by (pval for "p", qval for "q").
Both label_mode="top_term" and label_mode="compressed" require the selected score column (pval for rank_by="p", qval for rank_by="q").
In label_mode="compressed", HiMaLAYAS uses NLTK normalization when available and falls back to regex tokenization otherwise.

Results.cluster_labels(...) is an optional post hoc utility for inspection, export, or external workflows. You do not need to pass its output into Plotter.plot_cluster_labels(...) or plot_dendrogram_condensed(...); both generate labels internally from the attached Results.

Examples¶

Filter to significant annotation terms:

results_sig = results.filter("qval <= 0.05")

Subset to a single cluster (for zoom analysis):

zoom_view = results.subset(cluster=7)
zoom_matrix = zoom_view.matrix

Subset to multiple clusters (union view):

zoom_view_multi = results.subset_clusters(clusters=[2, 7, 9])
zoom_matrix_multi = zoom_view_multi.matrix

Build optional cluster labels for inspection or export:

cluster_labels = results.cluster_labels(rank_by="q", label_mode="top_term")
compressed_labels = results.cluster_labels(rank_by="p", label_mode="compressed", max_words=5)

display(cluster_labels[["cluster", "label", "pval", "qval", "score", "n", "term"]].head())

Inspect cluster membership and sizes:

display(results.clusters.cluster_sizes)
example_cluster = int(sorted(results.clusters.cluster_sizes)[0])
display(sorted(results.clusters.cluster_to_labels[example_cluster])[:10])
example_label = sorted(results.matrix.labels)[0]
display(results.clusters.label_to_cluster[example_label])

Key Columns in `results.df`¶

cluster: Cluster id.
term: Term id.
k: Overlap between cluster and term.
K: Term size in background.
n: Cluster size.
N: Background size.
pval: Hypergeometric p-value.
qval: Adjusted p-value used for significance filtering (if present).

Results and Filtering¶

Common Attributes¶

Common Methods¶

filter¶

subset¶

subset_clusters¶

with_qvalues¶

cluster_layout and cluster_spans¶

cluster_labels¶