Matrix Input¶
A Matrix wraps a numeric pandas.DataFrame and validates labels and values. This is the primary input for clustering and plotting.
Signature¶
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
df |
pd.DataFrame |
required | Numeric matrix with labeled rows. Row labels become the matrix label universe. |
axis |
str |
"rows" |
Orientation of labels. Use "rows" for standard usage. Reserved for column-labeled use cases. |
Common Attributes¶
| Attribute | Type | Description |
|---|---|---|
matrix.df |
pd.DataFrame |
Stored matrix table (defensive copy of the input DataFrame). |
matrix.values |
np.ndarray |
Numeric matrix values as a NumPy array. |
matrix.labels |
np.ndarray |
Row labels used as the matrix label universe for clustering and enrichment. |
matrix.axis |
str |
Label orientation metadata ("rows" by default). |
Example¶
import pandas as pd
from himalayas import Matrix
DF = pd.read_csv("data/gi_pcc_sampled.tsv", sep="\t", index_col=0)
matrix = Matrix(DF)
print(matrix.values.shape)
print(matrix.labels[:5])
Example (Stacked / Long Matrix Table)¶
If your matrix is stored as three columns (row_id, col_id, value), pivot it first:
import pandas as pd
from himalayas import Matrix
stacked_df = pd.DataFrame(
{
"row_id": ["GENE_A", "GENE_A", "GENE_B", "GENE_B", "GENE_C"],
"col_id": ["GENE_A", "GENE_B", "GENE_A", "GENE_C", "GENE_C"],
"value": [1.00, 0.42, 0.42, -0.30, 1.00],
}
)
matrix_df = stacked_df.pivot_table(
index="row_id",
columns="col_id",
values="value",
aggfunc="mean", # If duplicates exist for the same row/col pair, average them.
)
# Build a square matrix over the full label universe.
labels = sorted(set(matrix_df.index).union(matrix_df.columns))
DF = matrix_df.reindex(index=labels, columns=labels)
# Choose how to handle missing row/col pairs for your data.
DF = DF.fillna(0.0)
matrix = Matrix(DF)
Common Errors¶
Matrix must have at least one row and one columnif the DataFrame is empty.Matrix labels must be uniqueif the index has duplicates.Matrix values must be numericif any entries are non-numeric.
Notes¶
- Missing values should be handled before creating
Matrix. - Matrix symmetry and normalization are user-defined; HiMaLAYAS does not enforce them.