PAMM: pathway-aware masked representation learning for interpretable multi-cancer prediction
Loading...
Date
Publisher
BRAC University
Citation
Abstract
In this thesis, PAMM, a new paradigm of interpretable multi-cancer prediction based
on Pathway-Aware Masked Representation Learning is introduced. To tackle the
challenge of the ‘Small n, Large p’ of transcriptomics it is our holding that we apply
the rigorous seven-stage pipeline of preprocessing (i.e. Log2 transform, ANOVA filter,
Lasso regularization and Recursive Feature Elimination) to reduce the original
high-noise 57,750 genes in Breast, Lung, GBM, and HC samples to a high-signal
feature set. The basic architecture goes beyond the usual deep learning of black
boxes by incorporating biologically relevant priors of KEGG 2021 Human library in
a self-supervised masking scheme. In contrast to stochastic masking, the pretraining
phase of PAMM uses a Pathway-Aware Masking logic where complete sets of
functional genes are zeroed, requiring the model to recreate missing biological units
and learn complicated inter-pathway relationships. The latent representations of
the model are optimized with Optuna, and the statistical robustness is verified with
twenty independent iterations, and the latent representation is further interpreted
with Single-sample Gene Set Enrichment Analysis (ssGSEA). The resulting visualizations
of mean pathway activity indicate that PAMM is able to capture different,
clinically viable biological signatures of each cancer type. PAMM provides a clear
and very precise diagnostics platform of precision oncology by filling the gap between
high-dimensional self-supervised learning and functional biology. Along with
closed-set multi-cancer, PAMM is also explicitly tailored to open-set recognition.
Through a combined study of softmax confidence and latent space distances from
class centroids, the framework can discard samples that do not adhere to any known
cancer manifold. This allows the certainty of identifying unknown or non-cancerous
gene expression patterns, which is very essential when it comes to a real-life clinical
implementation in which unobservable conditions are the norm. This two-fold feature
sets PAMM apart from the traditional classifiers and guarantees the accuracy
of the diagnosis and its safety
LC Subject Headings
Description
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 66-69).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2026.
Includes bibliographical references (pages 66-69).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2026.
Publisher Link
Type
Thesis