Welcome to the newly upgraded BRAC University Institutional Repository! Following our recent system upgrade, we are actively organizing our collections. While the category counters on the homepage are currently syncing and may temporarily display low numbers, rest assured that our full repository of over 27,000 items remains safely intact. Please use the search bar above to easily access all scholarly outputs, theses, and institutional documents while we complete this categorization process.

Generalized bridge pipeline for multimodal gene regulatory network discovery

Citation

Abstract

In this thesis, a computationally generalized pipeline where multimodal gene regulatory networks (GRNs) are built by combining transcriptomic data in RNA sequencing and functional dependency data in CRISPR knockout screens. Traditional forms of GRN rely on expression data as the only tool which can not detect causal or functional significant interactions. We attempted to solve this by constructing a contrastive bridge model, where both datasets are put in the same 128-dimensional latent space. We used Maximum Mean Discrepancy (MMD) loss and a diversitypreserving loss such that patterns of modality are aligned, and meaningful biological variation is not distorted. Using these embeddings, we built multimodal GRNs, combining evidence as provided by various outlets. In order to identify statistical and functional relationships, we demonstrated Spearman co-expression correlations, GENIE3 random forest importance scores, CRISPR dependency support, and cosine similarity of embedding vectors into a single edge-weight expression. The bridgefused networks have been steady in structure and introduced new cross-modal interactions (Bridge-fused vs GENIE3) when used in both hematopoietic and lung cell data. Top hub genes in these networks scored negative on the mean CRISPR dependency score which is an indication of important functional roles and Gene Ontology enrichment analysis scored significant representation of the immune activation and metabolic processes. The implications of these findings are that the bridge pipeline offers biologically meaningful, consistent and interpretable GRNs. Overall, this framework is a generalizable and data-driven framework to integrate heterogeneous genomic datasets, which can be applied in the process of identifying significant regulators and potential therapeutic targets in a broad variety of biological settings.

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (pages 47-50).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2025.

Publisher Link

Type

Thesis