Welcome to the newly upgraded BRAC University Institutional Repository! Following our recent system upgrade, we are actively organizing our collections. While the category counters on the homepage are currently syncing and may temporarily display low numbers, rest assured that our full repository of over 27,000 items remains safely intact. Please use the search bar above to easily access all scholarly outputs, theses, and institutional documents while we complete this categorization process.

ProteoKnight: phage virion protein classification with CNN and uncertainty quantification

Citation

Abstract

microbial ecosystems. This has led to their increased utilization in several research areas, such as bacterial genome engineering, phage therapy, disease diagnostics, and viral host identification. The structure of phages is made up of proteins called phage virion proteins (PVP). Classifying these proteins is important for genomic research, which in turn helps us understand the complex interactions between phages and their hosts in the context of making antibacterial drugs. Replacing the tedious traditional procedures, a growing number of computational strategies are being employed to annotate phage protein sequences acquired using high-throughput sequencing. Among these techniques, deep learning approaches demonstrate improved performance in classification outcomes. Such procedures require special sequence encodings for the model to perceive the protein sequences with their distinctive features. Numerous ways have been examined and assessed, while novel methods continue to emerge in order to optimize the task in terms of resource utilization and prediction accuracy. The objective of our work, ProteoKnight, is to explore and develop a unique encoding technique for phage proteins and demonstrate its effectiveness via classification. In our work, we make use of the time-separated PVP dataset that [47] introduced. Furthermore, this study aims to address the lack of research conducted on uncertainty analysis by exploring the domain of uncertainty in binary PVP classification using Monte Carlo Dropout (MCD) method. The experimental findings demonstrate the effectiveness of our strategy for binary classification, achieving a prediction accuracy of 90.2%. However, the accuracy for multi-class classification remains suboptimal. Furthermore, our uncertainty analysis reveals that the class and sequence length show variability in prediction confidence for our suggested classification approach.

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (pages 51-54).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.

Publisher Link

Type

Thesis