Identifying the best metrics to find the best quality clusters of genes from gene expression data

Citation

Abstract

Microarray data is used to create groups of similar genes based on their phenotypic attributes. Information extracted from these groups of gene can be applied to path- way analysis, disease predictions, target identification in drug design and many other important applications and functionalities in biology. However, how to determine a distance metric to measure the similarities among genes has always been a great chal- lenge. In our work, we have studied sixteen combination of distance-linkage combina- tional metrics and tried to and the groups of similar genes based on their expression level by building phylogenetic tree. Furthermore, to validate our endings we have evaluate the output of the same trails on three different datasets. Our work suggests that, Maximum distance metric with the combination of Average linkage metrics gives the optimal quality while grouping similar genes together by building a phylogenetic tree.

LC Subject Headings

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (pages 38-40).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2019.

Publisher Link

Type

Thesis