Gene classification and pattern analysis using data mining and machine learning techniques

Citation

Abstract

Gene classification and pattern extraction from gene sequence data is essential in understanding different gene sequence features. The field of gene expression data analysis has grown in the past few years from being purely data-centric to integra tive, aiming at complementing microarray analysis with data and knowledge from diverse available sources. Since then, it has been used for various science fields, including the discovery of new drugs, identification of protein coded genes by ana lyzing and separating exons from the main sequence, phenotype prediction based on gene expression. The paper presents an application of gene classification from gene sequence data using data mining and machine learning techniques. Our research’s main goal is to compare different machine learning approaches based on time of execution, and overall efficiency by testing them on different microarray data sets of gene sequence and determining the best approach for gene classification. Eight different machine learning techniques have been tested on eleven different gene ex pression datasets, and the results are compared and improved using the feature selection method. Moreover, we perform pattern analysis on some gene expression datasets using a J48 decision tree outcome, after applying feature selection.

LC Subject Headings

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (page 23-25).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2021.

Publisher Link

Type

Thesis