Early and late fusion ensemble methods for predicting bug severity from bug report
Loading...
Date
Publisher
BRAC University
Citation
Abstract
Accurate prediction of software bug severity is essential for optimizing resource allocation,
enhancing bug triaging, and improving project management within the
software development lifecycle. This study introduces a robust methodology for
predicting bug severity by leveraging textual data from bug reports, employing advanced
natural language processing (NLP) techniques and machine learning models.
We evaluate several approaches, including Word2Vec with XGBoost (68% accuracy,
64% precision, 68% recall), TF-IDF with Logistic Regression/SVM (77% F1 score),
DistilBERT (73% accuracy, 70% F1 score), and DistilRoBERTa (76% accuracy,
73% F1 score), each demonstrating strengths in capturing semantic and contextual
nuances of bug descriptions. To further improve performance, we propose a
fusion-based ensemble learning framework, combining early fusion (integrating TFIDF,
Word2Vec, and transformer embeddings into a unified feature vector) and late
fusion (aggregating predictions from independently trained models). The hybrid
Ensemble Fusion model achieves the highest performance, with an accuracy of 79%
and an F1 score of 76%, excelling in generalizing across diverse bug severity, including
challenging short and long durations. Our methodology encompasses rigorous
data preprocessing, feature engineering, and techniques to mitigate class imbalance,
utilizing a comprehensive dataset of bug reports with rich textual and metadata
attributes. The results underscore the efficacy of integrating diverse feature representations
and model predictions, providing a scalable, robust, and actionable
solution for predicting bug severity, ultimately enhancing software development efficiency
and reliability.
Description
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 47-49).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2025.
Includes bibliographical references (pages 47-49).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2025.
Publisher Link
Type
Thesis