Early and late fusion ensemble methods for predicting bug severity from bug report

Citation

Abstract

Accurate prediction of software bug severity is essential for optimizing resource allocation, enhancing bug triaging, and improving project management within the software development lifecycle. This study introduces a robust methodology for predicting bug severity by leveraging textual data from bug reports, employing advanced natural language processing (NLP) techniques and machine learning models. We evaluate several approaches, including Word2Vec with XGBoost (68% accuracy, 64% precision, 68% recall), TF-IDF with Logistic Regression/SVM (77% F1 score), DistilBERT (73% accuracy, 70% F1 score), and DistilRoBERTa (76% accuracy, 73% F1 score), each demonstrating strengths in capturing semantic and contextual nuances of bug descriptions. To further improve performance, we propose a fusion-based ensemble learning framework, combining early fusion (integrating TFIDF, Word2Vec, and transformer embeddings into a unified feature vector) and late fusion (aggregating predictions from independently trained models). The hybrid Ensemble Fusion model achieves the highest performance, with an accuracy of 79% and an F1 score of 76%, excelling in generalizing across diverse bug severity, including challenging short and long durations. Our methodology encompasses rigorous data preprocessing, feature engineering, and techniques to mitigate class imbalance, utilizing a comprehensive dataset of bug reports with rich textual and metadata attributes. The results underscore the efficacy of integrating diverse feature representations and model predictions, providing a scalable, robust, and actionable solution for predicting bug severity, ultimately enhancing software development efficiency and reliability.

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (pages 47-49).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2025.

Publisher Link

Type

Thesis