Designing a hybrid system for automated scientific reviewing combining NLP models with human-in-the-loop feedback mechanisms
Loading...
Date
Publisher
BRAC University
Citation
Abstract
The rise of academic manuscript submissions poses a significant threat to the traditional
paradigm of peer-review which overburdens the reviewers with a large amount
of submissions and increases inequities as well as biases, at the same time requiring
high-quality timely feedback. In this paper, SKY, an advanced human-AI hybrid
system will be introduced, which focuses on automation of the scientific review
process by implementing the state-of-the-art natural language processing (NLP) algorithms
and the human-in-the-loop (HITL) framework that balances the use of
machine intelligence and human knowledge. Large language models (LLMs) like
Mistral-7B and Qwen2.5-7B, which are fine-tuned using QLoRA, are used in the
architectural design to solve domain-specific assessments, thus improving epistemological
accuracy. SKY as an orchestrator consists of four functionally independent
modules of workers, namely SKY-ORI (Originality and Impact), SKY-MDR
(Methodology and Rigor), SKY-PRC (Presentation and Clarity), and SKY-RQM
(Review Quality and Meta-Review), which provide systematic evaluations on a 0–
5 scale, with confidence ratings and rationales. A confidence-based routing process
controls routing decisions: outputs with confidence score of ≥ 0.85 get automatically
forwarded, the scales between 0.60 and 0.85 get into the active-learning reviewing
system, and the scores with value < 0.60 are evaluated by human reviewers. Empirical
appraisal of the PeerRead and ACL-OCL datasets has shown a total accuracy of
82.3%, a Cohen’s κ of 0.56, which is larger (above the 0.43 agreement level) than
that obtained between human reviewers in the literature and is a 42% reduction of
the time required to conduct the review.Lastly, the HITL framework bridges natural
weaknesses of LLMs, such as hallucinations, limited visual content processing,
and lack of methodological critique, by offering human moderation of AI-generated
suggestions coupled with the system refinement through expert critique.
Description
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 67-69).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2026.
Includes bibliographical references (pages 67-69).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2026.
Publisher Link
Type
Thesis