Designing a hybrid system for automated scientific reviewing combining NLP models with human-in-the-loop feedback mechanisms

The rise of academic manuscript submissions poses a significant threat to the traditional paradigm of peer-review which overburdens the reviewers with a large amount of submissions and increases inequities as well as biases, at the same time requiring high-quality timely feedback. In this paper, SKY, an advanced human-AI hybrid system will be introduced, which focuses on automation of the scientific review process by implementing the state-of-the-art natural language processing (NLP) algorithms and the human-in-the-loop (HITL) framework that balances the use of machine intelligence and human knowledge. Large language models (LLMs) like Mistral-7B and Qwen2.5-7B, which are fine-tuned using QLoRA, are used in the architectural design to solve domain-specific assessments, thus improving epistemological accuracy. SKY as an orchestrator consists of four functionally independent modules of workers, namely SKY-ORI (Originality and Impact), SKY-MDR (Methodology and Rigor), SKY-PRC (Presentation and Clarity), and SKY-RQM (Review Quality and Meta-Review), which provide systematic evaluations on a 0– 5 scale, with confidence ratings and rationales. A confidence-based routing process controls routing decisions: outputs with confidence score of ≥ 0.85 get automatically forwarded, the scales between 0.60 and 0.85 get into the active-learning reviewing system, and the scores with value < 0.60 are evaluated by human reviewers. Empirical appraisal of the PeerRead and ACL-OCL datasets has shown a total accuracy of 82.3%, a Cohen’s κ of 0.56, which is larger (above the 0.43 agreement level) than that obtained between human reviewers in the literature and is a 42% reduction of the time required to conduct the review.Lastly, the HITL framework bridges natural weaknesses of LLMs, such as hallucinations, limited visual content processing, and lack of methodological critique, by offering human moderation of AI-generated suggestions coupled with the system refinement through expert critique.

Keywords

Aspect score prediction, Automated essay scoring, Automated scientific reviewing, Large Language Models, Text summarization

LC Subject Headings

Natural language generation (Computer science)., Text processing (Computer science)., Educational tests and measurements--Data processing.

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (pages 67-69).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2026.

Department

Department of Computer Science and Engineering

Type

Thesis

Collections

Thesis (Bachelor of Science in Computer Science)

Designing a hybrid system for automated scientific reviewing combining NLP models with human-in-the-loop feedback mechanisms

Files

Date

Publisher

Authors

URI

Citation

Abstract