Automated species identification in camera trap images for wildlife conservation
Loading...
Date
Publisher
BRAC University
Citation
Abstract
Wildlife conservation involves protecting, preserving, and managing wildlife species
and their habitats. With today’s rapid pace of human development, climate change,
and other unsustainable practices, the need for wildlife conservation has heightened.
Despite significant progress in species identification using deep-learning models, significant challenges still remain in effectively detecting small animals in low-contrast
trap images due to limited feature extraction capabilities. This thesis presents a
novel end-to-end framework integrating a shifted window based local self-attention
mechanism along with enhanced feature fusion in a object detection head and incorporating multimodal large language model to address these limitations. The proposed architecture involves a Swin-BiFPN backbone integrated in a Faster RCNN
detection network, coupled with a visual semantic extraction module driven by the
LLaVA v1.5 (13B) multimodal large language model. The detection framework,
capable of extracting crucial features in challenging trap images, demonstrates consistently high results and robust generalization capabilities. Furthermore, the visual
semantic extraction module provides zero-shot detection capability, as well as providing valuable insights and emergent cues of the animal’s behavior, further supporting
the conservation effort. The MLLM evaluation was conducted using both traditional
NLP metrics (precision, recall, F1, and SBERT similarity) and subjective scoring by
LLM-based judges (GPT-4.1 and GROK 3.0), across five MLLMs, demonstrating the
model’s strong performance in visual description generation. The proposed framework improves detection accuracy across low-contrast trap images and small animals
while also demonstrating zero-shot detection capability leveraging the MLLM.
LC Subject Headings
Description
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 51-53).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2025.
Includes bibliographical references (pages 51-53).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2025.
Publisher Link
Type
Thesis