A comprehensive model for advanced road scene understanding: YOLO-CNN fusion for accurate road segmentation and object detection in varied conditions
Loading...
Date
Publisher
BRAC University
Citation
Abstract
The critical component in autonomous driving is road scene understanding that
includes accurate object recognition and precise road segmentation. Most existing
models do not accurately represent the roads of Bangladesh as those are are mostly
trained and tested on the data from the organized roads found in developed countries.
This study tackles the challenging problem of road perception in the context of
autonomous driving in Bangladesh by proposing a YOLO-CNN fusion model that
takes into account the country’s diverse weather patterns and unstructured road
layouts. To evaluate model performance, we tried out various object detection architectures,
such as YOLOv8, YOLOv9, YOLOv10, and YOLOv11, and for road
segmentation, we tested U-Net and ResU-Net segmentation models as well. We
gathered a dataset of 2,495 images of roads from different locations and environments
including highways, villages, foggy and rainy situations, as well as nighttime.
The results of the experiments confirmed that the Medium model of YOLOv11 was
the most accurate of all versions of YOLO at object detection with an accuracy of
79.3%. Likewise, U-Net outperformed ResU-Net in terms of accuracy and the IoU
score of 80.69% for road areas, indicating a closer match to actual road areas. The
best-performing models of object detection and segmentation are then combined to
create a comprehensive road scene understanding system. The findings show that the
combination of YOLOv11 and U-Net Fusion Model enhances object detection segmentation
in road environments greatly, which makes it relevant for self-driving car
applications. The system was additionally implemented as a web based prototype
so that users could upload images and see the results of detection and segmentation
visually. Work on these objectives will emphasize improving the economization of
processing power for use with low-resource devices, including depth perception with
LiDAR, and widening the dataset for better performance across different driving
environments.
Keywords
LC Subject Headings
Description
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 51-53).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2025.
Includes bibliographical references (pages 51-53).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2025.
Publisher Link
Type
Thesis