A comprehensive model for advanced road scene understanding: YOLO-CNN fusion for accurate road segmentation and object detection in varied conditions

Citation

Abstract

The critical component in autonomous driving is road scene understanding that includes accurate object recognition and precise road segmentation. Most existing models do not accurately represent the roads of Bangladesh as those are are mostly trained and tested on the data from the organized roads found in developed countries. This study tackles the challenging problem of road perception in the context of autonomous driving in Bangladesh by proposing a YOLO-CNN fusion model that takes into account the country’s diverse weather patterns and unstructured road layouts. To evaluate model performance, we tried out various object detection architectures, such as YOLOv8, YOLOv9, YOLOv10, and YOLOv11, and for road segmentation, we tested U-Net and ResU-Net segmentation models as well. We gathered a dataset of 2,495 images of roads from different locations and environments including highways, villages, foggy and rainy situations, as well as nighttime. The results of the experiments confirmed that the Medium model of YOLOv11 was the most accurate of all versions of YOLO at object detection with an accuracy of 79.3%. Likewise, U-Net outperformed ResU-Net in terms of accuracy and the IoU score of 80.69% for road areas, indicating a closer match to actual road areas. The best-performing models of object detection and segmentation are then combined to create a comprehensive road scene understanding system. The findings show that the combination of YOLOv11 and U-Net Fusion Model enhances object detection segmentation in road environments greatly, which makes it relevant for self-driving car applications. The system was additionally implemented as a web based prototype so that users could upload images and see the results of detection and segmentation visually. Work on these objectives will emphasize improving the economization of processing power for use with low-resource devices, including depth perception with LiDAR, and widening the dataset for better performance across different driving environments.

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (pages 51-53).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2025.

Publisher Link

Type

Thesis