Enhanced and optimised indoor object detection using YOLO Models

Citation

Fahad, Nafiz (2025) Enhanced and optimised indoor object detection using YOLO Models. Masters thesis, Multimedia University.

Full text not available from this repository.
Official URL: http://erep.mmu.edu.my/

Abstract

Object detection from digital images is a main goal of computer vision. Previous studies neglected resizing, custom resizing, rescaling, normalization, balanced augmentation, multi-level and hybrid feature extraction, feature selection advanced YOLO model, and relied on limited dataset. The first experiment uses the YOLOv5s model combined with Grad-CAM and Indoor object dataset from Kaggle, with 1,012 training, and 107 test images covering 10 classes and optimisations such as pruning and quantisation are applied. Results showed a mAP@0.5 of 0.457 overall, with “cabinet door” scoring 0.807. Grad-CAM visualisations made predictions clearer, although rare classes such as “pole” and “opened door” had lower performance. In the second experiment with the same dataset, various YOLOv8 versions were tested with an optimised YOLOv8x model, and improved preprocessing, reached a training mAP@0.50 of 0.537 and test mAP@0.50 of 0.480. A third approach used YOLOv11n on a TUT Indoor dataset of 2,213 images across seven indoor classes with preprocessing and augmentation, features were extracted with ResNet18, GradCAM++, HOG, LBP, and selected using PCA. The model reached mAP@0.50 values of 0.948 (training), 0.921 (validation), and 0.896 (testing). A fourth experiment involved a primary dataset of 12,747 images with seven object classes captured using mobile phones. Images were resized, annotated using Grounding DINO, augmented, and re-split before training a YOLOv12m model with good performance. Finally, a fifth experiment used the TUT indoor dataset with resizing, hybrid augmentation using Stable Diffusion, combined feature extraction, and selection. YOLOv12m trained under 3 optimisers, with Adamax showing the highest performance with a stable result at learning rate 0.000909 and confidence 0.10. This approach got the highest test mAP@0.5 (0.987 for all classes), test mAP@0.5–0.95 0.883 and test fitness to 0.893 for every class, for class screen Precision 0.992 and recall 0.979, fire-extinguisher precision 0.985 and recall0.992, clock precision 0.974 and recall 0.992. These results confirm accurate indoor object detection.

Item Type: Thesis (Masters)
Additional Information: Call No.: TA1634 .N34 2025
Uncontrolled Keywords: Computer vision
Subjects: T Technology > TA Engineering (General). Civil engineering (General) > TA1501-1820 Applied optics. Photonics
Divisions: Faculty of Information Science and Technology (FIST)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 05 Feb 2026 08:59
Last Modified: 05 Feb 2026 08:59
URII: http://shdl.mmu.edu.my/id/eprint/15201

Downloads

Downloads per month over past year

View ItemEdit (login required)