Human object detection and classification system based on thermal cameras using the YOLOv11 object detection model

Authors

  • Muhammad Irsyaad Nurrahman Universitas Pertahanan Republik Indonesia, Bogor, Indonesia
  • Bagus Hendra Saputra Universitas Pertahanan Republik Indonesia, Bogor, Indonesia
  • H. A. Danang Rimbawa Universitas Pertahanan Republik Indonesia, Bogor, Indonesia

DOI:

https://doi.org/10.35335/int.jo.emod.v20i2.188

Keywords:

Computer Vision, Deep Learning, Human Detection, Object Detection, Thermal Camera, YOLOv11

Abstract

Strategic institutions such as military campuses, defense research centers, and government facilities face increasingly complex security challenges, particularly in environments with low visibility and limited manual patrol capabilities. Conventional surveillance systems often perform poorly in dark environments because they depend heavily on visible light. Therefore, this research proposes a human object detection and classification system based on thermal cameras integrated with the YOLOv11 object detection model. Thermal cameras are capable of capturing heat radiation emitted by objects, enabling effective visualization under low-light and completely dark conditions. The proposed system combines thermal imaging technology with the real-time detection capability of YOLOv11 to automatically identify and classify human objects. This research employs the Research and Development (R&D) method, including dataset collection, image annotation, data augmentation, data preprocessing, model training, and system evaluation. The dataset consists of thermal images enhanced using augmentation techniques such as cropping, rotation, brightness adjustment, and blur effects to improve model robustness. Model performance was evaluated using Accuracy, Precision, Recall, F1-Score, and Confusion Matrix analysis. Experimental results demonstrate that the proposed system achieved an average accuracy of 86.36%, with accuracy values of 85.98% under completely dark conditions and 86.75% under dim-light conditions, indicating that the model is capable of reliably detecting and classifying human objects in low-visibility environments. These findings show that the integration of thermal cameras and YOLOv11 can contribute to the development of intelligent security systems that improve surveillance efficiency while reducing dependence on manual monitoring.

References

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779–788.

Tan, M., & Le, Q. (2020). EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning, 6105–6114.

Bochkovskiy, A., Wang, C.-Y., & Liao, H.-Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.

Jocher, G., et al. (2023). YOLO by Ultralytics. Retrieved from https://ultralytics.com

Wang, C.-Y., Bochkovskiy, A., & Liao, H.-Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7464–7475.

Carion, N., Massa, F., Synnaeve, G., et al. (2020). End-to-end object detection with transformers. European Conference on Computer Vision, 213–229.

Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations.

Szegedy, C., et al. (2015). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.

Lin, T.-Y., et al. (2014). Microsoft COCO: Common objects in context. European Conference on Computer Vision, 740–755.

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 886–893.

Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 511–518.

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.

Szeliski, R. (2022). Computer vision: Algorithms and applications (2nd ed.). Springer.

Gonzalez, R. C., & Woods, R. E. (2018). Digital image processing (4th ed.). Pearson.

Zhang, J., Cao, Y., & Wang, Z. (2022). Thermal infrared object detection using deep learning: A review. IEEE Access, 10, 6745–6763. https://doi.org/10.1109/ACCESS.2022.3145678

Sonka, M., Hlavac, V., & Boyle, R. (2014). Image processing, analysis, and machine vision. Cengage Learning.

Russ, J. C. (2016). The image processing handbook (7th ed.). CRC Press.

Konar, A. (2019). Artificial intelligence and soft computing: Behavioral and cognitive modeling of the human brain. CRC Press.

Russell, S., & Norvig, P. (2021). Artificial intelligence: A modern approach (4th ed.). Pearson.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR).

Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.

Raschka, S., & Mirjalili, V. (2022). Machine learning with PyTorch and Scikit-Learn. Packt Publishing.

Goodrich, M. A., & Schultz, A. C. (2007). Human–robot interaction: A survey. Foundations and Trends in Human–Computer Interaction, 1(3), 203–275.

Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 1–48.

Zoph, B., Cubuk, E. D., Ghiasi, G., Lin, T.-Y., Shlens, J., & Le, Q. V. (2020). Learning data augmentation strategies for object detection. European Conference on Computer Vision, 566–583.

Downloads

Published

2026-05-30

How to Cite

Nurrahman, M. I., Saputra, B. H., & Rimbawa, H. A. D. (2026). Human object detection and classification system based on thermal cameras using the YOLOv11 object detection model. International Journal of Enterprise Modelling, 20(2), 190–201. https://doi.org/10.35335/int.jo.emod.v20i2.188