A Study on Comparative Analysis of Object Detection-based Real-Time Wildfire and Class B Fire Detection Performance

Article information

Int J Fire Sci Eng. 2024;38(2):1-11
Publication date (electronic) : 2024 June 30
doi : https://doi.org/10.7731/KIFSE.12dac838
1Department of Fire and Disaster Prevention Engineering, Semyung University, Choongbuk 27136, Republic of Korea
2Department of Fire and Disaster Prevention, Semyung University, Choongbuk 27136, Republic of Korea
Corresponding Author, TEL: +82-43-649-1695, FAX: +82-43-649-1787, E-Mail: hyjung@semyung.ac.kr
Received 2024 April 17; Revised 2024 April 30; Accepted 2024 May 1.

Abstract

Wildfires cause human casualties, property damage, and significant damage to ecosystems, so prompt fire detection and response are essential. In this study, training and evaluation experiments were performed on the YOLOv3, Faster R-CNN, and Cascade R-CNN models to analyze their real-time wildfire detection performance. The performance of each model was evaluated using precision, recall, mAP 50, mAP 50-95, the model parameter, and frames per second (FPS). The experiment results showed that YOLOv3 had slightly lower overall performance than other models, but the difference was not significant in terms of mAP 50. YOLOv3 had the highest inference speed (48.7 FPS) and an overall mAP 50 of 0.832. On the contrary, Faster R-CNN's overall mAP 50 was 0.835, while Cascade R-CNN showed higher performance with 0.840. Compared to YOLOv3, both models showed relatively low inference speeds (36.2 and 26.3 FPS, respectively). Therefore, the YOLOv3 model is deemed appropriate for real-time fire detection owing to its high inference speed and insignificant wildfire detection performance difference in terms of mAP 50. Future wildfire detection models are expected to improve by optimizing the network structures (e.g., FPN and Head) of 1-Stage models, such as YOLOv3.

1. Introduction

Wildfires cause serious human casualties and property damage, as well as massive ecosystem impacts. Recent climate change has increased the frequency of dry weather, raising the risk of wildfires [1]. Therefore, there is an urgent need to develop systems for rapidly detecting and responding to wildfires [2]. Recently, advances in deep learning technology have led to active prevention and response to disasters, such as fires [3], earthquakes [4], and industrial accidents [5], as well as research on real-time wildfire detection using deep learning-based object detection technology that has attracted attention [6]. This technology can be effectively applied to wildfire detection because it can detect fires by identifying flames and smoke objects in images and estimating their locations [7].

Previous studies on wildfire detection using deep learning-based object detection technology proposed various methods to improve wildfire detection performance. Object detection models are primarily divided into 1-Stage (e.g., RetinaNet and you only look once (YOLO)) [8,9], 2-Stage (e.g., Fast R-CNN and Faster R-CNN) [10], and Multi-Stage (e.g., Cascade R-CNN) methods [11]. Each method has advantages and disadvantages. First, the 1-Stage method is suitable for rapid detection owing to its small number of parameters and the high inference speed, but its detection accuracy is low. Despite its large number of parameters and the low inference speed, the 2-Stage method has higher detection accuracy than the 1-Stage method. The Multi-Stage method has been primarily used in areas requiring precise detection owing to its superior detection accuracy despite having an even larger number of parameters to improve detection accuracy and the low inference speed. Several studies have been conducted to improve wildfire detection performance using these object detection models.

First, Zhao et al. [12] improved detection performance for small fires by developing Fire-YOLO that combined Efficient Net and YOLOv3. This method exhibited more accurate fire detection performance and a higher inference speed than conventional YOLOv3 and Faster R-CNN, and improved small fire detection performance as well. On the contrary, Shen et al. [13] configured deformable vision transformer in backbone, and proposed a new Fire-ViT model based on the YOLOv8 Head. The proposed model had a lighter structure than conventional wildfire detection models and was evaluated as an object detection-based fire detection model that was suitable for fire detection because it exhibited high accuracy. It was, however, focused on detecting fire and smoke events, and studies that conduct analysis considering clouds and fog, which can be mistaken for class B fires, are limited. In addition, there are few studies that comprehensively compare and analyze various performance indicators of basic 1-Stage, 2-Stage, and Multi-Stage models, such as accuracy and speed. Against this backdrop, the purpose of this study is to analyze wildfire detection performance, including images of clouds and fog that can be mistaken for both class B fires and wildfires, as well as to comprehensively analyze the object detection-based wildfire detection performance of the 1-Stage, 2-Stage, and Multi-Stage methods. These analyses were conducted to help minimize the damage caused by wildfires by improving the efficiency of real-time wildfire detection systems and presenting the optimal model that is actually applicable.

2. Real-Time Object Detection Model

In this study, YOLOv3, Faster R-CNN, and Cascade R-CNN were selected as object detection models from various methods for real-time wildfire detection, and their performance was compared and analyzed. Each model is a representative model of the 1-Stage, 2-Stage, and Multi-Stage methods. An attempt was made to evaluate the efficiency of fire detection using different detection methods and examine advantages and disadvantages.

2.1 YOLOv3

The YOLO model is available in several versions, with versions after YOLOv5 providing improved accuracy and speed. The latest YOLO models, however, are optimized for specific hardware or software environments. On the contrary, YOLOv3 has gained popularity owing to its high compatibility with various software and hardware environments, relative high accuracy, and a high inference speed. For this reason, YOLOv3 was selected for this study. YOLOv3, a representative object detection model of the 1-Stage method, has a high inference speed and relatively high accuracy. The YOLO series detects objects at high speed by processing the entire image with a single neural network. Figure 1 shows the structure of YOLOv3. YOLOv3 primarily consists of backbone, a feature pyramid network (FPN), and head. YOLOv3's Backbone consists of convolutional neural network (CNN)-based Darknet-53 and contains 53 convolution layers. It performs efficient feature extraction using skip connections. FPN corresponds to YOLOv3's neck, and it combines multiple layers of feature maps to effectively detect objects of various sizes. Specifically, it uses upsampling to combine the lower layer’s detailed spatial information with the upper layer’s semantic information. Based on this, it can detect objects of various sizes ranging from small to large. Finally, head predicts the class and location of objects using the feature maps combined in FPN. The prediction results for objects of various sizes consist of bounding box coordinates, object presence probability, and class probability. This allows for the rapid and accurate detection of objects of varying sizes [14,15].

Figure 1.

YOLOv3 architecture.

2.2 Faster R-CNN

Faster R-CNN, a 2-Stage object detection model, has high detection accuracy for detecting various objects. This model also includes a region proposal network (RPN) to detect objects based on the proposed region. Figure 2 shows the structure of Faster R-CNN, which consists of backbone, RPN, ROI pooling, and head.

Figure 2.

Faster R-CNN architecture.

Faster R-CNN's Backbone uses ResNet to extract features from input images and converts them to feature maps. RPN then proposes the object region based on the feature maps extracted by Backbone. RPN consists of small networks that output the object presence probability and bounding box prediction values at each location using the sliding window method. ROI Pooling converts the locations of objects in various sizes proposed by RPN into feature maps of a fixed size, resulting in a consistent input size. This method is similar to Max Pooling. Each region is divided into a constant size, and the maximum value is extracted from each division. Finally, Head predicts the class and bounding box of objects using fully connected layers (FC Layers) with feature maps of a fixed size converted through ROI Pooling as inputs. This method allows Faster R-CNN to have high accuracy. It is particularly effective in detecting complex or small objects. It may, however, have limitations in real-time wildfire detection because the complex network structure causes it to be slower than the 1-Stage method due to [16].

2.3 Cascade R-CNN

Cascade R-CNN, a representative model of the Multi-Stage method, consists of several stages of networks that provide high detection accuracy. Each stage uses the Faster R-CNN structure, which performs more precise detection by receiving the previous stage’s output. Figure 3 shows the structure of Cascade R-CNN. Cascade R-CNN detects objects more precisely using multiple stages of detectors based on the feature maps extracted from Backbone. Each detector in this model has three stages based on the Faster R-CNN structure for more precise object detection. The first stage detects initial object candidates, followed by the second and third stages for more accurate object detection. Finally, this allows for high-accuracy object detection. The main advantage of Cascade R-CNN is its high detection accuracy, which is particularly effective for object detection in complex scenes. The network of each stage continuously performs more precise detection, allowing for improved detection performance of small or complex objects. However, using detectors in multiple stages increases computation cost while decreasing speed [17].

Figure 3.

Cascade R-CNN architecture.

In this study, model training and validation experiments were performed based on wildfire data using YOLOv3, Faster R-CNN, and Cascade R-CNN to improve the efficiency of wildfire detection systems and present the optimal model.

3. Experiment

3.1 WildFire detection model development process

Figure 4 shows the wildfire detection model development process for this study. The details are as follows:

Figure 4.

WildFire detection model development process.

• A development environment was constructed based on Python and PyTorch to develop wildfire detection models.

• Image data containing various wildfire events (flames, fire smoke, chimney smoke, clouds, and fog) were collected, and data preprocessing and labeling were performed.

• Training and validation experiments were performed for wildfire detection models based on YOLOv3, Faster R-CNN, and Cascade R-CNN.

• Model evaluation and inference were performed using a test dataset to analyze wildfire and class B fire detection capabilities. Based on the analysis results, a model favorable for wildfire detection was proposed.

3.2 Experimental setup

Model training and validation were performed in the hardware and software environments listed in Table 1 using the hyperparameter settings provided. Model training and validation experiments were performed using the GPU Nvidia V100 in the CPU 36 core 2.1 GHz and RAM 96 GB computer environment. In this computer and deep learning software environment, object detection model training was performed using CUDA 11.6, CUDNN 8.2.0, Python 3.8.0, and PyTorch 1.11.0. In terms of hyperparameter settings, the optimizer was set to AdamW, the learning rate to .01, and the epoch to 100 to compare each model’s performance. The batch size was set to 20 to ensure sufficient GPU training in the hardware. Under the same optimization conditions, the models’ optimal performance was derived, and the performance of various object detection models was compared and analyzed.

Hyperparameter and Hardware Settings of Network Training

3.3 Data collection

Data for this study were collected using the "large-scale artificial intelligence database for local safety disaster (wildfire) prevention" provided by AI-hub (https://www.aihub.or.kr/). The collected data were composed of five classes, including the fine flame and smoke images, large-scale flame and smoke images, and cloud, fog, and chimney smoke images captured at high altitudes. A total of 15,000 images were secured, with 3,000 images for each class. Among them, 10,500 images (70%) were used as training data, 3,000 images (20%) as validation data, and 1,500 images (10%) as test data. Figure 5 shows examples of such wildfire images.

Figure 5.

WildFire detection model development process.

3.4 Model training performance evaluation indicator

The model’s performance was evaluated using the 'Confusion Matrix for Classifier' listed in Table 2. The matrix shows the relationship between the actual class and the class predicted by the model. True positive (TP) is the case in which actual "true" is accurately predicted as "true" by the model, and false negative (FN) is the case in which actual "true" is incorrectly predicted as "false" by the model. False positive (FP) is the case in which actual "false" is incorrectly predicted as "true" by the model, and true negative (TN) is the case in which actual "false" is accurately predicted as "false" by the model. Eqs. (1), (2), (3), and (4) are the calculation formulas for precision, recall, AP, and mAP, which are calculated using this matrix.

Confusion Matrix for Classifier

(1) Precision=TPTP+EP
(2) Recall=TPTP+FN
(3) AP=01prdr
(4) mAP=1Ci=1CAPi

In Eq. (3), p(r) represents the maximum precision for a given recall. C in Eq. (4) represents the total number of classes.

These calculation formulas were used to assess and analyze the overall performance of the object detection-based wildfire detection models proposed in this study.

4. Experiment Results

4.1 Model training and validation experiment results

Figure 6 shows the training and validation results for the object detection-based wildfire detection models. The performance of the Cascade R-CNN, Faster R-CNN, and YOLOv3 models varies with the number of epochs. In the mAP 50-95 results, Cascade R-CNN and Faster R-CNN still maintained high performance, while YOLOv3 showed relatively low performance. This indicates that YOLOv3’s precise detection performance can be degraded at high IoU thresholds. Cascade R-CNN had the highest accuracy, followed by Faster R-CNN and YOLOv3, but YOLOv3’s performance converged similarly to that of the other models as the number of epochs increased in the mAP 50 results. This indicates that YOLOv3 is also suitable for wildfire detection. To evaluate actual wildfire detection performance based on these results, a new test dataset was used to evaluate and infer wildfire detection models.

Figure 6.

Object detection-based wildfire detection model validation results.

4.2 Model evaluation and inference results

Table 3 lists the test dataset-based performance evaluation results for wildfire detection models. Precision, recall, mAP 50, mAP 50-95, the model parameter (MB), and frames per second (FPS) were used as performance indicators for each model (YOLOv3, Faster R-CNN, and Cascade R-CNN). Overall, the YOLOv3 model showed lower performance than Faster R-CNN and Cascade R-CNN in terms of precision, recall, and mAP 50-95. However, it exhibited relatively high performance in terms of mAP 50 with an average of 0.831 across all classes, and the performance difference was not significant when compared to the Faster R-CNN and Cascade R-CNN models. The YOLOv3 model also demonstrated a very high inference speed, with a model parameter of 61.9 MB and 48.7 FPS. This is a significant advantage for real-time fire detection systems. YOLOv3 is ideal for rapid fire detection owing to its lightweight model structure and high processing speed. Overall, the Faster R-CNN and Cascade R-CNN models performed better, but their inference speed was slower than that of YOLOv3. In particular, Cascade R-CNN performed the best, but had the lowest inference speed (26.3 FPS). The detection performance of the YOLOv3 model may be low for small flame and smoke events, but there is no significant difference in wildfire detection performance in terms of mAP 50, and it has a relatively high inference speed of more than 40 FPS, making it suitable for real-time cameras such as CCTV. Therefore, it is believed that YOLOv3 is effective at detecting most fires.

Wildfire Fire Detection Model Evaluation Results

Figure 7 shows the wildfire detection image inference results for the YOLOv3, Faster R-CNN, and Cascade R-CNN models. Overall, the YOLOv3 model showed lower performance than the other models, but it correctly detected the majority of fire events. The mAP values of Faster R-CNN and Cascade R-CNN were generally higher than those of YOLOv3 for the same images, but YOLOv3 detected all fire events with low mAP performance rather than false detection. This indicates that there is no significant difference in mAP 50 between Faster R-CNN and Cascade R-CNN. In addition, Faster R-CNN and Cascade R-CNN could not detect small flame objects in flame events, whereas YOLOv3 detected them with mAP 0.87. YOLOv3 generally exhibited higher flame detection performance even though its performance can be degraded for the detection of fluid objects, such as fire smoke, chimney smoke, fog, and clouds. Therefore, it is believed that YOLOv3 can sufficiently detect general smoke objects with fluid characteristics.

Figure 7.

Wildfire detection model inference results.

In conclusion, while the YOLOv3 model may have slightly lower overall performance, there was no significant difference in wildfire detection performance in terms of mAP 50. YOLOv3 outperformed Faster R-CNN and Cascade R-CNN for clear fire events, such as the flame class, and it is thought to be suitable for real-time fire detection owing to its very high inference speed. Therefore, when developing wildfire detection models in the future, it is critical to improve model performance by optimizing the network structure of 1-Stage models, such as YOLOv3, and hyperparameters. This applies to wildfire detection systems for rapid wildfire detection.

5. Conclusions

This study compared the performance of various object detection models for real-time wildfire detection. To this end, experiments were performed by selecting YOLOv3 (a 1-Stage model), Faster R-CNN (a 2-Stage model), and Cascade R-CNN (a Multi-Stage model). The performance of each model was evaluated using various indicators, such as precision, recall, mAP 50, mAP 50-95, and FPS. In the experiment results, the YOLOv3 model showed slightly lower overall performance than Faster R-CNN and Cascade R-CNN. YOLOv3’s mAP 50 was found to be 83.2%, slightly lower than Faster R-CNN’s 83.5% and Cascade R-CNN’s 84.0%. However, YOLOv3, exhibited high detection performance for clear flame events, and its overall mAP50 performance showed no significant difference. On the contrary, the YOLOv3’s detection performance was slightly reduced when detecting fluid objects such as smoke, chimney smoke, and mist. However, the YOLOv3 model, showed a much higher speed (48.7 FPS) than Faster R-CNN (36.2 FPS) and Cascade R-CNN (26.3 FPS). This serves as a significant advantage for real-time fire detection systems and is particularly useful in situations requiring rapid response. Therefore, the YOLOv3 model is deemed suitable for real-time fire detection owing to its very high inference speed and insignificant difference in wildfire detection performance in terms of mAP 50, although its overall performance may be somewhat low. If research is conducted to improve model performance by optimizing the network structure of 1-Stage models, such as YOLOv3, and hyperparameters when wildfire detection models are developed in the future, they will be applicable to systems for high wildfire detection accuracy and rapid wildfire detection.

Notes

Author Contributions

The following statements should be used "Conceptualization, S.C. and H.G.; methodology, J.C.; software, S.C.; validation, S.C. and H.J.; formal analysis, S.C.; investigation, H.G.; resources, H.G.; data curation, J.C.; writing—original draft preparation, S.C.; writing—review and editing, S.C.; visualization, H.J.; supervision, H.J.; project administration, H.J.; funding acquisition, H.J. All authors have read and agreed to the published version of the manuscript."

Conflicts of Interest

The authors declare no conflict of interest.

Acknowledgements

This study was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF: 2021R1G1A1014385)

References

1. Gould C. F., Heft-Neal S., Johnson M., Aguilera J., Burke M., Nadeau K.. Health Effects of Wildfire smoke Exposure. Annual Review of Medicine 75:277–292. 2024;https://doi.org/10.1146/annurev-med-052422-020909.
2. Lee J. H., Jeong K. S., Jung H. Y.. Development of a Forest Fire Detection System Using a Drone-based Convolutional Neural Network Model. International Journal of Fire Science and Engineering 37(2):30–40. 2023;https://doi.org/10.7731/KIFSE.26686d3f.
3. Kwon H. J., Lee B. H., Jung H. Y.. Research on Improving the Performance of YOLO-Based Object Detection Models for Smoke and Flames from Different Materials. Journal of the Korean Institute of Electrical and Electronic Material Engineers 37(3):261–273. 2024;https://doi.org/10.4313/JKEM.2024.37.3.4.
4. Choi S. G., Lee B. H., Kim J. K., Jung H. Y.. Deep-Learning-Based Seismic-Signal P-Wave First-Arrival Picking Detection Using Spectrogram Images. Electronics 13(1)2024;https://doi.org/10.3390/electronics13010229.
5. Jung H. Y., Choi S. G., Lee B. H.. Rotor Fault Diagnosis Method Using CNN-Based Transfer Learning with 2D Sound Spectrogram Analysis. Electronics 12(3)2023;https://doi.org/10.3390/electronics12030480.
6. Maillard S., Khan M. S., Cramer A., Sancar E. K.. Wildfire and Smoke Detection Using YOLO-NAS. 2024 IEEE 3rd International Conference on Computing and Machine Intelligence, IEEE 2024;https://doi.org/10.1109/ICMI60790.2024.10585773.
7. Goncalves L. A. O., Ghali R., Akhloufi M. A.. YOLO-Based Models for Smoke and Wildfire Detection in Ground and Aerial Images. Fire 7(4)2024;https://doi.org/10.3390/fire7040140.
8. Redmon J., Farhadi A.. Yolov3: An incremental improvement. arXiv preprint. arXiv 1804.02767. 2018;https://doi.org/10.48550/arXiv.1804.02767.
9. Lin T . Y., Goyal P., Girshick R., He K., Dollar P.. Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision 2018;https://doi.org/10.48550/arXiv.1708.02002.
10. Ren S., He K., Girshick R., Sun J.. Faster r-cnn: Towards Real-Time Object Detection with Region Proposal Networks. NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems 2015;
11. Cai Z., Vasconcelos N.. Cascade r-cnn: Delving into High Quality Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018;https://doi.org/10.48550/arXiv.1712.00726.
12. Zhao L., Zhi L., Zhao C., Zheng W.. Fire-YOLO: A Small Target Object Detection Method for Fire Inspection. Sustainability 14(9)2022;https://doi.org/10.3390/su14094930.
13. Shen P., Sun N., Hu K., Ye X., Wang P., Xia Q., Wei C.. FireViT: An Adaptive Lightweight Backbone Network for Fire Detection. Forests 14(11)2023;https://doi.org/10.3390/f14112158.
14. Huang Y. Q., Zheng J. C., Sun S. D., Yang C. F., Liu J.. Optimized YOLOv3 Algorithm and Its Application in Traffic Flow Detections. Applied Sciences 10(9)2020;https://doi.org/10.3390/app10093079.
15. Mostofa M., Ferdous S. N., Riggan B. S., Nasrabadi N. M.. Joint-SRVDNet: Joint Super Resolution and Vehicle Detection Network. IEEE Access 8:82306–82319. 2020;https://doi.org/10.1109/ACCESS.2020.2990870.
16. Ren S., He K., Girshick R., Sun J.. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6):1137–1149. 2017;https://doi.org/10.1109/TPAMI.2016.2577031.
17. Cai Z., Vasconcelos N.. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(5):1483–1498. 2021;https://doi.org/10.1109/TPAMI.2019.2956516.

Article information Continued

Figure 1.

YOLOv3 architecture.

Figure 2.

Faster R-CNN architecture.

Figure 3.

Cascade R-CNN architecture.

Figure 4.

WildFire detection model development process.

Figure 5.

WildFire detection model development process.

Figure 6.

Object detection-based wildfire detection model validation results.

Figure 7.

Wildfire detection model inference results.

Table 1.

Hyperparameter and Hardware Settings of Network Training

Name CPU Value
Hardware CPU 36 Core 2.1 Ghz
GPU NvidiaV100
RAM 96 GB
Software CUDA 11.6
CUDNN 8.2.0
Python 3.8.0
Pytorch 1.11.0
Hyperparameter Optimizer AdamW
Epoch 100
Learning rate .01
Batch size 20

Table 2.

Confusion Matrix for Classifier

Division Predicted
True False
Actual True True positive False negative
False False positive True negative

Table 3.

Wildfire Fire Detection Model Evaluation Results

Model Class Precision Recall F1_Score mAP50 mAP50-95 Parameter (MB) FPS
YOLOv3 Smoke .271 .382 .317 .677 .180 61.9 48.7
Flame .334 .447 .383 .832 .182
Cloud .536 .631 .579 .903 .619
Mist .510 .587 .546 .915 .549
Chimney smoke .362 .467 .408 .831 .251
Total .403 .503 .447 .832 .356
Faster R-CNN Smoke .402 .464 .431 .723 .397 41.3 36.2
Flame .463 .497 .479 .725 .522
Cloud .828 .860 .843 .949 .897
Mist .829 .857 .843 .965 .907
Chimney smoke .481 .542 .509 .811 .515
Total .601 .644 .621 .835 .648
Cascade R-CNN Smoke .410 .472 .439 .716 .428 69.1 26.3
Flame .478 .515 .496 .727 .549
Cloud .857 .882 .869 .962 .903
Mist .858 .875 .867 .966 .913
Chimney smoke .516 .575 .544 .831 .537
Total .624 .664 .643 .840 .666