The use of the Isolation Forest model for anomaly detection in measurement data
DOI:
https://doi.org/10.30837/ITSSI.2024.27.236Keywords:
uncertainty; anomaly detection; measurement; metrology; data processing; machine learning algorithms; statistical methods.Abstract
The subject of the research is the Isolation Forest model, which is a powerful and efficient tool for detecting anomalies in measurement data and outliers, applicable in various fields where ensuring high accuracy and reliability of measurements is important. The goal of the study is to apply the Isolation Forest model to identify unusual or anomalous patterns that differ from typical patterns in the output data. This is achieved by isolating anomalous patterns from normal ones through the construction of multiple different decision trees. The task of the research is to detect outliers in data obtained during the preparation for international comparisons on the state primary standard for mass and volume flow rate of fluid, mass and volume of fluid flowing through a pipeline, by measuring with a сoriolis flowmeter. Data collected during metrological studies undergo processing by the model to detect anomalies. This model analyzes the data and identifies anomalous or outlier values that may indicate systematic or random measurement errors. It enables quick and efficient detection of even the smallest deviations in the data, helping to maintain high accuracy and reliability of measurement results. The main methods for detecting outliers in statistical analysis, which are distribution-independent, are the Grubbs' criterion, interquartile range distribution, and standard deviation. They are sensitive to sample size but are simple and understandable tools. However, the Isolation Forest model also has its limitations, particularly it can be resource-demanding for large datasets. Additionally, it is necessary to consider that using the model requires proper parameter tuning to achieve optimal results. The results of the research include assessment of the Isolation Forest model's effectiveness by comparing it with traditional outlier detection methods. Comparative analysis of the results of different approaches to the same task is an effective method for evaluating the model's performance. Conclusion. The article concludes with the perspective of further research development in this direction. The work will focus on further developing methods for detecting anomalies in measurement data and improving the accuracy and reliability of measurement results in various application fields, which can find broad applications in science and industry.
References
Список літератури
Chun S., Furuichi N. Final report of the APMP water flow supplementary comparison (APMP.M.FF-S1), Metrologia, Vol. 59, 2022. DOI: 10.1088/0026-1394/59/1A/07004
Frahm E., Arias R., Maldonado M., Vargas J., Mendoza J., Arredondo A., Silvosa M. Supplementary comparison SIM.M.FF-S9.2016 for water flow measurement, Metrologia, Vol. 61, 2024. DOI: 10.1088/0026-1394/61/1A/07001
Huovinen M., Frahm E. EURAMET.M.FF-S13 final report, Metrologia, Vol. 59, 2022. DOI: 10.1088/0026-1394/59/1A/07010.
ДСТУ-Н РМГ 43:2006 Метрологія. Застосування. Посібники з вираження невизначеності вимірювань, 2006.
Zakharov I., Serhiienko M., Chunikhina T. Measurement uncertainty evaluation by kurtosis method at calibration of a household water meter, Metrology and Metrology Assurance (MMA). P. 83–86. 2020. DOI: 10.1109/MMA49863.2020.9254260
Vallejo M., Espriella C., Gómez-Santamaría J., Ramírez-Barrera A., Delgado-Trejos E. Soft metrology based on machine learning: a review, Measurement Science and Technology, Vol. 31, No. 3. Р. 1–16. 2019. DOI:10.1088/1361-6501/ab4b39
Kebir S., Tabia K. Anomaly Detection in Real Scarce Data: A Case Study on Monitoring Elderly's Physical Activity and Sleep, IEEE 23rd International Conference on Bioinformatics and Bioengineering (BIBE), 2023, P. 385–392, DOI: 10.1109/BIBE60311.2023.00069
Yu B., Yu Y., Xu J., Xiang G., Yang Z. MAG: A Novel Approach for Effective Anomaly Detection in Spacecraft Telemetry Data, IEEE Transactions on Industrial Informatics, Vol. 20, No. 3, Р. 3891–3899. 2014. DOI: 10.1109/TII.2023.3314852
Li Z., Wang P., Wang Z., Zhan D. FlowGANAnomaly: Flow-Based Anomaly Network Intrusion Detection with Adversarial Learning, Chinese Journal of Electronics, Vol. 33, No. 1, 2022. Р. 58–71. DOI: 10.23919/cje.2022.00.173
Barbieri L., Brambilla M., Stefanutti M., Romano C., Carlo N., Roveri M. A Tiny Transformer-Based Anomaly Detection Framework for IoT Solutions, IEEE Open Journal of Signal Processing, Vol. 4, 2023. Р. 462–478. DOI: 10.1109/OJSP.2023.3333756.
Guo N., Lin C., Yan H., Zang J., Xiong M. Real-Time Pantograph Anomaly Detection Using Unsupervised Deep Learning and K-Nearest Neighbor Classification, IEEE Transactions on Instrumentation and Measurement, Vol. 73, 2024. Р. 1–13. DOI: 10.1109/TIM.2024.3370747
Occorso M., An M., Olsen R., Perry V.Anomaly Detection as a Data Reduction Approach for Test Event Analysis at the Edge, IEEE International Conference on Big Data (BigData), 2023. Р. 3863–3867, DOI: 10.1109/BigData59044.2023.10386215
Xiang H., Zhang X., Dras M., Beheshti A., Dou W., Xu X. Deep Optimal Isolation Forest with Genetic Algorithm for Anomaly Detection, IEEE International Conference on Data Mining (ICDM), 2023 P. 678–687, DOI: 10.1109/ICDM58522.2023.00077
Liu F., Ting K., Zhou Z. Isolation Forest, IEEE International Conference on Data Mining, 2008. Р. 413–422, DOI: 10.1109/ICDM.2008.17
Jurado K., Ludvigson S., Ng S. Measuring Uncertainty, American Economic Review, Vol. 105 (3). 2015. Р. 1177–1216. DOI: 10.1257/aer.20131193
References
Chun, S., Furuichi, N. (2022), "Final report of the APMP water flow supplementary comparison (APMP.M.FF-S1)" Metrologia, Vol. 59. DOI: 10.1088/0026-1394/59/1A/07004
Frahm, E., Arias, R., Maldonado, M., Vargas, J., Mendoza, J., Arredondo, A., Silvosa, M. (2024), "Supplementary comparison SIM.M.FF-S9.2016 for water flow measurement" Metrologia, Vol. 61, DOI: 10.1088/0026-1394/61/1A/07001
Huovinen, M., Frahm, E. (2022), "EURAMET.M.FF-S13 final report", Metrologia, Vol. 59, DOI: 10.1088/0026-1394/59/1A/07010.
DSTU-N RMG 43:2006 Metrology. Guidance on expressing measurement uncertainty [Metrolohiia. Kerivni vkazivky z vyrazhennia nevyznachennosti vymiriuvannia], 2006.
Zakharov, I., Serhiienko, M., Chunikhina, T. (2020), "Measurement uncertainty evaluation by kurtosis method at calibration of a household water meter", Metrology and Metrology Assurance (MMA) Р. 83–86. DOI: 10.1109/MMA49863.2020.9254260
Vallejo, M., Espriella, C., Gómez-Santamaría, J., Ramírez-Barrera, A., Delgado-Trejos, E. (2019), "Soft metrology based on machine learning: a review", Measurement Science and Technology, Vol. 31, No. 3. Р. 1–16. DOI: 10.1088/1361-6501/ab4b39
Kebir, S., Tabia, K. (2023), "Anomaly Detection in Real Scarce Data: A Case Study on Monitoring Elderly's Physical Activity and Sleep", IEEE 23rd International Conference on Bioinformatics and Bioengineering (BIBE), P. 385–392, DOI: 10.1109/BIBE60311.2023.00069
Yu, B., Yu, Y., Xu, J., Xiang, G., Yang, Z. (2014), "MAG: A Novel Approach for Effective Anomaly Detection in Spacecraft Telemetry Data", IEEE Transactions on Industrial Informatics, Vol. 20, No. 3, Р. 3891–3899, DOI: 10.1109/TII.2023.3314852
Li, Z., Wang, P., Wang, Z., Zhan, D., (2022), "FlowGANAnomaly: Flow-Based Anomaly Network Intrusion Detection with Adversarial Learning", Chinese Journal of Electronics, Vol. 33, No. 1, Р. 58–71, DOI: 10.23919/cje.2022.00.173
Barbieri, L., Brambilla, M., Stefanutti, M., Romano, C., Carlo, N., Roveri, M. (2023), "A Tiny Transformer-Based Anomaly Detection Framework for IoT Solutions", IEEE Open Journal of Signal Processing, Vol. 4, Р. 462–478, DOI: 10.1109/OJSP.2023.3333756
Guo, N., Lin, C., Yan, H., Zang, J., Xiong, M. (2024), "Real-Time Pantograph Anomaly Detection Using Unsupervised Deep Learning and K-Nearest Neighbor Classification", IEEE Transactions on Instrumentation and Measurement, Vol. 73, Р. 1–13, DOI: 10.1109/TIM.2024.3370747
Occorso, M., An, M., Olsen, R., Perry, V. (2023), "Anomaly Detection as a Data Reduction Approach for Test Event Analysis at the Edge", IEEE International Conference on Big Data (BigData), Р. 3863–3867, DOI: 10.1109/BigData59044.2023.10386215
Xiang, H., Zhang, X., Dras, M., Beheshti, A., Dou, W., Xu, X. (2023), "Deep Optimal Isolation Forest with Genetic Algorithm for Anomaly Detection", IEEE International Conference on Data Mining (ICDM), P. 678–687, DOI: 10.1109/ICDM58522.2023.00077
Liu, F., Ting, K., Zhou, Z. (2008), "Isolation Forest", IEEE International Conference on Data Mining, Р. 413–422, DOI: 10.1109/ICDM.2008.17
Jurado, K., Ludvigson, S., Ng, S. (2015), "Measuring Uncertainty", American Economic Review, Vol. 105 (3). Р. 1177–1216. DOI: 10.1257/aer.20131193
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Our journal abides by the Creative Commons copyright rights and permissions for open access journals.
Authors who publish with this journal agree to the following terms:
Authors hold the copyright without restrictions and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-commercial and non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their published work online (e.g., in institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.