MONITORING DATA AGGREGATION OF DYNAMIC SYSTEMS USING INFORMATION TECHNOLOGIES
DOI:
https://doi.org/10.30837/ITSSI.2023.23.123Keywords:
data dimensionality reduction; deep learning; autoencodersAbstract
The subject matter of the article is models, methods and information technologies of monitoring data aggregation. The goal of the article is to determine the best deep learning model for reducing the dimensionality of dynamic systems monitoring data. The following tasks were solved: analysis of existing dimensionality reduction approaches, description of the general architecture of vanilla and variational autoencoders, development of their architecture, development of software for training and testing of autoencoders, conducting research on the performance quality of autoencoders for the problem of dimensionality reduction. The following models and methods were used: data processing and preparation, data dimensionality reduction. The software was developed using the Python language. Scikit-learn, Pandas, PyTorch, NumPy, argparse and others were used as auxiliary libraries. Obtained results: the work presents a classification of models and methods for dimensionality reduction, general reviews of vanilla and variational autoencoders, which include a description of the models, their properties, loss functions and their application to the problem of dimensionality reduction. Custom autoencoder architectures were also created, including visual representations of the autoencoder architecture and descriptions of each component. The software for training and testing autoencoders was developed, the dynamic system monitoring data set, and the steps for pre-training the data set were described. The metric for evaluating the quality of models is also described; the configuration of autoencoders and their training are considered. Conclusions: The vanilla autoencoder recovers the data much better than the variational one. Looking at the fact that the architectures of the autoencoders are the same, except for the peculiarities of the autoencoders, it can be noted that a vanilla autoencoder compresses data better by keeping more useful variables for later recovery from the bottleneck. Additionally, by training on different bottleneck sizes, you can determine the size at which the data is recovered best, which means that the most important variables are preserved. Looking at the results in general, the autoencoders work effectively for the dimensionality reduction task and the data recovery quality metric shows that they recover the data well with an error of 3–4 digits after 0. In conclusion, the vanilla autoencoder is the best deep learning model for aggregating monitoring data of dynamic systems.
References
References
Grusha, V. M. (2017), "Chlorophyll fluorometer data normalization and dimensionality reduction” ["Normalizaciya ta zmenshennya rozmirnosti dany’x xlorofil-fluorometriv"], Computer facilities, networks and systems, No. 16, P. 76–86.
Martseniuk, V. P., Dronyak, Y. V. and Tsikorska, I. V. (2019) "Reduction of dimension for prediction of progress in problems of medical education: an approach based", Medical Informatics and Engineering, No. 4, P. 16–24.
Kozak, Ye. B. (2021), "A complex algorithm for creating control automata based on machine learning" ["Kompleksny’j algory`tm stvorennya keruyuchy’x avtomativ na bazi mashy`nnogo navchannya"], Technical engineering, No. 2 (88), P. 35–41. DOI: https://doi.org/10.26642/ten-2021-2(88)-35-41.
Bakurova, A. et al. (2021), "Neural network forecasting of energy consumption of a metallurgical enterprise", Innovative Technologies and Scientific Solutions for Industries, No. 1 (15), P. 14–22. DOI: https://doi.org/10.30837/itssi.2021.15.014.
Korablyov, M. and Lutskyy, S. (2022), "System-information models for Intelligent Information Processing", Innovative Technologies and Scientific Solutions for Industries, No. 3 (21), P. 26–38. DOI: https://doi.org/10.30837/itssi.2022.21.026.
Xie, H., Li, J. and Xue, H. (2018), "A survey of dimensionality reduction techniques based on random projection", arXiv.org. DOI: https://doi.org/10.48550/arXiv.1706.04371
Espadoto, M. et al. (2021), "Toward a quantitative survey of dimension reduction techniques," IEEE Transactions on Visualization and Computer Graphics, No. 27 (3), P. 2153–2173. DOI: https://doi.org/10.1109/tvcg.2019.2944182
Velliangiri, S., Alagumuthukrishnan, S. and Thankumar joseph, S.I. (2019), "A review of dimensionality reduction techniques for efficient computation", Procedia Computer Science, No. 165, P. 104–111. DOI: https://doi.org/10.1016/j.procs.2020.01.079
McInnes, L., Healy, J. and Melville, J. (2020), "UMAP: Uniform manifold approximation and projection for dimension reduction", arXiv.org. DOI: https://doi.org/10.48550/arXiv.1802.03426
Chorowski, J. et al. (2019), "Unsupervised speech representation learning using WaveNet autoencoders", IEEE/ACM Transactions on Audio, Speech, and Language Processing, No. 27 (12), P. 2041–2053. DOI: https://doi.org/10.1109/TASLP.2019.2938863
Jia, W. et al. (2022), "Feature dimensionality reduction: A Review", Complex & Intelligent Systems, No. 8 (3), P. 2663–2693. DOI: https://doi.org/10.1007/s40747-021-00637-x
May, P. and Rekabdarkolaee, H.M. (2022), "Dimension reduction for spatially correlated data: Spatial predictor envelope", arXiv.org. DOI: https://doi.org/10.48550/arXiv.2201.01919
Matchev, K.T., Matcheva, K. and Roman, A. (2022), "Unsupervised machine learning for exploratory data analysis of Exoplanet Transmission Spectra", arXiv.org. DOI: https://doi.org/10.48550/arXiv.2201.02696
Björklund, A., Mäkelä, J. and Puolamäki, K. (2022), "SLISEMAP: Supervised dimensionality reduction through local explanations", Machine Learning, No. 112 (1), P. 1–43. DOI: https://doi.org/10.1007/s10994-022-06261-1
Bhandari, N. et al. (2022), "A comprehensive survey on computational learning methods for analysis of Gene Expression Data", arXiv.org. DOI: https://doi.org/10.48550/arXiv.2202.02958
Bank, D., Koenigstein, N. and Giryes, R. (2021), "Autoencoders", arXiv.org. DOI: https://doi.org/10.48550/arXiv.2003.05991
Hinton, G.E. et al. (2012), "Improving neural networks by preventing co-adaptation of feature detectors", arXiv.org. DOI: https://doi.org/10.48550/arXiv.1207.0580
Ioffe, S. and Szegedy, C. (2015), "Batch normalization: Accelerating deep network training by reducing internal covariate shift", International conference on machine learning. DOI: https://doi.org/10.48550/arXiv.1502.03167
Kingma, D.P. and Ba, J. (2017), "Adam: A method for stochastic optimization", arXiv.org. DOI: https://doi.org/10.48550/arXiv.1412.6980
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Our journal abides by the Creative Commons copyright rights and permissions for open access journals.
Authors who publish with this journal agree to the following terms:
Authors hold the copyright without restrictions and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-commercial and non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their published work online (e.g., in institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.