LENGTHENING THE DATA SERIES BY VALUES OF SIMILAR DATA SERIES SAMPLES
DOI:
https://doi.org/10.24025/2306-4412.3.2021.244266Keywords:
time series, regression, data replenishment, machine learning, sklearn, insufficient data, hydrologyAbstract
The problem of insufficient information essentially influences the choice of approaches and methods of data series analysis, as well as the quality of the obtained results. Considering this problem, the authors believe that the development of such approaches and models for data series lengthen-ing is relevant. The main task of this work is to describe and implement the technology of data series lengthening. The basis for the implementation of the technology is the use of values of similar data series as a signs for the lengthening of a certain data series represented by the same indicators, as well as similar data series. The work describes a scheme for identifying similar data series. According to this scheme, the most similar data series are those that have the smallest distance value and the strongest direct correlation, calculated between the potentially similar series and the series for which the lengthening will take place. For lengthening of the series, the work considers seven models: linear regression; sum of weighted values for a group of similar series; average weighted values for a group of similar series, with a correction to the average value of the series for which the lengthening is performed; random forest; k-nearest neighbors; support vector regression; gradient busting. The calculation experiment was carried out on the series represented by the values of water level indicators rec-orded at hydrological stations located in the water objects of the Dnieper River basin. For the data series of post 79545, located on the river Sluch, Novograd-Volynsky, Zhytomyr region, a lengthening by one year is carried out, i.e. the length of the series increases by 365 values. As a result, it was found that the most similar are the data series of values by the posts 79555 and 79694, which have the lowest values of the calculated distances and the value of the correlation coefficient greater than 0.75. When the series is lengthened, the best results are obtained with the use of two models: the sum of weighted values for a group of similar series and average weighted values for a group of similar series, with a correction to the average value of the series for which the lengthening is performed. In future research it is planned to use the obtained results for the development and analysis of methods for replenishment of missing values in time series.
References
E. Kovpak, F. Orlov, "Comparative analysis of machine learning models and regressions for prediction the car price", Visnyk Kharkivskoho natsionalnoho universytetu imeni V. N. Karazina. Seriia: Ekonomichna, iss. 97, pp. 31-40, 2019 [in Ukrainian].doi: 10.26565/2311-2379-2019-97-04.
M. S. Khan, "Water quality prediction and classification based on principal component regression and gradient boosting classifier approach", Journal of King Saud University-Computer and Information Sciences, 2021. doi:10.1016/j.jksuci.2021.06.003.
Assem Haytham et al., "Urban water flow and water level prediction based on deep learning", Joint European Conf. on Machine Learning and Knowledge Discovery in Databases, Springer, Cham, pp. 317-329, 2017.doi: 10.1007/978-3-319-71273-4_26.
V. D. Derbentsev, H. I. Velykoivanenko, and N. V. Datsenko, "Machine learning ap-proach for forecasting cryptocurrencies time series", Neiro-nechitki tekhnolohii modeliuvannia v ekonomitsi, no. 8, pp. 65-93, 2019 [in Ukrainian].
Yu. L. Khlevna, and Yu. S. Bura, "Infor-mation software for real estate prices prediction by machine learning", Sciences of Europe, iss. 71-1, pp. 54-62, 2021 [in Ukrainian].doi: 10.24412/3162-2364-2021-71-1-54-62.
Yu. O. Andrusenko, "Analysis of the basic models for forecasting time series", Zbirnyk naukovykh prats Kharkivskoho natsional-noho universytetu Povitrianykh Syl, iss. 3 (65), pp. 91-96, 2020 [in Ukrainian].doi.org/10.30748/zhups.2020.65.14.
V. A. Artemenko, and V. V. Petrovich, "Improving the quality of forecasting of hydro-logical time series", Avtomobilni dorohy i dorozhnie budivnytstvo: sci. and techn. coll., iss. 92, pp. 114-127, 2014 [in Russian].
C. Chen, Q. Hui, Q. Pei, Y. Zhou, B. Wang, N. Lv, and J. Li, "CRML: A convolution re-gression model with machine learning for hydrology forecasting", IEEE Access, vol. 7, pp. 133839-133849, 2019.doi:10.1109/ACCESS.2019.2941234.
S. K. Jain at al., "A brief review of flood forecasting techniques and their applica-tions", Int. J. River Basin Manag., vol. 16, pp. 329–344, 2018.doi:10.1080/15715124.2017.1411920.
Д. А. Тамбиева, Е. В. Попова, и Ш. Х. Салпагарова, "К проблеме недостаточности информации. Малые выборки или "очень короткие" временные ряды", По-литематический сетевой электронный научный журнал Кубанского государственного аграрного университета, № 107, с. 126-141, 2015.
Random Forest Regressor [Online]. Availa-ble: http://surl.li/aidsj. [12] K-Neighbors Regressor. [Online]. Available: http://surl.li/aidsk.
SVR. [Online]. Available: http://surl.li/aidsm.
Gradient Boosting Regressor. [Online]. Available: http://surl.li/aidsn. [15] scikit-learn. [Online]. Available: https://scikit-learn.org/stable/index.html.
Downloads
Published
How to Cite
Issue
Section
URN
License
Copyright (c) 2021 Анастасія Батурінець, Світлана Антоненко

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The authors who publish in this journal agree to the following terms:The authors reserve the right to authorship of their work and give the journal the right to first publish this work under the terms of the Creative Commons Attribution License CC BY-NC, which allows other persons to freely distribute published work with a mandatory reference to authors of the original work and the first publication of the work in this journal.
Authors have the right to conclude separate additional agreements for the non-exclusive distribution of the paper in the form in which it was published by this journal (for example, posting work in electronic repository or publishing as part of a monograph), provided that the link to the first publication in this journal is maintained.
The journal policy allows and encourages authors to post on the Internet (for example, in repositories of institutions or on personal websites) the manuscript of work, both before the submission of this manuscript to the editorial staff, and during its editorial work, as it contributes to the emergence of productive scientific discussion and positively affects the efficiency and dynamics of published work citation (see The Effect of Open Access).