APPLICATION OF THE NARX NEURAL NETWORK FOR PREDICTING A ONE-DIMENSIONAL TIME SERIES

a complex structure, an uneven rate of packet arrival for service by network devices. Predicting network traffic is still an impor tant task, as forecast data provide the neces sary information to solve the problem of managing network flows. Numerous studies of actually measured data confirm that they are nonsta­ tionary and their structure is multicomponent. This paper presents modeling using Nonlinear Autoregression Exogenous (NARX) algorithm for predicting network traffic datasets. NARX is one of the models that can be used to de­ monstrate non­linear systems, especially in modeling time series datasets. In other words, they called the categories of dynamic feed­ back networks covering several layers of the network. An artificial neural network (ANN) was developed, trained and tested using the LM learning algorithm (Levenberg­ Macwardt). The initial data for the prediction is the actual measured network traffic of the packet rate. As a result of the study of the ini­ tial data, the best value of the smallest mean­ square error MSE (Mean Squared Error) was obtained with the epoch value equal to 18. As for the regression R, its output ANN values in relation to the target for training, validation and testing were 0.97743. 0.9638 and 0.94907, respectively, with an overall regression value of 0.97134, which ensures that all datasets match exactly. Experimental results (MSE, R) have proven the method’s ability to accurate­ ly estimate and predict network traffic


Introduction
Further evolution of the telecommunication network based on infocommunication with packet switching caused a sharp increase in the amount of data associated with information flows from various human activities.
Today, topical areas in the Republic of Kazakhstan (RK) are data center management, cloud and cognitive technologies, IT security, etc.There are projects of electronic document management, electronic government, and the main investments are made in the development of cognitive technologies.Artificial intelligence collects various data and develops through ma-chine learning, as well as based on previous requests or statements, provides various information.All these prerequisites contribute to the growth of network traffic, and taking into account the implementation of the concept of the Internet of Things, data volumes will increase more and more in the cognitive infocommunication network.Analysts of the Republic of Kazakhstan report that based on the analyses carried out, the growth in network traffic vo lume, taking into account the implementation of the concept of the Internet of Things, will receive an astronomical growth in data volumes.Therefore, today the task of network management, including based on predicted future data, is necessary to make the right decision.Identifying and quantifying the components of a complex structure -the presence/absence of a trend, periodicity, random component is the main task of time series analysis.Identifying a nonlinear function and predicting it is an urgent task.Predictive data provide the necessary information to solve the problem of managing information flows in the network.
Practice has shown that the measured time series are non-stationary.The peculiarity of non-stationary time series is that they contain a trend, seasonal and cyclical fluctuations and noise.
Predictive data provide the necessary information to solve the problem of managing information flows in the network and allow you to prevent packet loss on a control basis.
This work is relevant, because in order to predict non-stationary time series in the traditional way, additional research and processing of initial data are required.As for the models of the artificial neural network, they do not require any additional information and preliminary processing of initial data.Analyzing network traffic is an important step in developing successful proactive congestion control schemes and identifying normal and malicious packets.These schemes aim to avoid network congestion due to network resource allocation relative to predicted traffic.The predictability of network traffic is an important benefit in many areas such as dynamic bandwidth allocation, network security, network planning, predictive congestion control and so on [1].
In addition, the task of detecting and preventing abuse on the network becomes very difficult due to the growing volume of traffic and the complexity of networks.Anomaly detection is one of the measures to counter online abuse.Significant deviations from normal traffic behavior can be exploited to detect an attack.The efficiency of anomaly detection is directly related to the accuracy of traffic prediction [2].

Literature review and problem statement
In [3], the authors described that, according to the estimates of forecasters, there are already more than a hundred forecasting methods, which raises the problem of choosing methods that would give adequate forecasts for processes or systems under study.At the same time, to obtain predictive data, they used a neural network to identify complex patterns of a time series.
The authors of [4] propose several recurrent neural network (RNN) architectures (standard RNN networks, Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU)) to solve the problem of predicting network traffic.Analyze the performance of these models on three important network traffic prediction problems: volume prediction, packet protocol prediction, and packet distribution prediction.As a result, we achieved a solution to the problem of predicting volumes in publicly available datasets such as the GEANT and Abilene networks.It can also be seen that RNN architectures show promising results in all three areas of network traffic prediction, far exceeding standard statistical prediction models.
A variety of deep learning models have been proposed for solving time series forecasting problems.Recurrent neural networks (RNNs) such as LSTM and GRU have made outstanding advances in sequential modeling.These models have become the obvious choice for time series modeling and a promising tool for handling irregular time series data.RNNs have the unique ability to adapt to efficiently use missing value patterns, time intervals and complex time relationships in irregular univariate and multivariate time series data [5,6].
Irregular time series data is becoming more common with the growth in the number of multi-sensor systems and the continued use of unstructured manual data recording mechanisms.Irregular data and resulting missing values severely limit the ability to analyze and model data for classification and forecasting tasks.Traditional methods used to process time series data often introduce bias and make strong assumptions about the underlying data generation process, which can lead to poor model predictions [7].
The work [8] provides a complete overview of the IoT traffic forecasting model using classical time series and an artificial neural network.Real network traces are used to predict IoT traffic.Experimental results show that LSTM and FNN based prediction models are very sensitive and therefore can be used to provide better performance as a time sequence prediction model than traditional traffic prediction methods.
It is this approach that was used in [9].Several methods were proposed for predicting the traffic matrix (TM) based on neural networks (NN) and predicting the traffic matrix from three points of view: directly predicting the total TM, predicting each origin-destination (OD) flow separately and predicting the overall TM combined with the correction of key elements.In addition to the prediction accuracy, various prediction methods are evaluated in terms of traffic management performance as well as prediction time.Experimental results show that forecasting methods based on recurrent neural networks (RNNs) can provide better forecasting accuracy than methods using convolutional neural networks (CNNs) and deep belief networks (DBNs).Predicting each OD stream with RNN models can further improve prediction accuracy as well as performance.However, it takes more time to predict each OD stream sequence.In contrast, overall TM prediction, combined with key element correction, can provide a trade-off between traffic control results and prediction overhead, which is more suitable for dynamic traffic control.
In [10], the authors present the results of a study on the use of echo state networks (ESN) to predict network traffic.ESN is a type of reservoir learning algorithm whose internal state consists of a pool of randomly connected neurons, similar to a reservoir.The experiments compare the results obtained using ESNs with those obtained using other network prediction methods on two datasets obtained from a wellknown public network trace repository.The authors show that ESNs provide significantly less training and prediction times at similar levels of accuracy compared to other methods.
In [11], the authors provided a methodology for predicting future power consumption using nonlinear autoregressive (NAR) and nonlinear autoregressive neural networks with exogenous inputs (NARX), respectively.The results show that NAR and NARX neural networks are suitable for performing energy predictions, but also that exogenous data can help improve prediction accuracy.Each network model has its own advantages and disadvantages.NAR methods are simpler than NARX.However, the latter model allows additional information to be used that can improve the forecast.
In the NARX time series closed-loop neural networks with multistage prediction, IoT traffic time series prediction models have been proposed.The forecast was evaluated using reliability assessment functions such as MSE, SSE, MAE, and mean absolute percentage of error (MAPE) [12].
The authors in their research believe that NARX is one of the best methods in the neural network (for example, much faster in convergence) [13,14].
In [15], an automated system for forecasting time series based on a neural network apparatus is implemented.All this allows us to assert that it is advisable to conduct studies of a time series devoted to actually measured data.
Taking into account the above review of scientific research, for predicting network traffic, it is proved that the NARX method learns network traffic efficiently with an acceptable result of the forecast accuracy obtained [16].For this, nonlinear autoregressive networks with exogenous inputs (NARX) were chosen, which augment multivariate time series using external information to improve the performance of time series forecasting.

The aim and objectives of the study
The aim of the work is to predict the actually measured time series.From a practical point of view, forecasting the actually measured time series (packet intensity) solves the problem of identifying traffic for creating devices for managing information flows.
To achieve the aim, the following objectives were set: -substantiation of the choice of a forecasting model based on a neural network; -description of the architecture and modeling of the neural network; -building a forecast of future values of the time series.

Materials and methods of the research
The original data was obtained by means of a sniffer program -Wireshark.Within five hours, 278.557 packages were tracked, including: 158 TCP (Transmission Control Protocol); 493 ARP (Address Resolution Protocol); 25733 MPEG (Moving Picture Experts Group); 250242 UDP (User Datagram Protocol), etc.
A time series was created based on the collected data.This row represents the intensity of User Datagram Protocol packets.At the same time, their number was calculated for every 10 seconds.User Datagram Protocol is used to transmit real-time traffic that does not guarantee packet delivery, so it is important to consider its structure for forecasting (Fig. 1).A fragment of the measured network traffic is as follows:  In the course of its evolutionary development, the volumes of transmitted information, the types of its presentation, the methods of transmission and storage, the number of sources and consumers, the distribution between users, the requirements for timeliness and reliability (quality) change.
Modern studies of real traffic show that it is necessary to take into account the «explosive» nature of traffic (pulsating over a wide range of time scales), which is combined with the Triple Play and Quadra Play service packages, etc.This fractal property implies a significant impact on the performance of network devices associated with a non-uniform packet arrival rate [17].
Based on the above, it can be seen that the measured network traffic is not stationary.And classical forecasting methods are not suitable for forecasting such series.Nonlinear identification with prediction is possible using neural network prediction.At the same time, neural networks and deep learning cope well with such tasks.
A non-stationary series always has a tendency, which is characterized by non-random factors in the processes represented by this time series.A time series is called non-stationary if its characteristics (mean value, variance and autocorrelation function) depend on time [18].
The importance of network traffic prediction is relevant in the following cases: -when managing a network: -detecting network congestion; -data flow control.In this case, the selected model for predicting network traffic is important.Practice shows that modern network traffic is not stationary.A functioning telecommunications network must be ready to serve a huge number of devices in accordance with the concepts of the Internet of Things (IoT) and Industrial Internet of Things (IIoT).It is about interchange through a network of sensors, actuators, controllers and human-machine interfaces.
When forecasting the time series, one takes into account that the series values in the past contain information about its future behavior.The stronger the future depends on the past, the better: it is possible to build a forecast -this is a key feature.The most common methods today are: methods of temporary extrapolation intersecting with autoregressive forecasting models, exponential smoothing models, neural network forecasting methods, etc.
In the present work, a forecast is made based on ANN (artificial neural network) associated with their universal approximating and predicting capabilities.
Back in 1943, foreign scientists introduced the concept of ANN and proposed a formal model of an artificial neuron.
Further in 1949, scientists described the basic principles of the network and created the first algorithm for training neurons [19].And today, ANN is used as a model, mean and tool for approximating multidimensional functions, predicting and diagnosing processes, for searching by associations and searching for patterns in data arrays, for adaptive control and statistical analysis, for identifying and recognizing patterns, etc.

2. Description of the architecture and modeling of the neural network
ANNs are mathematical models (software or hardware implementations), which are built on the principle of the organization and functioning of biological neural networksnetworks of nerve cells of a living organism.
ANN is a set of artificial neurons connected to each other.The work uses one of two types of ANN architecturea network with direct signal propagation, a NARX nonlinear autoregression network, in which nonlinear coefficients are introduced into the difference autoregressive equation for the implementation of dynamic nonlinear systems using neural networks.At the same time, a network of this type is multilayer with direct signal transmission and output feedback, and the input signal is passed through a vector of time delays.
The work uses a parallel version (Fig. 3).
The most common area of application of NARX networks is forecasting systems, that is, predicting the values of the output signal of the system under study based on the results of previous measurements.
One of the simplest dynamic neurons is the Hopfield neuron.As for the NARX non-linear autoregression network, it uses a modified Hopfield neuron, whose state is determined by a more distant background.The dynamics of the NARX model is described as follows: where F -some nonlinear function of its arguments, which is approximated during the training; q -unit delay.
The NARX network is a multilayer network with direct signal transmission and output feedback, the output of which is passed through a time delay vector.As a function of activation, a sigmoid function is used.
Each network layer converts the input feature space into some other space, possibly with a different dimension.Such Fig. 3. NARX network architecture a nonlinear transformation occurs until the classes turn out to be linearly separable neurons of the output layer.All layers of the ANN, except for input and output, give the network the ability to simulate nonlinear phenomena.

3. Building a forecast of future values of a time series
For training the neural network, a nonlinear optimization method -the Levenberg-Macwardt algorithm was used, which is based on achieving the lowest mean square error and the mean square error (MSE) is used to evaluate the performance of the neural network.
Input (source data) and target data vectors are randomly divided into the following three sets: -70 % of vectors are used for training; -15 % vectors -to verify the reliability of the results and to avoid network overtraining; -15 % used for independent network testing.Fig. 4 shows the nntraintool window, which allows you to get various graphs describing different processes and quality of training the NARX network.At the top of the nntraintool window is a model of a nonlinear autoregressive neural network with exogenous inputs.In this case, the forecast is made on the basis of the previous values of the studied data and exogenous input signals.This network has one Hidden layer with 52 neurons and one Output layer with one neuron.

Fig. 4. NARX network learning outcomes
The principle of the processes of such learning is that a recurrent neural network «turns back in time» and is represented as a multilayer perceptron with a large number of layers, each layer corresponds to some past tact.
The learning process was completed when the target accuracy was achieved, while the studied network was trained in 18 epochs.
When the Performance button is activated, a graph of the mean square error (MSE) versus the number of epochs is obtained (Fig. 5).It can be seen that by the end of the learning process, the error becomes higher.The above plot of the validation efficiency between the number of epochs and the root mean square error demonstrates that the green circle in the plot shows that training stopped when the validation error increased for twelve iterations.Both the validation set error and the test set error have similar characteristics, as indicated by the green and red lines.It is clear that with an increase in the number of epochs, the MSE decreases for all trained, validation, and test data with an insignificant change in slope.The best validation efficiency from the MSE perspective is 572.6426 in epoch 12.
As a result of the launch and training of ANNs, informative graphics were obtained: -train -describes the training settings.At the same time, the value of the epoch (cycle) is 18, which the network shows, is represented by the introduction of a new adjustment pattern (backpropagation), coupled with a direct activation flow, occurred under the adjustment of the ANN weights with the total mean square error.A further increase in the epoch value leads to a decrease in the mean square error of the MSE, that is, the training lasted 18 epochs; -validation -the studied data after the value of the epoch equal to 12 does not change its values, that is, the best performance is taken from the epoch with the smallest validation error (dot circled); -test -shows the best learning performance; -best -the best performance.Fig. 6 shows the ANS training tools, which additionally display various parameters for predicting progress.The characteristics of the learning state of the network at epoch 18, when the gradient is 447.8859.The network stops the training session because its generalization stops improving.
A histogram of output errors of a nonlinear autoregressive network with exogenous NARX inputs is shown in Fig. 7.
For a detailed analysis of the approximation quality, a regression analysis of the network outputs has been obtained, in which a linear regression of the network learning results is constructed on the three considered subsets and on the entire training set.For each result, the correlation coefficient R was calculated and graphs were plotted (Fig. 8).When solving the problems of approximating dependencies, the quality of the network is determined by the differences between the known values of the function and the approximated network.Fig. 9 shows the approximation functions in the Neural Network Toolbox Matlab.
Fig. 10 shows a graph that displays an erroneous autocorrelation function.
Fig. 11 shows the cross-correlation function of the input error, which illustrates how errors correlate with the input sequence x(t).Moreover, all correlations are within the confidence limit around zero.After several experiments by the authors in [16] using three architectures, the results show that the NARX model gives the best result when using the parameters 189:31:94 or 60 %:10 %:30 %, in which the training performance value was < 0.90.The overall performance of the NARX model is above 0.90, and between 0.86 and 0.90.In other words, the NARX architecture can be used as a model to predict a time series dataset.
In addition, the results show that the MSE and r values for the NARX architecture model are 0.006717 and 0.90764, respectively.Thus, the NARX architecture model with a configuration of 189:31:94 or 60 %:10 %:30 % can be used as an alternative model for predicting daily network traffic, because the MSE and r results obtained are very high.
In Fig. 9, 10, there is only one non-zero value of the autocorrelation function, which occurred at zero latency (standard error).This shows that the forecast model is relatively perfect (forecast errors do not correlate with each other, that is, they represent white noise).The rest of the values of the autocorrelation function are below the confidence interval and are approximately 95 % of the confidence interval near zero.Otherwise, you will have to retrain the network to improve the forecast.
The cross-correlation function plot shows how the errors correlate with the input sequence.Ideally, the forecasting model should have all correlations equal to zero.In this case, all correlations are within confidence limits below zero.
Based on the prediction rule, namely the error autocorrelation, cross-correlation of input errors and time series the reaction parameters that were observed.For an accurate forecast model, error autocorrelation function values there must be only one non-zero value and reach zero lag, this would mean that the errors in expectations were completely uncorrelated with each other [16].The research results show that the NARX network with exogenous inputs provides a satisfactory forecast quality, but this model is relevant only for long-term network forecasting.

Conclusions
1.The choice of a forecasting model based on a neural network is primarily due to the fact that the measured traffic has an uneven intensity.This row is not stationary.Forecasting such series by traditional forecasting methods requires bringing the series to a stationary one using differentiation.
2. NARX model is a dynamic neural architecture commonly used to model nonlinear dynamic systems.When applied to time series forecasting, the NARX network is designed as a feedforward time-delay neural network.
3. The total root mean square error (MSE) is the root mean square error difference between the actual output values and the target values and is equal to 572.6429 with an epoch of 12. Regarding the regression values (R), they measure the correlation between the outputs.The difference in errors is very small, indicating a complete agreement between the values predicted by the NARX network and the experimentally measured values.The NARX network predictions show what is accurately modeled by the artificial neural network.Therefore, the artificial neural network is intended to be a useful tool for predicting packet intensity.

Fig. 2 . 5 . Results of the research 5 . 1 .
Fig. 2. A fragment of program code for predicting the source series

Fig. 5 .
Fig. 5. Dependence of the standard error on the number of epochs