output DEVELOPMENT OF A SYSTEM FOR THE DETECTION OF CYBER ATTACKS BASED ON THE CLUSTERING AND FORMATION OF REFERENCE DEVIATIONS OF

Adaptive system of cyber attack detection, which is based on the improved algorithms for splitting the feature space into clusters, was developed. The procedure of recognition was improved by using the simultaneous clustering and formation of verifying admissible deviations for the attributes of anomalies and cyber attacks. The proposed modifications of the algorithm for splitting the feature space into clusters in the process of implementation of the procedure of recognition of anomalies and cyber attacks, in contrast to the existing ones, allow us to form simultaneously the reference tolerances when processing complex attributes of recognition objects (RO). This provides the possibility, at every step of training an adaptive recognition system, to change the verifying admissible deviations for all attributes of anomalies and cyber attacks simultaneously. The proposed algorithms make it possible to prevent possible cases of absorption of one RO class of basic attributes of anomalies and cyber attacks by another class. Predicate expressions for ASR that is capable of self-learning were obtained. Verification of the proposed algorithms was carried out on the simulation models in MatLab and Simulink. It was proved that the proposed algorithms for the clustering of RO attributes make it possible to receive effective learning matrices for ASR as a part of intelligent systems for cyber attack detection.


Introduction
Active expansion of computer technologies, in particular in critically important information systems (CIIS), is accompanied by the emergence of new threats to cyber security (CS). It is possible to enhance CS of CIIS by using, in particular, intelligent systems (and technologies) for the detection of cyber attacks (ISDA). Given a constant complication in the scenarios of cyber attacks, ISDA must have characteristics of adaptive systems. In other words, the ability to deliberately modify the algorithm for detecting the anomalies and cyber attacks by using the methods of clustering of attributes of the recognition objects (RO), as well as machine intelligent technologies of learning (MITL). This makes it relevant to examine improvement of those existing and development of the new algorithms for the clustering of RO attributes, as well as the applied adaptive subsystems as a part of ISDA.

Literature review and problem statement
Information that is accepted as the basis for building the clusters in adaptive systems of recognition (ASR) of cyber attacks was explored in many studies, for example, in the form of complex attributes of RO in CIIS [1,2]. These studies were mainly of theoretical character. As indicators or metrics [3] for building the classifiers, the authors investigated: threshold values of parameters of the input and output

DEVELOPMENT OF A SYSTEM FOR THE DETECTION OF CYBER ATTACKS BASED ON THE CLUSTERING AND FORMATION OF REFERENCE DEVIATIONS OF ATTRIBUTES V . L a k h n o
traffic [4], unpredicted addresses of packets [5], attributes of requests to databases (DB) [6,7], etc. These articles do not take into account the possibility of parallel formation of reference deviations for the features of anomalies and cyber attacks, which increases the time of RO analysis in ASR (or ISDA) [8]. For complex targeted attacks, information attributes may be quite fuzzy [9,10], which does not contribute to building the effective algorithms of recognition.
In papers [11,12], it was assumed that to enhance effectiveness of recognition, it is expedient to split the set of values of each indicator into disjoint groups by certain rules. This task can be solved by using the methods and models for cluster analysis [13,14]. However, these studies have not been brought to hardware or software implementation.
By using an information condition of functional effectiveness (ICFE) of ASR learning [15,16], it is possible to implement adaptive algorithms for the clustering of RO attributes into ISDA.
As was shown in articles [17,18], in case the RO attributes glossary is unchanged, it is possible to improve effectiveness of ASR learning. These studies do not take into account the possibility of increasing the degree of intersection of the RO classes.
Thus, given the potential of the ІSDА application, it appears to be an important task to improve the algorithms for clustering and formation of reference deviations of the OR attributes for the timely detection of anomalies and cyber attacks in CIIS.

The aim and tasks of research
The aim of present research is to develop an algorithm for the partition of the feature space (FS) into clusters in the process of recognition of cyber attacks in the systems of cyber protection.
To achieve the aim of the study, the following tasks are to be solved: -to improve algorithms for the clustering of attributes of anomalies and cyber attacks and for the simultaneous formation of verifying admissible deviations in the intelligent systems of cyber attack detection; -to conduct simulation in order to test and verify the adequacy of the proposed algorithms.

Algorithms for the clustering of attributes and the formation of verifying admissible deviations in the intelligent systems of cyber attack detection
Splitting FS and further clustering, for any RO class 0 m CT , in accordance with [19,20], was carried out by transforming FS to a hyper-spherical form. Since the main stage of clustering when splitting FS into groups is an increase in the radius (cr m ) of container (RC) at every step of ASR (or ІSDА) leanring, it is possible to use the following recurrent expression: In the process of ASR learning, we make an assumption about fuzzy compactness of the implementation of binary learning matrices (BLM) [16,21,22], obtained at the stage of splitting SF into relevant RO classes. Fuzzy partition RC |M| includes the elements that can be attributed to fuzzy RO classes, for example, when it is difficult to distinguish a DoS attack from a DDoS attack [4,16].
The rules of ASR learning, according to [1,2,14,23,24], are built based on the iteration procedure of searching for the maximum boundary magnitude of an information condition of functional effectiveness (ICFE): where CE m is the ICFE of ASR learning to recognize RO that belong to class 0 m C ; IS k is the permissible range of values of the k-th informative attribute of RO; IS CE is the permissible range of ICFE in the course of ASR learning.
The following constraints are imposed on expression (2): cr ct ct ⊕ among all classes for RO; RO are described by binary learning matrices (BLM) [21][22][23]. We accepted that ct a and ct b are the reference vectors of RO classes, in particular, by the KDD Cup 1999 Data [2,5,7].
The ASR learning procedure is given in the form of predicate expression: where a cr , ¢ b cr¢ are the optimal radii of containers 0 a C and 0 b C , respectively.
To reduce the number of cycles during a learning procedure, the sets of input signals (factors) that influence ASR were determined. These sets correlate with the dimensionality of the vector of ASR testing parameters is=<is 1 ,…, is k ,…, is RS > in the course of recognition of the templates of attacks.
ASR (or ISDA) learning is an iteration procedure of searching for global ICFE [2,5,8,20,24] CT CT cr cr ct ct : CE RC , CT RC , cr cr ct ct range of determining ICFE indicator CE; IS cr is the permissible range of RC magnitude cr. The algorithm of OR classification is functional at the following restrictions: ξ For better visualization, the stages of splitting FS of RO into clusters in ASR are represented in tabular form in Table 1.
As a criterion of the optimization of parameters, during ASR learning, we used statistical parameters (information measures) for the variants of solutions with two alternatives [18,25,26] for a modified entropic indicator, as well as the Kullback-Leibler divergence (for three hypotheses) [27]. Table 1 Stages of splitting FS into clusters Stage Action Description Step counter (SC) of changing VAD ca i by features of RO is set as "0": l : 0 =  Table 1 We developed the algorithm that allows us to perform parallel formation of reference tolerances during an analysis of attributes of anomalies and cyber attacks, which are difficult to explain [1,7,16,18]. This approach, when a parallel formation of VAD -({ca K,i }) is performed, makes it possible to change VAD for all attributes at every step of learning simultaneously. The algorithm enables in the course of learning to update optimal parameters of containers for the recognition classes 0 m CT . The stages of splitting FS of RO into clusters are presented in tabular form in Table 2.

Continuation of
The Adding results to a knowledge base (KB). End of algorithm operation.
Input data for ASR are an array of learning samples, obtained based on data from Tables 1, 2, as well as results of [10,16]: where kl is the number of learning matrix for RO class; implementation is the number of implementation in BLM [10,16]; j is the number of recognition attribute for RO. To assess ASR effectiveness and optimality of defined VAD for RO classes of ISDA, the Pareto method was used Values of meter of recognition classes "0» m:=0 6.2 Increasing the value of meter m:=m+1 6.3 Value of meter of steps of RC change "0» cr:=0 6.4 Increasing the value of meter cr:= cr +1 6.5 Calculation of current ICFE Expression -stage 6.4 Table 1 6  The values of optimal RC cr, taking into consideration additional hypotheses for the examined simulation models of ASR learning, are given in Table 3. Values of optimal RC cr for the examined simulation models of ASR learning As was shown by data analysis, for IM, Fig. 1-3, quasi-optimal value of parameter ca n,i of VAD equals VAD=8-16 % at maximum value of СЕ max =6. 16.
Thus, it was proved in the course of the simulation experiment that the proposed algorithms for the clustering of RO attributes enable us to obtain efficient learning matrices for ASR as a part of ISDA.

Discussion of results of testing the algorithms and prospects of further research
Scientific and practical results of research in the form of software applications were implemented in ASR and adaptive expert systems (AES) of cyber protection, implemented at the state enterprise "Design and engineering technological bureau of automation of control systems on railway transport of Ukraine" of the Ministry of Infrastructure of Ukraine, as well as in the information security services of computing centers at the industrial and transportation enterprises in the cities of Kyiv, Dnipro and Chernihiv.
The proposed algorithms differ from the existing ones by the possibility of simultaneous formation of reference tolerances in the course of analysis of complex attributes of anomalies and cyber attacks. This allows changing VAD for all attributes simultaneously during the procedure of training the existing and promising ISDA. The improved algorithms are also focused on the possibility of processing a large amount of specialized data during procedures of the recognition and analysis of various types of attributes of anomalies and targeted cyber attacks in CIIS.
The effectiveness of using the proposed algorithms depends on the number of informative attributes, which are used for the formation of BLM. In addition, efficiency of algorithms is determined by the input data for ASR or AES, formed at each step of clustering. When the number of attributes is insignificant, the effect of using the modified algorithm is negligible.
The results presented are a continuation of the research, results of which were described earlier in articles [10,18,23]. The prospects of further research include the enlargement of attributes knowledge base and the formation of BLM of ASR.

Conclusions
1. We proposed to refine the algorithm of splitting the feature space into clusters in the course of implementation of procedure for the recognition of anomalies and cyber attacks, which differs from the existing algorithms by the simultaneous formation of reference tolerances during analysis of complex RO attributes, and allows simultaneous changing of VAD for all attributes at every step of learning. The proposed refinements make it possible to prevent possible cases of the absorption of one RO class of basic attributes of anomalies and cyber attacks by another class. In this case, predicate expressions were obtained for ASR that is capable of self-learning.
2. We examined the devised algorithms on the simulation models in MatLab. It was proved that the proposed algorithms for the clustering of RO attributes enable to obtain effective learning matrices for ASR as a part of ISDA.