DEVELOPMENT OF A METHOD FOR TRAINING ARTIFICIAL NEURAL NETWORKS FOR INTELLIGENT DECISION SUPPORT SYSTEMS

We developed a method of training artificial neural networks for intelligent decision support systems. A distinctive feature of the proposed method consists in training not only the synaptic weights of an artificial neural network, but also the type and parameters of the membership function. In case of impossibility to ensure a given quality of functioning of artificial neural networks by training the parameters of an artificial neural network, the architecture of artificial neural networks is trained. The choice of architecture, type and parameters of the membership function is based on the computing resources of the device and taking into account the type and amount of information coming to the input of the artificial neural network. Another distinctive feature of the developed method is that no preliminary calculation data are required to calculate the input data. The development of the proposed method is due to the need for training artificial neural networks for intelligent decision support systems, in order to process more information, while making unambiguous decisions. According to the results of the study, this training method provides on average 10–18 % higher efficiency of training artificial neural networks and does not accumulate training errors. This method will allow training artificial neural networks by training the parameters and architecture, determining effective measures to improve the efficiency of artificial neural networks. This method will allow reducing the use of computing resources of decision support systems, developing measures to improve the efficiency of training artificial neural networks, increasing the efficiency of information processing in artificial neural networks


Introduction
Decision support systems (DSS) have become the basis for solving information and calculation problems in everyday life and to solve very specific (special) tasks. DSS are actively used in processing large data sets, providing information support to the decision-making process of decision-makers. The basis of existing DSS is artificial intelligence methods [1][2][3][4][5][6][7][8][9][10][11].
The creation of intelligent DSS was a further development of classical DSS, the main tool of which is evolving artificial neural networks (ANN). Intelligent Decision-Maker Support System (iDMSS) is an interactive computer system designed to support decision-making in various fields of activity regarding poorly structured and unstructured problems, based on the use of models and procedures for data processing and knowledge based on artificial intelligence technologies.
Evolving ANN have versatile approximating properties. Evolving ANN provide stable operation in conditions of nonlinearity, a priori certainty, stochasticity and chaos, various kinds of disturbance and interference.
Despite their successful application to a wide range of data mining problems, these systems have a number of disadvantages. The most significant shortcomings are the following: -complexity of system architecture selection. As a rule, a model based on the principles of computational intelligence has a fixed architecture. In the context of ANN, this means that the neural network has a fixed number of neurons and connections. Therefore, adapting the system to new data received for processing, which differ from previous data, may be problematic; -formation of «dead» neurons in the layers during ANN functioning; -batch training and multi-epoch training requires considerable time resources. Such systems are not adapted to work online with a fairly high flow rate of new data for processing; -many of the existing computational intelligence systems cannot determine the evolving rules by which the system develops, and can also present the results of their work in terms of natural language.
Thus, the task of developing new methods (approaches, techniques) for training ANN, which will solve these difficulties, is relevant.

Literature review and problem statement
In [3], an analysis of the properties of ANN, which were used in predicting the air pollutant concentration, was carried out. The use of an extreme training machine for ANN is proposed, which provides high generalization efficiency at extremely high training rates. The disadvantages of the approach include the accumulation of ANN errors during calculations, the impossibility to choose the parameters and type of the membership function, the formation of «dead» neurons during training.
The work [4] presents the modeling of bank capital ma nagement adequacy. This modeling is based on trend forecasting models. A multilayer perceptron is used for calculations. The training of this perceptron is limited only to training synaptic weights, and only activated neurons. No other training mechanisms are presented in this study.
The work [5] presents an operational approach to spatial analysis in the maritime sector to quantify and reflect related ecosystem services. The disadvantages of this method include the impossibility of flexible adjustment (adaptation) of evaluation models while adding (excluding) indicators and changing their parameters (consistency and significance). Also, ANN training is limited to classical training of the weights of active neurons.
The work [6] presents a machine learning model for automatic identification of requests and providing information support services exchanged between members of the Internet community. This model is designed to process a large number of messages from users of social networks. The disadvantages of this model are the lack of mechanisms for assessing the adequacy of decisions and high computational complexity. Training is limited to the synaptic weights of the ANN.
The work [7] demonstrates the use of ANN to detect cardiac arrhythmias and other heart diseases. The backpropagation algorithm is used as a method of ANN training. The disadvantage of this approach is the limited training of only synaptic weights, without training the type and parameters of the membership function.
In [8], it is proposed to use ANN to detect avalanches. The backpropagation algorithm is used as a method of ANN training. The disadvantage of this approach is the limited training of only synaptic weights, without training the type and parameters of the membership function.
The work [9] presents the use of ANN to identify anomaly detection problems in home authorization systems. The «winner-take-all» algorithm is used as a method of Kohonen ANN training. The disadvantages of this approach are the accumulation of errors due to the presence of inactivated and dead neurons in the training process, the limited training of only synaptic weights and the need to store previously calculated data.
The work [10] presents the use of ANN to identify problems in detecting abnormalities in a human encephalogram. The method of fine-tuning of ANN parameters is used as a method of ANN training. The disadvantages of this approach are the accumulation of errors in the training process, the limited training of only synaptic weights without training the type and parameters of the membership function.
The work [12] presents the use of machine learning methods, namely ANN and genetic algorithms. A genetic algorithm is used as a method of ANN training. The disadvanetage of this approach is the limited training of only synaptic weights, without training the type and parameters of the membership function.
The work [13] presents the use of machine learning methods, namely ANN and differential search method. In the course of the study, a hybrid method of ANN training was detveloped, based on the use of backpropagation and differential search algorithms. The disadvantages of this approach are the limited training of only synaptic weights, without training the type and parameters of the membership function.
In [14], the methods of ANN training were developed using a combined approximation of the response surface, which provides the smallest training and forecasting errors. The disadvantage of this method is the accumulation of training errors and the inability to change the ANN architecture during training.
The work [15] shows the use of ANN to evaluate the efficiency of a unit using the previous time series of its performance. SBM (Stochastic Block Model) and DEA (Data Envelopment Analysis) models are used for ANN training. The disadvantages of this approach are the limited choice of network architecture and training of only synaptic weights.
The work [16] describes the use of ANN to evaluate geomechanical properties. The backpropagation algorithm is used as a method of ANN training. The characteristics of the backpropagation algorithm are improved by increasing the training sample. The disadvantages of this approach are the limited training of only synaptic weights, without training the type and parameters of the membership function.
The work [17] presents the use of ANN to assess traffic intensity. The backpropagation algorithm is used as a method of ANN training. The performance of the backpropagation algorithm is improved using bandwidth connections among each layer, so that each layer presents only a residual function relative to the results of the previous layer. The disadvantage of this approach is the limited training of only synaptic weights, without training the type and parameters of the membership function.
The analysis of scientific works [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17] showed that well-known training methods are used for artificial neural networks. These methods are usually focused on training synaptic weights or membership functions. The use of known algorithms (methods, techniques) for training artificial neural networks, even with improved characteristics, does not meet the existing and future requirements for them, namely: -increasing the amount of information that artificial neural networks can process; -increasing the reliability of decision-making of intelligent decision support systems; -increasing the speed of adaptation of the architecture and parameters of artificial neural networks in accordance with emerging tasks; -increasing the efficiency of systems with limited computing capabilities, such as on-board aircraft observers, by improving evaluation accuracy; -prevention of deadlocks in the training of artificial neural networks; -ensuring the predictability of the process of training artificial neural networks; -ensuring the unambiguity of decisions made by intelligent decision support systems; -ensuring the calculation of large data sets for a single epoch without saving previous calculations.

The aim and objectives of the study
The aim of the study is to develop a method for training artificial neural networks for intelligent decision support systems, which allows processing more information while making unambiguous decisions.
To achieve the aim, the following objectives were set: -to develop an algorithm for training artificial neural networks for intelligent decision support systems; -to evaluate the effectiveness of training artificial neural networks experimentally.

Materials and methods
During the study, the general provisions of artificial intelligence theory were used to solve the problem of training artificial neural networks in intelligent decision support systems. Thus, artificial intelligence theory is the basis of this study. The study used an advanced genetic algorithm and evolving artificial neural networks. Simulations were performed using the MathCad 2014 software (USA) and Intel Core i3 PC (USA).
The Kohonen network [2,[18][19][20][21][22][23][24] refers to self-organiz]ing networks. This means that they do not receive the desired output signal upon receipt of the input training vector, and as a result of training, the network divides the input signals into classes, thus forming topological maps.
It should be noted that the Kohonen self-organizing map implements the mapping of the input space of dimension n into the output space of dimension m.
Let us consider the self-organizing map architecture in more details. The network input receives an n-dimensional input signal. The network contains a single layer of m neurons that form rectangular lattices on the plane.
Neurons are characterized by their location in the network. Each neuron in the Kohonen layer is connected to each input of the zero (input) layer by direct connections and to all other neurons by cross connections. In the training process, neighboring neurons affect each other more strongly than those located further away. It is the lateral connections in the network that excite some neurons and inhibit others.
Each neuron of the Kohonen layer forms a weighted sum of signals: If the synapses are accelerating, then w ij > 0. If the synapses are inhibitory, then w ij < 0.
Given the above, the classic training procedure of the Kohonen network is to adjust the synaptic weights, without taking into account other network training opportunities, such as the type and parameters of the membership function and network architecture.

Results of research on the development of a method
for training artificial neural networks Fig. 2 shows the proposed algorithm for training an artificial neural network. The improvement of this training algorithm lies in improving procedures 2, 3, 8 of the previously developed method of training artificial neural networks [2,18,32].

1. Development of an algorithm for training artificial neural networks
Briefly, the main steps of the proposed method are: Step 1. Initialization of the initial values of synaptic weights.
Step 2. Determination of neuronal weights.
Step 3. Correction of neuronal weights and determination of neighborhood function.
Step 4. Formation of the first cluster.
Step 5. Checking the threshold value.
Step 6. Checking the architecture's capabilities to process the amount of information coming to its input.
Step 7. Evolution of the system architecture. We describe steps 2, 3 and 8 in detail. The essence of the improvement is genetic-competitive training, supplemented by the introduction of various strategies for genetic optimization of weights of «dead» neurons located in the source layer of the network. Also, the type of uncertainty of the training sample is additionally taken into account (the approach is detailed in [32]). The proposed stochastic optimization reduces the number of training echoes of the Kohonen network while reaching a given maximum value of vector quantization error and while constructing centroids, uncertainty coefficients (total uncertainty, partial uncertainty, full awareness) are additionally taken into account when selecting initial values of cluster centers.
Before starting the Kohonen network training algorithm, the input vectors are pre-normalized [33,34]: The Kohonen network training algorithm can be described as a sequence of steps: Step 1. Input of source data. At this stage, the initial values of synaptic weights w ij = 0 are initialized.
One commonly used initialization method is to assign synaptic weights values equal to randomly selected vectors from multiple observations.
Step 2. Determination of neuronal weights. At this stage, the normalized signal vector  x is fed to the input of the system and the weight vector (neuron) closest to  x is selected, i.e., the vector for which the Euclidean distance to x is the smallest: The following steps are performed: 2. 1. Setting the parameters of the Kohonen network (the size of the original network I × J, the number of training epochs T ≥ 1, the initial width of the neuron neighborhood σ 0 , the coefficients τ, κ 0 , η). Also, at this stage, the current width of the centroid edges is calculated: and σ σ t ( )= 0 for Т = 1.

(3)
Step 3. Correction of neuronal weights and determination of neighborhood function.
We denote as w ij the weight vector of the neuron, which has coordinates (i, j) on the input lattice of the Kohonen network (i is the row number, j is the column number). The training process is aimed at minimizing the half sum of squares of the distances between the input vectors  ,..., is the error of vector quantization [24]. Using the gradient descent method, we obtain the following formula for updating the weight vectors: where κ is some positive constant or function with the range of values [0, 1], setting the training rate. Note that in (5), only part of the vectors from the training sample with the smallest discrepancy between each of them w ij is used to update each particular weight vector w ij . In other words, the vector is modified if and only if it is closest to the training vector x k within a given metric space. The correction of the vector w ij is carried out by a value directly proportional to the difference between the input vector x k and the weight vector w ij . Thus, there is competition between the output lattice neurons for the right to be selected as the candidates closest to the input vector x k . The neuron satisfying this requirement is called the winning neuron with coordinates i j k k * * ( ) , . Note that when rationing the vectors w ij and x k , the minimization of E(w 11 ,…,w IJ ) is equivalent to maximizing the sum of their scalar products: To reduce competition between neurons, a rule is introduced that allows you to update not only the weights of the winning neuron, but also other neurons in its vicinity. To this end, the previously introduced characteristic function [(i, j = F IJ )(x k )] is replaced by an exponential Gaussian function, the value of which reflects the attenuating dependence of changes in neuronal weights with increasing distance from neurons to the winning neuron at the level of their coordinates on the original lattice. The closer the neuron is to the winning neuron, the greater the multiplicative factor of updating its weights. The parameter σ is called the effective neighborhood width [35][36][37] of the winning neuron, which can be interpreted as the current value of the neighborhood radius of the winning neuron. A feature of the Kohonen network training algorithm is a decrease in the value of σ over time: Here, the parameter σ 0 sets the initial value of the neighborhood radius of the winning neuron, which is usually set to I J 2 2 + . The parameter τ is selected so that in the last training epoch, the minimum number of weight vectors of neurons or only one vector of the winning neuron is updated. Thus, it was τ = ln(σ 0 ). The training rate factor is chosen so that in the initial epochs of the algorithm, the weight vectors of most neurons are updated at the highest rate. Then, as the number of epochs increased and the width of the neigh-borhood narrowed, fewer and fewer neuronal vectors were modified at a lower rate. This technique allows you to build clusters whose elements are first adapted to the general characteristics of the approximating set, and then specify its individual features. The most common representatives with such a characteristically descending dependence are the functions κ(t) = κ 0 (t+1) -1 , κ(t) = κ 0 •exp{-ηt} [35][36][37].
The essence of improving procedure 3 is to use the Kohonen network competitive training algorithm, supplemented by genetic operators: 3. 1. Initialization of the current set of active neurons V + : = ∅.

2. 4.
Determining the coordinates of the winning neuron for the vector x k : , a rg min .

5. Go to the next step 4 if the condition t T
≥ is met, otherwise go to step 2.

7.
Calculation of the activation threshold of the remaining neurons with coordinates: Step 6. Checking the architecture's capabilities to process the amount of information coming to its input with optimization of the artificial neural network architecture.
After modification of weights when presenting the training vector (step 3. 2. 8) or after performing several epochs of competitive training (step 3. 3), stochastic opti mization is additionally applied. This optimization is based on a genetic algorithm of stochastic optimization of the weights of certain neurons of the original lattices of the Kohonen map. To this aim, the weights of each neuron at the edges of the current winning neuron and never activated are given as a sequence of genes acting as the minimum unit for the input argument of the crossover operator (О 1 ). As a result of this operator, a pair of new chromosomes is formed, in which randomly selected regions of the chromosome gene are rearranged. Each gene is a set of bits that can be considered as a separate component of the vector associated with the corresponding winning neuron or one of the neurons in its vicinity. When using the mutation operator (О 2 ), there is a permutation of a pair of randomly selected bits within one gene, when using the inversion operator (О 3 ), there is an inversion of the value of the randomly selected bit. Both of these operators apply only to part of the mantissa of the 64-bit «objective gene».
To simulate the process of neuronal evolution, several methods have been developed to generate offspring. The first approach А 1 is to apply the operators of crossover, mutation or inversion of arbitrary neurons that are currently close to the winning neuron. The second approach А 2 is based on geometric considerations about the mutual position of neurons on the original lattice relative to the winning neuron. Since the weight vector of each neuron equidistant from the winning neuron is modified with the same coefficient as a result of the selected training algorithm, the crossover operator is proposed to be applied to such neurons. The third approach A 3 involves applying the crossover operator only to the most adapted individuals, while the mutation or inversion operators will be used for the remaining neurons. Here, the inverse value of the mean deviation was chosen as a fitness function when the given neuron with the weight w ij recognized elements of the training for step 3. 2. 8. In addition, several strategies were developed to select a neuron generation method: fixed choice G 1 of a certain approach, sequential or random selection of all approaches G 2 and the choice based on the roulette mechanism from [32]. In the G 3 strategy, the offspring of neurons generated with the chosen approach are more adapted than their predecessors. Then the probability of choosing such an approach in the future increases compared to other approaches, otherwise the probability of choosing it decreases. Each of the approaches А 1 , А 2 , А 3 is used to create two daughter Kohonen maps, in which only neurons from the environment of the winning neuron of the parent Kohonen map can be changed. These approaches can be described as a sequence of actions in which only step 3 differs: Step 6. 1. V: = ∅. Step to use twice the mutation or inversion operator to the neuron with coordinates (i′, j′), where 0 ≤ K ≤ 1 is a constant deteremining the relative number of neurons to which only the mutation or inversion operator should be applied).
Step 6. 4. Adding two new neurons generated by the О і* operator, one in each of the two Kohonen daughter maps, keeping each of these neurons in position (i′, j′) on the original lattice.
Step 6. 5. Adding coordinates (i′, j′) to the set: Step 6. 6. Go to step 6. 2, if V V ≠ † , otherwise stop. Each of the strategies G 1 , G 2 , G 3 manipulates the choice of approaches А 1 , А 2 , А 3 . The G 1 strategy is the most primitive of the analyzed and consists in choosing one of the three approaches А 1 , А 2 , А 3 in step 6. 1 of the algorithm of genetic-competitive training of the Kohonen network.
Therefore, if there are N observations and m clusters with centroids с j , calculations of all memberships and the adjusted coordinates of the centroids are estimated according to the relation: The system of equations (9) is essentially a batch information processing algorithm so that when the observation x(N+1) is received, all calculations must be performed again. With a sufficiently high data flow rate, the approach may be ineffective.
To this end, recurrent procedures should be developed that do not require storing previously used data. These recurrent procedures can be implemented on the basis of a twolayer adaptive neural fuzzy network with such an architecture.
The first hidden layer of the network is formed by ordinary Kohonen neurons N j K , connected by lateral connections, through which the competition process is realized. The source network layer, formed by nodes N j u , is designed to calculate the levels of membership of each observation x(k) to each j-th cluster, j = 1, 2, 3, …, m. To configure the centroids of clusters, a recurrent self-learning procedure is used, which has the form [10]: It is easy to see that the first expression (10) is a WTM self-learning rule with a narrowing neighborhood function ( ) ( ). k u k j + − 1 1 β Steps 4, 5 7 are described in detail in previous studies [18,32], and steps 2, 3, 6 are described respectively by expressions (1) to (10).

2. Experimental evaluation of the effectiveness of training artificial neural networks
Simulation of the proposed method in the MathCad 14 software environment is performed. For the experiment, we used a two-dimensional artificially generated data set with different degrees of uncertainty (for the convenience of calculation, 1,500 of each type of uncertainty). Thus, the generated data set consists of data: -that are fully reliable (full awareness of the object state), so intelligence data are complete and reliable; -partial uncertainty (missing information on individual assessment indicator); -complete uncertainty (no information about the monitoring object).
The data set contained 15 clusters with different levels of overlap. The data sample contained 4,500 observations. The data were submitted for processing in sequential mode.
To compare the quality of clustering, FCM and a system based on the evolving fuzzy clustering method (EFCM) with different threshold parameter values, EFCM system with the training method from [18,32], K-means++ and K-medoids (Partitioning Around Medoid) were used.
For the next experiment, a data set describing the characteristics of the monitored object was used. Before clustering, the observational features were normalized in the interval [0, 1].
The Xie-Beni index was used as a criterion for assessing clustering quality. Table 1 presents the comparative results of clustering. The study of the developed methodology showed that this training method provides on average 11-15 % higher efficiency of training artificial neural networks and does not accumulate training errors (Table 1).
These results can be seen in the last rows of Table 1 as the difference in the Xie-Beni index.

Discussion of the results of the development of artificial neural network training method
The advantages of this method are achieved by performing a sequence of additional procedures, namely 2, 3, 6, shown in Fig. 2.
Thus, there is an adjustment not only of synaptic weights, but also the architecture of artificial neural networks, which provides fewer training epochs.
The main advantages of the proposed evaluation method are as follows: -enables processing more information by reducing the advanced training algorithm; -increased reliability of the results, no accumulation of errors during training artificial neural networks. This is achieved through an improved procedure for correcting synaptic weights and the architecture of the artificial neural network; -wide scope of use (decision support systems); -easy mathematical calculations; -prevention of the «overtraining» phenomenon by adjusting the network architecture; -no need to save the results of previous calculations. The disadvantages of the proposed method include: -the need to use high-speed genetic algorithms to adjust synaptic weights; -lower accuracy of estimation on a separate estimation parameter; -loss of results accuracy during the readjustment of the artificial neural network architecture. This method will allow you: -to train artificial neural networks; -to make adjustments necessary to control the aircraft during refueling in the air and ensure the successful connection of the rod with the refueling cone; -to identify effective measures to increase the efficiency of training artificial neural networks through an improved procedure of synaptic weights and network architecture; -to reduce the use of computing resources of decision support systems; -to develop measures aimed at improving the efficiency of training artificial neural networks; -to increase the efficiency of information processing in artificial neural networks.
The limitations of this study include: -the need for reliable and complete training and test samples; -the need to take into account the time for collecting, processing and summarizing information; -to ensure a given degree of training efficiency in the software implementation of the method, it is necessary to take into account the computational capabilities of the hardware.
The method is proposed for use in intelligent decision support systems, which adjust the angular velocities of aircraft based on the specified and current angular velocities, stabilization of the set roll angle by forming the set value of angular velocity.
It is also proposed to use the method: -in the contour of the specified angular velocity (for example, the contour of angular velocity in roll) when synthesizing an algorithm for stabilizing a given angle; -when compensating for inconsistencies at the final stage of aircraft docking by forming a given angular pitch velocity of the aircraft state observer.
Areas of further research should be aimed at reducing computational costs when processing various types of data in special-purpose systems.

Conclusions
1. An algorithm for the method of training artificial neural networks for intelligent decision support systems has been developed.
The improvement of information processing efficiency and reduction of evaluation errors are achieved by: -training not only the synaptic weights of the artificial neural network, but also the type and parameters of the membership function; -training the architecture of artificial neural networks; -calculation of data for one epoch without having to store previous calculations. This reduces the processing time as there is no need to access the database; -no accumulation of errors during training artificial neural networks as a result of processing the information received at the input of artificial neural networks.
2. The application of the proposed method on the example of clustering of the monitored object is given. This example showed an increase in the efficiency of artificial neural networks at the level of 10-18 % by the Xie-Beni index and the efficiency of information processing using additional training procedures for artificial neural networks.