Development of a Methodology for Training Artificial Neural Networks for Intelligent Decision Support Systems

The method of training artificial neural networks for intelligent decision support systems is developed. A distinctive feature of the proposed method is that it provides training not only of the synaptic weights of the artificial neural network, but also the type and parameters of the membership function. If it is impossible to provide the specified quality of functioning of artificial neural networks due to the learning of the parameters of the artificial neural network, the architecture of artificial neural networks is trained. The choice of architecture, type and parameters of the membership function is based on the computing resources of the tool and taking into account the type and amount of information supplied to the input of the artificial neural network. Due to the use of the proposed methodology, there is no accumulation of errors of training artificial neural networks as a result of processing information that is fed to the input of artificial neural networks. Also, a distinctive feature of the developed method is that the preliminary calculation data are not required for data calculation. The development of the proposed methodology is due to the need to train artificial neural networks for intelligent decision support systems in order to process more information with the uniqueness of decisions made. According to the results of the study, it is found that the mentioned training method provides on average 10–18 % higher efficiency of training artificial neural networks and does not accumulate errors during training. This method will allow training artificial neural networks through the learning of parameters and architecture, identifying effective measures to improve the efficiency of artificial neural networks. This methodology will allow reducing the use of computing resources of decision support systems and developing measures aimed at improving the efficiency of training artificial neural networks; increasing the efficiency of information processing in artificial neural networks.


Introduction
Decision support systems (DSS) are actively used in all areas of human life. They received special distribution while processing large data sets, providing information support to the decision-making process by decision-makers.
The creation of intelligent DSS was a natural continuation of the widespread use of classic DSS. Intelligent DSS provide information support for all production processes and services of enterprises (organizations, institutions), including product design, manufacturing and marketing, financial and economic analysis, planning, personnel management, marketing, support for product creation (operation, repair) and prospective planning. Also, these intelligent DSS have been widely used for specific military tasks, namely [1,2]: -planning of deployment, operation of communication systems and data transmission; -automation of troops and weapons control; -collecting, processing, and generalizing the intelligence about the status of intelligence objects and others.
The main tool for solving computational and other problems in modern intelligent DMSS are evolving artificial neural networks (ANN).
The prospect of using evolving ANN is due to the fact that ANN that do not have the ability to evolve do not meet the requirements for data processing efficiency and training capabilities. Evolving ANN have universal approximation properties and fuzzy inference capabilities, making them widely used for solving various problems of data mining, identification, emulation, forecasting, intelligent management, etc. They provide stable performance in conditions of nonlinearity, uncertainty, stochasticity and randomness, various kinds of disturbances and interferences.
Despite their successful application to address a wide range of data mining tasks, these systems have several drawbacks associated with their use.
The most significant disadvantages are as follows: -complexity of system architecture choice. Generally, a model based on the principles of computational intelligence has a fixed architecture. In the context of ANN, the neural network has a fixed number of neurons and connections. In this regard, adapting the system to new data coming for processing that is different from the previous data may be problematic; -batch and multi-epoch training require considerable time resources. Such systems are not adapted to operate online with a sufficiently high rate of new data to be processed; -many of the existing systems of computational intelligence cannot determine the evolving rules by which the system develops, and can also represent the results of their work in terms of natural language. Thus, the urgent task is to develop new training methods for ANN that will solve these difficulties.

Literature review and problem statement
In [3], an analysis of the properties of ANN, which were used to predict the concentration of air pollutants is carried out. The work emphasized that ANN have a low convergence rate and a local minimum. It is suggested to use an extreme training machine for ANN, which provides high efficiency of generalization at extremely high speed of training. The disadvantages of this approach include the accumulation of ANN errors during the calculations, the inability to select parameters and the type of membership function.
In [4], modeling of the adequacy of banking capital management in Ukraine is presented. The foregoing modeling is based on trend forecasting models. The multilayer perceptron is used for calculations. The training of this perceptron is limited only by the training of synaptic weights.
In [5], an operational approach for spatial analysis in the maritime industry is presented to quantify and display related ecosystem services. This approach covers the three-dimensionality of the marine environment, considering separately all marine areas (sea surface, water column and seabed). In fact, the method builds 3-dimensional models of the sea by estimating and displaying each of the three marine domains by adopting representative metrics. The disadvantages of this method include the impossibility of flexible adjustment (adaptation) of estimation models while adding (excluding) indicators and changing their parameters (compatibility and significance of indicators).
The work [6] presents a machine learning model for the automatic identification of requests and provision of information support services that are exchanged between members of the Internet community. This model is designed to handle a large number of messages from social network users. The disadvantages of this model are the lack of mechanisms to evaluate the adequacy of decisions made and high computational complexity.
In [7], the use of ANN for the detection of heart rhythm abnormalities and other heart diseases is presented. The error propagation algorithm is used as the ANN training method. The disadvantage of this approach is the limited learning of only synaptic weights, without learning the type and parameters of the membership function.
In [8], the use of ANN for avalanche detection is presented. The error propagation algorithm is used as the ANN training method. The disadvantage of this approach is the limited learning of only synaptic weights, without learning the type and parameters of the membership function.
The work [9] presents the use of ANN to identify problems of anomaly detection in home authorization systems. The «winner-takes-all» method is used as a method of training for the Kohonen's ANN. The disadvantages of this approach are the accumulation of errors in the learning process, the limited learning of only synaptic weights, without learning the type and parameters of the membership function, as well as the need to store previously calculated data.
In [10], the use of ANN for identifying anomaly detection problems in human encephalograms is presented. The method of ANN training is the method of fine-tuning of the ANN parameters. The disadvantages of this approach are the accumulation of errors in the learning process, the limited learning of only synaptic weights, without learning the type and parameters of the membership function.
In [12], the use of machine learning methods, namely ANN and genetic algorithms, is presented. A genetic algorithm is used as a method of ANN training. The disadvantage of this approach is the limited learning of only synaptic weights, without learning the type and parameters of the membership function.
In [13], the use of machine learning methods, namely ANN and differential search method, is presented. During the research, a hybrid method of ANN training is developed, based on the use of the algorithm of backpropagation and differential search. The disadvantage of this approach is the limited learning of only synaptic weights, without learning the type and parameters of the membership function.
In [14], the development of ANN training methods using a combined approximation of the response surface, which provides the least learning and forecasting errors, was carried out. The disadvantages of this method are the accumulation of errors during training and the inability to change the ANN architecture during training.
The work [15] describes the use of ANN to evaluate the performance of the unit using the previous time series of its performance. SBM (Stochastic Block Model) and DEA (Data Envelopment Analysis) models are used for ANN training. The disadvantages of this approach are the limitations in the choice of network architecture and training of only synaptic weights.
In [16], the use of ANN for the estimation of geomechanical properties is presented. The error propagation algorithm is used as the ANN training method. Improving the characteristics of the error propagation algorithm is achieved by increasing the training sample. The disadvantage of this approach is the limited learning of only synaptic weights without learning the type and parameters of the membership function.
The work [17] describes the use of ANN for the estimation of traffic intensity. The error propagation algorithm is used as the ANN training method. Improving the characteristics of the error backpropagation algorithm is achieved by using bandwidth connections between each layer so that each layer sets only a residual function over the results of the previous layer. The disadvantage of this approach is the limited learning of only synaptic weights without learning the type and parameters of the membership function.
The analysis of scientific works [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17] showed that the well-known training methods are used to train artificial neural networks. These methods tend to focus on the training of synaptic weights or membership functions. The use of known algorithms (methods, techniques) for training artificial neural networks, even with improved characteristics, does not meet the requirements, namely: -increasing the amount of information that can be processed by artificial neural networks; -increasing the reliability of decision making by intelligent decision support systems; -increasing the speed of adaptation of the architecture and parameters of artificial neural networks in accordance with the emerging tasks; -prevention of deadlock situations during training artificial neural networks; -ensuring the predictability of the learning process of artificial neural networks; -ensuring the uniqueness of decisions made by intelligent decision support systems; -ensuring that large data sets are computed in one epoch without storing previous calculations.

The aim and objectives of the study
The aim of the study is to develop a methodology for training artificial neural networks for intelligent decision support systems, which allows processing more information, with the uniqueness of decisions made.
To achieve this goal, the following objectives were set: -to develop an algorithm for training artificial neural networks for intelligent decision support systems; -to give an example of practical application of the proposed methodology.

Materials and methods of the research
The Kohonen network [2] refers to self-organizing networks. This means that they do not receive the desired out-put signal when the input learning vector is received, and as a result of the training, the network divides the input signals into classes, thus forming topological maps.
It should be mentioned that T. Kohonen's self-organizing map displays the input space of dimension n into the output space of dimension m.
The self-organizing map has a very simple architecture with direct information transfer. In addition to the zero (receptor) layer, it contains a single layer of neurons, which is very often called the Kohonen layer [2].
Let us take a closer look at the self-organizing map architecture. The n-dimensional input signal arrives at the network input. The network contains a single layer of m neurons that form a rectangular array on the plane.
Neurons are characterized by their location in the network. Each Kohonen layer neuron is connected to each input of the zero (input) layer by direct connections, as well as to all other neurons by cross connections. Fig. 1 shows the 1D Kohonen's map. In the course of learning, neighboring neurons influence each other stronger than those located further away. Exactly lateral connections in the network provide excitation of some neurons and inhibition of others.
Each neuron from the Kohonen layer generates a weighted sum of signals: If the synapses accelerate, then w ij > 0. If the synapses inhibit, then w ij < 0.

Development of an algorithm for training artificial neural networks for intelligent decision support systems
-learning the architecture of artificial neural networks, depending on the amount of input information (number of layers, number of hidden layers, number of connections between neurons in a layer and between layers); -learning the type and parameters of the membership function.
Step 1. The initial step is to initialize the initial values of synaptic weights.
Step 2. Determination of neuron weights.
Step 3. Correction of neuron weights and determination of neighborhood function.
Step 4. Formation of the first cluster.
Step 5. Verification of the threshold value.
Step 6. Checking the architecture's ability to handle the amount of information entering its input.
Step 7. Evolution of the system architecture.
The Kohonen network training algorithm itself can be described as a sequence of steps.
Step 1: initialize the initial values of synaptic weights w ij = 0. One commonly used initialization method is the assignment of values equal to randomly selected vectors from mul-tiple observations to synaptic weights. In general, there are three main ways to initialize initial weights :  -initialization by random values, when all weights are  given small random values;  -initialization by examples when the values of randomly  selected examples from the training sample are given as  initial values; -linear initialization. In this case, the weights are initiated by the values of the linearly ordered vectors along a linear subspace passing between the two principal eigenvectors of the original data set.
Step 2: a normalized signal vector  x is fed to the system input and a vector of weights (neurons) closest to  x, that is the vector, for which the Euclidean distance to  x is the smallest, is selected: This expression can also be written in the following form: Step 3: correction (adjustment) of the vector of synaptic weights w ij according to the rule: where r i is the vector that determines the position of the i-th neuron in the lattice, r j is the vector that determines the position of the j-th neuron in the lattice. Obviously, for the neuron-winner: Most often, the Gaussian is chosen as a neighborhood option. In this case, the network output signals are defined as follows: Then, steps 1 and 2 are repeated until the initial network values are stabilized with the specified accuracy. The meaning of tuning by expression (4) is to minimize the difference between the input vector  x k ( ) and the vector of the synaptic weights of the neuron-winner. In other words, in the process of tuning, this algorithm «brings» the vector of synaptic weights of the neuron-winner to the current input image  x k ( ). In this case, there is a vector  x k w k ij ( )− ( ), which is then reduced by a value η k ( ) that sets the learning speed. Thus, it can be said that training is reduced to rotating the vector of the neuron weights in the direction of the input vector without significantly changing its length. The work [18] proposed an original online evolving fuzzy clustering method (EFCM) based on a probabilistic approach to solve a problem. The main parameter that ultimately determines the final result is the radius of the formed clusters, selected from empirical considerations and ultimately determines the number of possible classes. Despite the effectiveness of fuzzy clustering (FC) probabilistic algorithms, their «weakness» is the «hard» restriction: 12 , ,..., .
This disadvantage is not present in the so-called (possibilistic) fuzzy clustering approach (PCM) [17] associated with the optimization of the objective function. However, the observation also belongs to all classes, that is, is equidistant from all centroids, but does not belong to any of the clusters. , ,..., is the center of gravity of the j-th cluster, which is calculated during data processing; β > 1 is the fuzzyfication parameter («fuzzifier»), which determines the «blurring» of boundaries between clusters and is usually relied upon β = 2; µ j > 1 is the scalar parameter that determines the distance at which the membership takes a value of 0.5, if then µ j k ( )= 0 5 . . Using the possibilistic approach leads to the evolving fuzzy clustering method (EFCM), which is conveniently written in the form of a sequence of steps.
Step 4: upon receipt of the x 1 ( ) observation, the first cluster with the с 1 centroid is formed.
Step 5: upon receipt of the x 1 ( ) observation, the following condition is checked: where the threshold Δ is set a priori.
If this condition is fulfilled, then the x 1 ( ) observation does not form a new center of gravity, meaning that it must belong to the first cluster with the membership level: If the condition is fulfilled: then the centroid is corrected according to the WTA (winner-takes-all) Kohonen self-learning rule [18]: ( ), the following inequality holds: then a second cluster with a centroid is formed: c x 2 2 = ( ). (15) In this case, the membership levels u 2 1 ( ) and u 1 2 ( ) must be calculated by the formulas below.
Step 6: Checking the architecture's ability to handle the amount of information entering its input.
Therefore, if there are N observations and m clusters with c j centroids, the calculations of all the members and the adjusted coordinates of the centroids are estimated according to the relation: we obtain as a result of minimization (8) for all evaluated parameters.
The system of equations (16) is essentially a batch algorithm for processing information so that upon receipt of the observation x N + ( ) 1 , all calculations must be performed again. It is clear that at a sufficiently high data rate, the approach may be ineffective.
For this purpose, it is necessary to develop recurrent procedures that do not require storing previously processed data. These recurrent procedures can be implemented on the basis of a two-layer adaptive neuro-fuzzy network with the following architecture.
The first hidden layer of the network is formed by the ordinary Kohonen neurons N j K , connected by lateral connections through which the competition process takes place. The output layer of the network, formed by the nodes N j u , is designed to calculate the levels of membership x k ( ) of each observation in each j-th cluster, j m = 1 2 3 , , , ..., . To set up the cluster centroids, a recurrent self-learning procedure is used, which has the form [10]: It is easy to see that the first expression (17)  Step 7. Evolution of the system architecture.
Electronic copy available at: https://ssrn.com/abstract=3708230 The process of system evolution, like the previous one, begins with a single Kohonen neuron that specifies the coordinates of the first с 1 centroid. The following neuron is added to the network when the condition (14) holds, which in this case takes the form: The neuron with a centroid c k x k 2 ( )= ( ) is formed. It should be mentioned that since the data is pre-normalized to the hyperspace in the Kohonen neural networks, then inequality (14), which determines the need for the introduction of new neurons into the network, has a form: or Thus, the build-up of the architecture occurs as a result of the constant control of inequalities (29) or (21), which occurs if these inequalities are violated.
Using the possibilistic approach, it is advisable to implement another «branch of evolution», namely, if at some point in time it turns out that the observation membership x k ( ) do not exceed some threshold value: The popular Xie-Beni index [10] can be used to estimate the quality of fuzzy clustering [14]. For a fixed sample with N observations, the index has the form: For online processing, this index, like centroid clusters, can also be calculated recurrently: Adding the expression (23) to the learning procedure (17) will allow additional control over the number of clusters formed by the system. Thus, entering the third threshold d and checking the condition: at every step make it is possible to stop the process of neurons build-up in the event of a violation of inequality (26).

Example of the practical use of the proposed method
The method of training artificial neural networks for intelligent decision support systems is proposed. The proposed methodology is simulated in the MathCad 14 software environment.
A two-dimensional artificially generated dataset was used for the experiment. It contained 15 clusters with different levels of overlap. The data sample contained 5,000 observations. Data were submitted for sequential processing.
Consider the example of setting up cluster centers for the evolving Kohonen neural network to solve clustering problems.
In the first stage of setting up centers, there are three situations: 1. For the distance between the new observation and the centers of the existing clusters, the condition is satisfied: In this case, the clusters mix their centers towards the new observation. It should be mentioned that the movement of the cluster center depends on the parameter η(k).
When a new observation arrives, the cluster center shifts toward that observation. In this example, the parameter η(k) = 0.5.
2. For the distance between the new observation and the centers of the existing clusters, the condition is satisfied: In this case, a new cluster is created. The new observation becomes the center of the new cluster.
3. For the distance between the new observation and the centers of the existing clusters, the condition is satisfied: In this case, this observation belongs to existing clusters, and the centers of these clusters do not change.
To compare the quality of clustering, FCM and a system based on the evolving fuzzy clustering method (EFCM) with different threshold parameter values were used.
The Xie-Beni index was used as the criterion for assessing the quality of clustering. Table 1 presents the comparative results of clustering.
For the next experiment, a data sample describing the characteristics of the monitoring object was used ( Table 2)   The parameters of the analyzed systems, as well as the number of identified clusters, are also given in Table 2. The Xie-Beni (XB) index was used to evaluate the quality of the systems. It should be mentioned that the proposed system showed a better PC (partition coefficient) result than the EFCM and a better running time result than the FCM. The proposed systems and FCM have identified three fuzzy clusters.
The research of the developed methodology showed that the mentioned training method provides on average 10-18 % higher efficiency of training of artificial neural networks and does not accumulate errors during training (Tables 1, 2) These results can be seen from the results in the last lines of Table 1, 2, as the difference of the Xie-Beni index.

Discussion of the results on the development of methods for training artificial neural networks
The advantages of this method are achieved by performing a series of additional procedures, namely 3-8, which are shown in Fig. 2.
Thus, not only synaptic weights, but also other parameters of artificial neural networks are adjusted, which provides more accurate tuning of artificial neural networks.
The main advantages of the proposed evaluation methodology are: -processing more information by reducing the improved learning algorithm; -increased reliability of the obtained results of the absence of learning error accumulation during training artificial neural networks. This is achieved by adjusting the parameters and architecture of the artificial neural network; -wide scope (decision support systems); -simplicity of mathematical calculations; -prevents the phenomenon of «retraining» by adjusting the network architecture; -no need to store the results of previous calculations. The disadvantages of the proposed methodology include: -loss of information in the estimation (forecasting) due to the construction of the membership function. This loss of information can be reduced by choosing the type of membership function and its parameters in the practical implementation of the proposed methodology in decision support systems. The choice of the membership function depends on the computing resources of a particular electronic computing device.
-lower accuracy of estimation on a separate parameter of state estimation; -loss of accuracy of results during the reorganization of the artificial neural network architecture.
The method will allow: -training artificial neural networks; -identifying effective measures to improve the efficiency of artificial neural networks; -increasing the efficiency of artificial neural networks by learning the parameters and architecture of networks; -reducing the use of computing resources of decision support systems; -developing measures aimed at improving the efficiency of training artificial neural networks; -increasing the efficiency of information processing in artificial neural networks.
The areas of further research should be aimed at reducing the computational cost of processing different data types in special-purpose systems.

Conclusions
1. The algorithm of training artificial neural networks for intelligent decision support systems is developed.
Improving the efficiency of information processing and reducing the estimation error are achieved by: -training not only of the synaptic weights of the artificial neural network, but also the type and parameters of the membership function; -training the architecture of artificial neural networks; -calculating data for one epoch without having to store previous calculations. This reduces the time of informa-tion processing due to the absence of the need to access the database; -no accumulation of errors of training artificial neural networks as a result of processing the information coming to the input of artificial neural networks.
2. An example of using the proposed methodology on the example of clustering of the monitoring object is given. This example showed an increase in the efficiency of artificial neural networks at the level of 10-18 % by the Xie-Beni index and efficiency of information processing through the use of additional training procedures for artificial neural networks.