Development of Complex Methodology of Processing Heterogeneous Data in Intelligent Decision Support Systems

The complex methodology for processing heterogeneous data in intelligent decision support systems is developed. This method is made to increase the efficiency of processing heterogeneous data in intelligent decision support systems. The complex methodology consists of the following interrelated procedures: heterogeneous data storing model; heterogeneous data synchronization algorithm; heterogeneous data separation algorithm; heterogeneous data indexing algorithm. The model of storing heterogeneous intelligence data, which is the basis of the methodology, differs in the presence of templates of intelligence objects and parameter templates of intelligence objects. Templates allow storing both unstructured heterogeneous intelligence data and structured intelligence data according to a defined pattern, which reduces the time to access the data. In the heterogeneous intelligence data storage model, a heterogeneous intelligence data synchronization algorithm, heterogeneous intelligence data separation algorithm and heterogeneous intelligence data indexing algorithm are developed. The development of the proposed technique is due to the need to increase the efficiency of processing various information types in intelligent decision support systems with acceptable computational complexity. The proposed method allows increasing the efficiency of intelligent decision support systems through integrated processing of data circulating in them. The proposed method allows increasing the efficiency of information processing in decision support systems from 16 to 20 % depending on the amount of information about the monitoring object.


The complex methodology for processing heterogeneous data in intelligent decision support systems is developed. This method is made to increase the efficiency of processing heterogeneous data in intelligent decision support systems. The complex methodology consists of the following interrelated procedures: heterogeneous data storing model; heterogeneous data synchronization algorithm; heterogeneous data separation algorithm; heterogeneous data indexing algorithm. The model of storing heterogeneous intelligence data, which is the basis of the methodology, differs in the presence of templates of intelligence objects and parameter templates of intelligence objects. Templates allow storing both unstructured heterogeneous intelligence data and structured intelligence data according to a defined pattern, which reduces the time to access the data. In the heterogeneous intelligence data storage model, a heterogeneous intelligence data synchronization algorithm, heterogeneous intelligence data separation algorithm and heterogeneous intelligence data indexing algorithm are developed. The development of the proposed technique is due to the need to increase the efficiency of processing various information types in intelligent decision support systems with acceptable computational complexity. The proposed method allows increasing the efficiency of intelligent decision support systems through integrated processing of data circulating in them. The proposed method allows increasing the efficiency of information processing in decision support systems from 16 to 20 % depending on the amount of informa-
types of features is carried out. The essence of the proposed metric is that it allows you to build clustering, classification and association algorithms based on it, using classical processing methods.
The proposed metric is designed to assess the proximity of objects with given features, which allows reducing it to scalar numerical values. This makes it possible to reduce the problem to the classical numerical form and provides a fundamental opportunity to apply known methods and algorithms. However, this metric does not allow effective functioning in conditions of insufficient computing resources.
In [5], an approach to in-depth analysis of various types of data affecting the energy efficiency of buildings based on the representation of the hierarchy of factors in the form of a multidimensional cube with different levels of abstraction is proposed. This approach allows building a multi-level description of the object, but does not take into account the uncertainty about the state of the monitoring object, which does not allow a full assessment of its condition. This approach is focused on the use of sufficient computing resources.
In [6], an approach to processing various types of data obtained from an unmanned aerial vehicle, implemented in the GRASS GIS software environment, is presented. This approach is based on three-dimensional raster methods of image processing with a subsequent reduction of their redundancy. However, this approach is intended only for processing graphical information and does not take into account the type of uncertainty about the state of the monitoring object.
In [7], the description of the work of spatial and temporal degradation of soil erosion is carried out. The description of the soil erosion process is described using the method of erosion potential. This method is based on the analytical processing of various types of data on the factors influencing the erosion process. The method is characterized by a high degree of reliability, less computational complexity, simplicity and is suitable for use in GIS. However, this approach is intended only for processing various types of cartographic information with a sufficient number of available computing resources. This feature limits its scope.
In [8], the method of binary classification of social network users based on the method of logical regression is given. This method allows processing a variety of information about users of social networks. However, this method requires significant computing resources and does not take into account the uncertainty about the state of the monitoring object (in this case, social network users).
In [9], the problem of processing information from heterogeneous technical monitoring devices is considered. As a possible solution to the problem, the application of a generalized method of information processing based on the method of clustering of territorially combined monitoring information sources and the use of a frame model of the knowledge base for identifying monitoring objects is proposed. The clustering technique is formed on the basis of the Lance-Williams hierarchical agglomerative procedure using the Ward metric. The frame model of the knowledge base is built using object-oriented modeling tools. The disadvantages of the proposed generalized methodology include not taking into account the relative importance of events and the inability to work in a shortage of computing resources. Also, the disadvantages of this technique include the inability to redistribute computing resources between elements to increase the efficiency of information processing. tabases with the possibility of their graphical analysis and visualization [1,2].
Processing of different types of data from various sources of information extraction requires significant computational operations with strict restrictions on the time of calculations.
This leads to the search for new scientific approaches to processing different information types to increase the efficiency of information systems for information retrieval.
One of the ways to increase the efficiency of information systems for information retrieval is to increase the efficiency of decision support systems (DSS). DSS were especially widespread in the processing of large data sets, providing information support to the decision-making process of decision-makers.
The creation of intelligent DSS has become a natural continuation of the widespread use of classical DSS. Intelligent DSS provide information support for all production processes and services of enterprises (organizations, institutions).
Also, these intelligent DSS have been widely used to solve specific tasks of military purpose, for example [1,2]: -planning the deployment, operation of communication systems and data transmission; -automation of troops and weapons control; -collection, processing and generalization of intelligence information on the state of intelligence objects, etc.
When developing and making decisions in automated control systems (ACS), corresponding to the operational environment, the main role is played by their justification for the decision-maker (DM) information and knowledge. Information and knowledge must meet the requirements of completeness, reliability, adequacy and consistency.
These features force the use of currently very developed software and hardware and new information technologies -DSS. Extensive and effective use of these tools at control points at different levels is becoming a vital component for making informed decisions in the ACS.
To sum up, it is necessary to solve an urgent scientific problem, which consisits in the development of a comprehensive methodology for processing various types of data in intelligent decision support systems.

Literature review and problem statement
In [3], the analysis of known methods of processing various types of information is carried out. Also, it was found that the researchers identify a number of problems associated with the process of data extraction for further analytical processing: -data in sources, as a rule, are presented in various formats, coding and forms, thus the of analytical problems involves the use of a uniform, universal format that will be supported by a data warehouse and analytical applications; -for excessively detailed data contained in the sources, it is necessary to carry out cleaning and generalization. In this case, the methods and algorithms that are designed for this purpose are often more complex than the analysis algorithms themselves; -lack of integrated use of information processing and distribution methods.
In [4], the development of a generalized metric in the problem of analyzing multidimensional data with different to be analyzed and further processed and requires significant computing resources.
In [18], an approach to using GIS to process and present geotechnical data in formats that are useful to engineers, planners and land management professionals is described. This approach significantly reduces the time for processing data circulating in the GIS. However, this approach is intended exclusively for use in land management without taking into account the importance of information circulating in the system. This approach requires significant computing resources, is unable to distribute information to improve the efficiency of its processing.
The work [19] considers the use of GIS for processing various types of spatial data to optimize urban planning. This approach allows increasing the energy efficiency of urban planning, optimizing transport links and proposing strategies for city development. The proposed approach requires significant computing resources and does not take into account the degree of uncertainty of the data about the monitoring object.
In the work [20], the approach to processing various data types on the basis of GIS is presented. The essence of the proposed approach is to analyze the energy efficiency of buildings by technical, economic, environmental and social criteria. Energy balance and spatial analysis were performed on the basis of a geographic information system. This approach requires significant computing resources and complete information about the status of the monitoring object.
The work [21] presents an approach to processing various data on the state of the communication channel. The essence of the proposed approach consists in assessing the state of the communication channel and three indicators with different dimensions. This approach allows an integrated assessment of the state of the communication channel using fuzzy logic. This approach requires complete information about the state of the channel and accumulates an evaluation error during operation.
The analysis showed that the known methods (techniques): -have great computational complexity; -do not take into account the degree of awareness of the state of the monitoring object; -do not index heterogeneous data; -do not make synchronization of heterogeneous data; -do not allow complex processing and distribution of information about the state of the monitoring object.
Therefore, it is necessary to develop a comprehensive methodology for processing various types of data in intelligent decision support systems, which can effectively process and distribute large data sets in conditions of uncertainty, as well as shortage of computing resources.

The aim and objectives of the study
The aim of the study is to develop a comprehensive methodology for processing different types of data in intelligent decision support systems in the conditions of different types of data and uncertainty about the state of the monitoring object.
To achieve the aim, the following objectives were set: -to develop a model for storing heterogeneous data in accordance with the concept of data lake; -to develop an algorithm for synchronizing heterogeneous data; In [10], a method for processing different types of acoustic information from different sources of origin to identify the level of information security of unmanned autonomous objects is developed. This method is based on the use of a two-layer neural network with sigmoid hidden neurons. However, this method is intended only for working with acoustic information, requires significant computing resources and does not take into account the degree of uncertainty about the state of the monitoring object.
In [11], a method for determining the type of signal modulation based on a convolutional neural network by analyzing various signal parameters is proposed. This method is highly effective, but can only be used to solve radio monitoring problems, requires significant computing resources and does not take into account the degree of uncertainty about the state of the monitoring object.
In [12], a method of signal identification for unmanned aerial vehicles is proposed. The method is based on an artificial neural network and uses a knowledge base of radio wave propagation taking into account geographical coordinates. The disadvantages of this method are that the method is limited to solving radio monitoring problems, requires significant computing resources and does not take into account the degree of uncertainty about the state of the monitoring object.
In [13], the development of a methodology for correcting general geometric and topological errors in geographic information systems is carried out. This methodology is made to correct errors that occur while converting different types of data in geographic information systems from analog to digital. However, this methodology is not intended to work with other types of information than geometric, requires significant computing resources and does not take into account the degree of uncertainty about the state of the monitoring object.
In [14], an approach to the conversion of aerial photographs and satellite images is proposed, which is based on the results of their geomorphological, geobotanical, reclamation, erosion and other surveys in geographic information systems into a digital landscape model. However, this approach does not take into account other types of information circulating in geographic information systems and does not take into account uncertainty about the state of the monitoring object.
In [15], an intelligent system for processing various types of data circulating in geographic information systems is proposed. This intelligent system is designed to solve problems of geological exploration in geographic information systems. The essence of this approach is that the components of the target mineral system, which are compared, are converted into a set of maps, which leads to automatic updating of the map. However, this intelligent system is designed only to solve geological exploration problems and requires significant computing resources.
In [16], an approach to monitoring the state of the power grid using a geographic information system is proposed. The essence of this approach is that on the basis of comprehensive processing of information about the state of the grid, its integration occurs. However, this intelligent system is designed solely to monitor the state of the grid, requires significant computing resources and does not take into account the uncertainty about the state of the monitoring object.
In [17], the architecture of the integrated GIS platform architecture is proposed, which is designed to meet the requirements of processing and analysis of real-time spacetime data using cloud computing. This platform does not take into account the relative importance of events that have -to develop an algorithm for separating heterogeneous data; -to develop an algorithm for indexing heterogeneous data.

Development of a model for storing heterogeneous data
A comprehensive methodology for processing different types of data in intelligent decision support systems in the conditions of different types of data and uncertainty about the state of the monitoring object consists of the following interrelated procedures: -a model for storing heterogeneous data according to the concept of data lake; -an algorithm for synchronizing heterogeneous data; -an algorithm for separating heterogeneous data; -an algorithm for indexing heterogeneous data. These types of intelligence information are combined into the concept of data, due to the diversity of the processed information (geographical, textual, video, photo) as a decision support system.
In computer science and information technology, data are defined as: -subject to multiple interpretation of information presentation in a formalized form suitable for transmission or processing [23][24][25]; -forms of information presentation that information systems and their users deal with.
Thus, in decision support systems, the problem is complicated by the heterogeneity of data. Based on the analysis of the processes that occur in data processing, the following classification of data types used in decision support systems is identified. Table 1 shows the relationship of types of intelligence information to the corresponding classification of data types used in decision support systems.
According to the location of data sources, we should talk about concentrated data (source of information) and geographically distributed sources. The area involved in data collection and processing is called data fusion.
Accordingly, the data were classified by various criteria used in processing in decision support systems. The developed classification is presented in Fig. 1.
Depending on the data structure, data can be divided into 3 main classes: structured, unstructured and semi-structured.
Structured data are often stored in databases, where they are already available and processed in a fixed format. Otherwise, unstructured data are data of unknown structure. This form of data is characterized by a number of difficulties in processing and retrieving useful information. Typical examples of unstructured data are those that contain a combination of plain text files, images, and video files. Semi-structured data contain the described properties of structured and unstructured data. Increasing the number of data sources inevitably leads to an increase in data volume. As a result, the concept of "big data" has emerged. Big data are characterized by volume, speed of receipt, variety and variability.
To solve this problem, the concept of data storage by the principle of "data lake" is proposed. The idea of "data lake" is to store raw, unstructured, or semi-structured data in original format until needed. This will allow users to request smaller, more up-to-date and more flexible data sets. As a result, query time can be reduced to work in a data store, data warehouse or relational database.  As a result of the system analysis of the data collection process and structure, a model for storing heterogeneous intelligence data in accordance with the data lake concept is proposed.

{ } { } { }
Each data source can have many parameters with different types of intelligence data M i according to the Parameter Template templates. The structure of such a source of intelligence data ds is presented as follows: 1 2 , ,... .
The model of storing heterogeneous intelligence data, which is the basis of the methodology, differs from the existing ones by the presence of templates of intelligence objects and templates of intelligence object parameters. Templates allow distributed storage of both unstructured heterogeneous intelligence data and structured intelligence data according to a specific pattern, which reduces the time to access the data.
In order to increase the efficiency of synchronization of heterogeneous data, an algorithm for synchronization of heterogeneous data was developed.

Development of an algorithm for synchronizing heterogeneous data
The algorithm for synchronizing heterogeneous data is shown in Fig. 3. 1. A set of intelligence data P is obtained. At this stage, the value of uncertainty about information sources is taken into account, according to expressions (10) Each item in the resulting L list is a block of intelligence data in JSONObject format. Data on the intelligence object can be heterogeneous, which in some situations leads to significant difficulties in further analysis. Therefore, for ease of processing heterogeneous intelligence data, this paper proposes an algorithm for separating heterogeneous intelligence data, which is based on a parameter template.

Development of an algorithm for separating heterogeneous data
The algorithm for separating heterogeneous data is shown in Fig. 4. 1. The L list (result of executing the algorithm of heterogeneous intelligence data synchronization) is obtained.
2. Create an empty R JSONObject list, which contains data about the intelligence object according to the "Parameter Template".

Development of an algorithm for indexing heterogeneous data
To increase the efficiency of collection and storage of heterogeneous data, in the heterogeneous data storage model, an algorithm for indexing heterogeneous data is proposed. The efficient storage structure of heterogeneous data of data lakes allows convenient and fast access to storage for batch data processing.
Each file in the storage of heterogeneous intelligence data is a pair , , t ν where t is the intelligence data generation time, v is the value of intelligence data. Fig. 5 shows the created file that is indexed in the storage in accordance with the principle of indexing heterogeneous intelligence data in the data warehouse.
The explanation of the application of the proposed method of indexing heterogeneous intelligence data.
1. The_root_ to_ store_heterogeneous_ data is the root directory for storing heterogeneous data.
2. ot_IDn is the identifier of the n object template.
3. od_IDm is the identifier of the unique m object.
4. ds_IDk is the identifier of the unique source of this object to k.
5. p_IDl is the identifier of the unique parameter of l source.
6. f _IDp is the identifier of the unique file of р parameter.
For ease of reducing the data processing time, the file is given a name that contains the data block generation time t.  Let`s make the modeling of the complex methodology of processing data of different types in accordance with the algorithms of procedures in Fig. 2-5 and expressions (1)-(3). The technique of processing data of various types in intelligent decision support systems is proposed. The proposed technique is modeled in the Umbrello UML Modeller software environment (in terms of processing different data types) and Math-Cad 14 (in terms of computational complexity estimating).
The standard notation for modeling large information systems based on object-oriented methodology is the Unified Modeling Language (UML). One of the available tools is the Umbrello UML Modeller environment, which meets two key requirements: free and cross-platform. This application is free software made to build UML diagrams and supports all their standard types. Using the Umbrello UML Modeller environment, a frame knowledge base was built to identify the monitoring objects. The frame knowledge base was built as a UML class diagram, where each frame is represented as a class with its own attributes and procedures.
For this purpose, the method was modeled to determine the number of computing operations required for this method.
Initial data for assessing the condition of the monitoring object using the proposed method: -the number of sources of information about the condition of the monitoring object is 3 (radio monitoring, remote sensing devices and unmanned aerial vehicles); -the number of information signs, which determine the state of the monitoring monitoring is 13 (affiliation, type of organizational formation, priority, minimum width on the front, maximum width on the front, number of personnel, minimum depth on the flank, maximum depth on the flank, total number of personnel, number of samples of armaments and military equipment, the number of types of pieces of armaments and military equipment, number of communication devices, type of communication devices); -the options for organizational and staff formations are company, battalion and brigade.
To make the experiment, 8 sample frames were used in this method: "Monitoring object", "Unit", "Operational and tactical grouping of troops", "Brigade", "Battalion", "Company", "Detected object", "Selected cluster". Each of the sample frames (except for "Monitoring object" and "Unit") corresponds to instance frames in the form of information about specific divisions. The constructed dependency has a clear hierarchical structure and 3 types of relations between frames: generalization, association and dependence.
The "Subdivision" frame is related to the generalization relationship with the "Monitoring object" frame and is its descendant. The "Operational and tactical grouping of troops", "Brigade", "Battalion", "Company" frames are connected with the "Unit" frame and are its descendants. While filling the knowledge base, instance frames are created with a structure similar to the mentioned frames, but the slots of which are filled according to the information about specific divisions.
The "Detected object" frame is a generalized form that is filled at monitoring points (posts) and contains information about the class, type, coordinates and number (in the case of a group object). The specified frame is associated with a relationship to the "Selected cluster" frame. The slots of the "Selected cluster" frame are filled as a result of the procedure of clustering territorially combined monitoring information sources, the implementation method of which is given above. To determine the values of the slots, the attached procedures centerKoordFinding (finding the coordinates of the center of the selected cluster) and clus-terSizeFinding (finding the size of the selected cluster) are used. The resulting set of selected clusters, the information of which is represented by a set of instance frames of the type "Selected cluster", is subjected to the procedure of identification of monitoring objects. During the identification, the frames corresponding to the selected clusters are compared with the reference frames contained in the knowledge base.
The result is a conclusion about the cluster correspondence to a specific unit (or identification of a new monitoring object) and its current state.
Primary processing of information from information monitoring sources, filling of frames of the type "Detected object".
Clustering of identified monitoring information sources in accordance with the filling of frames of the "Selected cluster" type. Identification of selected clusters and formation of conclusions.
Based on these initial data, we obtain the distances between the points given in Table 1. Fig. 6 shows graphical estimates of the efficiency of data processing by the criterion of the number of computing operations of the method proposed in the paper. The comparison of the developed complex technique was carried out by comparing its efficiency with the known ones presented in [9,10,12]. Denote these works 1, 2 and 3, respectively.
From the given graphic dependencies, it is seen that the proposed technique allows increasing the efficiency of information processing (reducing the number of computing operations) from 16 to 20 % depending on the amount of information about the monitoring object. This gain is explained by a complex combination of not only taking into account different types of data on the state of the monitoring object, but also the procedures of synchronizing heterogeneous data, separating heterogeneous data and indexing heterogeneous data.
The difference of the proposed technique: -allows high-quality processing of large arrays of different types of data that have a numerical and quantitative nature; -has less computing complexity due to the redistribution of computing operations; -takes into account the degree of awareness about the condition of the monitoring object; -allows complex processing and distribution of information about the state of the monitoring object.
This method allows increasing the efficiency of processing different types of data by 16-20 %. The advantages of this technique are: -the possibility of rational allocation of resources of information systems to solve problems of responding to different events, taking into account their relative importance; -minimization of the total time required to perform the task of responding to another circumstance; -limiting the degree of human participation in the management cycle of complex processing resources, as well as automatic determination of the option of forming scenarios for solving monitoring tasks.
The limitations of this method include the need for communication channels with high reliability of information transmission and minimal delay. This is due to the need to process information in close to real-time mode and high requirements for the reliability of information circulating in special-purpose decision support systems.
The disadvantages of this method include the need to process large data sets to determine the state of the monitoring object.
The practical significance of the developed technique is the significant increase of the efficiency of integrated data processing in automated control systems.
It is proposed to use the proposed methodology in ACS DSS of artillery units, special-purpose geographic information systems, ACS DSS of aviation and air defense, as well as ACS DSS of logistics of the Armed Forces of Ukraine in the development of software for processing heterogeneous data and decision-making.
This research is a further development of research conducted by the authors, aimed at developing methodological principles for improving the efficiency of data processing in special-purpose geographic information systems, published earlier [26,27]. Areas of further research should be aimed at reducing computing costs when processing various types of data in special-purpose information systems.

Conclusions
1. The model for storing heterogeneous data in accordance with the concept of data lake is developed in the paper. The model of storing heterogeneous intelligence data, which is the basis of the methodology, differs from the existing ones in templates of intelligence objects and parameter templates of intelligence objects, which allow distributed storage of both unstructured heterogeneous intelligence data and structured intelligence data according to a certain scheme to reduce time spent on data access.
2. The algorithm for synchronizing heterogeneous data is developed and it takes into account the type of uncertainty about information sources and does not require editing da-  Fig. 6. Results of evaluating the effectiveness of the proposed algorithm tabase volumes and does not require accounting for changes in objects in a separate storage. 3. The algorithm for separating heterogeneous data, which allows the separation of both processed and unprocessed data with different units of measurement and origin is developed. It means that the separation occurs on a set of selection parameters that characterize the intelligence object.
4. The algorithm of indexing heterogeneous data, which allows indexing heterogeneous data (processed and unprocessed) for their batch processing is developed. Taken together, these scientific results are a comprehensive method of processing different data in intelligent decision support systems, which increases the efficiency of processing different types of information (reduces computational complexity) taking into account uncertainty about the state of the monitoring object, and allow complex processing and distribution of information on the condition of the monitoring object.