IMPROVED ALGORITHM FOR MATCHED-PAIRS SELECTION OF INFORMATIVE FEATURES IN THE PROBLEMS OF RECOGNITION OF COMPLEX SYSTEM STATES

V o l o d y m y r O s y p e n k o Doctor of Technical Sciences, Professor* Е-mail: vvo7@ukr.net B o r y s Z l o t e n k o Doctor of Technical Sciences, Professor, Head of Department* Е-mail: zlotenco@ukr.net T e t i a n a K u l i k Doctor of Technical Sciences, Associate Professor* Е-mail: t-81@ukr.net S v i t l a n a D e m i s h o n k o v a PhD, Associate Professor* Е-mail: mashuk2007@ukr.net O l e h S y n y u k Doctor of Technical Sciences, Professor** Е-mail: synoleg@ukr.net V o l o d y m y r O n o f r i i c h u k PhD** Е-mail: volodymyronofriychuck@gmail.com S v i t l a n a S m u t k o PhD, Associate Professor** Е-mail: svsmutko@gmail.com *Department of Computer Engineering and Electromechanics Kyiv National University of Technologies and Design Nemirovycha-Danchenka str., 2, Kyiv, Ukraine, 01011 **Department of Machines and Apparatuses, Electromechanical and Power Systems Khmelnytskyi National University Instytutska str., 11, Khmelnytskyi, Ukraine, 29016 The problem of computer diagnostics of complex systems is one of the non-tri vial tasks of modern information technology. Such systems are, for example, computer networks, automatic and/or automated control systems for complex technological objects, including related to complex problems of environmental protection, biology, etc. In pattern recognition, one of the major problems is forming subspaces of informative features, which only in the «ensemble» allow diagnosing the states of such systems with a high degree of reliability. An effective approach to solving this problem based on the principles of inductive modeling of complex systems is proposed. The quality criterion for recognizing classes of patterns is formulated, which also makes it possible to evaluate the quality of the constructed ensemble of informative features. As an example, the problem of constructing an ensemble of informative features represented by a binary code based on the data of an experiment to determine the hazard levels of some plant protection products is considered. Real primary data on plant protection products used in practice were applied to recognize the effect of certain characteristics on the so-called integrated «hazard indicator». Comparative numerical estimates of the effectiveness of the proposed approach are given. In this case, there can be a fivefold gain in the amount of computations for a relatively small number of input features equal to 5 compared to the known algorithms of the class considered in the paper. It is shown that, from a practical point of view, the described algorithm has advantages over the known algorithms with brute-force search of feature subspaces in pattern recognition problems


Introduction
The problem of computer diagnostics of complex systems is one of the non-trivial tasks of modern information technology. Such systems include computer networks, complex technical architectures of information processing systems (more about the hardware part of such systems), automatic (automated) control systems for complex technological objects, complex electromechanical systems, etc. Systems related to complex problems of environmental protection, medicine, biology, agriculture are also complex systems that need constant diagnostics. This area includes a wide scientific and applied direction of pattern recognition, where one of the major problems is forming subspaces of informative features. Only in the «ensemble» these features allow performing the functions of diagnosing the states of such systems with a high degree of reliability. However, real complex technical (technological) systems can be described by a large number of characteristics (parameters), which, in turn, may have a hierar chical structure. That is, one parameter may consist of several subparameters, which may have different effects on the state or behavior of the object under study.
In such cases, there is a problem of the need to work with multidimensional spaces of parameters (features -in terms of pattern recognition). Current (or predicted) or perhaps even critical states of modern complex systems can be described by multidimensional feature spaces x X i n i ∈ = { } , , ,..., . 1 2 At the same time, the effectiveness of recognition (diagnosis) of a particular state may not necessarily depend on the entire available a priori set of features. Only a certain part of it can be decisive -the subspace of informative features for a given task -so-called «ensembles» of informative features x X X i n i * * , , ,..., . ∈ ⊂ = { } 1 2 This means that precisely such features and precisely in such a «composition» make it possible to best recognize a situation that has developed in a complex technological object.
The solution to any recognition problem is directly related to the problem of finding a relevant feature system. Obviously, this is due to the huge variety of applied areas, which differ significantly in nature (physical, material, informational, biological, economic, etc.) and require the use of modern methods and tools of pattern recognition theory.
It is known that with the exhaustive search for all feature subspaces in an available input set, it is necessary to perform a fairly large amount of computations. For example, let the input set have 50 features and the maximum number of features in all possible options of feature subspaces would have a maximum of only 5 (n = 5) parameters in the input data set. Then the number of options that need to be created and tested in order to select the optimal «ensemble» of informative features will reach 2,369,935 attempts. This, of course, will require numerous additional mathematical operations and, probably, the application of such an exhaustive search method in operational control problems of complex technological systems can be quite complicated and inconvenient.
Therefore, it is obvious that the problem of constructing (or selecting) feature subspaces in real applied pattern recognition problems remains very relevant today.

Literature review and problem statement
The effectiveness of solving a pattern recognition problem is usually evaluated by special quality (accuracy) criteria of recognition on a test data sample. A measure of feature informativeness can be a value that quantifies the ability of such a feature to recognize classes of patterns k r ∈K, where K is the number of clusters specified or formed in the process of recognition.
There are many approaches to assessing the informativeness of features, both in the practical and theoretical plane of pattern recognition theory. For example, [1] presents an approach to selecting a subset of features using a genetic algorithm. The feasibility of this approach for selecting a subset in the automated design of neural networks for pattern classification and knowledge discovery is demonstrated. A sequential scheme for selecting factors was applied. In [2], an attempt was made to apply self-organization methods to the problem of constructing a subset of features. However, the basic principles of computer self-organization of models in the construction of features are not applied. The paper [3] presents statistical criteria for assessing the informativeness of features of radiation sources of telecommunications networks and systems during their recognition. In [4], a rather wide set of statistical criteria for the features described by real numbers is presented. Evaluation here is also performed sequentially by the brute-force search for primary features, which, according to the authors, makes it possible to determine priorities of features and highlight the most informative ones. The paper [5] presents a semantic and [6] statistical approach to reducing spaces of input features to subspaces of their informative subsets. The works [7][8][9] are to some extent encyclopedic publications on pattern recognition in many areas in this powerful direction. Of course, the problems of constructing subsets of informative features for solving recognition problems are also covered.
It should be noted that these approaches can be quite effective with a small number of input features. For example, combinatorial or similar methods of search for all possible combinations of ensembles do well with the number of input features n ≤ 20. However, in some practical recognition problems, in particular and especially with binary descriptions, this number reaches hundreds and even thousands with limited capabilities of computer systems. The authors of [10] proposed an algorithm for selecting an ensemble of features, which applies the basic principles of the self-organization theory. This expanded the possibilities of selecting an ensemble of informative features compared to exhaustive combinatorial search, but since the advent of the algorithm described in [10], the complexity of problems, of course, has increased significantly.
Thus, further development of tools to reduce the amount and time of computation is important in the general field of pattern recognition theory.

The aim and objectives of the study
The aim of the study is to develop an improved algorithm for constructing an «ensemble» of informative binary features using the basic principles of inductive modeling of complex systems for pattern recognition problems.
To achieve the aim, the following objectives were set: -to develop an improved algorithm for matched-pairs selection of informative features with step-by-step multi-row «selection» of intermediate results; -to develop a criterion for assessing the quality of formed subspaces of informative features for specific problems; -to conduct an experimental interpretation of the algorithm for forming «ensembles» of informative features to confirm its efficiency.

Research materials and methods
The research is based on a methodology that can be formulated as inductive modeling of complex systems (IMCS) based on input experimental data with interference. This methodology, in addition to many other applications, is aimed at solving pattern recognition problems in various fields and, particularly, in the field of innovative design of complex systems. The proposed algorithm uses the IMCS principles, in particular, the architecture of multi-row algorithms Group Method of Data Handling (GMDH) [11,12] and this is its difference.
As is known [12], the IMCS methodology is based on three fundamental principles borrowed from different scientific fields, but organically created a holistic system of provisions. These principles can be formulated as follows: 1) the principle of heuristic self-organization, i. e. search for many candidate models and selection of the best ones by appropriately constructed so-called external model selection criteria («selection hypothesis»); 2) the principle of external compliment, i. e. the need to use «fresh information» in order to objectively verify models according to special criteria of regularity (accuracy); 3 Although the roots of such methods for solving recognition problems date back to the seventies and eighties of the twentieth century, for example [15], as of today they have been sufficiently developed in both theoretical and applied aspects. For example, the work [16] can be considered an encyclopedic collection of basic GMDH algorithms, including those that use schemes of multi-row inductive modeling algorithms. Most of these algorithms introduce the so-called structural-parametric identification of models of complex systems with automatic selection of subsets of informative parameters. The works [17,18] also apply the principles of computer inductive modeling for clustering problems, including those that operate with large (several hundred, for example) dimensions of input feature spaces. However, the approaches presented in these works do not apply matching of features.
In general, practical applications of the IMCS methodology, in particular GMDH, have shown its effectiveness in various areas for problems with high levels of interference [12]. In this paper, this powerful direction of computer modeling in terms of using a multi-row architecture to build computational algorithms has also found direct application.
The computer experiment used the SELECT computer program developed by the authors, which implements a multirow algorithm for matched-pairs selection of informative features. Some source materials for the experimental study of the proposed algorithm are taken from open sources -the statistical yearbook of the Food and Agriculture Organization of the United Nations (FAO) followed by processing (binarization).

1. Multirow algorithm for matchedpairs selection of informative features
Classically, the decomposition of a general pattern recognition problem includes the following subtasks: 1) generating a set of primary features; 2) selection of a subset (subspace) of informative features; 3) construction of a decision rule or classifier; 4) assessment of recognition quality (usually on examination data samples).
In general, the problem of selecting informative features for further synthesis of decision (recognition) rules can be formulated as follows.
Let be a sample of input (a priori) data given as: , ,..., ; , ,..., , { } is the array of values of input features of the object or process under study; i is the number of features in the set, j is the number of patterns (images, instances) in the given sample. It is necessary to select the combination of features X * from the original array of features X, which provides a minimum of a given evaluation criterion of the constructed set of features, which is conditionally written as:

1. 1. Criteria used
The proposed algorithm uses the so-called criterion «number of resolved disputes», which was formulated in [9], as the main one. This criterion allows distinguishing patterns at the information level. Table 1 shows two classes of patterns R 1 and R 2 . Note that a set of images that can be divided into (or which can objectively highlight) more than two classes can be reduced to a set with two classes. To do this, the first class R includes all images of some class, and the so-called «non-R class» Rall other images of the original set. Recognition of class R is carried out as if against the background of «non-R class» R. Table 1 Illustration of the optimization criterion «number of resolved disputes»

Patterns Features
Patterns Features Here ω i (or ω j ) is the i-th (or j-th) image vector of the sample set; x k is the k-th component (feature) of this vector; ω i R ∈ , ω j R ∈ . The «dispute resolution matrix» for the feature x k is the following matrix: where: i n = 1 2 , ,..., ; j m = 1 2 , ,..., ; i j ≠ ; k K = 1 2 , ,..., .
In this example, for the features x 1 , x 2 , x 3 , we have the following matrices: The multiplicity of dispute resolution is the value (denoted as q min ) corresponding to the minimum term a i,j in (3).
The criterion «number of resolved disputes» in this case requires choosing the matrix (and hence the feature), where [8]: That is, for some primary feature x i , the dispute resolution matrix with greater q min is better, which follows from natural prerequisites for solving recognition problems in the presence of interference.
As an additional criterion, the following one is expedient: which requires choosing the dispute resolution matrix, for which the number of elements corresponding to the multiplicity q min is minimal. That is, it is actually a system of criteria for selecting the best options of feature subspaces: In this example, from the constructed matrices for features x 1 , x 2 , x 3 , it can be seen that the primary features do not allow distinguishing patterns of class R 1 from patterns of class R 2 . This is explained by the fact that their matrices contain zero terms -unresolved disputes, i. e. there are zero elements in the corresponding «dispute resolution» matrices (3). The results can be improved by «overlapping» (a kind of element-by-element summation) matrices (3) by two, three, etc. In this case, the following options are possible: The choice, obviously, will be in favor of the ensemble x x In addition, the number of features in this ensemble to successfully recognize images of the two classes R 1 and R 2 is less (n * = 2) than in the ensemble x x x which the values of the criteria are the same. That is, the chosen ensemble allows solving the problem with a smaller number of features, where n * corresponds to the optimal number of features in the informative ensemble X * { } for this problem.

1. 2. Description of the multirow matchedpairs feature selection algorithm
Unlike algorithms with brute-force search of dispute resolution matrices mentioned above, this algorithm performs their matching (search). Note that as a result of the algorithm, several matrices can be constructed in which the values of criterion (7) will be equal. In such a rare case, a matrix is chosen where the number of the next ascending value of q min would be minimal. The algorithm consists of the following blocks.
А1 -rejecting obviously non-informative features. Noninformative features are those having the same value in all images, i. e. such that: where x j i is the i-th feature of the j-th image, m is the total number of images in the original set, n is the input number of features.

А2 -rejecting images with the same feature vectors as non-informative in advance.
А3 -constructing dispute resolution matrices for primary features =1,2,..., i n , using rule (4), where  n takes into account possible exclusions of features in block А2.

(9)
А8 -selection of the final ensemble of features. In the conditions of the last two blocks, the selection stop rule is set: on the last selection row, not F best pairs of features in terms of (5), (6) are chosen, but only one. Given block А6, it is possible to find the ensemble in terms of the input feature space. Thus, the constructed and selected ensemble X * allows distinguishing images of class R k against all other images of class R k .
Operations А3-А8 are repeated as many times as specified by the experts of classes in the input set of images.

2. Quality assessment of the selected ensemble of informative features
Assessment of the informativeness of the obtained ensemble of informative features is made by the minimum of functionality, reflecting the accuracy of object recognition in the test sample. To decide whether the control sample ω 1 k belongs to a certain class R k , in the training part of the sample, a decision rule is built in a perfect disjunctive normal form: where j( ) ω X k i is the conjunction built for the selected ensemble X k for the images ω i k R ∈ .
⊂ Ω Then in the case of correct recognition of the k-th class, we have: The criterion of recognition accuracy for the k-th class is written: where D k * is the negation of the left part of (10), built on the set R k k * ⊂ Ω for the class R k * . Therefore, the functional (11) displays incorrect recognition for the selected ensemble.
The functional that displays correct recognition, and, therefore, characterizes the quality of the constructed ensemble of features, is as follows: To illustrate the effectiveness of the quality criterion of the selected ensemble of informative features, which will display correct recognition, input binary data on physicochemical and toxic properties of substances can be used (Table 2). Table 2 Input data on physicochemical and toxic properties of substances (binarized)

Physicochemical properties
No. R i MW, X f1 MP, X f2 WS, X f3 VL, X f4 (q min = 0). Then: , Table 1 shows that the pattern ω 2 does not differ from ω 3 , and ω 4 from ω 6 , so the functions D 1 and D 2 contain not three but two conjunctions and Thus, the system of rules (13) can recognize objects belonging to different classes by applying the already constructed ensemble of informative features x x 1 3 , .

3. Experimental application of the algorithm for se lecting ensembles of informative features
An experiment to determine the hazard levels of some plant protection products on the basis of primary measurement data of the studied environment was considered. Measurement data come from special sensors through communication channels to the computer to recognize the impact of certain characteristics (features x i ∈X, i = 1,2,...,n) on the integrated «hazard indicator» W, which can be described in the feature space {X}. Here the task is not to consider the purely technical side of the experiment, but to apply the above algorithm for selecting informative factors-features. Such indicators can be obtained within an automated environmental monitoring system to study those that most affect the value of W. This emphasizes that the described algorithm can be applied not only in the «technical» field, but also in other areas of research, such as in environmental studies or in health research with specified input data.
Input a priori information (pre-measured values of factors) is quite cumbersome, so Table 2 shows only a fragment of it with already binarized data. In Table 2, the following abbreviations for the properties of the substance are adopted: MW -molecular weight, MP -melting point, WS -water solubility, VL -volatility, LD 50 -median lethal dose for white rats, AF -accumulation factor.
The numbers 1, 2,…, 29, 30 in Table 2 indicate the gradation numbers of the six properties with 5 levels for each. For example, for RV (water solubility), the number 11 (x 11 ) corresponds to the range of (0.01-0.02) g/l, 12 (x 12 ) -(0.03-0.04) g/l, etc., 15 (x 15 ) -(0.09-0.10) g/l. In the same way, the values of other features were obtained, but for each indicator from each property in its ranges and units.
Based on the results of the synthesis of the subsystem of features, a certain conclusion can be made for ecological and technical environmental monitoring from the standpoint of minimizing the negative impact of a particular product W (e. g., pesticide in agriculture) ( Table 3). Table 3 Results of selecting informative features for five classes of levels W R k Physicochemical properties Toxic properties MW, X f1 MP, X f2 WS, X f3 VL, X f4 LD 50 , X f5 AF, X f6 R 1 x 2 x 7 x 11 x 16 x 25 x 26 R 2 x 2 x 6 x 11 x 20 x 24 x 26 R 3 x 4 x 8 x 11 x 16 x 24 x 27 R 4 x 4 x 7 x 12 x 16 x 24 x 28 R 5 x 4 x 7 x 13 x 16 x 22 x 28 Table 3 shows that the most hazardous substances of class R 1 have high toxicity (x 25 ) and pronounced accumulation properties (x 26 ). For class R 1 , on the contrary, low toxicity (x 22 ) and accumulation ability.

Discussion of the results of using the improved algorithm for matchedpairs selection of informative features
The described algorithm has advantages over known algorithms with brute-force search of feature subspaces in pattern recognition problems with large differences of input features represented by a binary code. This can be illustrated by the following example.
Suppose we have a problem of relatively low dimension, where the number of recognition classes k = 2, the number of features n = 16, the number of objects (images) l = 16 and by one computational procedure we mean the computation of one element of the matrix (3).
With the exhaustive combinatorial search of all possible options, the number of operations is: When using multi-row brute-force search [9], the number of operations is: When using the improved multi-row matched-pairs feature selection algorithm: Analysis of the written comparisons of the improved algorithm for matched-pairs selection of informative features with other algorithms aimed at solving the same problem, shows the following. With the exhaustive combinatorial search of all options of structures of feature subsets according to the given values of the input parameters k, n and l, it is necessary to search through the number of combinations represented by expression (16), which is 17·10 8 operations. Thus, the application of the algorithm with multirow sequential matching of features under the same conditions and when a similar result is achieved allows performing this task for 6·10 5 operations. That is, much faster, and the gain will be S : S Q Q = 1 2 ≈ 5 times. The application of the improved multi-row matchedpairs feature selection algorithm to find the desired result according to expression (18) is estimated at about 1.2·10 5 operations. This gives a gain of S Q Q = 1 3 ≈ 130 times in the amount of computations.
Such results are due to, firstly, matching of features and, secondly, the principle of multi-row selection of ensembles, inherent in the inductive approach to computer modeling of complex systems.
It should also be noted that these estimates for the second and third methods of selecting the resulting subset of features show the upper limits of the number of operations. In fact, these values may even be significantly lower. This is because the quality of ensembles of features is assessed on each selection row. The required set of informative features can be achieved earlier than would be required with the exhaustive combinatorial search of all options and their evaluation by the same criteria (6)- (8).
Although no loss of features from the primary information base during the multi-row procedure was found in test experiments, such a possibility exists and requires additional research in the future.

Conclusions
1. One of the approaches to solving the general problem of constructing subspaces of informative features presented by a binary code in the problems of recognition of complex system states is proposed. The algorithm uses the feature matching procedure, which indicates its effectiveness and makes it possible to significantly reduce the amount of computations.
2. Quality criteria for recognizing classes q min max → and N q min min,

( )→
which also make it possible to assess the quality of the constructed ensemble of informative features in the system q Nq 3. An example of using the proposed algorithm to solve a specific practical problem of selecting an ensemble of informative features with an assessment of the effectiveness of such an ensemble in the examination sample is given. From a practical point of view, the described algorithm has advantages over known algorithms with brute-force search of feature subspaces in pattern recognition problems. This is shown for a relatively small (number of recognition classes k = 2, number of features n = 16, number of objects (images) l = 16) problem. In this direction, we can conclude that the efficiency of such an algorithm will increase with increasing dimension of the feature space.

Introduction
The quadratic assignment problem (QAP) is a wellknown problem and this is a problem whereby a set of facilities are allocated to a set of locations in such a way that the cost is a function of the distance and flow between the facilities. In this problem, the costs are associated with a facility being placed at a certain location. The objective is to minimize the assignment of each facility to a location as given in [1,2].
The QAP has application in wiring a computer backboard, in designing a hospital layout and in the dartboard