DEVELOPMENT OF A COMBINED IMAGE RECOGNITION MODEL

The object of research is the processes of identification and classification of objects in computer vision tasks. Currently, for the recognition of images, the best results are demonstrated by artificial neural networks. However, learning neural networks is a poorly conditioned task. Poor conditioning means that even a large data set can carry a small amount of information about a problem that is being solved. Therefore, a key role in the synthesis of parameters of a specific mathematical model of a neural network belongs to educational data. Selection of a representative training set is one of the most difficult tasks in machine learning and is not always possible in practice.The new combined model of image recognition using the non-force interaction theory proposed in the paper has the following key features:– designed to handle large amounts of data;– selects useful information from an arbitrary stream;– allows to naturally add new objects;– tolerant of errors and allows to quickly reprogram the behavior of the system.Compared to existing analogues, the recognition accuracy of the proposed model in all experimental studies was higher than the known recognition methods. The average recognition accuracy of the proposed model was 71.3 %; using local binary patterns – 59.9 %; the method of analysis of the main components – 65.2 %; by the method of linear discriminant analysis – 65.6 %. Such recognition accuracy in combination with computational complexity makes this method acceptable for use in systems operating in conditions close to real time. Also, this approach allows to manage the recognition accuracy. This is achieved by adjusting the number of sectors of the histograms of local binary patterns that are used in the description of images and the number of image fragments used in the classification stage by the introformation approach. To a large extent, the number of image fragments affects the time of classification, since in this case, it is necessary to calculate the matching of the system actions in each of the possible directions in pairs.


Introduction
Every year the volumes of information grow from the formalization and subsequent algorithmic processes that were previously performed manually. 80 % of the information a person receives through vision, so any systems associated with automatic image processing are in demand. One of the key concepts in automatic processing is the concept of object recognition, which is an area of active work over the past twenty years. When algorithms perform recognition at the expert-person level, automation leads to acceleration of data processing systems and increasing their efficiency.
Despite significant success, image recognition only in some areas can match or exceed the cognitive function of human perception in terms of the quality of the result. In general, the problem of image recognition is still not completely solved.
There are various methods for image recognition potential functions, Bayesian networks, Markov networks, artificial neural networks, various types of associative memory, and so on. The study of the problem of image recognition has shown that recognition is carried out by methods that do not fully take into account the features of graphic objects, the main ones of which are a small amount of a priori data regarding reference descriptions of recognition objects. Therefore, it is important to develop a universal model, which makes it possible to assign the object of recognition to a certain class of objects with a small training set with a high probability.

The object of research and its technological audit
The object of research is the processes of identification and classification of objects in computer vision tasks. Currently, for image recognition, stochastic models show the best results, namely: a subclass of artificial neural networks -convolutional neural networks. However, learning neural networks is a poorly conditioned task. Poor conditioning means that even a large data set can carry a small amount of information about a problem that is being solved. Therefore, a key role in the synthesis of parameters of a specific mathematical model of a neural network belongs to educational data. Selection of a representative training set is one of the most difficult tasks in machine learning and is not always possible in practice. So there is a need to develop a universal method, which allows with a high degree of probability to assign the object of recognition to a certain class of objects with a small training set. TECHNOLOGY AUDIT AND PRODUCTION RESERVES -№ 3/2(47), 2019 ISSN 2226-3780

The aim and objectives of research
The aim of research is increasing the efficiency of image processing in solving technical problems through the use of new approaches. To achieve this aim it is necessary to solve the following objectives: 1. Analyze the developed methods for computational complexity in solving the practical problem of image processing.
2. Develop a combined method for solving the problem of image processing.
3. Justify the need and advantages of using the new combined image processing method.

Research of existing solutions of the problem
In the past few years, significant progress has been made in the field of object recognition with small variations in lighting, stage, etc. [1], but reliable recognition methods with more extreme changes were unattainable.
The task of object recognition with the help of classifiers is traditionally divided into two parts: the selection of key features and the classification by these features [2]. Feature extraction is carried out due to a priori information, as a result of which structural invariance is achieved [3].
Statistical methods [4] are used for image analysis, but in cases where high recognition accuracy is not required. Not only image pixels, but also some low-level image representation, for example, wavelet coefficients, are input to the system. The support vector machine is a universal approximator that can reduce to zero the training error on a training set, which, in turn, allows to rely on a low generalization error on the test set. However, practice has shown that the error of generalization for networks with a large number of layers (deep architecture) is lower than for networks with a small number of layers (shallow architecture) [5]. Classical perceptrons, RBF networks and SVM, have a small number of layers, and the task of training the network when expanding the number of layers becomes quite complex.
In most cases, projection directions that are orthogonal over the entire range of their class are chosen for research. The principal component analysis method (PCA) is based on linear projection of image space in low-dimensional feature space. The basis of this algorithm is the use of fundamental statistical characteristics such as: average expectation and covariance matrix. For each object, its main components are calculated. The recognition process consists in com paring the principal components of an unknown object with the components of the remaining objects. However, this approach maximizes the overall range across all classes. PCA projection is well suited for reconstruction from a low-space basis, but can't be optimal from a discriminant point of view. The main disadvantage: high demands on the conditions for shooting images. Images should be obtained in similar lighting conditions, with the same angle and high-quality pre-processing should be carried out, leading the image to standard conditions (scale, rotation, centering, leveling brightness, background clipping). It is undesirable to have various kinds of distortions and other intra-class variations.
Linear discriminant is a «classical» method for pattern recognition, in which the characteristics of an object for recognition are obtained using polar quantization of the form.
Neural network structures give rather good results due to their non-linear structure. The disadvantages of networks include the rapid growth of their size when trying to tune to a large number of invariant transformations, the use of neurons with a threshold activation function, and special algorithms for setting [6].
Convolutional neural networks (CNS -Convolutional Neural Network) belong to multilayer neural networks, which are a modernized multilayer perceptron [7]. SNA consists of two parts: one part is responsible for the selection of signs, and the other is responsible for the classification. To highlight the signs using the alternation of layers of convolution (C-layer) and subsampling (S-layer). This is one of the variants of alternating layers of complex and simple cells, which appeared in the neocognitron. Neurons in these layers are organized into plates. Each neuron perceives information from the previous layer using its own receptive field, which is a group of neurons associated with the current neuron using settings. The double layer perceptron acts as a classifier.
The results of experimental studies of classification methods are described in [8,9]. During the work, a process was proposed containing the following classification methods: -naive Bayesian classifier (NB); -method of k-nearest neighbors (k-NN); -decision tree (DT); -support vector machine (SVM); -neural network (ANN); -linear discriminant analysis (LDA). The results of the study are summarized in Table 1.  [8,9] Analysis of presented in Table 1 data shows that the neural network structure (ANN) has the lowest error rate, but neural networks have several disadvantages: -most ANN design approaches are heuristic and often do not lead to unambiguous solutions; -to build a model of an object based on ANN, it is necessary to perform the configuration of the internal elements and the connections between them; -the problems of preparing a training set are asso ciated with the difficulties of finding a sufficient number of training examples; -significant time costs of training do not allow the use of ANN in real-time systems; -behavior of a learned ANN can't be unambiguously predicted, which increases the risk of using ANN to control expensive technical objects.
One of the key tasks in the implementation of neural network structures is the search for the optimal ratio of parameters and their characteristics in each specific case. To effectively solve such a problem, a wide range of methods, algorithms and synthesis methods are needed, differing in ISSN 2226-3780 the amount of computations, quality of results, time to search for solutions and methods of data presentation.
Thus, the results of literary analysis allow to conclude that such approaches and methods still remain reliable means of recognition, but using only such approaches it is impossible to obtain the best results for various applied tasks [10]. Therefore, it is necessary to use combined approaches and methods that focus on minimizing hardware costs when implementing computer vision systems.

Methods of research
The recognition task is a relation: where I in -information about a certain set of objects, divided into classes (their number is finite), and the method of presenting information is precedent, and for Z only part of the information are known (training and control samples); I out -class labels/indexes. Object recognition can be divided into stages: 1) selection of the observation area; 2) coding the found area for further recognition steps; 3) comparison and construction of classification models; 4) calculation of the probability of belonging to a particular class; 5) summary of the results. The first three stages relate to the preparatory stages of image recognition, during which a model is formed suitable for recognition by the chosen method. In the framework of this work, such methods will not be consi dered. This paper discusses the combined model of image recognition based on the theory of non-force interactions. The fundamental concept of the theory of non-force interaction is the concept of information. In the generally accepted interpretation, information means information, data, knowledge obtained and transmitted in the process of interaction with other subjects. Another fundamental concept is the Vip-interpretation of motion, which is fed through a variety of displacements behind or against the direction of motion with a speed that is equal to the speed of light in a vacuum and the probability determined by the introformation in the content of moving objects. Elements of the theory of non-force interaction for the first time are used as an apparatus for image recognition. Basically, the theory of non-force interactions is used to solve the following problems: -access to databases based on the analysis of natural language phonemes; -assessment of investment proposals; -assessment and prediction of the impact of harmful substances in water resources on the health of the population; -forecasting the results of sporting events. The image model for recognition by the introformation approach in the framework of this work is based on the voting approach and elements of the local binary pattern method. The ideological basis for voting in computer vision is Hough transformations, and the theoretical basis is set theory, methods for calculating estimates, and statistical analysis. An important feature of a voting system is the ability to obtain a resulting list of fragments or features by which a decision is made, as well as the ability to stop recognition at an early stage when certain characteristics are achieved, saving on the amount of computation.
From the known probabilities of the manifestation (action) of system (3), (4), its definiteness (5) is calculated with respect to these manifestations: From certainty of system (5), its awareness is calculated: Then the total, for all actions on the system, increase in the certainty of the action of the system is calculated.
or introformation representation of the change in kinetic energy: The impulse corresponds to the magnitude of the influence of an image fragment on the reaction. Based on the introformation representation of the change in impulse of objects (7) or the introformation representation of the change in kinetic energy (8), an increase in the system awareness is calculated: Then a new definition of the system action is calculated: and new awareness of the system action: According to physical laws, and on the basis of the obtained new values of certainty (11) and awareness (12), the probability of the system is calculated: The obtained probabilities (13) should be further generalized: In the general case, the construction of a model for a specific object requires, according to the results of measurements of the input and output signals, the assignment of this object to a certain class of objects. This paper presents a generalized model of the combined model of image recognition, in which the image processing process looks like a stream consistently passes through the stages of preprocessing, image description, mapping and classification. The combined model assumes that the transition from one phase of image processing to another occurs only after a full and successful completion of the previous phase, transitions back, forward or overlapping of the phases -does not occur.

Research results
Testing of the combined model of image recognition was carried out on four different data sets: -a set of images of small objects ALOI (Amsterdam Library of Object Images) [11]. This collection contains 1000 classes of images of small objects of 24 copies per class. In each instance of the class systematically vary: angle of view, angle of illumination and color of the backlight. All images have a fixed size of 192 × 144 pixels in grayscale in PNG format; -a set of images of handwritten characters formed by T de Campos in Microsoft Research India (The Chars 74k dataset) [12]. This set contains 62 classes of characters (0-9, A-Z, a-z) with 55 copies per class. Character images do not have fixed sizes and are not centered relative to the image. All images are strictly in a black and white palette, 1200 × 900 pixels in PNG format; -a set of images of persons from the Cambridge AT & T Laboratory (ORL Database of Faces) [13]. In this set are images of 40 people in 10 copies per person. The pictures were taken at different times, with changes in lighting, facial expressions and details. All pictures were taken on a dark uniform background with subjects in a strictly vertical frontal position. All files have a fixed size of 92 × 112 pixels in grayscale in PGM format; -a set of images of the base CDI Set (Celebrities Data Images) [14]. In this set 20 excellent people were used with 20 images per person. The pictures were taken at different times, with changes in lighting, background, angle of shooting, details, facial expressions and head position. All files have various sizes, made in polychrome coloring, the format of images is JPG. The studies were conducted on Apple mobile devices, all devices were based on the iOS version 7.1.2 mobile operating system, using the SDK Xcode 5.1.1 stable release 5B1008. To study the reliability of image recognition of the proposed combined model, several experiments were conducted on the data sets described above. In these experiments, the dependences of the recognition accuracy on the method settings for the description of images and its key fragments, the dependence of recognition accuracy on the amount of input data, and the time dependencies on the method settings for the image description were determined. Respectively, sets of input data were prepared. The minimum size of the images used in the experiments was 92 × 112 pixels, the maximum 2000 × 3000 pixels.
During the research it was found that the proposed combined model of image recognition by the introformation method based on local binary templates with voting can be used in conditions close to real-time conditions on mobile devices. This approach showed high reliability of recognition (from 60 % to 95 % depending on the set of input data) with the ability to control the recognition accuracy and time losses by setting the description of objects.

ISSN 2226-3780
Compared with existing analogues, the proposed approach showed mainly the best temporal indicators of learning and recognition on large volumes of input data (Table 2). The average classification accuracy by the proposed approach was 71.3 % (LBP -59.9 %; PCA -65.2 %; LDA -65.6 %). This, combined with the temporary loss of learning and recognition, makes this approach acceptable for use in systems close to real-time conditions on mobile devices.

SWOT analysis of research results
Strengths. Studies have shown that the proposed combined model of image recognition allows not only image processing, but also training and retraining in near realtime mode with limited mobile device resources better than known approaches to image recognition. The proposed combined model also makes it possible to control the reliability and temporary loss of the recognition process due to the reduction of the feature space.
Weaknesses. The proposed model shows the highest reliability of recognition only in cases when the recognition object is «known» (handwritten symbol, faces, etc.). As part of this work, only the recognition model was studied and developed. Issues of classification and preprocessing in the framework of this work were not considered.
Opportunities. The promise of the proposed combined model is ensured by low computational complexity and high reliability of recognition. This allows not only to work on devices with limited hardware resources (such as mobile devices), but also to build complex computing systems for processing Big Data, which are a promising direction of the present. Since the elements of the theory of non-force interactions that lie at the basis of the proposed combined model, have proven themselves in other data processing tasks, the created combined model can be adapted for other areas of recognition: sound, behavior, etc. kind of events and consequences.
Threats. One of the main factors affecting the success of recognition with high confidence is the conditions for collecting information and methods for preparing and processing data. Moreover, the accuracy of recognition varies in a significant range depending on the conditions for collecting information. The research results were obtained from data sets obtained under conditions close to ideal. Therefore, when working with real data, it may be necessary to additionally apply methods and approaches to image preprocessing aimed at: -compensation of distortions caused by camera vibration; -reduction of noise introduced by the camera and depend on the type and size of its sensor, as well as shooting conditions; -alignment of image brightness, etc.

Conclusions
1. Studies show that the theory of non-force interactions, in contrast to the existing classification methods, has the following key features: -designed to handle large amounts of data; -selects useful information from an arbitrary stream; -allows to naturally add new objects; -tolerant of errors and allows to quickly reprogram the behavior of the system. 2. Based on the theory of non-force interactions with the description of objects using local binary patterns and the voting approach, a new model for recognizing graphical objects in software systems was formed. The peculiarity of this model lies in the combination of non-linear structure, which is inherent in neural network structures (theory of non-force interaction) and low computational complexity. This allows software systems based on this model to operate in near real-time modes with limited resources of mobile devices. 3. A model built on the basis of the proposed image recognition method allows to control the recognition accuracy. This is achieved by adjusting the number of sectors of the histograms of local binary patterns that are used in the description of images and the number of image fragments used in the classification stage by the introformation approach. To a large extent, the number of image fragments affects the time of classification, since in this case, it is necessary to calculate the matching of the system actions in each of the possible directions in pairs.