OPTIMIZATION OF KNOWLEDGE BASES ON THE BASIS OF FUZZY RELATIONS BY THE CRITERIA “ ACCURACY – COMPLEXITY ”

The method of optimization of fuzzy classification knowledge bases by the criteria “inference accuracy – complexity” was proposed. A relational fuzzy model, which corresponds to the fuzzy classification knowledge base, was developed. The matrix of fuzzy relations in the form of one-dimensional projection “input terms – output classes” is a simplified representation of the system of classification rules. A problem on the optimization of a knowledge base is reduced to the problem on the min-max clustering and comes down to selecting such partition matrices “inputs – output” that provide for the required or extreme levels of inference accuracy and the number of rules. In the relational models, a question about optimal choice of the number of output terms remains open. A selection of output classes, input terms and rules is reduced to the problem on discrete optimization of the algorithm reliability indicators, in order to solve which, we employed the gradient method. The number and location of hyperboxes are determined by the relations matrix, and the sizes of hyperboxes are defined as a result of tuning of the triangular membership functions. A selection of the number of input and output terms in the partition matrices may be performed both under the offline mode and by adaptive adding/removing of terms. Known methods of the min-max clustering apply heuristic procedures for the selection of the number of rules (classes). The proposed method generates variants of fuzzy knowledge bases in accordance with the formalized procedures of reliability analysis and synthesis of algorithmic processes. This resolves a general problem on the methods of min-max clustering related to the minimization of the number of input terms without losing inference accuracy. A transition to the relational fuzzy model provides simplification of the process of the knowledge bases tuning both for the assigned and unknown output classes.


Introduction
The tuning of expert fuzzy knowledge bases involves maximum approximation to experimental data for a given level of complexity or maximum simplification without losing accuracy of inference [1].The number of output terms or classes of output [2] determines the quality of a fuzzy classification knowledge base.The optimization of such knowledge base implies: a search for the minimum inference error with the limitation to the complexity of a model (the number of input terms, output classes, and rules); search for the minimum of rules (classes) at the assigned level of accuracy.A transition to the relational model makes it possible to simplify the design process by presenting the rules in the form of a matrix of fuzzy relations "input terms -output classes" [1].In this case, a multi-dimensional matrix of relations R(X) is presented in the form of projections R 1 (x 1 ),…, R n (x n ) [3].The number of input and output terms is set in advance, and the tuning of the model implies selection of the elements of a matrix of relations [4,5].However, relational models leave open the problem on the optimal choice of the number of output classes.At the same time, the problem on the optimization of a fuzzy knowledge base is the task of fuzzy clustering [6].In addition, it requires a partition of the space of input variables into such number of classes that provides the required or extreme levels of inference accuracy and the number of rules.

Literature review and problem statement
Methods of relational clustering, which conduct the partition of objects by similarity measures, are limited by the assigned number of classes [6,7].If the number of classes is unknown, the methods of min-max clustering are used, which imply the generation of easily understandable rules-hyperboxes [8].Hyperboxes learn using supporting vector machines (SVM) [9,10] through extension/compression.Balancing between the inference accuracy and the number of rules (classes) is achieved by combining/partition of hyperboxes.To restore nonlinear boundaries between classes and avoid excessive coverage density, the mode of learning in the min-max neural networks must reduce the number of hyperboxes without compromising the recognizing capacity [11,12].There remains a problem in the adaptation of maximum size of the hyperbox, which determines how many rules can be generated.Classes overlapping and classification errors render this parameter very important.If the value of this parameter is small, unnecessary hyperboxes (classes) are formed [13].
A general problem of the min-max clustering methods is the selection of the number of output classes and the minimization of the number of input terms without compromising the inference accuracy.The method for the optimization of output classes of fuzzy knowledge base was proposed in papers [14,15].In contrast to the heuristic procedures of rules (classes) selection [8][9][10][11][12][13], the generation of fuzzy knowledge bases is reduced to the problem on discrete optimization of indicators of algorithm reliability [14,15].For the selection of output classes, the gradient method was used.The number of classes is defined under the offline mode [14].Clarification of class boundaries is carried out by adaptive adding/ removing classes in arrangement vectors [15].For the current output classes, interval rules are generated by solving the problem on inverse logical inference [2].This solves the problem of control and adaptation of the hyperbox size [16].The structure of the model is determined by parameters of interval rules that are connected to the coordinates of the maximum of a membership function.

Предложен метод оптимизации классификационных нечетких баз знаний по критериям «точность -сложность», который позволяет упростить процесс настройки путем перехода к реляционной модели. Задача оптимизации базы знаний сведена к задаче min-max кластеризации. Суть метода в выборе таких матриц разбиения «входы -выход», которые обеспечивают необходимые или экстремальные уровни точности вывода и количества правил Ключевые слова: оптимизация нечетких баз знаний, min-max кластеризация, нечеткие реляционные модели
This paper proposes a method for the optimization of output classes and input terms of a fuzzy knowledge base.If the number of terms is set in advance, the problem of min-max clustering may be solved by relational partition of the space of input variables [1].The number and location of hyperboxes is determined by the matrix of relations [17] and the sizes of hyperboxes are determined as a result of adjusting the triangular membership functions [1].Then the optimization of a relational fuzzy knowledge base lies in the selection of such partition matrices "inputs -output", which provide the required or extreme levels of inference accuracy and the number of rules.Following [14,15], the selection of number of input and output terms in the partition matrices may be performed both under the offline mode and by adaptive adding/removing of terms.

The aim and tasks of the study
The aim of present work is to develop an approach to the optimization designing of relational fuzzy knowledge bases by the criteria "inference accuracy -complexity".This approach should simplify the process of the knowledge bases tuning based on fuzzy relations for both the assigned and the unknown output classes.
To achieve the set goal, the following tasks were to be solved: -development of a relational fuzzy model that matches a fuzzy classification knowledge base; -development of a method for the optimization of knowledge base on the basis of fuzzy relations under offline and online modes.

1. Fuzzy relational model
Consider an object of the form y=f(x 1 ,…,x n ) with n inputs X=(x 1 ,…,x n ) and output y, for which the relation "inputsoutput" may be represented in the form of a system of fuzzy classification IF-THEN rules [2]: where jp i a is the fuzzy term for the evaluation of variable x i in line jp, j 1,m, = j p 1,z ; = d j is the fuzzy term for the evaluation of variable y; z j is the number of rules in class d j ; m is the number of terms of the output variable.Let i i1 ik {c ,...,c } be a set of input terms for the evaluation of variable x i , i 1,n = .We designate where N=k 1 +…+k n .
Then the system of one-dimensional matrices of fuzzy relations corresponds to a fuzzy knowledge base (1): that is equivalent to a multi-dimensional matrix: = dependence "inputs -output" is described using the extended compositional rule of inference [1]: where are the vectors of membership degrees of variables x i and y to terms c il , i 1,n, = and d j , j 1,m, = respectively.From ratio (2), hence follows the system of fuzzy logical equations, which connects membership functions of fuzzy input and output terms: Ratio ( 3) defines a fuzzy model of an object as follows: where is the vector of parameters of fuzzy relations, which includes: -vectors of lower and upper bounds, as well as vectors of coordinates of the maximum of triangular membership functions of fuzzy terms C I and d j ; f is the operator of connection "inputs -output", which corresponds to formula (3).

2. Problems on the optimization of knowledge base based on fuzzy relations
For a fuzzy knowledge base (1), the interrelation between the mean root square error and the number of rules depends on the number and bounds of output classes.Then the problem on the optimization of a fuzzy knowledge base (1) is reduced to the problem on the min-max clustering and lies in selecting such a partition matrix R that provides the required or extreme levels of inference accuracy and the number of rules.
Let the training sample be assigned as P pairs of experimental data: Optimization of the number of input terms and output classes is carried out under the offline mode.In this case, the preliminary boundaries of d j classes are assigned by an expert.
We shall evaluate the complexity of a fuzzy model (4) based on the number of rules Z(N, m, R), which are associated with relation matrix R. We shall assess the quality of a fuzzy model (4) based on the root mean square error: Then the problem of selecting the optimal number of input terms and output classes may be formulated in the direct and dual statement.
Direct statement.Find such a number of input terms N, output classes m and fuzzy partition matrix R that provide the minimum number of rules for a permissible inference error: where E is the maximum permissible root mean square error.
Dual statement.Find such a number of input terms N, output classes m and fuzzy partition matrix R, which provide minimum inference error for the assigned number of rules: where Z is the maximum permissible number of rules.
Optimization of boundaries of output classes is performed under the online mode.In this case, clarification of the partition method is made by adaptive adding/removing of terms.
We shall introduce a limitation on the volume of relations matrix in the following way: where i k and m are the maximum number of input terms and output classes. Assume: are the vectors of arrangement of input terms and output classes, where u I =1(0) or v J =1(0) correspond to the addition (removal) of term C I or d J , respectively.
We shall evaluate a complexity of fuzzy model ( 4) based on the number of rules Z (U, V, R), which are associated with relations matrix R. We will assess the quality of fuzzy model (4) based on root mean square error Then the problem on the selection of optimum boundaries of output classes may be formulated in direct and dual statement.
Direct statement.Find vectors of arrangement of input terms U, output classes V and fuzzy partition matrix R, for which under condition of limitation on the knowledge base volume Z( , , ) min → U V R and E( , , ) E. £ U V R Dual statement.Find vectors of arrangement of input terms U, output classes V and fuzzy partition matrix R, for which under condition of limitation on the volume of knowledge base E( , , ) min

3. Method for the optimization of relational fuzzy knowledge base
To select the values of controlling variables, the gradient method is used, which was proposed in [14] for the solution of problems on discrete optimization of fuzzy knowledge base.This method implies a coordinate-wise rise along the surface of objective function in the direction of gradient.Algorithms for solving the optimization problems have a unified structure, consisting of two iteration sections [14].In the first of them, the first permissible solution by successive adding of terms with the highest gradients is determined; in the second, an improvement of the found solution by decreasing the complexity of the model is accomplished.For the current output classes, fuzzy relations are tuned by the methods proposed in [2].

3. 1. Algorithms of the optimization under offline mode
Gradients: and y (m), γ will be defined as the ratio of infallibility increment ∆E(k i +1, Ψ r ) or ∆E(m+1, Ψ r ) to the increment in the number of rules ∆Z(k i +1, Ψ r ) or ∆Z(m+1, Ψ r ) at increasing the number of input or output terms in partition matrices: We designate the solution vector, obtained at the tth step of the optimization algorithm as: The algorithm for solving the problem in direct statement is performed in the following sequence: 1. Set the zero-option of a fuzzy model: identify gradients i x γ and γ y relative to solution Ψ (t) .Find the coordinate, for which i x y max{ , }, γ = γ γ t:=t+1.For vector Ψ (t) , assign: Proceed to step 2. 4. Decrease the complexity of model Ψ (t) by decreasing the number of input or output terms at maintaining permissible inference accuracy.Check the conditions for models If conditions ( 5) and ( 6) are not fulfilled for any coordinate, consider vector Ψ (t) as the result of solving the problem, otherwise proceed to step 5.
5. For the coordinates that satisfy conditions ( 5) and ( 6), find the magnitude, by which the number of rules ∆Z will decrease.Find the coordinate for which: t:=t+1.For vector Ψ (t) , assign: Proceed to step 4. The algorithm of solving the problem in the dual statement is performed in the following sequence.
1. Set the zero-option of a fuzzy model: proceed to step 3, otherwise − to step 4. 3. The essence of this step coincides with step 3 of the algorithm for solving the problem in direct statement.Proceed to step 2.
4. Decrease the complexity of model (t)  Ψ for the inclusion in the area of permissible solutions by reducing the number of input or output terms.Check the conditions for models If at least one of the conditions ( 7) or ( 8) is fulfilled, then, among permissible solutions, select a model that provides a lower inference error, otherwise proceed to step 5.
5. For the coordinates that do not satisfy limitations ( 7) and ( 8), find the increment in deriving error ΔE.Find the coordinate, for which t:=t+1.For vector Ψ (t) , assign: Proceed to step 4.

3. 2. Algorithms of optimization under the online mode
Gradients will be defined as the ratio of infallibility I =1, Ψ r ) or ∆E(v J =1, Ψ r ) to the increment in the number of rules ∆Z(u I =1, Ψ r ) or ∆Z(v J =1, Ψ r ) as a result of adding the input or output term C I or d J : Designate the solution vector, obtained at the t-th step of the optimization algorithm as (t) The algorithm of solving the problem in direct statement is performed in the following sequence.
1. Assign the zero-option of a fuzzy model: proceed to step 3, otherwise − to step 4.

3.
For the models where (t) I u 0 = and (t) J v 0, = add an input or output term as follows: relative to solution Ψ (t) .Find the term, for which L M x y max{ , }, γ = γ γ  where: t:=t+1.For vector Ψ (t) , assign: 4. Improve model Ψ (t) by attaining the required level of inference accuracy with fewer terms.For the models for which (t) I u 1 = and (t) J v 1, = decrease the complexity by re- ducing the number of terms in the following way: For the inputs and outputs, find such sets of terms (t)   x Q and (t) y Q , for which the conditions are fulfilled: If (t)   x Q and (t) y Q are empty sets, consider vector Ψ (t) as the result of solving the problem, otherwise proceed to step 5.

For terms (t) I x
C Q ∈ and (t) J y d Q , ∈ which satisfy conditions ( 9) and ( 10), find the magnitude, by which the number of rules ∆Z decreased.Find the term, for which t:=t+1.For vector Ψ (t) , assign: The algorithm of solving the problem in the dual statement is performed in the following sequence.
1. Set the zero-option of a fuzzy model: proceed to step 3, otherwise − to step 4.
3. The essence of this step coincides with step 3 of the algorithm for solving the problem in direct statement.Proceed to step 2.
4. Decrease the complexity of model Ψ (t) for the inclusion in the area of permissible solutions.For models, in which (t) I u 1 = and (t) J v 1, = decrease the number of terms in the following way: For the inputs and outputs, find such sets of terms (t)   x Q and (t) y Q , for which the conditions are satisfied: If at least one of conditions (11) or ( 12) is not met, then choose among permissible solutions a model that provides a lower inference error, otherwise proceed to step 5.
5. For terms which satisfy conditions ( 11) and ( 12), find the magnitude, by which the inference error ∆E increases.Find the term, for which t:=t+1.For vector Ψ (t) , assign:

Results of computer experiment
For the model-standard [15,16], the number of terms is limited as follows: The task implied the transformation of the expert zerooption of a knowledge base to the variant, which provides: Z→min and E 0.5 ≤ in the direct statement; E→min and Z 30 ≤ in the dual statement.Results of the calculation of optimization problems are listed in Table 1, where each iteration represents the results of designing model Ψ (t) for the current number of terms (t)   i k and m (t) with further arrangement of terms vectors U (t)  and V (t) .
The first acceptable solution of the direct problem is obtained at step 4 by successive adding of terms with the highest gradients: -term c 24 at step 2 since:  Model Ψ (4) remains the solution of the direct problem.Decreasing the complexity leads to model Ψ (5) leaving the region of permissible solutions.Further increase in the number of terms in model Ψ (6) provides decreasing the inference error by ∆E=0.0248 with increasing the number of rules by ∆Z=1.
Solution of the dual problem was continued by adding terms with the highest gradients: -term c 23 at step 6 since: Model Ψ (8) remains the solution of the dual problem.Further increase in the number of terms leads to model Ψ (9)  leaving the region of permissible solutions, and decreasing the complexity of model Ψ (10) -to increasing the inference error by ∆E=0.0305 at decreasing the number of rules by ∆Z=1.
Matrices of fuzzy relations in the solutions of direct and dual tasks (Tables 2, 3) are associated with fuzzy rule bases, presented in Tables 4, 5. Results of structural and parametric tuning of models Ψ (4) and Ψ (8) are shown in Fig. 1, 2.

Discussion of results of assessing the complexity of tuning algorithms for a fuzzy classification knowledge base
The proposed method, as well as methods [14,15], represents the formalization of improving transformations for an expert fuzzy knowledge base.At the same time, controlling variables are set, which are the number of input terms, output classes and rules.Improving transformations make it possible to formalize the process of generation of fuzzy knowledge base variants with a subsequent selection by the criteria of accuracy and costs or by the complexity of the tuning process.
Assume the number of rules (classes) is limited, and the number of input terms is unknown.Then the number of tuning parameters for the fuzzy classification knowledge base is 2nZ+2m for two-parameter membership functions [2] or upper and lower boundaries of interval rules [8][9][10][11][12][13].Assume that in addition to the number of output classes and rules, the number of input terms is also limited.Then relations matrices are implanted into the antecedents of fuzzy rules, and the number of tuning parameters of a relational fuzzy knowledge base is ZNm 2N 2m + + [4,5].If the number of rules (classes) is subjected to minimization, we limit the number of terms of input N T and output M T whose linguistic modification provides the required inference accuracy [14][15][16].The number of tuning parameters of the rules generator based on the matrix of fuzzy relations is T T T T N M 2N 2M .+ + An inverse inference for m output terms requires the solution of Z optimization problems with 2n variables for the upper and lower boundaries of the intervals [16].
Compared with [2, 4, 5, 8-13, 14-16], the proposed method allows us to decrease the number of tuning parameters to Nm 2N 2m + + for partition matrices and the upper and lower boundaries of triangular membership functions.The shortcoming of the method is the necessity of obtaining linguistic IF-THEN rules, which are associated with a fuzzy partition matrix.

1 .
The models and methods were developed for the optimization design of fuzzy classification knowledge bases by the criteria "inference accuracy -complexity".A fuzzy relational model, which corresponds to a fuzzy classification knowledge base, was proposed.The problem on the optimization of a fuzzy knowledge base is reduced to the problem on the min-max clustering and comes down to selecting such partition matrices "inputsoutput" that provide the required or extreme levels of accuracy and the number of rules.2.The selection of output classes and input terms is reduced to the problem on discrete optimization of the algorithm reliability indicators, for the solution of which we employed the gradient method.This resolves a general problem in the methods of min-max clustering related to the selection of the number of output classes and minimization of the number of input terms without losing inference accuracy.The number and location of hyperboxes are determined by the relation matrix "input terms -output classes", and the sizes of hyperboxes are defined as a result of tuning of the triangular membership functions.Selection of the number of input and output terms in partition matrices may be performed both under the offline mode and by adaptive adding/ removing of terms.A transition to the relational fuzzy model provides the simplification of the process of knowledge bases tuning both for the assigned and unknown output classes.

Fig. 2 .
Fig. 2. Results of parametric tuning for solving: a -direct problem; b -dual problem

Table 1
Calculation of optimization problems

Table 2
Matrix of fuzzy relations for a direct problem

Table 4
IF-THEN rules for a direct problem

Table 5 IF
-ТHEN rules for a dual problem