MULTI-STEP RECURRENT ALGORITHM TO MAXIMIZE THE CRITERIA OF CORRENTROPY

N ∗ ∗ ∗ ∗ θ = θ θ θ is the vector of the desired parameters N×1; ξ(k) is an interference. For example, identification and filtering tasks are to determine (estimate) the vector of parameters θ* included in equation (1). To this end, one uses some functionality, chosen in advance, whose minimization produces the required solution. The functionality type depends on the interference distribution. The application of the most widely-used quadratic functionality for this purpose ensures obtaining the asymptotically optimal estimate of the vector θ* with minimal variance in the class of unbiased estimates at normal interferCopyright © 2021, O. Rudenko, O. Bezsonov, V. Borysenko, T. Borysenko, S. Lyashenko


Introduction
Many of the tasks related to information processing (identification, management, forecasting, classification, filtering, etc.) [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15] are reduced to constructing and analyzing a model in the following form ( ) ence distributions, that is ( ) ( ) 2 0, . k N ξ ξ ∼ σ If the interference distribution is different from normal, the estimate from a least-square method (LSM) is unstable. The instability of the LSM-estimate in the presence of non-Gaussian interference was the basis for the development of an alternative, robust estimation in statistics, the purpose of which was to eliminate the influence of interference.
It should be noted that having information about interference ξ belonging to a certain class of distributions makes the task much easier. In this case, a maximum likelihood estimate (M-estimate) can be derived by minimizing the criterion that represents the reverse logarithm of the interference distribution function [1]. If such information is not available, a non-quadratic criterion must be applied to assess the vector of parameters θ*. This ensures that the estimate is robust. One of these criteria is the criterion of maximum correntropy [2,3].
Existing algorithms that maximize this criterion are different modifications of gradient procedures, characterized by low convergence rate, which is why they are ineffective in evaluating non-stationary parameters. Therefore, it is a relevant task to develop algorithms that would provide for quality estimation and robustness of non-stationary parameters and could demonstrate a greater convergence speed.

Literature review and problem statement
Estimation based on the maximization of the criterion of maximum correntropy was the development of the ideas of robust evaluation through the use of non-quadratic criteria. One such criterion, in particular, is modular. The application of this criterion leads to symbolic algorithms. It is shown in [2][3][4][5][6] that they are quite effective when there is pulsed interference. Thus, [2,3] studied the efficacy of the affinity projection symbolic algorithm. The affinity projection symbolic algorithm with a variable gain factor was used in [4].
It should be noted, however, that the symbolic algorithms, while providing for the robustness of the resulting estimate, demonstrate a low convergence rate. Attempts to speed up the operation of such algorithms, undertaken in [5,6], require quite a lot of additional information and lead to an increase in their computational complexity.
The classic robust criteria, proposed by Huber [7] and Hempel [8], are a combination of quadratic and modular functionality. As shown in the cited works, this combination provides for the optimal estimates for the Gauss distribution, as well as robustness to distributions with heavy tails (emissions). It should be noted, however, that the effectiveness of the robust estimates obtained depends significantly on the many parameters used in these criteria. Although there are some recommendations for choosing these parameters, in most cases they are chosen based on the experience of the researcher [9]. Some practical recommendations have been developed in [10][11][12] for the choice of functional parameters for robust neural network training. The more common problem of robust estimation in the presence of interference with asymmetrical distributions was investigated in [13]. However, the issue of choosing the parameters of the functionality remains open.
A simpler approach to building combined functionalities, consisting of both quadratic and modular, and with no specified disadvantage, was developed in [14][15][16][17][18]. A given criterion, proposed in [14], was used in the cited works to solve the problem of identification in the presence of pulse interference. The issues of the stability of the normalized algorithm were studied in [15]; the issues of convergence of the algorithm were discussed in [17] the task of choosing the optimal values for algorithm parameters was solved.
The minimum fourth-power criterion was proposed in [18]. The task of increasing the convergence rate of the algorithm to minimize a given criterion by using the optimal setting step parameter was studied in [19,20]. In order to ensure the robustness and stability of the algorithm, it was proposed in [21] to use a variable step parameter that takes into consideration the energy of the error (in terms of the least-squares). In [22], it was proposed to modify the algorithm of the method of the least fourth power based on the quasi-Newtonian procedure. And, finally, work [23] considered the implementation of a given algorithm using quantum computing A combined estimation criterion to speed up the identification process, which uses the combination of the quadratic criterion and the fourth-power criterion, is proposed in [24[. In [25], a given approach was used to speed up the identification process in the presence of pulse interference. The properties of the adaptive algorithm to minimize such a combined criterion were studied in [26].
A combined criterion consisting of a fourth-power and modular criterion was proposed in [27,28]. These works established the asymptotic and non-asymptomatic properties of the identification algorithm and investigated the effect of selecting the mixing parameter value on the properties of estimates.
The criterion of the lowest average excess has been introduced in [29,30]; a fairly simple identification algorithm has been obtained. As shown in the cited works, a given algorithm is resistant to a wide range of noises (pulse, evenly distributed, and Gaussian). In order to increase the speed of the evaluation process using a given criterion, the kernel-recursive method of the least squares is proposed in [31], and, in [32], the kernel algorithm of affinity projection. These algorithms are modifications of the LSM-type and the recurrent LSMtype algorithms that use a kernel representation [33].
Our analysis of the above works has revealed that the implementation of the algorithms reported in them is associated with problems in choosing the parameters that are included in the minimized functionalities. In addition, theoretical studies of the convergence of these algorithms require the introduction of simplistic assumptions far enough from practice. Therefore, the effectiveness of these methods depends significantly on the experience of the researcher.
Another widely adopted approach is the approach based on signal information characteristics, particularly entropy. The functionality used in this case is an explicit functionality from the probability density function (PDF) and includes all the higher-order statistical properties defined in the PDF. Because entropy measures the average uncertainty contained in a given PDF, minimizing it reduces the error. The concept of information-theoretical learning, which uses Renyi quadratic entropy as a criterion, was introduced in [34,35], for which a non-parametric estimate based on Parzen's windows with Gauss nuclei is defined directly from the data samples. In the cited works, it was shown that when using Renyi entropy the result of training minimizes the Renyi distance between the conditional probability of the density function of the desired and actual output signals for the specified input signals.
Numerous studies [36][37][38][39][40][41][42][43][44] have shown that with non-Gaussian noise in measurements an informational approach is very effective. At the same time, the criterion used should take into consideration the statistics of the error signal not only of the second but also of a higher order.
Correntropy was introduced in [36,37] as a generalized measure of similarity, maximizing which underlies the production of sufficiently effective robust algorithms. However, while a gradient algorithm was used in [38][39][40][41][42] to maximize functionality, the algorithm of the recurring LSM was used in [43] to maximize functionality. It should be noted that due to the low convergence rate, both algorithms are quite ineffective in assessing non-stationary parameters. Two approaches are used to effectively assess non-stationary parameters. The first is based on the modification of RLSM by applying an information weighting parameter. Another approach suggests the use of some additional signal information in algorithms at a series of previous cycles. This approach is implemented in the algorithms of current regression analysis (CRA). However, a given approach is designed to minimize quadratic functionality, which does not ensure the robustness of estimates. In addition, existing CRA algorithms are not convenient in real time as they require, similar to LSM, a recalculation at each cycle of evaluation of the inverse matrix of observations.
In this regard, it is important to derive the recurrent ratios describing the CRA method to maximize the criterion of correntropy and to study their properties. At the same time, the recurrence that excludes the matrix inversion operation ensures their convenience for real-time implementation and the use of a correntropy criterionthe robustness of estimates.

The aim and objectives of the study
The aim of this study is to derive the recurrent ratios of multi-step algorithms to maximize the criterion of correntropy. This would make it possible to obtain and adjust robust estimates while information about the studied process is acquired on-line.
To accomplish the aim, the following tasks have been set: -to derive the recurrent ratios describing a generalized modified multi-step algorithm that maximizes a correntropy criterion; to obtain analytical estimates of the convergence rate of multi-step algorithms maximizing the criterion of correntropy; to investigate the established estimation regime under the examined conditions; to model the process of evaluating the parameters of a linear object using multi-step algorithms.

Using correntropy as a similarity measure
Correntropy, defined as a localized measure of similarity, has proven to be very effective at obtaining robust estimates because it is less sensitive to emissions.
For two random variables X and Y, correntropy is defined as • is a symbol of mathematical expectation; ( ) k σ • are the invariants to the turns of Mercer's kernel; σ is the kernel width.
The most widely used in the calculation of correntropy are the Gaussian ones, determined from the following formula In the tasks of identification, filtration, etc., the functionality used is the correntropy between the required output signal d i and the output signal of the model (real) y i . Using Gaussian kernels, the optimized functionality takes the form: is the identification error (filtering). Using Taylor's series expansion for the Gaussian kernel makes it possible to record correntropy as follows: The last expression includes all the moments of the even order of random magnitude .
i j x y − In work [43], in order to eliminate pulse interference, it was proposed to use a recurrent method of weighted least squares (RLSM), minimizing the following criterion and taking the following form ( ) Here is 0 1 ≤ λ < -the weighting factor. Thus, the following approximation was used to derive the calculation formula P n+1 (4) 1 1 1

.
T n n n n n P P x x One can see from (5) that a given algorithm is some modification of the weighted RLSM.

Building recurrent ratios for a modified multi-step algorithm that maximizes correntropy
Current regression analysis algorithm that takes the following form ( ) to maximize correntropy was studied in [44]. In (6), L= ( ) is the memory of the algorithm. As already noted, algorithms such as the recurrent current regression analysis (RCRA), the modifications of RLSM that use the finite number of observations, are prom-ising to evaluate non-stationary parameters. However, while the use of all observations in LSM leads to an unbiased estimate, that is, the accumulation of information is a filter, then when using the finite number of observations, such filtration of interference is not possible. Therefore, in order to give the algorithm additional filtering properties, one can use the idea of exponential smoothing, using in the algorithm the mechanism of weighing the information.
The modified re-current algorithm CRA can be obtained similar to a weighted LSM, which has a mechanism to give large weights to new incoming information.
Denote the number of steps used in the construction of an estimate as L (L≥N). Inclusion in the weight matrix estimate the dimensionality L×L, where 0 1, < λ ≤ modifies (6) as follows: ( ) The feature of the algorithms with L=const is that the matrix and observation vectors used in the estimate construction are formed at each estimation step as follows: they include information about newly received measurements and exclude information about the oldest ones. Depending on how these matrices and vectors are built (whether new information is added first and then the outdated is excluded, or whether the outdated is first excluded and then a new one is added), two forms of evaluation are possible.
Receiving new information (adding a new dimension) results in a calculation of the estimate, which, similar to (8), can be recorded as follows: Consider the modification of the current regression analysis algorithm used to maximize correntropy that takes the following form Introduce the following designations: Then Applying to (14), (15) a lemma about matrix inversion, one can obtain, as already noted, two forms of calculations: one uses first the accumulation of information (includes the newly received signal x n+1 ), and then resets the outdated information (excluded signal x n-L+1 ), and vice versa.
Thus, the refinement of estimates and the calculation of the matrix when the outdated information is reset is performed, respectively, according to the following formulas ( ) and the ratios that describe the accumulation of information take the following form ( )

.
T n n n n L n L n L n L T n n n n L P x x P P P x P x Thus, the recurrent evaluation algorithm, obtained by excluding outdated information and then adding a new one, is described by ratios (16) to (19).

Algorithm convergence study
To determine conditions for the convergence of algorithm (16) to (19), we shall introduce a Lyapunov function [44]. Subtracting c* from both parts of (18) considering (10) to (12), write down the algorithm regarding identification errors where I is the singular function N×N.
On the other hand, taking into consideration (17), ratio (21) can be rewritten as follows: n L n L n L n L P P − Similarly, writing down (18) regarding estimate errors and considering (19), we obtain T n n n n L n L n L T n n n n L n L n L n L P x x I x P x P P Fitting (23) Applying formulae (14), (15), we determine To calculate the third term in the bracket, we introduce the following designations: For the algorithm to converge, one needs to meet the following condition that is, the expression in the right-hand part of (30) should be negative. Because 1, λ ≤ the first term is not positive. For the second term to be negative, the following condition must be met For the first term of inequality (31), taking into consideration the introduced designations, we obtain ( ) ( )    It should be noted that, given that the Cauchy-Bunyakovsky inequality is satisfied, the following holds 1 1 1

.
T T n n n L n L n L n L T T n n L n n L n L n L x P x x P x x P x x P x It should be noted that given that (32) and (34) are met, 0 A ≥ and 0. B ≥ Considering the designations in (33) and (35), condition (31) can be written as follows: One can see from (36) that, in order to ensure the convergence of the algorithm, it is required that A and B should exceed unity.
On the other hand, consider the difference (B−A). Substituting expressions for A and B (33) and (35) And since both the numerator and the denominator of a given expression are not negative, the difference 0, Thus, to meet the conditions for the convergence of the algorithm, that is, 37) or inequality (36), it is necessary that 1. A ≥ Consider the properties of the P -1 matrix, which is part of the Lyapunov function. The positive certainty of this matrix is required for the algorithm to converge. Consider a step-by-step change in P -1 according to (15).
If, in the n-th step, the matrix Therefore, the Lyapunov function, if these conditions are met, will be non-negative and limited that is, it is limited by that is, the identification error decreases as time increases. Because and, on the other hand, According to the above formulae, the amount of the estimation error 2 1 n+ θ decreases over time.

Exploring the established regime
At last, consider the established mode when the estimate is no longer adjusted, that is, 1 .
Consider the singular numbers of the matrices 1| n L P + and 1 | .
n L P − As it is known, for the maximum and minimum values of the singular numbers σ max , σ min and the eigenvalues μ of the square matrix A, the following ratio holds Since the matrices 1| n L P + and 1 | n L P − are square, and their maximum singular numbers σ max are no less than the maximum eigenvalues μ max , we obtain n L n L n L n L P P P P It follows from the Cauchy-Bunyakovsky inequality that Because, as shown above, the matrix n L c c = that is, the estimate obtained using the algorithm in question is unbiased.

Simulation of the process of evaluating the parameters of a linear object using multi-step algorithms
They conducted 2 experiments. The first considered the task of building a stationary model, described by equation (1), with the following parameters  (2), σ=1 for different values of the algorithm memory L. Fig. 1, a shows the simulation results for λ=1; Fig. 1, bfor λ=0.5. In the second experiment, it was assumed that in model (1) the 3 , * θ 5 , * θ 8 , * θ parameters changed according to the sinusoidal law while the rest of the parameters remained the same as in the first experiment. The results of a given experiment are shown in Fig. 2.
The above results demonstrate that when assessing the stationary parameters of model (1), it is advisable to enhance the memory of the multi-step algorithm (approximating it with RLSM). If non-stationary parameters are evaluated, then one should choose the memory whose magnitude is minimally different from the dimensionality of the object (in our case, N=1 and L opt =10).

Discussion of results of studying the convergence of a multi-step algorithm of correntropy maximization
The research presented in this paper is a continuation and development of earlier studies described in [17,28,44]. While [17,28] considered the issues related to obtaining robust estimates based on the minimization of the combined functionalities, then [44] addressed the task of robust training of the Adaline neural network. The results reported in those works were applied in this paper to study the properties of a generalized multi-step algorithm that maximizes the criterion of correntropy.
The developed recurrent ratios, describing a generalized modified multi-step algorithm to maximize the criterion of correntropy (16) to (19), make it possible to evaluate unknown parameters on-line as information becomes available.
According to the above formulae, the implementation of a given algorithm does not cause difficulties. Note that the initial values for the observation matrix should be selected in a similar way to RLSM.
The use of Lyapunov functions has made it possible to define the condition of convergence for a multi-step algorithm, determined from expression (36). Based on the given analytical estimates of the convergence of the multi-step CRA algorithms, we derived expressions to select the optimal values of the algorithm parameters that ensure its maximum convergence rate. The resulting formulae indicate that the value of the estimation error n L c c = It should be noted that the implementation of the resulting recurrent algorithm, described by ratios (16) to (19), does not cause difficulties; it is similar to the implementation of RLSM.
Although all theoretic results have been obtained for the case of stationary parameters estimation, the results of the simulation show the effectiveness of the application of the maximum correntropy functionality for identification and linear non-stationary objects (experiment 2).
Here's what one needs to consider when applying the considered algorithm in practice. Based on our results, when evaluating stationary parameters of model (1), it is advisable to enhance the memory L of the CRA multi-step algorithm. That ensures that it is closer to the RLSM, which is optimal for a stationary case. If non-stationary parameters are evaluated, the simulation shows that, given L≥N, one should select the memory L whose magnitude is minimally different from the dimensionality of the object N.
In addition, it should be noted that the estimates received in this work depend on the parameters σ used in the algorithm (kernel width), 0<λ≤1 (information weighting parameter), and L=const (algorithm memory), the issue of selecting the values of which remains open. Therefore, when applying a given algorithm in practice, one should use the estimates of these parameters. However, the estimates derived from (40) to (42) allow the researcher to pre-evaluate the limits of a given algorithm and the effectiveness of its application when solving practical tasks.
The limitation of our study is to consider only a regular case (no interference), although it makes it possible to determine the limits of the algorithm. It would therefore seem appropriate to extend a given approach for the case involving interference to obtain appropriate statistical estimates.
In addition, the continuation of research into the dynamic properties of a given algorithm is of undoubted interest. That could make it possible to assess the effectiveness of the multistep algorithm to evaluate the parameters of a non-stationary  (1) in the presence and absence of information about the type of non-stationarity.

Conclusions
1. The recurrent ratios have been derived describing the generalized modified multi-step algorithm (CRA) that maximizes the criterion of correntropy. The use of given ratios makes it possible to evaluate the unknown parameters of an examined object under on-line mode as information becomes available and ensure that the estimates are robust. According to the derived formulae, the implementation of a given algorithm does not cause difficulties. The initial values for the observation matrix should be selected in a similar way to RLSM.
2. The application of Lyapunov functions has made it possible to determine the condition from the convergence of the multi-step algorithm, described by the expression, reduced to the verification of the ratio between the matrices of transformations A and B. Based on the obtained analytical estimates of the convergence of multi-step CRA algorithms, we defined expressions for the selection of the optimal values of parameters of the algorithm, ensuring its maximum convergence rate. According to the above formulae, the value of the estimation error it is necessary to know these numbers.
3. Our study of the established estimation regime under the considered conditions showed that, due to the inequality to zero of the eigen numbers of the observation matrix 1 1| | , n L n L I P P − + − λ the estimates received would be unbiased, that is, * .
n L c c = 4. We have performed the simulation of the process of evaluating the stationary and non-stationary parameters of a linear object at a different selection of the depth of the algorithm's memory. The simulation results form the basis for selecting the parameters of the algorithm when it is implemented. Based on the analysis of the simulation results, the following conclusions can be drawn. First, the application of maximum correntropy functionality is effective enough to identify the linear stationary and non-stationary objects (experiment 2). Second, the choice of the depth of the algorithm's memory ( ) const L L N = ≥ is different for identifying stationary and non-stationary objects. In the first case, the memory needs to be enhanced, in the second casereduced to a value that is minimally different from the dimensionality of the object N.