IDENTIFICATION OF NON-STATIONARY OBJECTS WITH NONGAUSSIAN INTERFERENCE

is the vector of unknowns M×1; ( ) k ξ is interference, and are reduced to minimizing some pre-selected quality functional (identification criterion). However, the identification problem is significantly complicated if the parameters θ change (drift) over time, i. e. ( ) var. k ∗ θ = The quadratic functional most widely used in practice leads to various identification algorithms that allow obtaining estimates of the sought vector ∗ θ with normal interference distributions, i. e. ( ) ( ) 2 0, . k N ξ ξ σ ∼ Most of available identification methods are based on the use of strict and difficult to test conditions associated with the hypothesis of normality of the interference distribution law and justified by references to the central limit theorem. As is known, the normal law of distribution density describes interference present in measurements carried out under absolute stability of measurement conditions, the Laplace’s law having longer “tails” – interference occurring under maximum instability of conditions. Accordingly, identification algorithms in the case of Gaussian interference are based on the least squares (LS) method, and in the case of interference distributed according to Laplace’s law, they are based on the least absolute deviations (LAD) method. Both of these methods are optimal in their conditions and Received date 04.09.2019


Introduction
The identification problem is not only of interest in itself, but is also an integral part of the general optimization problem. Many problems of management, forecasting, pattern recognition, etc., are associated with the construction of a model of the following form: where y(k) is the observed output signal; σ ∼ Most of available identification methods are based on the use of strict and difficult to test conditions associated with the hypothesis of normality of the interference distribution law and justified by references to the central limit theorem. As is known, the normal law of distribution density describes interference present in measurements carried out under absolute stability of measurement conditions, the Laplace's law having longer "tails" -interference occurring under maximum instability of conditions. Accordingly, identification algorithms in the case of Gaussian interference are based on the least squares (LS) method, and in the case of interference distributed according to Laplace's law, they are based on the least absolute deviations (LAD) method. Both of these methods are optimal in their conditions and the solutions obtained with their help may vary greatly. Furthermore, since in practice these extreme cases are very rarely implemented, neither the Gauss's law nor the Laplace's law are usually fulfilled.
In this regard, it seems very relevant to develop an approach to robust estimation of non-stationary parameters using some combined functional, which allows combining LS and LAD.

Literature review and problem statement
To estimate non-stationary parameters, modified LS algorithms (use of a sliding window or exponential smoothing, etc.), Kaczmarz algorithm proposed in [1] and its modifications, dynamic adaptation algorithms, etc. are commonly used. In particular, the Kaczmarz algorithm has been studied in sufficient detail in [2][3][4][5][6]. Modifications of this algorithm are associated with an increase in computational stability and improvement of dynamic properties. So, in [2], the modified (regularized) Kaczmarz algorithm was studied, in [3] its multi-step modification was considered, in [4] weighting of estimates to increase the speed of this algorithm was proposed. In [5], analytical expressions for asymptotic and non-asymptotic estimates were obtained and expressions for optimal values of the relaxation parameter of the Kaczmarz algorithm providing its maximum convergence rate were determined. In [6], a randomized version of the Kaczmarz method for consistent, overdetermined linear systems was proposed and it was proved that it converges at the expected exponential rate. In [7,8], the efficiency of the Kaczmarz algorithm in estimating the non-stationary parameters described by the first-order Markov model was studied.
It should be noted that both the algorithm and all the modifications mentioned are based on the use of quadratic identification (estimation) criterion, i. e., they are varieties of the least squares (LS) method. Being the optimal estimation method with Gaussian interference, LS is not stable with non-Gaussian interference. This is because in this case the objective function can grow to infinity and outliers can become dominant dimensions that actually test the real model. Alternatively, to ensure robustness, the objective function is modified to limit the influence of the largest measurements. The main consequence of this is generally a lower convergence rate of optimization algorithms. This is due to the fact that distinguishing between outliers and useful measurements for the first time is very difficult. In this regard, some outliers can be filtered out, leading to a decrease in convergence rate. In the most difficult case, small but biased measurements move the minimum of the objective function.
If information about the affiliation of interference ξ to a certain class of distributions is known, then by minimizing the optimal criterion, which is the inverse logarithm of the interference distribution function, the maximum likelihood estimate (M-estimate) can be obtained. If there is no such information, then to evaluate the desired parameter vector θ, one should apply some non-quadratic criterion that ensures the robustness of the resulting estimate. One of the criteria is modular criterion leading to a sign algorithm. Application of this criterion in the problem of object identification with impulse interference was considered in [9][10][11][12]. So, [9] studied the efficiency of the affine projection algorithm, [10] used the variable-gain affine projection algorithm. It should be noted that sign algorithms, providing robustness of the obtained estimates, have a low convergence rate. Therefore, in order to accelerate the estimation process, a normalized sign identification algorithm was proposed and studied in [11]. [12] studies an easy-to-implement algorithm, which uses RMS error and estimated interference power to correct the step size. The theoretical results of its stationary behavior, obtained for the case of Gaussian input signals, are in good agreement with the experimental results. Similarly, [13] considers the Kaczmarz algorithm with variable gain depending on the squared cross-correlation between the squared output error and adaptive model output and shows its effectiveness in solving some noise reduction problems.
There are a fairly large number of functionals that provide robust M-estimates. The most common are combined functionals proposed by Huber [14,15] and Hempel [16,17]. They consist of a quadratic functional that ensures the optimality of estimates for the Gaussian distribution and modular one that allows obtaining a more robust estimate for distributions with heavy "tails" (outliers).
These functionals (ρ) and their influence functions (ψ) have the following form: where e is the estimation error.
It should be noted that M-estimates are usually described by setting an influence function rather than a minimized functional.
The Huber function ψ is monotonic, and the Hempel function ψ is nonmonotonic. As noted in [18], with heavytailed distribution, the use of nonmonotonic ψ functions improves the estimation results.
The effectiveness of these functionals depends on how well the constants a, b, c and d included in them are chosen, which determine the degree of interference immunity. In the above studies, it is recommended to choose the values a from the interval [ ] , 2 , σ σ where σ is the standard deviation of observation x, and set the values b, c and d to 1.5, 3.5, and 8, respectively.
A modification of Huber estimates are the Mallows estimates [19]. Along with weighing of residuals ε, weighing of factors is performed, which makes it possible to reduce the influence of points that stand out sharply in the space of independent variables. To find the Mallows estimates, it is necessary to solve ρ+1 equations Due to the fact that the modular criterion provides an estimate that is less sensitive to interference distribution tails than the LS estimate, such a kind of modular criterion as A. Forsythe functional is of interest [20] ( ) ( ) , where 0 2. < λ < Forsythe estimates (4) are close to Huber ones, yet lack such a convincing theoretical justification. It is empirically shown that λ=1.5 is acceptable. For λ=2, the Forsythe estimates coincide with the LS estimates, and for λ=1 we obtain the least absolute deviations (modules) (LDA) method, which minimizes the functional ( ) To obtain Merrill-Schweppe estimates, the functional [20] is used k γ γ ≤ As follows from the above formulas and as noted above, the effectiveness of the obtained robust estimates substantially depends on the numerous parameters used in the criteria and selected based on the researcher's experience.
The practical application of the considered functionals for solving the identification problem was considered in many works. In particular, in [21][22][23][24] the robust approach was applied to the identification of nonlinear systems. For this purpose, radial basis function networks [21,22], evolving networks [23], and evolving radial basis function networks [24] were used. Learning of these networks was carried out on the basis of minimizing the robust functionals considered above.
Another approach to obtaining robust estimates devoid of this drawback is the use of a combined criterion.
A combined estimation criterion to accelerate the identification process using a combination of quadratic criterion and fourth degree criterion, proposed and studied in [25], was developed in [26][27][28][29][30]. [26] investigated the stability of the algorithm under Gaussian input signals. In [27], normalization of the least mean fourth algorithm was proposed, which protects the algorithm from divergence when the input signal power increases and an approximate stability boundary of this algorithm was obtained. In [28], the problem of stability of the adaptive least mean algorithm was considered and normalization of the update term of the weight vector using the fourth order in the regressor and the second order in the estimation error was proposed. This allows increasing the stability of the algorithm with increasing the dispersion of the input signal and the type of distribution of the input signal. [29] also studied the problem of increasing the stability of the least mean algorithm in the context of adaptive interference reduction and showed under what conditions the algorithm minimizing the fourth degree criterion is superior to the Kaczmarz algorithm. In [30], the mean-square convergence of the least mean fourth algorithm for various cases, including non-Gaussian interference distributions, was analyzed. However, the analysis assumes the presence of a reference zero-mean Gaussian signal, which is not always possible.
In [31], the fourth degree criterion was replaced with the least absolute deviations criterion, which made it possible to ensure the robustness of the obtained estimates under impulse interference conditions. The normalized modification of the identification algorithm considered in [32] was studied in [33,34], where the presence of impulse interference was also taken into account.
In [35], the use of an adaptive combination of two normalized filters to obtain robust estimates in the identification problem was studied. It should be noted that this criterion proved to be very effective and much easier to implement in the identification procedure.

The aim and objectives of the study
The aim of the work is to study the convergence of gradient algorithms of identification of non-stationary parameters described by the first-order Markov model under non-Gaussian interference and to determine parameters of the algorithms ensuring their maximum convergence rate.
To achieve the aim, the following objectives were set: -to obtain analytical estimates of mean and meansquare convergence of the gradient minimization algorithm of the combined functional; -to determine the maximum attainable (asymptotic) values of parameter estimation errors and identification errors in the considered conditions.

Obtaining analytical estimates of convergence of robust identification procedure
Note that to ensure robustness of the obtained estimates, it is quite effective to use a combined learning functional [31,32] ( ) ( ) is the output signal of the model; is the vector of estimated parameters 1; N × γ is the parameter affecting the convergence rate of the algorithm; [ ] 0,1 λ ∈ is the mixing parameter.
When using the combined criterion (6), the gradient minimization procedure has the following form This procedure combines the properties of LS with those of LAD, since when 1 λ = (7) implies the LS algorithm, and when 0 λ = -LAD algorithm (5), and allows dealing with impulse interference. By varying the parameter λ, one can change the properties of the algorithm.
To obtain analytical estimates in the non-stationary case, it was assumed, like in [7,8], that the non-stationary parameters of an object can be represented by a first-order Markov model It is assumed that components of the estimation error vector ( ) The mean of this distribution is determined by the formula is the Gaussian error function.
In view of (8), the expression for ( ) e k can be written as follows: Since it is assumed that ( ) ( ) where M is the symbol of mathematical expectation; • is the Euclidean norm. Consider the convergence of the procedure (7). To this end, we introduce the Lyapunov function ( ) From (13) it can be seen that since 0, γ > the increment of the Lyapunov function Thus, the convergence condition of the procedure (7) will hold if the parameter γ satisfies the inequality The optimal value of the parameter γ is determined from the equation obtained by differentiating (13) by γ and equating the derivative to zero. Thus We examine the statistical properties of the learning procedure (7) with measurement interference, i. e.
Suppose that interference is not correlated with useful signals. Writing (7) regarding learning errors, we have (12).
Consider the expectation Averaging both sides of (12), we obtain (17) It's easy to see that where xx R is the correlation matrix of the input signal. We consider in detail the expression where e σ is the RMS value of the error ( ). e k The expression (14) was obtained using Price's theorem [37], according to which for two random Gaussian quantities x and y with zero expectations, the following is true where y σ is the RMS value of y. Taking into account the properties of interference whence it follows that the procedure (7) Consider the Lyapunov function Taking the mathematical expectation from both sides of (13), we have 1 .
Consider each term on the right-hand side of (23) taking into account the statistical properties of signals and interference. In our case ; The expression for is analyzed by analogy with (14) Similarly, we obtain Substituting the expressions (19), (24)- (28) in (23)  1 . x It follows from (29) that the procedure (7) will converge in mean-square (the increment of the Lyapunov function will be negative) if the following condition holds ( 1

Determination of maximum attainable (asymptotic) values of parameter estimation errors and identification errors
The above relation (29) allows obtaining an expression for the asymptotic estimation error Substitution of (30) into (11) yields an expression for the asymptotic identification error.
The relations obtained give non-asymptotic and asymptotic estimates for the gradient least squares algorithm ( 1 λ = ) and gradient least absolute deviations algorithm ( ) 0 . λ = Thus, the conditions of mean-square convergence (29) of the gradient least squares algorithm take the following form 1.
x x − γλσ + γ βσ < Where we get the inequality for the parameter γ which coincides with the result obtained in [38]. Moreover, the asymptotic error is determined by the relation the parameter γ should be selected as variable and tend to zero with increasing k, that is, satisfy the Dvoretzky conditions [39].
For the gradient least absolute deviations algorithm obtained from (30) with ( 0 λ = ), the inequality for γ has the following form Substitution of ( ) the parameter γ should also be selected as variable and tend to zero with increasing n, i. e., satisfy the Dvoretzky conditions. We note that in [23,24] the question of choosing the variable mixing parameter λ, that is ( ),

Simulation of capabilities of investigated algorithms
For the experimental study of the capabilities of the algorithm (7), identification of a linear object (FIR filter [25,26]) was performed described by equation (1)  When testing the robustness of the algorithms, independent Gaussian interference with a much larger amplitude was added to the output signal of the object to simulate "outliers" in the system (impulse interference). An example of such interference is shown in Fig. 1. The simulation results are presented in Fig. 2-5. At the same time, graphs of the model parameters adjustment are shown on the left, and identification error changes on the right. Fig. 2 reflects the result of identification of a linear system with λ=1 in (7) and no measurement interference. Fig. 3-5 show the results of identification with impulse interference using the algorithm (7) for λ=1, 0.8, 0.6, respectively. When λ=0, i. e., when using the sign algorithm, the identification process diverges.  (7): a -result of parameters adjustment; b -identification error

Discussion of the results of the study of the identification algorithm of a non-stationary object with non-Gaussian interference
The studies presented in the paper are a continuation and development of previous studies described in [5][6][7]. In these works, rigorous results were obtained for the Kaczmarz and Nagumo-Noda algorithms in the identification of non-stationary objects under Gaussian interference. These results were used to study the properties of the gradient algorithm of identification of a non-stationary object with non-Gaussian measurement interference.
As the research results showed, the use of the combined functional for object identification under non-Gaussian interference allows obtaining robust estimates. Conditions of convergence of the gradient algorithm of identification of a non-stationary object with non-Gaussian measurement interference determined by the expression (30) were obtained. In addition, the relation for the maximum attainable (asymptotic) estimation error is derived as the formula (31). As shown in the paper, the obtained relations yield the non-asymptotic and asymptotic estimates for the gradient least squares algorithm (λ=1) and gradient least absolute deviations algorithm (λ=0). The experimental results shown in Fig. 1-5 indicate the effectiveness of the developed approach and feasibility of using the combined functional in solving practical problems.
The obtained estimates are fairly general and depend both on the degree of object non-stationarity and on the statistical characteristics of useful signals and interference. In addition, expressions for the asymptotic values of the parameter estimation error and asymptotic accuracy of identification are determined. Since these expressions contain a number of unknown parameters (dispersion of signals 2 x σ and interference 2 ξ σ and degree of object non-stationarity 2 S σ ), estimates of these parameters should be used for their practical application. So, in on-line identification, it is possible to apply any recurrent procedure for estimating them and use the resulting estimates to refine the parameters included in the algorithms. In off-line identification, the result obtained after all calculations should be corrected. In addition, the asymptotic values of the estimation error and identification accuracy depend on the choice of mixing parameter [ ] 0,1 . λ ∈ It should be noted that the estimates obtained in this paper allow the researcher, when solving practical problems, to preliminarily evaluate the capabilities and efficiency of this algorithm. However, the question remains of the optimal choice of the value of mixing parameter [ ] 0,1 . λ ∈ Therefore, it seems important to conduct research in the direction of: 1) studying the effectiveness of the developed approach in identifying non-stationary objects using a model different from the first-order Markov model to describe non-stationarity; 2) establishing the dependence of the speed of the identification algorithm on the degree of non-stationarity of the investigated object; 3) developing recommendations for choosing the optimal value of mixing parameter λ or rules for its correction; 4) studying the effectiveness of such an approach in identifying objects in conditions of not only impulse interference, but also interference, having, for example, asymmetric distributions.

Conclusions
1 Conditions are determined and analytical estimates of mean and mean-square convergence rate of the gradient algorithm of identification of the parameters described by the first-order Markov model with non-Gaussian measurement interference are obtained.
2. Non-asymptotic and asymptotic estimates of estimation accuracy and identification error values are obtained, which allow determining the maximum attainable properties of the algorithm. These estimates are quite general and depend both on the degree of object non-stationarity 2 , It is intuitively clear that the efficiency of choosing this parameter depends on the characteristics of the problem to be solved. Therefore, it is not possible to develop general recommendations regarding the choice. However, it seems advisable to develop recommendations for choosing this parameter for a number of distribution classes.