ESTIMATING PARAMETERS OF LINEAR REGRESSION WITH AN EXPONENTIAL POWER DISTRIBUTION OF ERRORS BY USING A POLYNOMIAL MAXIMIZATION METHOD

At the beginning of the xix century, K. Gauss published several fundamental works, which introduced, among other things, a series of important statistical concepts such as “a least-square method”, “maximum likelihood”, as well as substantiated a “normal” distribution. This distribution law, often referred to as Gaussian, is by far the most common technique of probabilistic description. However, based on a series of theoretical limitations, it does not describe all possible variety of statistics. The axiomatic statement underlying Gauss’s approach was slightly altered at the beginning of the 20th century in work [1], where a more general law for the distribution of errors was justified, termed as an exponential power distribution (EPD). This probabilistic model is better suited to describe a larger number of phenomena that can be observed in reality. Its versatility is evidenced by the fact that the Laplace distribution, Gaussian distribution, and uniform distributions are separate cases of EPD at certain values of its form parameter. Therefore, various literary sources [2‒5] use alternative designations such as the “generalized normal distribution”, “generalized Gaussian distribution”, “generalized distribution of errors”, “Subbotin distribution”, etc. In addition, the family of exponential power distributions includes more complex multimodal [6] or asymmetrical [7, 8] modifications. Copyright © 2021, S. Zabolotnii, V. Chotunov, A. Chepynoha, O. Tkachenko

The properties and aspects of EPD practical application are addressed in many studies. In particular, works [2,3,9,10] consider the methods of finding point and, in [11], interval estimates of EPD parameters. Paper [12] reported the EPD characteristic function in the form of a Taylor series, and [5] investigated the properties of statistics of higher orders of this distribution.
The relevance of the problem under study is due to the fact that EPD is used as a probabilistic model in solving a large number of various practical tasks. In particular, to describe the error of measuring instruments and the results of measurement [13,14], random noise and interference during the processing of radio [15], video [16], and audio [17] signals; medical [18], biological [19], economic [20] statistics; in risk management [21], etc. Special features in the approximation of real data and verification of statistical hypotheses of correctness from an EPD model are discussed in [22,23]. Methods of generating random sequences for the implementation of statistical modeling are tackled in works [24][25][26]. Such a widespread EPD application gave rise to the development of a series of specialized software focused on computer statistical simulation. Among them, we note the library "normalp" [27] in the high-level programming language R, one of the most common among statisticians, which contains a set of functions for the statistical treatment of data acquired from EPD.

Literature review and problem statement
It is known that a least-square (LS) method is one of the most popular approaches to solving problems related to regression data analysis. This is due to the fact that a ordinary least-square (OLS) method makes it possible to derive a solution in a closed form. It does not require additional a priori information about the probabilistic properties of the error model and, when meeting a series of conditions imposed by the Gauss-Markov theorem, is the best linear unbiased appraiser. One of these conditions is the normalization of the distribution of regression errors, which minimizes the variance of parameter estimates [28]. However, in real conditions, there may be a significant deviation of the data distribution from the Gaussian law, which significantly impairs the accuracy of OLS assessments, in particular, in the presence of individual remote observations (emissions) [29].
One technique to improve the accuracy of regression parameter estimates for non-Gaussian data is to replace the quadratic loss function with another. This idea is not new, since back in 1793 Laplace proposed a method of evaluation based on the least absolute deviations (LAD) method, known as the regression appraiser in space L 1 [30]. Such an appraiser, when compared to OLS, is more effective for leptokurtic error distributions demonstrating tails that are heavier than those in the Gaussian law. A generalization of this approach is the construction of regression appraisers in the L p space, advanced in works [20,[31][32][33]. Another approach that ensures robustness is the use of nonparametric estimates of the sign [34] or quantile regression [29], which could be successfully applied to both heavy-tailed and asymmetric distributions. It is known that each above loss function has appropriate distribution models under which the estimates of the maximum likelihood estimation (MLE) method are equivalent to the estimates of methods that minimize corresponding losses. In particular, the OLS appraiser corresponds to MLE when the error distribution is normal; the LAD appraiser is equivalent to MLE for the Laplace distribution, and the L p appraiser corresponds to MLE when regression errors are described by EPD [35].
It is obvious that the key to the use of MLE is the a priori determination of the type of probability distribution. Note that in addition to the EPD family, many different types of symmetric distributions are used to solve the problems of regression analysis: elliptical laws (logistic, Cauchy, Student t-distribution) [36][37][38], various Gaussian distributions [39,40]. In addition, there are specially developed models of symmetric distributions that make it possible to change the value of the excess coefficient and the severity of the tails of regression errors [41,42]. The task is generally complicated by the fact that, in addition to selecting an adequate model for the distribution of errors, it is necessary to carry out a joint assessment of their parameters, the uncertainty of which significantly affects the accuracy of assessments of informative regression parameters. Under an assumption that model errors do not depend on predictors, an adaptive approach [43] is often used, which involves a series of sequential steps. At the first stage, the parameters of the deterministic component of the model are evaluated by a simple method that does not require additional information about the specificity of the probabilistic distribution of statistical data. After removing the deterministic component, the type of random component of the model is identified, and estimates of its parameters are found. At the third stage, they find refined (adaptive) assessments of the parameters of the deterministic component of the model, taking into consideration a posteriori properties of errors.
The adaptive approach can be based not only on likelihood maximization. An alternative is to use higher-order statistics, such as moments or cumulants, for the probabilistic description of regression models [44][45][46][47]. The application of this notation in combination with the adaptive approach to evaluation is the basis of a quadratic modification of the least-square method [48]. Different variations of this appraiser demonstrate their effectiveness only in the asymmetry of distributions, using as additional information the values of a posteriori estimates of asymmetry coefficients and the excess of regression residues [49][50][51].
The principle of adaptive evaluation by taking into consideration the properties of statistics of higher orders underlying the second-order least squares estimator (SLS) is similar to the idea of using a polynomial maximization method (PMM) to evaluate the regression parameters [52]. Paper [53] shows that when using, as the basis functions of power transformations, the PMM-estimation for the power S=2 demonstrates similar accuracy with the estimates of the SLS. The common property is also that in the symmetry of the distribution of statistical data, SLS (like PMM at S=2) degenerates into OLS. Therefore, it cannot be used to evaluate regression parameters with symmetric errors of the EPD type [53,54].
We also note a certain conceptual similarity of PMM to MLE, which implies that both methods employ the principle of maximizing certain selective statistics in the vicinity of the true values of the parameters that are evaluated. However, in contrast to MLE, PMM uses not the density of distribution, but a simpler probabilistic description in the form of the final number of moments or cumulants to form such statistics. Note the functionality of PMM, which was successfully used to assess the shear of symmetrical [55][56][57] and asymmetrical [58] distributions, to determine the moment of changes (disorder) of the properties of random sequences in the a posteriori problem statement [59], and others. In particular, works [60,61] consider the use of cubic (at the power of polynomial S=3) modification of PMM to solve the problem of adaptive evaluation of the scalar parameter of experimental data acquired from EPD. Their results suggest that PMM estimates may have significantly lower variances both compared to nonparametric estimates (median, middle, and mid-range) and to the estimates of maximum likelihood. But, since both the linear multifactor regression model and the principles of finding assessments of components of its vector parameter are more complex and differ from those discussed in the cited works, the issue of the effectiveness of applying a polynomial maximization method for such a task remains unresolved.

The aim and objectives of the study
The purpose of this study is to synthesize the algorithms of polynomial evaluation of parameters of linear regression and to analyze their accuracy depending on the properties of regression errors having exponential power distribution. This would make it possible to make a reasoned choice (taking into consideration the specificity of real problems) between the proposed solution and approaches based on classical methods.
To accomplish the aim, the following tasks have been set: -to apply a polynomial maximization method to synthesize algorithms for the adaptive evaluation of linear regression parameters; -to analyze the theoretical accuracy of PMM estimations depending on the values of regression error parameters; -to carry out, by statistical modeling, a comparative analysis of the effectiveness (according to a criterion of the amount of variance) of MLE estimates with respect to OLS and MLE.

Using a polynomial maximization method to evaluate regression parameters
Suppose there is a linear multifactor regression model of N observations that describes the dependence of the values of the target variable Y on the regressors X where the deterministic component is the linear function with a vector parameter θ={a 0 , a 1 , …, a Q-1 }, whose components are subject to evaluation. It is assumed that regression errors ξ v represent a sequence of independent and equally distributed random variables, which are adequately described by the exponential power distribution in the following form where the values of the parameters are of the scale σ β and form β, which may be a priori unknown.
To solve the set problem of regression analysis, one can use a PMM modification for the statistical evaluation of vector parameters with unevenly distributed data [52]. It is based on the property of maximizing the functionality in the form of a stochastic polynomial of the S order in a general form in the vicinity of the true value of some parameter a to be evaluated. As the basis functions of polynomial (4), one can use power transformations ( ) .
It is assumed that for random values y v there is a sequence of initial moments α i of the corresponding order and there are twice differentiated for the parameter a.
By analogy with MLE, the estimate of the parameter a can be derived by solving the following equation The optimal k iv coefficients that maximize functionality (4) and minimize the variance of estimates (for the corresponding order of the polynomial S) of the parameters are derived by solving the following system of linear algebraic equations


To find estimates of the vector parameter θ={a 0 , a 1 , …, a Q-1 }, it is necessary to use Q polynomials ( ) , for each component of the vector parameter a p . In this case, every p-th stochastic polynomial ( ) p sn L as a function of the parameter a p at known values of other components of the vector also has a maximum in the vicinity of the true value of this parameter at N→∞. Thus, the desired estimates can be found by solving the system of Q equations in the following form It should be noted that when using polynomials of the S≥2 power, finding the PMM estimates of a vector parameter in most cases requires the use of numerical methods for solving the systems of nonlinear equations. Our work employs an approach based on a Newton-Rafson iterative numerical procedure. This numerical method is often used in similar situations when using PMM to find estimates for the linear regression parameters [62,63]. It is based on the principle of linearization by decomposing the left-hand part of each nonlinear equation of system (8) into a Taylor series in the neighborhood of the true value of the vector θ. If limited to the first two terms of the series, it is possible to record the following linear system in a matrix form which can be used for the iterative search of valuation values.
To derive system (9), it is necessary to calculate the ma-trix-column F s (Y/θ (k) ), which is composed of the elements of the left-hand part of each nonlinear equation in system (8), and a square matrix where the elements of the matrix of the amount of information extracted To start the iterative procedure, it is assumed that there is some initial approximation θ (1) , for which "rough" assessments can be selected, to be found by a simpler evaluation method [52].

Finding PMM estimates for the linear multifactorial regression parameters
Works [53,54] show that with the symmetry of the distribution of the model of regression residues, the PMM estimations, derived by using the S=1 and S=2 order polynomials, are equivalent to the OLS estimates. Therefore, we consider the case of finding the PMM estimates based on the use of the S=3 order polynomials. The ratio for the first 6 initial moments, necessary to build a system of equations (6), can be represented in the form: To find the optimal coefficients that minimize the variance of PMM estimates at S=3, the following expressions are necessary for derivatives from the first 3 moments: By solving a system of equations (7) analytically by a Kramer method (taking into consideration expressions (11) and (13)), we obtain: 3 , 3 , 3 , Substituting the coefficients (14) in (8), after certain transformations, the system of equations for finding the estimates can be written as: It is obvious that the PMM estimates can only be found through a numerical solution to equation systems, in particular, using the Newton-Rafson iterative procedure (9), for which it is necessary to calculate the following elements As an initial approximation, those OLS estimates can be used whose formation does not require additional information about the properties of regression residues.

Accuracy of PMM estimates of the linear multifactorial regression parameters
Since, at S=1, the PMM estimates of the components of vector parameter θ of linear regression model (1) are equivalent to OLS estimates, their accuracy, in this case, coincides. The variation matrix containing the variance value of such estimates takes the following form It is known that such linear estimates are optimal (according to the criterion of the minimum variance) only for the situation when errors of the regression model follow a Gaussian distribution, which is known to be a separate case of EPD (3) at β=2.
For the analytical calculation of the amount of variance in the parameter estimates by a maximum likelihood method, we use a matrix of the amount of information according to Fisher, calculated for regression models with EPD [64]. Taking into consideration (18), the corresponding variation matrix can be represented in the following form To derive analytical expressions describing the variances in the PMM estimates of components of the vector parameter θ, we use the matrix J s (θ) of the amount of information obtained when applying the S order stochastic polynomials, consisting of elements (10). It has been proven in [52] that with an increase in the order of stochastic polynomial the variance of PMM estimates decreases while the amount of information extracted at S→∞ tends to the information by Fisher. In this case, the variances in the PMM estimates of the components of vector parameter θ in an asymptotic case (at N→∞) are elements of the main diagonal of the variation matrix, which is inverse to J s (θ).
Using expressions (12) for EPD moments and (14) for optimal coefficients, we can show that the elements of the variation matrix of PMM estimates at S=3 differ from the elements of the matrix of linear OLS estimates (18) by the following coefficient that can be represented in the form It is necessary to note the versatility of expressions (20) and (19) since they are equivalent to similar formulae describing the ratio of variances in the MLE estimates and PMM estimates to the variance of linear estimation in the form of arithmetic mean shift of EPD [61].
The analysis of these dependences, shown in Fig. 1, reveals that for the leptokurtic (sharp-topped) distributions at β<2 the relative efficiency of MLE can be significantly higher than other methods. In the case of Gaussian distribution at β=2, the accuracy of all three methods (MLE, PMM, and OLS) is the same. As the value of the form parameter β>2 increases, for a sufficiently wide range of values, the theoretical efficiency of MLE and PMM does not actually differ but can significantly exceed OLS. For the essentially plateau-kurtic (flat-topped) distributions at β>2, the relative efficiency of MLE increases slightly relative to PMM.
Note that the estimate of the coefficient of reduction of variance (20) can be represented not only as a function dependent on the parameter of the form of EPD but also as dependence on the values of the central moments μ r or dimensionless cumulant coefficients γ r : This representation makes it possible to more simply estimate the value of the efficiency of PMM estimates depending on the degree of non-Gaussian regression errors [54].

Statistical modeling by Monte Carlo method
Based on the statistical modeling by the Monte Carlo method, we have experimentally verified the theoretical results of our analysis of the effectiveness of various methods for evaluating the parameters of a linear regression model with EPD errors. Similar to works [60,61], its implementation employed the R language with the software module "normalp package". This module contains a set of functions for generating random values from EPD, as well as statistical estimation of linear regression parameters with EPD errors by a maximum likelihood method [27].
Since the theoretical values of the coefficients of the ratio of the variance in estimates (20) and (19) are the same for all components of the vector parameter θ, we used, as their quantitative empirical analogs, geometric averaging in the following form:  It should be noted that statistical modeling was carried out for two fundamentally different situations in terms of the presence of a priori information on the characteristics of the random component of the regression model. In the first (idealized, from a practical point of view) case, the value of the parameters of EPD distribution (3) was considered a priori known, and their values were used in finding the PMM estimates. This information also made it possible to calculate, based on (12), the values of paired moments up to the 6 th order, required to find PMM estimates at power S=3. It is obvious that, in real conditions, such a priori information is practically absent. Therefore, in the second case, an adaptive approach is used. As mentioned above, it is based on the hypothesis about the similarity of the distribution of a random component of the regression model and regression residues obtained using linear evaluation methods, in particular, OLS. Thus, instead of a priori parameter values necessary for both the use of MLE and PMM, their a posteriori estimates, found on regression OLS residues, can be used. It is obvious that this approach is asymptotically-approximate and its accuracy improves with an increase in the sample size N.
For statistical modeling of parameter evaluation, the deterministic component of regression model (2) used was a two-factor linear dependence with a parameter vector θ={1, 3, 1}, which were considered unknown and subject to evaluation. Tables 1, 2 give a set of statistical modeling results for different values of the magnitude of the parameter of EPD form β=1, 2, 3, 5, 10, and sample sizes N=20, 50, 100, 200, obtained from M=10 4 multiple experiments.

Experiment with real data on regression dependences with EPD errors
To illustrate the feasibility of the proposed approach, we shall use an example involving real data, considered in [27,64], which also investigates the use of EPD as a model of errors of linear regression. That set, taken from [65], contains data on the box office revenue, in millions of dollars (Gross), and the number of home videos sold in thousands (Video), for a sample of 30 movies. The task is to define the dependence of the number of videos sold on the profit at the box office, which is adequately described by the linear regression (Fig. 3, a).
The OLS estimates of the parameters of such a regression model are the following values: When using, to evaluate the parameters of regression by a maximum likelihood method and visualization of results (Fig. 3, b), the functionality of "normalppackage" software [27], we obtain the following estimated values: ( ) The analysis of the theoretical and experimental values of the coefficients of the ratio of the variance in estimates shows that there is a certain correlation of analytical calculations and results obtained through statistical modeling. The difference is due to the fact that the expressions describing the variances in MLE and PMM estimates were obtained for the asymptotic case (at N→∞). Data from Table 1 confirm that as the sample volume increases, the discrepancy between the theoretical and experimental data decreases.
The experimental results also confirm the thesis mentioned above that the accuracy of obtaining the MLE and PMM estimates is significantly influenced by the factor of the presence/absence of a priori information on the properties of the error model. For MLE, these are the parameters of EPD; for PMM, these are the paired central moments up to the 6 th order. Data from Table 2 reflect an important fact that in the absence of a priori information about the properties of EPD for flat-topped distributions (β>2), the efficiency of PMM estimates may be higher than the MLE estimates. This can be explained by the fact that, in these conditions, the impact of uncertainty of the value of the EPD form parameter on MLE is more significant than the impact of uncertainty of central moments on PMM. At the same time, the relative decrease in the variance of PMM estimates (compared to MLE) depends both on the form parameter and on the amount of sample values and is especially significant at small sampling. Such results are especially important from a practical point of view since for most real situations a priori information about the true values of the parameters of a random component of the regression model is absent.
Based on the set of our results, the following conclusions can be drawn about the relative effectiveness of PMM (in the absence of a priori information about the values of regression error parameters): 1. When the distribution of regression errors is close to Gaussian (β=2) the most effective (both in terms of accuracy and ease of implementation) are estimates of the least-square method.
2. For leptokurtic (sharp-topped) distributions with heavy tails, in general, the maximum likelihood method is more effective. For small sample sizes, the efficiency limit is near β=1.3 and smoothly asymptotically (with growth in N) increases to β=1.7.
3. For plateau-kurtic (flat-topped) distributions for a sufficiently wide range of values of the form parameter β the best efficiency is demonstrated by a polynomial maximization method. For small sample sizes, the lower limit of efficiency (between OLS and PMM) approximately begins at β=3 and parabolically asymptotically (with growth in N) tends to β=2. 4. For large values of the form parameter β>6 (which corresponds to the probabilistic distributions close to a uniform law) at large sample volumes, the maximum likelihood method is more effective, again. The boundary that separates MLE and PMM is even more dependent on the amount of sample values but the difference in their accuracy (in relation to the variance of estimates) is only a few percent. It should be noted that the current work is an integral part of the study on the feasibility of using a polynomial maximization method to find estimates of the parameters of statistical data with EPD. Its results are correlated to a large degree with the conclusions drawn in [60,61], which consider a simpler task of assessing the EPD shift parameter when using PMM.
In general, the results reported here provide an opportunity to choose the most effective among the analyzed methods for evaluating the parameters of linear regression depending on the volume of sample data and the properties of regression errors, in case they differ from Gaussian idealization but are symmetrical and adequately described by the EPD model.
The next stage of our research may address a modification of the proposed approach to find estimates of the parameters of regression of the nonlinear type, as well as the evaluation of parameters of autoregressive models.

Conclusions
1. The algorithm for finding adaptive estimates of the linear regression parameters, synthesized using a poly-nomial maximization method is less resource-intensive when compared to the estimation algorithm based on a maximum likelihood method. The simplification is due to the fact that, instead of identifying and evaluating the parameters of the distribution of regression errors, we used the estimates of central moments to the 6th order that are simpler in terms of computation.
2. The theoretical accuracy (of the variance in estimates) of a polynomial maximization method is close enough to the accuracy of a maximum likelihood method and can significantly exceed the accuracy of a least-square method. All three methods are characterized by the same accuracy only if the regression errors abide the Gaussian distribution (as a separate case of EPD).
3. The results of our statistical modeling indicate that for an almost important situation of the absence of a priori information about the values of regression error parameters (for plateau-kurtic EPD), the PMM estimates may demonstrate the lowest relative variance. At the same time, the maximum possible increase in accuracy with respect to OLS is up to 60 %, and with respect to MLEup to 10 %.