Development of Modern Models and Methods of the Theory of Statistical Hypothesis Testing

Typical problems of the theory of statistical hypothesis testing are considered. All these problems belong to the same object area and are formulated in a single system of axioms and assumptions using a common linguistic thesaurus. However, different approaches are used to solve each of these problems and a unique solution method is developed. In this regard, the work proposes a unified methodological approach for formulating and solving these problems. The mathematical basis of the approach is the theory of continuous linear programming (CLP), which generalizes the known mathematical apparatus of linear programming for the continuous case. The mathematical apparatus of CLP allows passing from a two-point description of the solution of the problem in the form {0; 1} to a continuous one on the segment [0; 1]. Theorems justifying the solution of problems in terms of CLP are proved. The problems of testing a simple hypothesis against several equivalent or unequal alternatives are considered. To solve all these problems, a continuous function is introduced that specifies a randomized decision rule leading to continuous linear programming models. As a result, it becomes possible to expand the range of analytically solved problems of the theory of statistical hypothesis testing. In particular, the problem of making a decision based on the maximum power criterion with a fixed type I error probability, with a constraint on the average risk, the problem of testing a simple hypothesis against several alternatives for given type II error probabilities. The method for solving problems of statistical hypothesis testing for the case when more than one observed controlled parameter is used to identify the state is proposed.


Introduction
Methods for solving numerous problems of identifying the state of objects and making decisions were laid down as a result of the emergence and development of an important area of mathematical statistics -the theory of statistical hypothesis testing. As you know, a statistical hypothesis is any statement about the form or properties of the distribution of experimentally observed random variables. The mathematical theory of statistical hypothesis testing was created in the 30 s of the last century and was significantly developed in a number of fundamental works [1][2][3]. At the same time, the mathematical formulation and solution of various practical problems of statistical hypothesis testing do not have a single methodological basis and remain an art. This circumstance is an inevitable consequence of the insufficient methodological basis of the theory of statistical hypothesis testing, the development of which virtually no one was engaged in. The lack of a universal approach to solving various problems of this theory leads to the incompleteness of its mathematical framework. This seriously complicates the search and development of possible approaches to new challenges arising from the needs of practice. Thus, the problem of developing a general universal approach to solving problems of the theory of statistical hypothesis testing is urgent. Using examples of specific problems of the theory of statistical hypothesis testing, we consider traditional technologies to solve them.

Literature review and problem statement
The works [1][2][3] consider the classical problem of testing a simple hypothesis Н 0 against a simple alternative Н 1 .
When solving such problems of testing hypotheses, it is necessary to construct a statistical criterion that allows making a decision on the degree of their agreement with the hypothesis Н 0 on the basis of observation results. Usually, such a criterion is constructed using a critical region, when some function of observations (or observations themselves) falls into it, the hypothesis Н 0 is rejected. The problem is to best select this region.
Let the observed parameter XÎW be a random variable with the distribution density f 0 (x/H 0 ), provided that the hypothesis H 0 is true, and f 1 (x/H 1 ), if the alternative hypothesis Н 1 is true.
Let us introduce the critical region wÌ W, when the observed parameter X falls into it, the hypothesis Н 0 is rejected (i. e. the hypothesis Н 1 is accepted). Then the probability of rejecting the hypothesis Н 0 , when it is true, called the criterion significance level (type I error probability), is determined by the expression: (1) In this case, the probability of accepting hypothesis Н 0 , when the hypothesis Н 1 is true, is called type II error probability. Thus, the values of type I and II error probabilities depend on how the critical region w is chosen. It is clear that the expansion of the region w leads to an increase in a, but a decrease in b. On the other hand, narrowing the critical region results in opposite results. It is natural to set the problem of determining the critical region being the best in some chosen sense.
Usually, the problem of testing a simple hypothesis Н 0 against a simple alternative Н 1 comes down to finding a critical region w for which the probability: of rejecting the hypothesis Н 0 when the hypothesis Н 1 is true (criterion power) takes the highest value, provided that type I error probability is equal to the set value α. This decision criterion is called the Neyman-Pearson criterion [1][2][3]. Along with the Neyman-Pearson criterion, the so-called Bayesian criteria based on average risk calculation are used in the statistical decision theory [4,5]. Let it be known a priori that the hypotheses Н 0 and Н 1 are true with probabilities Р(Н 0 ) and Р(Н 1 ), respectively, Р(Н 0 )+Р(Н 1 ) = 1. Then the average risk, depending on the choice of the critical area w, is determined as follows: where r 0 , r 1 are some quantitative cost (risk) estimates of type I and II errors, respectively. Now the problem of making a decision according to the Bayesian average risk minimum criterion is to choose such a critical region for the observed parameter X at which the function (4) takes a minimum value. A natural generalization of the given formulations of the classical problems of the theory of statistical hypothesis testing consists in rejecting the discrete nature of the criteria (acceptance or rejection of the hypothesis). A more general formulation of the hypothesis testing problem is as follows.
The decision rule А(х) is introduced: if the observed parameter has the value x, then the decision on the validity of the hypothesis Н 0 is rejected with the probability А(х). Such a rule is called randomized. The function А(х) obviously must satisfy the condition: Then the criterion power is written as: the significance level is: and the average risk function takes the form: where The relations obtained allow formulating the following typical problems of choosing the optimal randomized decision rule.
Problem 1. Find the function А * (х) that minimizes the functional (8) and satisfies the condition (5). In this case, the criterion determines the minimum average risk.
Problem 2. Find the function А * (х) that maximizes the functional (6) and satisfies the constraints (5), (7). In this case, the criterion power is maximized at a given level of significance.
Problem 3. Find the function А * (х) that maximizes the functional (6), satisfies the condition (5) and the relation: where R 0 is the allowable level of average risk. In this case, the criterion power is maximized for a given risk. Problems 1 and 2 are the usual two-alternative problems of testing a simple hypothesis in a traditional formulation [6]. However, the optimal decision rule is found here in a more general than usual class. Problem 3 is not considered by the classical theory of statistical decisions.
Attempts are known to formulate some problems for statistical hypothesis testing using a randomized criterion. For example, in [7] the problem of maximizing the criterion power provided that the type I error probability does not exceed a given one is formulated as follows. For a given sample of observations Х (n) , the function φ( ) ( ) X n called critical is introduced, which determines the probability of rejecting the hypothesis H 0 . However, this work lacks any consideration as to how to find such a function. The number of examples of using randomized decision rules to formulate various problems of statistical hypothesis testing can be increased. Some of them propose an approximate solution of the problem, reduced by discretizing the decision space to the application of a nonrandomized rule [8,9], in the rest, the consideration of the problem is limited to correct formulation [10][11][12][13]. However, there is no general approach to solving the problems arising here. This circumstance is a consequence of the speci fic nature of mathematical models of problems of type (5)- (8). By nature, these are isoperimetric problems of the calculus of variations. But it is impossible to use this mathematical apparatus here, since the desired function enters into the integrands of the optimized functional and constraints linearly. The Euler equation does not vanish anywhere.
Thus, it is quite clear that the use of randomized decision rules allows a uniform formulation of traditional problems of statistical hypothesis testing, reducing each of them to solving the corresponding optimization problem. The need to improve the methodological basis of the theory of statistical hypothesis testing is emphasized in [14,15]. At the same time, the problem of spreading the idea of randomization for solving more complex problems arising with several equivalent or unequal alternatives is important.
These circumstances initiate further research on the problem of statistical hypothesis testing.

The aim and objectives of the study
The aim of the study is to develop a general approach to the formulation of problems of statistical hypothesis testing using randomized decision rules. This allows making a reasonable choice of the method for solving optimization problems.
To achieve the aim, the following objectives were set: -to develop a general mathematical model of the problems of the theory of statistical hypothesis testing using a randomized criterion; -to develop a mathematical apparatus for optimizing the randomized criterion, adapted for solving traditional problems of statistical hypothesis testing; -to develop mathematical models and methods for solving an extended set of problems of the theory of statistical hypothesis testing.

Development of a general approach to the formulation of problems of the theory of statistical hypothesis testing
Problems 1-3 formulated in terms of (5)-(8) fit into a uniform mathematical model defined as follows: find the function y(x) that maximizes (minimizes) the functional: and satisfying the conditions: where A fundamental feature of the obtained mathematical models of the typical problems of the theory of statistical hypothesis testing is the need to find a continuous function that satisfies the system of integral constraints. Solving such a problem requires a special mathematical apparatus.

Development of a general method for solving problems of the theory of statistical hypothesis testing
The resulting problem (9)-(11) is a particular case of the general problem of continuous linear programming (CLP) [16].
CLP is a mathematical discipline dealing with the theory and methods of solving constrained optimization problems on a set of continuous functions. In this case, the optimized (objective) function of the problem and the constraints on it are described by integrals (Riemann or Stieltjes) that are linear with respect to the desired function of the conti nuous argument. Thus, CLP is a continuous generalization of conventional linear programming [17,18]. To solve the CLP problem, a special mathematical apparatus has been developed [16], which provides an iterative solution. However, in some simple special cases, this solution is achieved directly.
Let us introduce the functions: Then the problem (9)-(11) is simplified to the form: find the function А * (х) that maximizes (minimizes) the functional: and satisfies the conditions: First, we solve an even simpler problem: find the function А * (х) that maximizes the functional (12) and satisfies only the condition (14). To solve this problem, we use theorem 1. Theorem 1. The solution of the problem (12), (14) is: Proof.
Set the function с(х) as follows: In this case, the functional (12) takes the form: coinciding, up to a constant, with the average risk function (8), and the function А(х) in accordance with (14), (15) takes the meaning of the rule for rejecting the hypothesis Н 0 in problem 1. In accordance with theorem 1, the function А * (х) that minimizes the average risk is determined by the inequality (15). Thus, the average risk is minimized when А * (х) = 1, that is, if с(х)<0. In this case, the inequality holds: The obtained solution coincides with the classical Bayesian decision rule, according to which the hypothesis Н 0 is rejected if the inequality (17) is satisfied (the likelihood ratio exceeds a given threshold).
Let us now return to the original problem: find the function А * (х) that maximizes the functional (12) and satisfies the conditions (13), (14). The solution of the problem is determined by theorem 2.
Theorem 2. Introduce the subset: where λ is the root of the equation: Then the solution of the problem (12)- (14) is: or Proof.
Let us now give an interpretation of the obtained solution of the problem (12)- (14) in standard terms of the theory of statistical hypothesis testing. We set: Then the relations (12) and (13) take the form of (6) and (7), respectively, and problem (12)-(14) will coincide with problem 2 (maximizing the criterion power at a given level of significance). Now, taking into account (21), we define the meaning of the decision rule (20). In these terms: : : where λ is the root of the equation: In accordance with (20)- (22), the hypothesis Н 0 should be rejected if the likelihood ratio exceeds a given threshold. In this case, the maximum value of the criterion power is achieved for a given level of significance. This decision coincides with the Neyman-Pearson decision.
Theorem 2 gives an optimal solution to problem 3. However, the relations in (12)-(14) must be specified as follows: Then the problem (12)- (14) is converted to the form: find the function А * (х) that maximizes the functional: and satisfies the conditions: which corresponds to (5), (6), (8). In this case, the solution of the problem (20), defined by theorem 2, is: Thus, it is shown that the traditional problems of the theory of statistical hypothesis testing can be formulated uniformly and solved by continuous linear programming methods.

Generalization of the proposed method for solving the problem of statistical hypothesis testing
The proposed mathematical apparatus made it possible to significantly expand the set of solved complex problems of the theory of statistical hypothesis testing. This, in particular, concerns the problem of testing one main hypothesis against several alternatives.
We formulate a mathematical model of such a problem. Let x be a random variable with a distribution density f(θ, x), x ÎW depending on a discrete parameter, which can take one of the set of values Θ Î{ The problem is to find a rule according to which, on the basis of observation x, the main hypothesis Н 0 is accepted or rejected, according to which the obtained measurement belongs to the distribution f(θ 0 , x). The decision is made using the randomized decision rule 0 £ B(x) £ 1, which sets the probability of rejecting the hypothesis Н 0 when the observed value x is realized.
Acceptance (rejection) of the main hypothesis is accompanied by the possibility of errors. Moreover, if the hypothesis Н 0 is rejected when it is true, then this is the type I error.
On the other hand, if the hypothesis Н 0 is accepted in a situation where the hypothesis Н i , i = 1, 2, …, m is actually true, then this is the i-th type II error. By analogy with the two-alternative case, in a multi-alternative situation, the concept of criterion power is introduced as the probability of rejecting the hypothesis H 0 when any of the alternative hypotheses is true. The above relations make it possible to expand the set of problems of statistical hypothesis testing solved by continuous linear programming methods.
Problem 4. Making a decision according to the criterion of minimum average risk: find the function В * (х) that minimizes the functional: Problem 5. Making a decision according to the criterion of maximum power for a fixed unconditional type I error probability: find the function В * (х) that maximizes the functional: and satisfies the constraints: Problem 6. Making a decision according to the criterion of maximum power with a limited average risk: find the function В * (х) that maximizes the functional: Problem 7. Testing the hypothesis Н 0 against m simple alternatives Н i , i = 1, 2, …, m, according to the criterion of the minimum type I error probability for given conditional type II error probabilities: find the function В * (х) that minimizes the functional: and satisfies the constraints: The resulting optimization models belong to the class of continuous linear programming problems with a two-sided constraint on the values of the plan function, the methods for solving which are considered in [11].
At the same time, problems 4, 5, and 6 are the simplest problems of continuous linear programming and do not differ in structure from problems 1, 2, 3 discussed above.
Problem 7 is more complex -it takes into account possible differences in the risk of different type II errors, which arise if the main hypothesis is accepted in a situation where Н i , i = 1, 2, …, m is actually true. This is important, since in practice it often matters to which of the alternative hypotheses an error is made. However, this problem also fits into the general mathematical model of canonical problems of continuous linear programming.
Finally, consider a natural generalization of the problem of testing the main hypothesis against several alternatives in the case when all alternatives are equivalent.
Let, as before, x be a random variable with the distribution density(θ, x), xÎW, (or a multivariate density vector), depending on a discrete parameter that can take one of the set of values Θ Î{ } θ θ θ θ Acceptance or rejection of the hypothesis H i , i = 1, 2, …, m is accompanied by errors. Moreover, if the hypothesis H i is rejected when it is true, then this is the i-th type I error. The number of such errors is m. On the other hand, if the hypothesis H i is accepted in a situation where H j , i = 1, 2, …, m j ≠ i is actually true, this is the type II error of type (ij). In this situation, there may be (m 2 -m) such errors.
  is the distribution density of the values of the parameter x, provided that the hypothesis H i , i = 1, 2, …, m is true, the conditional probability of making a correct decision regarding the hypothesis H i is: The conditional probability of the i-th type I error is calculated by the formula: , ,..., , and the conditional probability of the (ij)-th type II error is determined by the relation: , ,..., , j m = 1 2 , ,..., , j ≠ i.
Let us now define the set g ij , i = 1, 2, …, m, j = 1, 2, …, m of numbers characterizing the risk associated with the occurrence of the corresponding type II errors. Then the average risk value is determined by the formula: The obtained optimization model belongs to the class of distribution problems of continuous linear programming, the algorithms for solving which are also considered in [11].
Thus, it is shown that various problems of statistical hypothesis testing regarding the state of an object can be formulated uniformly and solved by continuous linear programming methods.
The completion of the considered cycle of problems is the extension of the described methods to the case when more than one controlled indicator is used to identify the state. The proposed method for solving this problem is as follows.
Let there be a set x 1 , x 2 , …, x n of controlled indicators, for each of which the function f x H k i       -distribution density of the values of the indicator x k is given, provided that the hypothesis H i , k = 1, 2, …, n, i = 1, 2, …, m is true. The problem can be solved iteratively, for example, as follows.
At the first iteration, the indicator x 1 is selected and the problem of statistical hypothesis testing is solved in accordance with mathematical models

Discussion of the results of developing a general method for solving problems of statistical hypothesis testing
A review of the typical problems of statistical hypothesis testing is carried out, which revealed that known traditional methods for solving them are essentially and differently determined by the structure, nature and characteristics of each of these problems. The lack of a uniform technology, a common method for solving various problems of statistical hypothesis testing determines the theoretical and practical usefulness of developing a universal method for solving each of them in each particular case. The result obtained improves the methodological basis of the theory of statistical hypothesis testing, expanding the axiomatic foundation of this theory. The development of a uniform approach to the formulation of typical problems in the theory of statistical hypothesis testing makes it possible to significantly increase the list of such problems, enhancing the practical usefulness of the classical theory. The main result of the study is the development of a method of uniform formulation and solution of various problems of statistical hypothesis testing. The proposed method is based on the use of the mathematical apparatus of continuous linear programming (CLP). This method is a non-trivial generalization of the mathematical apparatus of linear programming by passing from a discrete space to a continuous one. At the same time, the range of solved optimization problems of the theory of statistical hypothesis testing is significantly expanded. These include the following problems. The problems of testing one main hypothesis against one or several alternative hypotheses, as well as problems of testing several equivalent alternative hypotheses are considered. In this case, solutions are sought not in a discrete, but in a more complete continuous class of mathematical descriptions. Theorems defining solutions to all typical problems of statistical hypothesis testing are formulated and proved. The use of the proposed randomized criteria and the mathematical apparatus of CLP significantly expands the range of problems in the theory of statistical hypothesis testing, which can be uniformly solved analytically.
A possible direction for further research is the development of the proposed method for an important case of uncertainty of initial data, when they are given indistinctly [19] or inaccurately [20]. In this case, the approaches proposed in [21,22] may be useful.

Conclusions
1. Using universal randomized rules, a uniform formulation of models of all typical problems of statistical hypothesis testing in terms and on the basis of a modern optimization apparatus (CLP) is made.
2. In terms of continuous linear programming, the technology for the analytical solution of typical problems in the theory of statistical hypothesis testing is developed. In this case, the problems of the theory of statistical hypothesis testing are reduced to a uniform mathematical scheme for optimizing the integral linear functional in the presence of integral linear constraints.
3. The theory and methods of continuous linear programming make it possible to expand the class of analytically solved problems in the theory of statistical hypothesis testing. In this case, randomized decision rules are used that ensure the transition from discrete two-alternative solutions to a continuous description of the solution in the form of functions that specify the probability of decision making.
The proposed method for solving problems of the theory of statistical hypothesis testing generalizes the known solution methods for the case when more than one independent indicator is used to identify the state of an object.