DEVELOPMENT OF A DATA ACQUISITION METHOD TO TRAIN NEURAL NETWORKS TO DIAGNOSE GAS TURBINE ENGINES AND GAS PUMPING UNITS

The application of neural networks is one of promising ways to improve efficiency when diagnosing aviation gas turbine engines and gas pumping units. In order to start functioning of such network, it should be trained first using the pre-defined training sets. These data should fully characterize work of the object in a wide range of operating modes and at various technical states of the diagnosticated assemblies. In addition, it is necessary to have a similar data set to monitor quality of the neural network learning. To train the network to recognize faults of one type, a set of 20‒200 or more training examples is required. Obtaining such information in operation or in full-scale tests is a rather long or costly process. A method for acquisition of training and control data sets was proposed. The sets are intended to train static neural networks recognizing single and multiple faults of the elements of air-gas channels of gas turbine engines and gas pumping units. The method enables obtaining sets of working process parameters describing operation of objects at various technical states of an air-gas channel, effect of measurement errors and object functioning in a wide range of modes and external conditions. Composition of the pumped gas is additionally taken into account for gas pumping units. To obtain the required parameters, a mathematical model of the working process of the object of the second level of complexity was used. The sets characterize work of operable objects and objects with significant malfunctions in spools of compressors and turbines and in a combustion chamber and for the case of a gas pumping unit, in its supercharger. Two variants of formation of sets were considered: using the measured parameters of the working process; deviations of the measured parameters from their reference values and the parameters used as regime parameters in the mathematical model of the working process. For the second variant, check of expediency of including the regime parameters in the sets was made. It has been shown that regime parameters can be excluded from data sets in some cases


Introduction
Automated analysis of operating parameters implemented within a computer diagnostic system is one of the ways to reduce workload on experts and improve quality and efficiency of diagnosing gas turbine engines (GTE) and gas pumping units (GPU).One of the promising methods for determining technical state (TS) of an object implies its diagnosing by neural networks [1,2].
The neural network diagnostic analysis of operational information can result in: -assigning the controlled object to one of the classes of technical states (for example, operable engines or engines having troubles in their compressor or turbine units);
In terms of diagnosing using a neural network, in order for the network to start working, it must be trained first using preliminarily prepared examples.Neural networks have the tendency to retrain.When retraining, the network precisely describes the training set of data but poorly describes data not included in this set.To solve this problem, a method of two [3][4][5] or three [2] sets of data can be used.
In the more general three-set method, network training is conducted using the first (training) set.Upon reaching the required accuracy of TS recognition, training is stopped and the second (control) set is input to the trained network and correctness of its recognition is evaluated.If the accuracy estimate obtained for this set is much worse than the one obtained for the training set, one can talk about the network retraining.The network structure should be changed and the network re-trained.Thus, the control set is actually included in the training loop.Therefore, when required accuracy of the network operation with the control set is achieved, the network operation must be checked again with the third (test) set.This set should not be used more than once.
Information contained in the sets should sufficiently fully represent all types of technical states of the objects under consideration (an operable object or an object having the malfunction types in question) and, if necessary, conditions and modes of the object operation.
Operational information or mathematical modeling results are used as a source for such data sets.
It should be noted that in order to train the net recognizing each TS class, it is necessary to have 20 to 200 or more calculation points.Each such point includes diagnostic information (measured operating parameters) for one of possible combinations of characteristics of the main GTE units in their normal or malfunction state.Acquisition of such information in operation, given the low frequency of serious malfunctions and, moreover, their combinations, is a rather long process.Acquisition of such information in full-scale tests is rather costly.
The results of numerical experiments using a mathematical model of an operation process (MMOP) can be the only real source of the main information content.Information on operation of GTE/GPU with actual faults in the air-gas path collected in full-scale modeling or information collected in operation can only be used to form a test set.

Literature review and problem statement
Two approaches are used to form sets.The first of them involves collecting information on operation of intact engines and engines having serious faults in the air-gas path.In a number of papers, to obtain such data, it is suggested that experiments should be carried out using engine test rigs.At the same time, malfunctions are introduced artificially in the engine measurement system [6] or in its air-gas path [7].Disadvantage of such an approach to formation of a training set is the high cost of carrying out work, a need for engine test rigs and the engine in which faults are introduced.
It was proposed in [8] to form neural networks based on the data obtained at the beginning of the new GTE operation.Subsequently, such a network is used as a standard for the intact state.Disadvantage of this method is that the result of the neural network operation will consist in just establishing the fact of operability/malfunction of the object as a whole.
Work [9] is devoted to the problems of creating neural networks to predict gas temperature behind the aviation GTE turbine.Information on operation of an operable engine was used for training the network.This approach is effective for identifying simple malfunctions, but, like in the previous case, it does not provide diagnostics of complex technical objects "to the assembly depth".
The second approach involves application of mathematical modeling methods to obtain the required amount of information.
A method for acquisition of training and control data sets is considered in [10].Disadvantage of this method consists if the use a simplified linearized model which makes it possible to calculate variation of the measured parameters depending on variation of parameters of the object TS.At the same time, changes in conditions and operating modes of the GTE are not taken into account which leads to a significant narrowing of the scope of diagnostic regimes.
In [3], methods of mathematical modeling are also used to obtain necessary data.However, the method of conducting numerical experiments is practically not described.
Mathematical models considered in [4] can be used to evaluate the GTE TS and generate training data and a sufficiently detailed description of a method for preparing the network using two sets is given in [5].Issues of formation of training sets are considered in these studies in a very compressed form.
In [11], mathematical models, which can be used to evaluate the GTE TS and generate training data, are also considered.The issues of formation of training sets are practically not touched in this study.
A review and detailed description of various MMOPs that can be used to obtain required data sets are given in [12].However, the issues of organizing numerical experiments are not considered there.
As can be seen from analysis of the studies, they contain an incomplete, fragmentary description of the method for acquisition of necessary data sets.Besides, significant simplifications and assumptions are introduced in some studies concerning development of the method itself.At the same time, in most of the listed studies, the issues related to the influence of measurement errors are left unresolved.It can be noted that the main works in the field of artificial intelligence relate to diagnostics of gas aviation GTE and steam turbines.The issues of GPU diagnostics are rarely considered.An option of solving these problems is acquisition of data necessary for preparation of the diagnostic neural network using mathematical modeling methods.Availability of the appropriate method would make it possible to take into account changes in the object technical state, modes and conditions of its operation as well as the effect of errors of parameter measurement.

The aim and objectives of the study
The study objective was to develop a method for conducting numerical experiments to obtain training and control data sets to be used in training static neural networks for diagnosing air-gas path of gas turbine engines and gas pumping units.
In developing the method for conducting numerical experiments, the following tasks had to be solved: -develop an algorithm that takes into account changes of the object technical state in experiments; -develop an algorithm that takes into account changes of operating modes and external conditions of GTE/GPU operation in experiments; -develop an algorithm that takes into account influence of parameter measurement errors in experiments; -develop an algorithm that takes into account arbitrary chemical composition of the working medium in the GPU supercharger and its fuel in experiments; -combine the developed algorithms into a single method for conducting numerical experiments.

Method for acquisition of a data set for training
the neural network to diagnose the GTE/GPU air-gas channel

1. General characteristics of the data set
All of the aforementioned sets are matrices.Each line of such matrix (calculation point, training example) is a set of data characterizing work of a particular object in a given mode under given external conditions.The calculation point includes two vectors: -a vector input to the neural network (measured parameters of the work process or their deviations from the standard values); -a vector of expected outputs of the neural network (markers indicating to which class/classes this point belongs or parameters numerically characterizing the object TS).
When forming the datasets, it is advisable to use a nonlinear MMOP of the second level of sophistication [13,14], which uses formal description of characteristics of the main elements of the GTE/GPU air-gas path (compressor, combustion chamber, turbine, etc.).
In the course of the experiment, the vector of regime parameters of the model, R, is applied to the input of the engine MMOP with a predetermined TS of its assemblies.The required parameters are recorded at the model output.The measured parameters themselves and their relative, , D or absolute, D, diagnostic deviations (DD) can be used as the diagnostic parameters in the network training.
where P i , S i P are the values of the i-th parameter for the diagnosed and standard GTE, respectively, in the same mode and under the same operating conditions.The vectors composed of these parameters are calculated using MMOP where Δа is the set (vector) of the MMOP parameters determining difference of characteristics of the air-gas path elements of the simulated object from the standard ones; F(*) is the object MMOP.
If DD is used, all or part of regime parameters, R, can also be included in the set.The essential correlation of the j-th regime parameter with the obtained DDs is the condition of its inclusion in the set.
When using parameter values themselves instead of the DD, all recorded regime parameters, R, must be necessarily included in the set.

2. Accounting for technical state of the object
To obtain parameters of an engine with a changed airgas path, the object model, F(*), should allow one to correct functional characteristics of this channel elements.One of the methods of such correction consists in scaling characteristics of the assemblies [13,14].For example, in order to obtain an individual functional characteristic of a turbine, the following dependences are used in this method: where Α Т is the turbine flow parameter; h * , T π * T is the efficiency factor and the pressure ratio of stagnant pressure in the turbine; l Т is reduced circumferential velocity of the turbine; Α 0 ,  a for the k-th class of TS depends on the malfunction considered.For its simulation, it is proposed in [1] to use normal and uniform distributions.It is indicated in [4,10] that the use of uniform distribution provides better representation in the class of objects with varying degrees of fault manifestation.In addition, this distribution provides more data in the areas most difficult for classification at the boundaries of classes.Taking into account this fact, it can be assumed that the main factor has an uniform distribution and may vary within the limits D min , Then the value of this factor for the k-th class is where uni f is the random-number generator obeying the uniform law with parameters D min , As usual, in the case of a malfunction in the air-gas path, the change in the values of the selected factor pair has a significant correlation.Taking into account this fact, it was assumed that the auxiliary scale has a statistical dependence on the main one.
For a unidirectional variation of the main and auxiliary scales (for example, the change of efficiency factor and flow rate through the compressor when it is fouled), values of the auxiliary scales may be given as where norm f is the generator of random numbers obeying the normal law (the first parameter is mathematical expectation, the second is the standard deviation); K is a constant; D m k a is the current value of the main scale.
In the case of opposite change of the main and auxiliary scales (for example, in burnout/melting f the turbine blades, reduction of the efficiency factor is accompanied by an increase in flow of the working medium through the turbine), values of auxiliary scales are found according to the dependence The value of parameter K in dependences ( 6), ( 7) depends on the TS class in question.For example, if the data given in [15] are taken into account, then in the case of various flaws in the compressor (distorted blade or air-gas path geometry, increased roughness), auxiliary to main scale ratio is close to 1 but may vary depending on the acting damaging factors in the range of 0.6 to 1.4.Proceeding from this, when using normal distribution in dependences ( 6), (7), values in a range of 4 to 8 can be recommended for the K parameter.
If it is necessary to train the network recognizing TS at a simultaneous occurrence of two or more malfunctions, the scale values are determined similarly, by taking into account selected TS classes.

3. Accounting for external conditions and operating mode
To adequately recognize TS, the neural network must be trained using data obtained for the conditions and operating modes at which diagnosis will be made.In this case, the object operation in all diagnostic modes should be presented in the same way.Then the value of the j-th regime parameter of the model will be where min , j R max j R are the minimum and maximum values of the j-th regime parameter in diagnostic modes.

Accounting for parameter measurement errors
Errors and gross errors of measurement are the last factor that can be taken into account when forming sets for training neural networks.
The following dependences can be used to obtain parameters P, P S included in formulas ( 1) and ( 2): where P, S P are the values of diagnosticated and standard GTE parameters, respectively, containing the measurement error; D max , j R D max i P are the maximum errors of measurement of the j-th regime and the i-th diagnostic parameter, respectively; n r , n p are the numbers of regime and diagnostic parameters, respectively.It was assumed in dependences ( 9), (11) that the measurement errors do not have a systematic component and are distributed according to the normal law.
If the goal is to train the network to detect gross measurement errors, it is necessary to create two classes.All possible TS combinations, conditions and operating modes will be presented in one class where gross errors are absent.Each point of the second class will contain several randomly chosen parameters with values in which a gross measurement error that exceeds the value of D max j R or D max i P is introduced:

p p k k
Concrete values of these coefficients and direction of change of the parameter (the sign of the function uni f in ( 14), ( 16)) depends on the characteristics of the measurement system and the error in question.

5. The scheme of numerical experiment
The scheme of carrying out the described numerical experiment for obtaining one calculation point of a set belonging to the k-th class is given in Fig. 2.

Fig. 2. The diagram of numerical experiment
To obtain the required amount of data, it is necessary to repeat the experiment multiply at a different initial value of generators of (pseudo)random numbers.
For some combinations of the set values of regime parameters and parameters of the object's TS, a situation is possible when MMOP (function F(*) in dependence (3)) cannot calculate the required mode because of leaving the operating point of one of the blade spools of the object beyond its functional characteristics embedded in the model.In this case, the model stops working with an error message.The results of such attempt must be discarded and new attempt made with new initial values of generators of (pseudo)random numbers.

6. Accounting for peculiarities of a gas pumping unit diagnostics
A gas pumping unit consists of two main parts: a gas turbine unit and a supercharger.The gas turbine unit is, in fact, a conventional turboshaft drive.The supercharger is a centrifugal compressor that compresses and pumps natural gas which is also used as a fuel for the drive.
Natural gas is extracted from different gas fields and its composition can vary considerably.It varies in a wide range of the lower calorific value of fuel, enthalpy, entropy, and the specific heat of the working medium of the drive (combustion products) and the supercharger (pumped gas) and, accordingly, the operation process of the GPU in general.Based on data given in [16], a possible range of gas composition variation was determined (Table 1).In the case of gas being purified from sulfur compounds, percentage of hydrogen sulfide in it can be taken zero.
When a dataset is formed, main components of the gas (methane, ethane, propane, carbon dioxide, nitrogen) are taken into account.Content of the l-th component (other than methane) is determined within the specified boundaries (Table 1).
Then the content of methane in the gas are: Y are weight or volume concentrations of the corresponding gas constituents.

1. Forming a dataset for an aviation GTE
For realization of the proposed approach, MMOP that is close by its characteristics to the PS-90A engine [13,17] was used (bypass engine with 2 rotors, mixing of flows, the bypass ratio: 5, thrust: 155 kN).
Diagnosis was made at take-off and initial climb.The values of R of regime operating parameters of MMOP were in the following ranges: barometric flight height H: (-100)...2,500 m above sea level; Mach's number M: 0...0.5; total temperature at the inlet to the engine: 238...313 K; relative humidity of air f: 0.3...1; velocity of the fan rotor n LP : 3,280...4,220 rpm (nominal and take-off modes).
Data sets were formed from relative DD parameters (dependence (1)) measured on the engine in operation: high-pressure rotor speed, D ;  2. The values of the vector components were determined on the basis of data in studies [10,11] and expert estimates.
When determining the values of D a k a (dependences ( 6), ( 7)), it was assumed that K=6 for all TS classes.At this value of K, the range of variation of ratio of the auxiliary scale to the main scale for the case of normal distribution in dependences ( 6), ( 7) can be estimated approximately as 0.5-1.5.
An example of the training set is shown in Fig. 3.The control set can be obtained in a similar way at other initial states of generators of (pseudo)random numbers.Influence of measurement errors was not taken into account in the numerical experiment.
Table 2 List and characteristics of classes of the engine technical state

2. Forming the data set for the gas-pumping unit
In order to realize the proposed approach, the MMOP close in its characteristics to the gas-pumping unit GPU-Ts-6.3/56M-1.45was used [16].The unit consists of the drive D-336 (low and high pressure rotors, a rotor of a power turbine with a rated power of 6.3 MW) and a supercharger N-196.
In addition to malfunctions of compressors, combustion chambers and turbines described in Table 2, an additional diagnosed fault was added: fouling of the supercharger.The scale of the natural gas flow through the supercharger unit, Da GS , was taken as the main TS parameter of this assembly in the MMOP (the range Da GS =(-0.03)...0.017 was taken for the intact supercharger and (-0.03)...(-0.072) for the fouled one.Scale of the efficiency factor of the supercharger unit, Da hS , was taken as an auxiliary parameter.
Eight main classes of TS were simulated: a faultless engine, troubles in a low-or high-pressure compressor, a combustion chamber, a high-or low-pressure turbine, a power turbine, and a supercharger.Rotational speed of the power turbine, n PT , was selected as a regime parameter.Diagnosis was carried out in high modes close to the nominal.Values of the regime parameters of MMOP were in the following ranges: -pressure at the gas turbine unit inlet: 73.3...110.6 kPa; -* in T : 223... 323 K; -rotational speed of the power turbine, n PT : 7,850... 8,300 rpm; -total pressure at the inlet to the supercharger: 4,000... 8,000 kPa; -total temperature at the inlet to the supercharger: 223... 333 K; -gas flow rate through the supercharger, Q: 7,000, 000... 15,000,000 Nm 3 /day.
Simulation was carried out for two cases: gas composition known and unknown.In the latter case, when calculating standard values of the operating parameters, S P , it was assumed that the gas consisted of pure methane.Thus, the numerical experiment has resulted in two data sets (Fig. 4).Each set point included the values of absolute DD (dependence (2)) of the following parameters: -rotation speeds of the low-pressure rotor, Δn LP (%) -the high-pressure rotor, ΔnHP (%); -total pressure, D * , According to the data shown in Fig. 3, qualitative analysis of the capability to recognize the selected TS classes has been carried out.Direction and degree of DD deviation of classes 2-6 (points 51-300) relative to deviations of class with no troubles (points 1-50) were analyzed.The analysis results are given in Table 3. Signs in Table 3 denote direction and degree of DD deviation corresponding to each TS class relative to the intact class.Signs , ¯ denote upward or downward DD shift, accordingly.Signs , ¯¯ indicate significant degree of DD shift and signs , ¯ are used for insignificant shift.As can be seen from the data presented, all selected TS classes are well separated in the multidimensional DD space and can be recognized.
Table 3 Direction and degree of DD deviation in faulty GTE Values of the factors of pairwise correlation between the obtained DD values and the regime parameters were calculated for analysis of influence of conditions and operating modes on the DD, Table 4.The values obtained for correlation factors were less than 0.07.This indicates that when using the DD to diagnose the engine, the regime parameters can be excluded from the sets.

2. Discussion of the results obtained for GPU
As can be seen from the results obtained, when gas composition is known, the diagnostic deviations react well to the change in the TS of spools (Fig. 4, b).When gas composition is unknown (Fig. 4, a), this factor significantly increases spread of the DD values and can significantly impede the process of TS recognition.
According to the data shown in Fig. 4, b, qualitative analysis of ability of recognition of selected TS classes was performed.Direction and degree of DD shift of classes 2-8 (points 51-400 relative to deviation of the intact class (points 1-50)) were analyzed.
The analysis results are given in Table 5.As can be seen from the data presented, all selected TS classes are well separated in the multidimensional space of DD and can be recognized.
Table 5 Direction and degree of DD deviation in faulty GPUs As shown above, parameters characterizing operating conditions of the gas turbine drive had no significant effect on the obtained DD values.However, the data sets for GPU may also lower heat of fuel combustion, Hu, flow of the gas pumped through the supercharger, Q, and the operating parameter of the model (power turbine rotational speeds, n PT ).To estimate influence of these factors on the DD, values of the factors of correlation between these parameters and the obtained DD values were calculated.The calculation results are given in Table 6.

Table 6
Values of factors of correlation between the lower heat of gas combustion, the flow of gas pumped through the supercharger and the diagnostic deviations (gas composition unknown) For a variant when the gas composition is known, all correlation factors are less than 0.2.
As can be seen from the data presented, effect of the n PT parameter is small and can be not taken into account in some cases.It is expedient to include Hu and Q parameters in the data sets when gas composition is unknown.

3. Discussion: the method application problems and prospects
The presented material provides a sufficiently complete and exhaustive description of the method for obtaining training and control data sets intended for training static neural networks for diagnosticating GTE and GPU.The described method makes it possible to form data sets that simulate information obtained in the course of long-term operation of a park of similar objects.At the same time, although the paper deals only with training of neural networks, the information obtained can be used in development of diagnostic methods based on other approaches.
The described method can be further improved by introducing concrete malfunctions (e.g.blade fouling, wear of labyrinth seals, improper adjustment of operation of the compressor guides vanes, etc.) instead of generalized failures (malfunction of the compressor, turbine, etc.).In this case, an opportunity appears to switch from diagnosing "to the depth of assembly" to diagnosing "to the depth of malfunction".To do this, it is necessary to clarify the list of malfunctions and values of parameters characterizing these malfunctions (Table 1) as well as form and parameters of dependences ( 4)- (7).
In conclusion, it is necessary to note that the main problem of using the proposed method consists in development and identification of its basic element, the mathematical model of the operation process of the second level of sophistication.To create it, it is necessary to have, at minimum, two-dimensional functional characteristics of all blade spools of the object.Besides, the process of developing such a model is laborious.For example, the volume of the code of the GPU MMOP in C++ language used in the study was about 2,000-2,300 lines and its development and identification for the data of a real GPU took about six months.But on the other hand, this work has resulted in that developers have got a powerful, versatile, and multi-purpose research tool.

Conclusions
1.An algorithm was developed that makes it possible to generate data describing operation of an object with malfunctions in the compressor and turbine spools, combustion chamber and supercharger.Its special feature is the use of scalable two-dimensional functional characteristics of the object blade spools in the mathematical model of its operating process.This, in turn, enables obtaining of a continuous, balanced description of the operation process of an object with any technical state of spools of its air-gas path.It was shown that all technical state classes obtained by simulation are well separated in the multidimensional space of diagnostic deviations and can be recognized.
2. An algorithm was developed that makes it possible to vary regime parameters of the mathematical model of the object working process.As a result of analysis of the data obtained in the numerical experiment, it was shown that when using deviations of the measured values of parameters from the standard ones as diagnostic information, the regime parameters practically do not affect the result of diagnosis of the GTE and may be not included in the data sets for training of the neural network.
3. An algorithm was developed that enables modeling the effect of errors and gross errors of measuring parameters of the object operation process on the results obtained in a numerical experiment.The use of the developed algorithm provides the opportunity to acquire data for preparation of diagnostic neural networks which work steadily even in the presence of errors and gross errors occurred in measuring parameters of the operation process.
4. An algorithm was developed that takes into account arbitrary chemical composition of the working medium in the GPU supercharger and the effect of this factor on the diagnostic process has been studied.Diagnostic situations were considered when chemical composition of the pumped gas is known and unknown.It was shown that when gas composition is unknown, it is necessary to include the lower heat of combustion of the fuel gas and flow of the gas pumped through the supercharger in composition of the sought sets.
5. A method for conducting a numerical experiment in order to obtain training and control sets for training static neural networks diagnosing the air-gas path of the gas turbine engine and gas pumping units has been developed.Application of the developed method makes it possible to form data sets of required volumes that characterize classes with both single and multiple faults of the air-gas path at different stages of their formation.Besides, these sets simulate the results of measuring of the working process parameters corresponding to different conditions and operating modes as well as the parameters measurement errors.

Fig. 1 .
Fig. 1.Example of standard functional dependences of capacity, Α 0 , T and efficiency factor, h *0 , T on the pressure ratio, π * , T at different values of the reduced circumferential velocity, l , T for a turbine of an aviation gas turbine engine

Fig. 3 .
Fig. 3. Values of relative DDs in a set designed to train the network for recognition of 6 classes of TS of an aviation GTE (Table 2).Points 1-50 belong to the first class, points 51-100 belong to the second class and so on.Relative DD of parameters are shown: rotational speed of the high-pressure rotor D ; HP n full pressure behind the fan, D * ; F P full pressure,

Fig. 4 .
Fig. 4. Values of absolute DD in the set designed to train the network to recognize 8 classes of GPU TS.Points 1-50 belong to the first class, 51-100 to the second class, etc. Gas composition is unknown (a), gas composition is unknown (b).DDs of the following parameters are shown: rotation speeds of low-pressure rotor, Δn LP (%) and high-pressure rotor, Δn HP (%); total pressure behind the compressor, D * , C P (kPa); temperature behind the low-pressure turbine, D * LT T (K); fuel consumption, DG F (kg/s); total pressure, D * S P (kPa) and total temperature, D * S T (K) behind the supercharger

Table 1
Characteristics of chemical composition of natural gas