DEVELOPMENT OF UNIFIED MATHEMATICAL MODEL OF PROGRAMMING MODULES OBFUSCATION PROCESS BASED ON GRAPHIC EVALUATION AND REVIEW METHOD

6 advanced capabilities while providing a variety of security services in general. One of the methods of protecting the code of a software product is the obfuscation process [3], which provides the services of information security and privacy. Thus, obfuscation is an important component of providing practical security services. The software obfuscation process consists of subprocesses that may or may not be used depending on the business process, total runtime, and level of protection provided. Creating models for each particular case is a costly process that requires unification. In this regard, the urgent task is to develop an approach based on the unification of the mathematical formalization of the software protection process to assess the probabilistic


Introduction
The widespread use of computer systems increases the value and role of software. This determines the role of malicious cyber attacks aimed at discrediting software products and reducing the security of computer systems in general.
In accordance with the requirements of laws and regulations [1,2], software (computer programs) is one of the objects requiring protection at all stages of development and operation. At the same time, software obfuscation at the development stage can become one of the key processes for providing security, privacy, authentication and integrity services. This is primarily due to the feasibility of these algorithms and procedures of obfuscation at the very early stages of code development and implementation, as well as their characteristics of the runtime of the programming modules obfuscation procedures.

Literature review and problem statement
An increasing number of attacks aimed at reducing the security of computer systems in some cases is associated with the emerging gaps between the development of theory and the practice of applying theoretical results, as well as imperfect mathematical models that do not provide increased requirements for practitioners.
The review of the literature [4][5][6][7][8][9][10][11][12] showed that for a number of special cases, the final results of the study taking into account possible limitations are presented. So, in [4][5][6], comprehensive results are obtained for the case when a random process determining transitions from one state to another is formalized by the exponential distribution law. This allows simplifying the solution of problems, but in advance introduces an error in the descriptive part of the model formalizing non-Markov systems.
More complex models based on the principles of decomposition of complex algorithms and private-level architectures are presented in [7][8][9][10][11][12]. However, the problem of obtaining final relations for calculating probability-time characteristics for cases when transitions between states are described by more complex than exponential distribution laws, but lacking signs of "markovianity" is studied insufficiently.
It should be noted that in addition to substantiating the mathematical apparatus for solving the problem, the choice of means of mathematical formalization is important.
Analysis of approaches to network stochastic modeling showed their great diversity (based on Petri nets [13], finite discrete automata [14], PERT networks [15], etc.). However, the main drawback of this modeling approach is the limited practical application due to the lack of predictability properties.
In [16,17], GERT (Graphical Evaluation and Review Technique) models of complex technical systems and processes are presented. However, the introduction of the assumption of an exponential distribution law as the main law characterizing the process of transition from state to state significantly reduces their theoretical and practical value.
In [17], a situation is proposed where the semi-Markov process is the main iterative process of obfuscation. However, it is shown that the problem of finding the distribution law for semi-Markov models of large dimension is solved with an error of about 15 %, and the final distribution is discrete. It is shown that these models are not suitable for solving problems that require accurate knowledge of the distribution function or density.
The review of the literature [13][14][15][16][17] showed that existing modeling methods have both advantages and disadvantages. It should be borne in mind that the obfuscation process by these models was not described.
Thus, the review showed that a number of mathematical models of complex algorithms and processes of software protection are formalized in terms of graph theory. It is often assumed that the functioning of the system as a whole can be described by one distribution law. In this case, possible options for applying the distribution laws and their parameters during the transition from state to state are not taken into account. The solution of this contradiction is possible by the mathematical formalization of processes using GERT struc-tures. At the same time, finding the probability distribution of transitions from state to state in the process of software protection, as well as the final result in the form of a distribution law with the found parameters are of theoretical and practical interest.
It should be noted that the study of complex GERT networks is difficult due to the high computational requirements of stochastic modeling approaches. At the same time, the problem of developing simplified unified GERT models has not been studied enough. Thus, there is a need to develop a unified model to formalize the programming modules obfuscation process in order to eliminate this drawback.

The aim and objectives of the study
The aim of the study is to develop a unified GERT model of programming modules obfuscation. This will make it possible to achieve the unification of the model in conditions of modifying the GERT network.
To achieve the aim, the following objectives were set: -synthesis of a set of algorithms of the programming modules obfuscation/deobfuscation process; -development of a GERT model of the programming modules obfuscation process based on algorithms; -study of a unified GERT model with a modified number of nodes.

Synthesis of a set of algorithms of the programming modules obfuscation/deobfuscation process
The decompiled bytecode of existing software products using programming modules obfuscation processes for their protection is investigated. On the basis of the studies, algorithms of programming modules obfuscation are developed (Fig. 1).
In accordance with the presented algorithms, the process of the source code obfuscation can be described by the following steps.
Step 1. Initial state. There is raw source code in a high-level language written using a virtual machine (Java, CLI, etc.). Due to the fact that the compiled code has a number of problems with modification, the process of modification and obfuscation of the code takes place based on the "raw" (source) code. Due to the fact that this work uses a combined approach of two independent obfuscation methods, the first step can be either paragraph 2, or paragraph 4, or paragraph 5.
Step 2. The source code runs through the obfuscation method, based on the modification of string literals. The modified source code will not make sense without the reverse algorithm of Step 3.
Step 3. Due to the fact that the algorithm of string literals conversion is symmetric, the inverse conversion function is added to the source code. The inverse conversion algorithm of paragraph 2 is an integral complement, however, its order of addition does not matter.
Step 4. The source code runs through the obfuscation method, based on the "untangling" of structures containing Boolean operations (performs operations opposite to simplification). This method has no dependencies and can be executed at any step/not executed depending on the combining rule.
Step 5. The source code runs through the obfuscation method based on obfuscation of identifier names. This method has no dependencies and can be executed at any step/not executed depending on the combining rule. This method has a number of options, which allows obfuscating: local variables, global variables, functions, classes.
Step 6. As a result of Steps 2-5, the source code is ready for the compilation process. At the same time, to eliminate side effects, it is necessary to make sure that the compiler does not perform premature code optimization. Also, to reduce the readability of the code and complicate the debugging process, debugging information must be disabled. So, for example, for languages based on the Java virtual machine, compiling with the "-g:none" attribute disables debugging information. And the "-Xint" compiler option disables Just-In-Time and Ahead-Of-Time compilations, leading to code optimization by the compiler.
In accordance with the above, a general algorithm of programming module obfuscation is developed (Fig. 2), as well as a GERT network of programming modules obfuscation and deobfuscation processes (Fig. 2, 3).
In Fig. 3 and the corresponding Table 1, based on the developed algorithms of Fig. 1, transitions between states are formulated that characterize: -(1, 2): processing the source code by modifying (encoding) string literals, embedding a function in the source code that decodes the modified string literals into the initial state "on the fly"; -(2, 3): after successfully encoding string literals, we perform the process of Boolean operations obfuscation, verification of the conversion success of Boolean operations; - (3,4): after successful obfuscation of Boolean operations, we perform the process of identifier names obfuscation, verification of the conversion success of identifier names; -(4, 5): after successful obfuscation of identifier names, we go to the final state, ready for compilation; -(1, 3): depending on the business scenario, we skip the source code processing by encoding string literals and perform the process of Boolean operations obfuscation; - (1,4): depending on the business scenario, we skip the source code processing by encoding string literals and obfuscation of Boolean operations. We perform the process of identifier names obfuscation; -(2, 4): depending on the business scenario, we skip the source code processing by encoding Boolean operations obfuscation. We perform the process of identifier names obfuscation; - (3,5): depending on the business scenario, we skip the source code processing by obfuscation of identifier names; -(2, 5): depending on the business scenario, we skip the source code processing by obfuscation of Boolean operations and identifier names; -(2, 1): an error occurred in the process of encoding by modifying string literals and adding a decoding function. We return to the initial state in order to repeat the obfuscation attempt; -(3, 1): an error occurred in the process of verification of the conversion success of Boolean operations. We return to the initial state (since the reason for the error is unknown -the problem of Boolean operations obfuscation or the error is caused by the modification of string literals in the previous steps) in order to repeat the obfuscation attempt; -(4, 1): an error occurred in the process of verification of the conversion success of identifier names. We return to the initial state (since the reason for the error is unknownthe problem of identifier names obfuscation, the problem of Boolean operations obfuscation or the error is caused by the modification of string literals in the previous steps) in order to repeat the obfuscation attempt. Thus, a set of algorithms of programming modules obfuscation and deobfuscation is synthesized, which made it possible to comprehensively describe these processes at the upper strategic level of formalization.

Development of a GERT model of the programming modules obfuscation process based on algorithms
The studies showed [18,19] that the general algorithm of programming modules obfuscation has a number of specific iterations that greatly complicate the overall process of its mathematical formalization. Therefore, it seems appropriate to divide this process into a number of subprocesses. For mathematical modeling of obfuscation and deobfuscation processes, network stochastic models are the most flexible and useful. A special case of the stochastic model is a GERT network. This is largely due to the availability of a mathematical apparatus for finding a continuous probability density function of the transition time of the GERT network. However, this is only possible provided that the set of distributions that can characterize individual arcs of the model includes known distributions. These are: discrete, binomial, Poisson, geometric, negative binomial, uniform, exponential, gamma and normal.
In addition, it is possible to find and use continuous arbitrary distributions. It is shown in [17] that the probability density function of the transition time of the GERT network is determined by the following expression: where W E (s) is the equivalent transfer function of the GERT network, s is a real variable.
From the topological equation [17] follows: 1   А i -the number of loops of orders i, not including the network sink.
In a number of practically important cases, distributions must be obtained in the form of mathematical expressions. Such problems include the study of algorithms of programming code obfuscation and deobfuscation. This problem is reduced to finding the random distribution density function of transition time formed on the basis of the developed GERT network. Note that it must be assumed that in the continuous probability density function of the transition time of the GERT network, ϕ(x) is determined by the expression (1).
The W function of the transition between states i, j is determined by the formula: where ζ ij (x) is the probability density of transition between states i, j; P ij is the probability of transition from state i to state j.
In the study, we adopt the hypothesis that the use of gamma distribution during modeling as a key one when describing probabilistic transitions from state to state will make it possible to achieve unification of the model of the programming modules obfuscation process. The unification is that reducing or increasing the number of obfuscation operations will slightly change the modeling results. It is expected that a decrease in the number of nodes will slightly decrease the variance and expectation, and an increase -accordingly, increase. However, there are restrictions on changing the model -the structural architecture of the model (for example, the degree of connectivity of the nodes) should remain unchanged.
Thus, in the considered GERT network of programming modules obfuscation and deobfuscation processes, the probability density function of transitions are defined by the gamma distribution with variable coefficients k and θ: The resulting W function has the following form: The formed table of characteristics of the branches considered in the GERT model of branches and distribution parameters is presented in Table 1.
As can be seen from the expression (4), the mathematical formalization of the resulting equivalent moment function seems to be a cumbersome expression. In this regard, the problem arises of a generalized mathematical formalization of the resulting expressions for calculating equivalent transfer functions.
Substituting the values of expressions (6)- (19) into the resulting W function (5), the formula of impressive size is obtained. For its "normalization", we introduce the following replacement: p t represents the product of W functions describing successful and unsuccessful execution of algorithms described by variables of expressions (6) = -a list (array) of coefficients of the corresponding generating function of transition moments.
Thus, the resulting expression for calculating equivalent transfer functions can be described as: Using the probability density function (4), we obtain the probability density graph shown in Fig. 4. At the same time, the probabilities were chosen as follows: The network can be built so that if in some state i it is possible to start one of several subsequent operations, then the probabilities of the start p ij of any of these operations form a complete group of incompatible events: In this case, the probability of running the entire network from source to sink is 1.
To show that all nodes satisfy the condition (22), we calculate the probability of completing the entire process, which is calculated by the formula: where M E (s) is the generating function; and where f E (x) are the distribution densities. Table 1 Characteristics of transitions between the states of the GERT network of programming modules obfuscation and deobfuscation processes  4 shows the density graph of the runtime of the entire programming module obfuscation and deobfuscation process taking into account variations in the values of k and θ. Integrating the probability density function, w e obtain a distribution function, the graph of which is shown in Fig. 5. The expectation and variance of the obtained functions are calculated according to the formulas: The obtained calculation results are described in Table 2. Thus, as part of the study, a unified GERT model of the programming modules obfuscation process is developed. This model differs from the known by the paradigm of using the mathematical apparatus of the gamma distribution as the key one at all stages of modeling the obfuscation process. This made it possible to achieve model unification in conditions of GERT network modification. . t p t + (21)

Study of the unified GERT model with a modified number of nodes
The developed process of obfuscation and deobfuscation of programming modules consists of 5 nodes. Consider the behavior of the system when the number of nodes changes.
The developed GERT networks of programming modules obfuscation and deobfuscation processes with a changed number of nodes are presented in Fig. 6, 7.
When changing the number of nodes, the following factors were taken into account: -the degree of connectivity of the nodes of the new process is comparable to the original one; -changing the process complexity leads to a change in the number of elements of the array of coefficients k, while the values of the first and last elements of the array k are identical to the original ones; -changing the process complexity leads to a change in the number of elements in the array of coefficients θ, while the values of the first and last elements of the array θ are identical to the original ones.
The generated tables of characteristics of the branches considered in the GERT model and distribution parameters are presented in Table 3, 4. Using the expression (20), the resulting expressions for calculating equivalent transfer functions can be described as: .
For each new process, the probability of implementing the entire process p E , which is equal to 1, was calculated. The values of expectation and variance were also calculated, presented in Table 5. Table 3 Characteristics of transitions between the states of the GERT network of programming modules obfuscation and deobfuscation processes with a reduced number of operating obfuscation functions Based on the obtained equivalent transfer functions, the density graphs of the runtime of the entire process of obfuscation and deobfuscation of the programming module were constructed taking into account variations in the variables k and θ. These graphs, as well as the corresponding graphs of the distribution function, are presented in Fig. 8, 9.   The results of the study showed that for the developed mathematical model, when adding another obfuscation process, the variance increases by 12 %, and when it is removed from the system, it decreases to 13 %. The expectation changes exponentially. So, when removing the node, the expectation decreases by 9 %, and when increasing by 1 node, the expectation increases by 26 %. This shows the insignificance of changes in the studied characteristics under the conditions of model modification and confirms the hypothesis of model unification when using the mathematical apparatus of gamma distribution as the main one. These results allow the developer to predict the behavior of the programming modules protection system in terms of runtime.

Discussion of the results of the study of the developed GERT model of the programming modules obfuscation process
A set of algorithms of programming modules obfuscation is synthesized (Fig. 1), which differs from the known ones by taking into account the variability of data types. The synthesis made it possible to present the obfuscation process as a whole (Fig. 2), as well as to formalize the obfuscation process in a convenient form for subsequent use in the developed GERT models (Fig. 3).
As part of the study, a unified GERT model of the programming modules obfuscation process is developed. This model differs from the known ones by the paradigm of using the mathematical apparatus of gamma distribution (4) as a key one at all stages of modeling the obfuscation process. This made it possible to achieve model unification under the conditions of GERT network modification (Fig. 4, 5). Unification allows adapting the modeling process to a possible complication of the structure and algorithms of obfuscation processes.
The results of the study showed that for the developed mathematical model, when adding another obfuscation process, the runtime variance increases by 12 %, and when removed from the system it decreases to 13 % (Fig. 8, 9). The runtime expectation changes exponentially. So, when removing the node, the expectation decreases by 9 %, and when increasing by 1 node, the expectation increases by 26 %. This shows the insignificance of changes in the studied characteristics under the conditions of model modification and confirms the hypothesis of model unification when using the mathematical apparatus of gamma distribution as the main one. These results allow the developer to predict the behavior of the programming modules protection system in terms of runtime. This allows reducing the time to decide on the feasibility of the obfuscation process when using flexible methodologies.
The studies show that the developed mathematical model is appropriate for mathematical modeling of systems that are formalized by at least four states. A decrease in the number of states leads to linearization of the process, for which the use of stochastic approaches to mathematical modeling leads to a deterioration in the accuracy of the results. Also, an additional decrease in the number of states leads to a decrease in the security of programming modules (Fig. 8, 9). So, Fig. 6 shows the fourstate model described by six transitions (Table 3). Reducing the number of states by 1 will lead to a system having 3 transitions.  The recommended maximum number of states is 9 nodes. A further increase in the number of nodes leads to an excessive complication of the mathematical model, while the trends of varying the process runtime expectation and variance remain.
At this stage, the coefficients k, θ are selected empirically using expert knowledge. So, the input data are obtained as a result of an experiment conducted by a group of expert developers of NixSolutions secure software. Further, this limitation can be eliminated by calculating these coefficients for specific data protection algorithms. The elimination of these restrictions is associated with the direction of further research, which should be focused on the development of the procedure for adapting these coefficients to various business processes of programming modules obfuscation.
The development of this study consists in the design of a methodology for calculating the gamma distribution and its adaptation for the practical implementation of data protection algorithms. However, difficulties may arise associated with the existing limitations of stochastic modeling approaches.