A MODEL FOR THE DYNAMIC EVALUATION OF VULNERABILITY IN SOFTWARE BASED ON PUBLIC SOURCES

When distributing and deploying devices and applications of the Internet of Things (IoT), companies often neglect software (SW) security issues. In the model of threats of devices of the Internet of Things, one can distinguish the applied layer of the device itself, the security of its components of system software and application packages. This space of technologies and products is quite attractive for attackers since devices can have access to personal and corporate data, infrastructure, and additional devices within the network perimeter. The main source of information security (IS) problems is the use of an outdated code, the lack of monitoring of the security of third-party components, the lack of timely updating and processes at an enterprise that ensures the secure development life cycle (SDLC [1]). The use of components with an open-source code provides additional opportunities for finding and exploiting vulnerabilities not only for information security specialists but also for attackers. Increased risks are caused by the found vulnerabilities in the used source code packages, which are no longer supported by the authors. Another problem is the interaction between the components in a system, namely, the existence of software dependences. The use of a vulnerable library in a program makes the latter unprotected from the actions of an attacker. It should be borne in mind that each vulnerability carries an information security threat and a potential attack vector that can be used by attackers. A large number of vulnerabilities create high risks for enterprises and corporations. Thus, there is a growing need for rapid decision-making and taking actions to enhance information security in order to reduce the identified risks. Negative consequences of the implementation of a threat through vulnerability carry strategic, financial, and reputational risks. It is essential to conduct vulnerability analysis to assess and mitigate risks to an organization. The complexity of systems and the growth of the level of threats lead to the need for thorough vulnerability analysis, its interaction with other components in the system, the possibility of exploitation, and the identification of consequences in the event of a successful attack. The situation is complicated by the existence of a large number of vulnerabilities in various product modules, as well as a large number of supported devices. This leads to the difficulties of timely updating, testing, and the need to carefully select the most critical vulnerabilities for further fixing. The process of vulnerability analysis, evaluation of the impact on a system, and risk analCONSTRUCTING A MODEL FOR THE DYNAMIC EVALUATION OF VULNERABILITY IN SOFTWARE BASED ON PUBLIC SOURCES


Introduction
When distributing and deploying devices and applications of the Internet of Things (IoT), companies often neglect software (SW) security issues. In the model of threats of devices of the Internet of Things, one can distinguish the applied layer of the device itself, the security of its components of system software and application packages. This space of technologies and products is quite attractive for attackers since devices can have access to personal and corporate data, infrastructure, and additional devices within the network perimeter.
The main source of information security (IS) problems is the use of an outdated code, the lack of monitoring of the security of third-party components, the lack of timely updating and processes at an enterprise that ensures the secure development life cycle (SDLC [1]). The use of components with an open-source code provides additional opportunities for finding and exploiting vulnerabilities not only for information security specialists but also for attackers. Increased risks are caused by the found vulnerabilities in the used source code packages, which are no longer supported by the authors.
Another problem is the interaction between the components in a system, namely, the existence of software depen-dences. The use of a vulnerable library in a program makes the latter unprotected from the actions of an attacker.
It should be borne in mind that each vulnerability carries an information security threat and a potential attack vector that can be used by attackers. A large number of vulnerabilities create high risks for enterprises and corporations. Thus, there is a growing need for rapid decision-making and taking actions to enhance information security in order to reduce the identified risks.
Negative consequences of the implementation of a threat through vulnerability carry strategic, financial, and reputational risks. It is essential to conduct vulnerability analysis to assess and mitigate risks to an organization. The complexity of systems and the growth of the level of threats lead to the need for thorough vulnerability analysis, its interaction with other components in the system, the possibility of exploitation, and the identification of consequences in the event of a successful attack. The situation is complicated by the existence of a large number of vulnerabilities in various product modules, as well as a large number of supported devices. This leads to the difficulties of timely updating, testing, and the need to carefully select the most critical vulnerabilities for further fixing. The process of vulnerability analysis, evaluation of the impact on a system, and risk anal-level. This leads to the fact that a large number of papers are devoted to predicting the emergence of the exploitation program. Subsequent studies are devoted to solving this problem.
In study [8], the authors, using machine learning, the official dictionary of enumeration of common platforms [9], and the repository of data on vulnerability management [10], are trying to predict the emergence of a vulnerability exploitation program in the coming year. However, the model is constructed using linear regression algorithms and is based only on the existence of already published exploits, popular products and ignores the features of vulnerability trends in the IS. Thus, the model presented in [8] requires constant retraining and data updating.
Article [11] also used general information from the vulnerability database [10] and proposed a method for predicting the appearance of exploits. The main drawback of the article is the lack of sufficient data and information regarding vulnerability. The forecast accuracy is about 80 %. The authors in [11] point out that the main problem for the obtained model is an information change over time and an insufficient number of identified vulnerability features.
One of the directions of enhancing the method is the use of attack graphs. Plotting attack graphs is used both for vulnerability analysis and testing and for threat and risk evaluation. In paper [12], the NVD database and the CVSS vector were used to plot attack graphs and generate conditions and privileges for attackers using a multilayer perceptron. Research [13] provides examples of different models based on graphs of attacks, resources, and network graphs. However, the authors do not take into consideration the problems and metrics about the existence and possibility of vulnerability patches. An essential drawback is the high computational complexity. The most common use of attack graphs to calculate risk is corporate networks, rather than single computing devices.
In work [12], the authors use plotting an attack graph based on [10] to analyze the security of systems. Computational models were obtained using a database of rules and a multilayer perceptron. A feature of the study is taking into consideration the privileges of users, but a complete and accurate forecast requires more rules than was proposed by the authors.
The study that is closest to the task is [14], which presents a risk evaluation model that takes time into account. Study [14] is based on the database of exploitation programs [15], the database of vulnerabilities [16], the Panjer algorithm, and the theory of actuarial risk calculations. The main drawback of the system [14] is that it does not take into consideration the interest of the information security community, all characteristics are considered independent.
Thus, we can conclude that currently there is no unified formalized approach to analysis, evaluation, and ranking of vulnerabilities. The methods under consideration do not take into account the rate of changing input parameters and the main characteristics of vulnerabilities, which leads to the irrelevance of obtained calculations and distortion of resulting values of the IS risks. The problem of analyzing and ranking vulnerabilities affects the processes of calculating and eliminating risks, the release of a high-quality product to the market. As a result, this leads to inappropriate use of the company's resources.
According to the considered methodologies, there is a problem of the complexity of determining the significance of vulnerability impact taking into consideration the context not only of the technical environment but also the industry sector. That is why it is justified to conduct a study on the ways of enhancing the accuracy of accounting for the crit-ysis has a number of shortcomings and widespread problems that affect the priority of vulnerability remediation and the release of updates, namely: a short period of time for analysis and decision-making; -workload of experts in information security units (in many cases there are no such units in an organization, and their functions are performed by units for testing and ensuring the functional quality of software); lack of comprehensive information regarding vulnerability; lack of up-to-date software documentation; -insufficient resources for timely elimination of a vulnerability; a wide attack surface makes it difficult to identify vulnerable components.
The vulnerability spread and severity do not always correlate with actual operation capacity. Sometimes a minor bug in software can cause more harm than a critical vulnerability, especially if that error can be exploited in a chain of several similar bugs. This leads to increased security risks, as well as the expenditure of resources on more complex threat models.
In this regard, there is a need to create a system for automatic and dynamic evaluation of vulnerabilities in software. Thus, problems in the field of vulnerability analysis indicate that the research topic devoted to solving the problem of dynamic vulnerability evaluation, which helps to prioritize risk and error management, is relevant.

Literature review and problem statement
The most widespread and common vulnerability evaluation system is the Common Vulnerability Scoring System (CVSS) [2]. It is an open and free field standard for assessing the risks of vulnerabilities of the computing system security (OS). This method is used by MITRE [3] to assess and report detected vulnerabilities. Most companies use [3] ready-made data to assess risks for existing vulnerabilities and [2] for internal ones. The main disadvantages of this approach are the lack of context consideration, the final evaluation does not reflect the real complexity of the vulnerabilities. In practice, vulnerabilities with high risk scores are exploited much less frequently compared to vulnerabilities with low scores and a more accessible attack vector.
Paper [4] focuses on the elimination of shortcomings [2]. The study is based on the recalculation of CVSS metrics using the principal component method [5] and a change in the variance of vulnerability risk scores. However, this system is of little use for operation under real conditions, since it does not have additional knowledge about vulnerabilities, except for the vector of CVSS values, therefore it repeats the main shortcomings [2].
Study [6] provides a dynamic safety evaluation system that is characterized by improved accuracy compared to [2]. The authors note that evaluation dynamics depend on the time metrics CVSS [2], as well as on the likelihood of vulnerability exploitation. In work [7], the authors developed a vulnerability analysis system based on machine learning (decision tree) in order to prioritize the release of patches. The inputs to the model are CVSS metrics and assets characteristics based on the CVSS baseline metrics.
Studies [4][5][6][7], when solving the problem of prioritizing vulnerabilities, highlight the existence and possibility of exploitation as the main characteristic for increasing the threat where X i is the set of strategic vulnerability characteristics that do not depend on the target computer system and can be obtained through publicly accessible databases: The process of searching, defining, and extracting characteristics from open sources (NVDDataProcessing()) is presented in research [17]. The result of the operation of NVDDataProcessing() function is a vector with dimensionality M x =5 with the following parameters: -unique vulnerability number (Id); -CVSS base basic metrics (internal vulnerability features that are unchangeable in the user environment, consists of 2 impact scores (CVSS impact ) and exploitability (CVSS exploitability )); -brief description of vulnerability (Descr); -vulnerability type (V type ), which is a display of Common Weakness Enumeration (CWE); -a type of vulnerability threat (V treat ). Y i is the feature that contains detailed information about the vulnerability and can be obtained from open sources (mainly resources such as forums, detailed reports, social networks, etc.). Some components depend on the moment of observation:

Refs Expl t Patch t R t Src t Trend t
The algorithm of the PublicFeatureExtraction() function is presented in detail in paper [17]. The result of the operation of PublicFeatureExtraction() function is vector M y with dimensionality 6 with the following parameters: the number of references and primary sources to vulnerability (Refs); availability of exploitation program (Expl(t)); -public availability of patches and updates (Patch(t)); -root cause (R(t)); availability of open source code (Src(t)); -the value of the degree of trends and discussion of vulnerability in social networks (Trend(t)) [18].
Z i is the feature that includes the display of features and analysis of a computer system, final values of X i and Y i [17]. The scheme of extraction of the basic feature is shown in Fig. 1.
icality of vulnerability on the product. Such ways should also ensure risk reduction at different levels of expert's competence and the amount of source data for analysis and increase the efficiency of the software development and management process.

The aim and objectives of the study
The aim of the research is to develop technology and a model for automated dynamic evaluation of the impact of the vulnerability in software on the final product, taking into consideration the context of the target system. This will allow improving the quality of the final product, namely, reducing information security risks. The use of the model will minimize the time of vulnerability analysis and decision-making to remove shortcomings.
To accomplish the aim, the following tasks have been set: -to determine the specifics, principles of operation, and amounts of information taken into consideration on the calculation of the IS risks and vulnerability management; to determine the main characteristics of the method of dynamic calculation of vulnerability impact on the software and the degree of their significance; to formalize a mathematical model for assessing the vulnerability impact based on publicly accessible sources and the relevance of the obtained information; to test the accuracy of the developed mathematical model on typical vulnerabilities from publicly accessible sources and databases.

1. Description of the general architecture of the system for evaluating the vulnerability impact on the product
The general provisions of the process of extracting characteristics, architecture, and vulnerability evaluation process are the subject of research in [17]. The research [17] resulted in obtaining and formalization of a set of vulnerability features. According to the stated presentation, the original set of characteristics is represented in the form of three subsets:  The units that are not taken into consideration in this study are marked in gray; green indicates the characteristics of X i set; parameters that refer to Y i and their derivatives are indicated in blue. The purpose of the study is to analyze the parameters and construct a mathematical model that makes it possible to get a dynamic evaluation on sets of features X i and Y i .
The value of the vulnerability evaluation (E(CVE i )(t)) is determined by the output value of the dynamic multifactor model, to the input of which the parameters of the sets of features X i and Y i are sent: where M are the following metrics: Mean Absolute Error (MAPE), Root Mean Square Error, the root of Root Mean Square Error (RMSE), determination factor (R 2 ); ( ) i E CVE  the value that was obtained when analyzing the vulnerability impact by experts.
The search for parametric and structural optimization of the model is carried out based on various approaches described in chapter 4. 4.

2. Source data processing and analysis of parameters to form a mathematical model for vulnerability impact evaluation
A significant part of previous studies aimed at determining the main features and characteristics of vulnerability to form the evaluation method [17]. To substantiate the accepted factors affecting the overall evaluation, it is necessary to analyze the input parameters.
In the course of early research [17], a dataset on vulnerabilities was formed based on the database [10]. The results obtained in [17] are input data for the current study. The dataset (D) contains 42835 vulnerabilities, each of which is represented as a characteristic vector. Each vector is a set of M=20 features. The vulnerabilities that were used to form the dataset were published between 2016 and 2019.
The next step was to represent all the values of the vector in numerical form. Since features V type , V treat , R(t) are categorical and nominal, represented by a line value, it is necessary to apply one-hot encoding for them. Each value is converted into a vector that contains 1 and 0, depending on the existence of a possible category. Id is ordinal data, is not used as a feature in training, so it is converted to an interval type and normalization is performed.
Thus, the matrix of the following form was obtained: where N=42835. Also, based on vectors X i and Y i , the following features were additionally separated: the date a vulnerability was published (Date published ) and the date it was modified (Date modified ); each value from vector CVSS is represented as a separate feature; the groups of vulnerability types were introduced CWE group .
The dimensionality of the dataset matrix is: , where , Before drawing conclusions about each vulnerability, the initial dataset (D) must be analyzed and relationships, correlations, and other regularities between the presented characteristics must be determined.
To solve the problem of studying the relationship between the features and determining the linear relationship in matrix D, the matrix of Pearson correlation coefficients was calculated. Further, based on analysis of the obtained values of the correlation matrix, the most significant parameters were selected. The next step in determining the main relationships between the characteristics was to build associative rules using the Apriori algorithm [19]. This algorithm makes it possible to find the most common sets of values of features and extract rules from them, to obtain a vector of the main factors of impact on the objective function. Based on this vector, the essential CVE i variables are selected.

3. Generation of a dataset to train the model
The next step is to form the training and test datasets to further search for a mathematical model of vulnerability impact evaluation i E CVE t A The dataset from formula 6 of chapter 4. 2 was taken as the source set.
The main problem for further research is that dataset (D) is not marked. To solve the problem of automatic evaluation, we need a qualitatively and evenly marked set of input data. Marking the full volume with the help of expert evaluation is a very time-consuming and resource-intensive task. During data analysis, it was noticed that in the initial set, there is a duplication of data among the vectors at the exception of temporal characteristics (Trend, Date published , Date modified ). The number of unique vectors after removing duplicates without taking into consideration the values of time characteristics is 9528, which greatly simplifies the task of marking. Next, the characteristic vectors are standardized by removing the mean value and scaling to the variance of random magnitude.
A subset of vulnerabilities was selected from the initial dataset and evaluated by invited experts in the area of information security. Based on the expert evaluation, the expected values of vulnerability impact for the selected subset were determined.
Given that the number of features and the size of the dataset are still quite large, and the data were marked using expert evaluation and required a significant amount of time, it is necessary to perform cluster vulnerability analysis. This analysis will make it possible to assess the distribution of vulnerabilities, scale the dataset marked by experts and transfer the evaluation for the values that fell into the nearest cluster. To do this, it was proposed to use t-distributed Stochastic Neighbors Embedding (t-SNE). Since the size of the original dataset is large enough, to fully visualize it, it was decided to form a uniformly distributed sample of vectors (T i ) of 1000 in size. This number of vectors is sufficient to display all possible groups and clusters of vulnerabilities. The next step is to mark 10 % of vulnerabilities for each set (T i ).

Analysis of methods of multifactor modeling
When searching for a suitable method for estimating and forming a model taking into consideration the above data and parameters, the following approaches and technologies for multifactor modeling were studied. The search for an effective evaluation method was carried out based on experimental studies and subsequent analysis of results.

1. Neuro-fuzzy systems
In paper [20], a fuzzy model of the Evaluation() function (4) was constructed, membership functions were determined and a rule base was formed based on expert evaluation. According to paper [20], the size of the characteristic vector and of linguistic variables is len(T i )=8, and the number of term-sets is len(T i j)=34. In order to take into consideration most possible cases, the size of the required rule base should be ≈82.000. The generation of so many rules manually is a time-consuming and inefficient problem. In this case, the characteristic vector (CVEi) consists of M=20 features.
Given the specifics of the processed data, the actual solution is to automate the generation of fuzzy rules using a fuzzy neural network, namely an adaptive network of fuzzy inference system (ANFIS). It is a five-layer neural network with a direct signal spread that implements the fuzzy Sugeno-Takagi system.
In this case, it was proposed to construct a simple Sugeno-Takagi controller with 20 inputs and 1 output. Thus, the Evaluation() function was represented by the procedure of fuzzy rules inference.

2. Construction of a multi-factor model of vulnerability evaluation
Several algorithms used to construct a model for vulnerability impact evaluation taking into consideration dynamic variables are represented below.
Linear regression. The use of such basic algorithm as multiple linear regression will not only find a connection between the input variables of the vulnerability vector (CVE i (t)) from the dataset (D) and the output evaluation of vulnerability impact (Evaluation()) obtained by experts ( ), i E CVE  but also plot the best match line for evaluation prediction E(CVE i )(t).
To apply the multiple regression method to construct function  Thus, the desired type of a multifactor model is derived experimentally at the best value of error analysis.
Polynomial regression. In order to overcome the lack of accuracy of the model, it is necessary to consider the option of applying polynomial regression. To do this, the model ( )

( )
, i E CVE t A is sought on the class of polynomial models of the 2 nd and more degrees. Multilayer perceptron. To construct a multidimensional function of impact evaluation Evaluation() on the declared dataset (D) and the input characteristic vector (CVE i (t)), we used a direct spread neural network. The high connectivity of the network and the nonlinear function of the activation of each neuron of a multilayer perceptron ensure computing power, and the hidden layers allow the network to extract the most important features from the input characteristic vector. When constructing a network model, one must try out the following changes to the network architecture parameters: activation function for the hidden level (logistic, sigmoid activation function in the form of hyperbolic tangent, semilinear element (ReLU)); -optimizer (quasi-Newton methods, stochastic gradient descent, method of adaptive assessment of moments (Adam)); network structure (the number of hidden layers and the number of perceptrons of each layer).
Decision tree. When searching for the most accurate multidimensional Evaluation() model, the algorithm for constructing decision trees was considered. One of the advantages of using decision trees is data systematization and structuring when solving a problem. Thus, decision-making is performed analytically, and the output variable is built based on inductive rules.
The tree structure is optimized according to the Minimal Cost-Complexity Pruning criterion. To do this, one needs to determine the optimal value of the regularization constant (α). To do this, it is necessary to construct a tree sequence with an increase in the number of α and to select a tree with minimal error on the test sample.
Random forest. Using the decision tree algorithm with a large number of nodes and depth on the training sample provides a flexible model that tends to retrain by fitting nodes to the training dataset. As an alternative to using a tree with limited depth or determining the optimal value of regularization constant, it is possible to use a mechanism from a set of decision trees -a random forest. Table 1 gives the results of a comparison of the main existing solutions regarding vulnerability analysis and management and calculation of information security risks.

1. Analysis of automated systems for calculating information security risks and vulnerability management
Existing systems for vulnerability management and calculation of information security risks in the system in most cases are commercial solutions. The key features of the solutions are proprietary risk evaluation systems, integration with CI/CD methodology software. The main shortcomings include the configuration and management complexity, non-transparency of the operation mechanism and used resources of vulnerability database, lack of consideration of the degree of trends, and mentions of vulnerabilities in social networks. Fig. 2 shows the results of calculations of correlation analysis of vulnerability characteristics based on NVD [10] for dataset (D) and CVE i (t) characteristics extracted from the analysis of open sources.

2. The result of analyzing and determining parameters for the formation of a mathematical model of vulnerability impact evaluation
According to the correlation matrix in Fig. 3, the following features can be identified. Sufficiently high correlation factors between the following characteristics: vulnerability publication date (Date published ) and the modification date (Date modified ) -0.75; basic components of CVSS evaluation, namely, (CVSS base , CVSS impact , CVSS exploitability ); constituent variables of the CVSS vector (confidentiality (C), integrity (I), accessibility (A)) and basic vulnerability evaluation (Base); constituent variables of the CVSS vector (confidentiality (C), integrity (I), accessibility (A)), and metrics of impact of successfully exploited vulnerability (Impact).
This matrix shows the degree of the interrelation of the selected characteristics. The total sample size is 42836 vulnerabilities. The significance of correlation factors is determined by the degree of confidence of the values presented in the cells of Table 2.
By applying the Apriori algorithm, we obtained support and confidence values for the rules. The total number of rules at minimum values of support (Support) 0.15 and confidence (Confidence) 0.8 is 1325. The results of the construction and selection of the main associative rules with maximum support and confidence values are shown in Table 2.
The first column shows the most significant features, the second one shows their values and constructed rules.
Features  The first category includes the rules that are associated with the CVSS vector. At the same time, we can point out that the existence of vulnerability exploitation is associated with a low attack complexity (AC=low), as well as with the network attack vector (AV=network). Some features (for example, the existence of patches (Patch), trend (Trend)) [18] according to the obtained rules remain independent.
The second category is more interesting regarding analysis of the relationships between the features. It makes it possible to see the most pronounced relationships between the values of some characteristics, which not only reflect the state of the CVSS vector but also contain information from other sources.

3. Results of construction of a model for vulnerability impact evaluation taking into consideration dynamic characteristics
Neuro-fuzzy systems. The experiment was conducted using ready-made implementations in Python (USA), as well as in the MATLAB environment (USA), namely Fuzzy Logic Toolbox (anfis) (Russia) and the interactive system Neuro-Fuzzy Designer (Russia). Using the full set of CVE i (t) variables and membership functions, both implementations were unable to handle a large amount of computation, resulting in memory shortages and subsequent errors during the network learning phase. This experiment was conducted on average hardware with an Intel Core i7 processor of the 8th generation and a RAM size of 16 GB. Reducing the characteristics of the vulnerability vector based on high correlation, as well as reducing the number of parameters of membership function, did not lead to an improvement in the results. Table 3 gives the results of the experiment on constructing a model based on ANFIS at different sets of input characteristics.
According to the results obtained in Table 3, it can be concluded that this technology for a given dataset is not effective and cannot be applied to achieve the intended purpose.
Decision tree. When constructing a tree without limiting the parameters controlling the size of trees (maximum depth and a minimum number of samples required to divide the internal node), the tree depth is 21, and the number of leaves  Based on an analysis of the resulting tree, we can conclude that the structure of the tree is complicated. The results of comparing the accuracy of the Evaluation() function and the regularization constant (α) to optimize the structure of the decision tree are presented in Fig. 3.

Fig. 3. Comparison of prediction accuracy (accuracy) and regularization factor (alpha(α) for training and test datasets
The greatest prediction accuracy after optimization is R 2 =0.86 at α=1.6. At the same time, the number of leaves is 24, and the tree depth is 8.
Random forest. Since the size of the existing sample is not large and complete enough, training occurs with object sample with bootstrapping. The forest size is 40 trees. The results of measurements of accuracy of the random forest use exceed the accuracy of a single tree. The results of random forest testing are given in Table 4.
Linear regression. The result of the construction of a model using the full set of characteristics CVE i (t) on the dataset (D) is represented by the following equation: The determination factor (R 2 ) is 0.895. Taking into consideration the data obtained from correlation analysis (Fig. 2, Table 2), parametric and structural identification of the model is performed. After analyzing the obtained model and discarding insignificant vulnerability coefficients and parameters (R(t), CWE group , AV, C, S, J) and performing parametrical identification of the model on the original dataset, the determination factor is R 2 =0.897. A decrease in the number of parameters does not lead to significant improvements in the model. The resulting linear regression equation can be written as: Polynomial regression. The multifactorial model for vulnerability impact evaluation uses a second-degree polynomial since the use of higher degrees (3 or more) leads to a large level of emissions in the formed sample.
Multilayer perceptron. Although the method for adaptive evaluation of moments is recommended for large datasets (1000 or more in size), this optimizer has proven to be the best in a given sample. The ReLU function turned out to be the best as an activation function on hidden layers. The model of a multilayer perceptron with three hidden layers of 30, 30, and 20 neurons in the corresponding layer was experimentally chosen as the architecture.
The obtained values of metrics of the accuracy of predicting various structures of the multifactorial model of vulnerability impact  Table 4. The results of the values of accuracy metrics on the training sample are shown in Table 5.
Based on the data given in Tables 4, 5, it can be seen that the use of some structures of the model leads to retraining. Such methods include the application of the models constructed with the use of the algorithms of polynomial regression, random forest, as well as decision tree (with the lowest proportion of retraining).
A visualization of the results of testing the model structures and the spread of values on a test dataset are shown in Fig. 4. Blue indicates the initial values of test scores of vulnerabilities, red dots are the results of the expert evaluation.

4. Verification of the operation accuracy of the model of dynamic evaluation of vulnerability impact on actual data
The method was tested using the Python implementation (scikit-learn library) and the MATLAB toolbox (Fuzzy Logic Toolbox for experimenting with neuro-fuzzy logic).
The purpose of the experiment is to use the constructed model of vulnerability impact to visually track the change in vulnerability evaluation scores based on the changing values of assigned characteristics for the separated period.   These vulnerabilities were randomly selected from the NVD database [10] for the experiment. As one can see in Fig. 6, initially, both vulnerabilities have fairly low scores due to the lack of full information after the vulnerability publication. The resulting value shows a significant increase in the arrival of additional data on vulnerability over time. Further fluctuations in the score value are affected by variable (Trend).

Discussion of the results of the development of a model for dynamic evaluation of vulnerability impact
When analyzing the parameters and studying the correlations in Fig. 3, the following conclusions can be drawn. The existence of exploitation programs (Exploit) has a very weak correlation with the evaluation of CVSS. At the same time, it is necessary to note the relationship between Exploit and vulnerability cause at the root code level (Rootcause). The relationship between the vulnerability trend in search engines (Trend) and existence of patches (Patch) was not detected.
Consequently, vulnerabilities with these types will have a higher priority in developing patch recommendations.
Our analysis and the constructed set of associative rules in Table 2 make it possible to perform evaluation and generate data samples for the test and training sets, as well as for further search of a multifactor evaluation model. Stemming from analysis of the input characteristics of CVE i (6), it can be concluded that all characteristics are significant, statistical data are poorly structured, and the dependences between the characteristics are nonlinear. Based on the obtained results in Fig. 3, it can be concluded that the existence of a trend (Trend) in a vulnerability is an independent magnitude.
Based on the data in Table 4, it can be seen that compared to linear regression, the root mean square deviation of the polynomial regression increased, and the R 2 indicator decreased. Thus, the use of the polynomial regression algorithm did not improve the accuracy of vulnerability impact evaluation.
As one can see in Fig. 6, the most flexible and sensitive to changes in the input data is the model based on a multilayer perceptron. This model also has some of the best prediction indicators after the linear regression model on the test sample, and this model is the least retrained. The accuracy of the obtained model is 0.889 (88.9 %).
Despite the highest prediction accuracy (89.6 %), the linear model is the least sensitive to changes in the input data.
The remaining models (decision tree and random forest), despite their high sensitivity, are over-trained and show poor results.
Therefore, it is recommended to use the model of vulnerability impact evaluation based on a multilayer perceptron. The main difference of the resulting model of vulnerability impact evaluation is that it is focused on a dynamic vulnerability risk evaluation and takes into consideration the overall vulnerability popularity, namely, trends. When training the model, the characteristics of vulnerabilities and their corresponding scores obtained by expert evaluation were used, which in turn also takes into consideration the time inter-vals from the publication date and the date of vulnerability modification.
The proposed model for vulnerability impact evaluation differs from the existing ones in the fact that it allows us, under conditions of limited resources and data volumes, provision of a unified and formalized vulnerability evaluation, to take into consideration current trends in the field of information security. The resulting evaluation can significantly reduce the time of basic analysis of vulnerability and make further decisions regarding its detailed study, the possibility of exploitation, or the order of applying patches.
The limitation of the resulting mathematical model is that it does not take into consideration the fading of the value of impact from the publication date. The constructed model implies that the relevance remains the same over time. The longer this interval, the less threat a vulnerability poses. In practice, vulnerabilities older than 5-10 years are rarely taken into consideration during analysis, since it is believed that the vulnerable software received an update, the source code was modified when the software functionality changed. This limitation can be eliminated by the following possible steps: updating the source data of expert evaluation and re-training the model (it is necessary to add to the trained sample the vulnerabilities that were published earlier, for example, 5-7 years ago); application of an additional coefficient reflecting the measure of removal of the current point in time from the publication date and the date of the last modification of vulnerability parameters (taking into consideration the value of trends).
Subsequently, it is planned to enhance the quality by using the following methods: to supplement the existing set of characteristics with the display of a cross-binary graph of a control flow for a target system and the result of its analysis; to expand popularity metrics through analysis in social networks and information security forums.

Conclusions
1. The specificity and principles of operation of existing automated systems and methodologies for calculating information security risks showed that the process of prioritizing the elimination of vulnerabilities in the development or use of software applies an insufficient amount of required resources. Most systems are limited to using a single database of NVD vulnerabilities, vulnerability evaluation metrics are closed and proprietary. The characteristics that are not taken into consideration in the evaluation were identified: separated time intervals, publication dates, the modification process, as well as the degree of vulnerability significance in the information space.
2. The main characteristics that affect the trends in changing the degree of patch prioritization were identified, namely: the existence of an exploitation program, the IS trends, certain types of vulnerabilities. Correlation analysis of all CVSS parameters, trends, types of vulnerabilities was carried out and the main relationships for dynamic calculation of vulnerability impact on software were found A significant parameter is consideration of the latest tendencies (trends) in the publication and discussion of vulnerabilities since its existence significantly increases the risk degree and the patch importance. Vulnerabilities, the type of which is defined as injection with an access vector remote over the network, should be given the highest priority since vulnerabilities of this type are most often subject to exploitation.
3. The model of dynamic vulnerability evaluation based on open sources (NVD database, CVSS vector, vulnerability trends, existence of patches and exploits) in the form of a multilayer perceptron with a prediction accuracy of 88.9 % was developed. A distinctive feature of the obtained model is taking into consideration a change in parameters in time intervals, as well as the degree of vulnerability popularity in a specific period. 4. The model of vulnerability impact evaluation was tested on actual data, namely: on a vulnerability set from the NVD database. The developed model makes it possible to completely reduce the time during vulnerability analysis from several hours to several minutes and fully automate the process of prioritizing patches and decision-making, thus standardizing the work of experts.