IMPROVEMENT OF THE METHOD FOR ASSESSING THE LEVEL OF SPEECH INFORMATION SECURITY

Assessment of the level of speech information protection from leakage through acoustic and vibration channels is carried out according to international and national standards and in compliance with regulatory documents. To assess its security level, regulatory documents in many countries imply the use of signal/noise ratio. However, the method has a series of significant shortcomings, which do not make it possible to determine the real state of security level. The improved objective evaluation method, which is based on determining the coefficient of residual intelligibility for a test-signal after its recovery by the methods of mathematical analysis (adaptive filtration, correlation and spectral analyses, wavelet transformation, etc.) was proposed. The coefficient of residual intelligibility is determined for each word included in a short phrase, a test signal. The analysis of frequency of using the phonemes in the Ukrainian speech was performed. It was shown that given the definition of the term "allophone" and the number of native speakers, it is possible to assume that the total number of allophones tends to infinity. To reduce the calculation complexity, we proposed the formalized approach based on the simplified linguistic model – a phoneme (a letter), a diphone (two letters), and a triphone (three letters). As a source of information, it is possible to use text documents. We proposed analytical dependences for calculating the coefficient of residual speech intelligibility and its components – coefficients of frequency of using allophones in the words of the Ukrainian language and the importance of allophone recognition for the word recognition. The interrelations of the SPC (speech privacy class) and word intelligibility W were shown. On their base, the scale of objective estimation of the degree of speech information privacy on the boundary of the controlled zone by the criterion of residual speech intelligibility was proposed


Introduction
Determining the level of speech information security is one of the main problems for the systems of technical information protection and security services of an enterprise/organization (hereinafter referred to as the object) Confidence in its security provides the possibility of free communication when discussing issues that are critical to the economic, technological and innovative policies of the object, contain state and commercial secrets.
Despite all the variety of protection objects (their purpose, functioning peculiarities, number of staff), protection of speech information is provided by passive and active systems. However, to obtain confidence in the quality of protection, it is necessary to conduct specialized research, which would establish if the actual protection level meets the set requirements.
There are currently a significant number of methods for assessing the level of speech signal security -speech Intelligibility Index, speech transmission index, Articulation Index, speech Privacy Class, Signal-to-noise ratio, and others. However, as the analysis shows, these methods have one common drawback -they do not take into consideration the capabilities of an intruder to process an intercepted signal in order to clean it from extraneous noises.
The use of modern methods for digital processing of phonograms (spectral-correlation analysis, wavelet-transformation, adaptive filtration and others) make it possible to restore the linguistic component of the speech message (test-signal) even at significant levels of noise interference (SNR≤−10 dBA). This leads to incorrect assessment of speech information security, obtained by these methods, and, accordingly, to underestimation of the possibility of interception and recognition of confidential speech information by an intruder.
Thus, there arises an urgent need for the development of the improved method for assessing the speech information privacy at the sites of information activities. The method should be independent on the structure and the principles of operation of the systems of technical protection of speech information with limited access and be able to take into consideration modern methods of digital processing of phonograms.

Literature review and problem statement
All the systems of assessing the level of speech information security at present can be divided into the following groups: -control of energy parameters of a speech signal; -control of frequency spectrum; -control of the speech signal intelligibility. Control of energy parameters of a speech signal on the boundary of the controlled zone in Ukraine is performed in accordance with [1,2]. The estimation criterion is the Signal-to-noise ratio -SNR, which is measured in the frequency range from 100 Hz to 10 kHz by ⅓-octave bands. Calculation and comparison of the derived values are performed according to [3].
A similar method for assessing speech information privacy for closed spaces is used in Canada. Measurements are carried out in accordance with ASTM E2638 [4]. The evaluation criteria are the magnitudes of Speech Privacy Class (SPC) and uniform-weighted signal-to-noise ratio (SNRuni32).
Speech signal intelligibility is assessed by the criteria: -W (word speech intelligibility (WSI) [12]; -АІ (Articulation Index) [6,12,15]. The analysis shows that not all the above methods can be applied to the assessment of the level of threats of speech information interception by an intruder. Thus, the methods for assessing the speech signal privacy based on the study of frequency spectrum and determining indices STI, RASTI, STIPA, STITEL and %ALcons make it possible to establish the information integrity level. At the same time, this level will be almost the same for legal listeners and for an intruder (illegal listener). The difference will be caused by the parameters of the premises independent on a listener, such as sound insulation, parameters of the intruder's equipment and others.
Other methods, despite a significant difference in the principles of assessing the level of speech signal privacy and research technologies, have many similarities. The most essential of them is the assessment of the impact of noise interference on a speech signal on the boundary of the controlled area.
However, all above methods share one general drawback -they do not take into account the capability of an intruder to conduct specialized processing of an intercepted signal in order to clear it from outside noise. Such clearing methods can include, first of all, wavelet-transformation, phonetic-correlation analysis, adaptive and neuro network filtration. This leads, as shown in paper [14], to a significant increase in the level of speech signal recognition and, accordingly, to a critical increase in the values of such indices as SII, W (WII), AI (AI), SPC, SNR and SNRuni32.
The use of energy parameters has a series of important technological advantages -ease of research, availability of laboratory equipment and evidence of obtained results. At the same time, this method has a significant drawback -assessment of the level of speech information privacy by the SNR ratio means that the ratio of root mean square levels of the signal and noise intensity is found. In relation to an interference signal, which usually uses white noise (or its "clones", such as "pink", "blue" and similar noises), this approach is justified. In relation to a speech signal, this approach, even with division of the frequency spectrum by ⅓-octave bands, is inadmissible.
When replacing a speech signal having a random form with its root mean square levels, even when using the division of frequency spectrum by ⅓-octave bands, there is a significant divergence between peak values and root mean square level [13,14].
In [4], there is an attempt to fix this drawback by introducing the SPC and SNRuni32 coefficients.
SPC is determined from expression: where LD(avg) is the average value of sound intensity, measured at some points of the closed premises (distant premises, hereinafter referred to as DP), calculated in the average value for control points (CP) of outer space (beyond the controlled zone); is the level of natural interference in the CP.
According to [4], the CP are chosen on condition of maximum benefit for an intruder, that is, in places of the worst sound insulation level, without taking into consideration the possibility of installing by an intruder of devices for speech information interception. The distance from the outside of the enclosing structure of a DP (for example, walls, windows, doors, etc.) to the CP is 0.25 m, according to [4]. This makes it possible to increase the level of the speech signal in the CP by 11-12 dB in comparison with measurements at the distance of 1 m.
The SNRuni32 coefficient, according to [4], is determined as the difference of levels in the CP of a speech signal By neglecting the values, for which the difference is less than 32 dB, ( ) TS L f can be found as where ( ) SP L f is the sound intensity, measured at some points in the closed premises (controlled zone).
Then, (2) will take the form During transition to average by ⅓-octave bands, we will obtain Taking into consideration (1), we obtain Hence, The expression above shows the relationships between SPC and SNRuni32 coefficients.
Papers [20,21] present the values of intelligibility threshold in the rooms for SNRuni32, depending on the necessary level of speech information privacy. Three levels were considered: "intelligibility in free space", "intelligibility in the premises" and "intelligibility of the fact of talk", which correspond to values -11 dB, -16 dB and -22 dB [21]. The features of designing security systems based on the coefficient were shown and the scale of conformity of coefficient values to the level of the object security (its category) and frequency of intelligibility of words (short phrases) within the normative period of time was introduced. The shortcomings of this paper include the limited coefficient SPC (linguistic peculiarities of speech information and the influence of noise interference were not taken into account), and limitation of the proposed scale only to coefficient SPC.
Thus, if we accept for ( ) avg SP L typical value of 64 dB (according to [12,22] "quiet speech"), SPC takes the form which corresponds to the level of "speech security". Paper [15] deals with the analysis of the features of using AI and SII coefficients, article [6] contains the recommendations on their application. It was shown that the SII coefficient is the improved AI and it is intended to determine the level of intelligibility of speech information. The scope of application includes bioacoustics, occupational safety and protection of places where there are people (offices, dwelling, public places and others), as well as speech information privacy (estimation of the level of speech information intelligibility on the boundary of the controlled area). The calculation method makes it possible to explore and calculate the SII coefficient on condition of the influence of acoustic noise of different types and intensity on a speech signal. In addition, in [15] and [6], they imposed restrictions on the signal/noise ratio, in which the comprising magnitudes are accepted as equal to zero (for ⅓-octave bands or for specialized areas). This leads to the possibility of incorrect evaluation of information availability in the band (area). On the other hand, in the case of signal filtering, at significant levels of noise interference, there is a significant distortion of the spectrum, which leads to the possibility of mistaking the introduced distortions for the information signal.
Paper [16] explores the use of the AI coefficient for different structures and intensity of noise interference, considers their influence on the possibility of signal recoveryspectral composition and signal levels on the bands. However, the paper does not take into consideration the linguistic component of the information signal.
The specific drawback was considered in [17]. The conducted studies revealed the possibility of substitution of vowels with consonants. This occurs due to the appearance of spectrum distortion in the process of the interaction of a speech signal with noise interference. The paper considered the characteristics of noise interference and estimated the magnitude of their influence on a speech signal. The studies were conducted with the involvement of presenters who are native speakers of English. The impact of noise on the adequacy of the speech signal perception for other languages was not explored in the paper.
The studies of the influence of interferences of combined noise on speech transmission were performed in [18]. The studies were performed based on the model of open public space. Sound fields for dominant noises were implied using the typical model of a city square, surrounded by houses. The traffic noise and two types of construction noises, corresponding to stationary and pulsed noises, were selected as background noises. Listening tests were conducted on a group of adults, and the speech transmission quality was evaluated according to the criteria of the difficulty of presenter's speech recognition, as well as intelligibility indicators. However, instrumental parameters of speech signals and noise were not controlled in this case, only subjective assessment of the possibility of presenter's speech recognition was carried out.
In article [19], the evaluation of the speech intelligibility level was conducted by the indicator of short-term objective intelligibility (STOI). In the studies, the speech that was mixed with white noise at low values of the SNR coefficient was supplemented by the ideal binary mask (IBM). The STOI was used for prediction of intelligibility of both loud and frequency-weighted speeches. Native speakers of English (British) participated in the studies. The possibility of restoring the signals at a significant influence of noise interference, sufficient for its recognition by instrumental methods, was shown. Other types of noise interference and their influence on objective and subjective characteristics of a speech signal were not considered in the paper.
In studies [24][25][26], it is stated that the minimum number of recognized formants for confident recognition of a phoneme should by not less than two. This is due to the complex spectral composition of phonemes, caused by the peculiarities of their formation in the voice tract of a person. Paper [24] contains the results of the formation of phoneme [a] by allophone transformations from short-term fragments under conditions of low intensity of a sound (whisper). The studies showed that only 70-80 % of short-term fragments are able to form the corresponding phoneme with an assigned reliability of identification. This is caused by the existence in short-term fragments of a significant number of additional non-stationary sections that do not bear any information about the appropriate phoneme.
Article [25] presents the research into the phenomenon of formant splitting, when a group of sub-formants emerge in the place of one simple (scalar) formant. Such phenomenon is characteristic of sibilants and hushing consonants [ ] have even more complex spectral composition. These sounds have not two, but three characteristic frequencies -the first basic nasal formant Fp1(n), the first basic mouth formant Fp1(m) and the second constant formant Fp2. This led to the appearance of additional spectral components. For example, according to [25], for sound [н], apart from the specified Fp1(n), Fp1(m) and Fp2, the additional spectral composition is determined from dependence: Fp2+n·Fp1(n), where n={1, 2, 3, 4}. Sound [л] in the Ukrainian language is characterized by the existence of additional combinatory formants that occur at the frequencies close Fp2-Fp1, Fp2+n·Fp1, where n={1, 2, 3, 4, 5}.
In addition, it is necessary to take into account the fall by 3 dB on the octave of signal intensity that is characteristic of most of the phonemes, which leads to an increase in the impact of noise interference [12,22].
Thus, analysis  shows the lack of scientifically substantiated method assessing the level of speech information privacy that can take into account the use of modern methods for speech information recovery by an intruder. The method should take into consideration objective (instrumental) and subjective (linguistic) characteristics of a speech signal and assess the privacy level, depending on the category of information, requirements to the selected premises and temporal parameters of observation.

The aim and objectives of the study
The aim of this study is to improve the method for assessing the level of speech information privacy, under conditions of the possibility of using modern methods and technologies of noise interference filtration by an intruder.
To achieve the aim, the following tasks were set: -to perform analysis of the methods for assessment of the level of speech information privacy used in Ukraine, EU, USA and Canada and to conduct their comparative analysis; -to substantiate the need for improvement of the procedure for determining the level of speech information privacy based on the coefficient (index) of speech; -intelligibility by taking into consideration the features of allophones and introduction of the coefficient of the residual speech intelligibility; -to study the possibility of using diphones and triphones as a simplified linguistic model of allophones and to analyze the frequency of their use in the Ukrainian language; -to analyze the impact of the quality of recognition of the basic tone frequency and formants for the reliability of assessment of the level of speech information privacy; -to propose the scale of objective evaluation of the degree of speech information privacy on the boundary of the controlled area by the criterion of residual speech intelligibility.

Analysis of methods for assessing the level of speech signals security
Comparison of coefficients SPC (according to [4,21]) and SNR (according to [12,22]), in the definition that is accepted on the post-Soviet territory and the coefficient of word intelligibility of speech W [22] is shown in Table 1.
Thus, the coefficient of word intelligibility of speech W makes it possible to link the methods for evaluation of speech information privacy, based on the principles of measuring the sound intensity, such as [4,22].
It should be noted that the SPC sets up the requirements for DP at the design phase of safety systems -sound insulation systems. The value of the coefficient essentially depends on the maximal level of probable average value of speech sound intensity in the DP. In case of inspection of premises and determining the actual level of information privacy, the SPC is determined based on the SNRuni32 coefficient, according to (7).
More information about the effectiveness of security systems is given by the AI and SII coefficients. It should be noted that the SII, according to [6], is a formal substitution for the AI articulation coefficient, which was in 1997. However, [4] was put into operation in Canada in 2010, where the use of the AI coefficient was resumed.
At significant levels of noise interference (SNR= -10…-22 dB) in spectral ⅓-octave bands, a signal of interference starts to increase the levels of formants of the second and third groups for most phonemes. The level of formants of the fourth and the following groups is usually hardly comprehensible at the ratios SNR≈-0 dB. Thus, for example, during forensic phonoscopic examinations [23,24], it is a mandatory condition in order to determine the authenticity of a presenter to ensure the minimal impact of the influence of noise interference on the presenter's speech. This is only possible at SNR≥0 Db (3... 5 dB).
The main advantage of SII in comparison with SPC and SNR is taking into consideration the informativeness of spectral ⅓-octave bands. That is, not the signal/noise ratio, but rather the presence the formants of the phoneme in a specific band of frequencies is essential in calculations. According to [6], SII is determined as: where n is the number of bands, according to which the speech intelligibility coefficient is calculated (usually n=6 or 7 -when using octave bands, or n=18 or 21 -when using ⅓-octave bands); I is the coefficient of information weight in the band; А is the coefficient of information recognition quality in the band. Coefficient of information weight in band I is determined for each speech and depends on the likelihood of appearance of sound formats in a particular octave or ⅓-octave band and existence of additional spectral components [24,25]. The coefficient of information recognition quality in band A reflects the likelihood of correct recognition of a formant or its additional spectral components in the studied octave or ⅓-octave band and, according to [6], is determined as: where L ∆ is the difference between the peak level of a speech signal and effective noise level. That is, Summand "15 dB", according to [6], ensures the transition from the peak signal level in the band to its root mean square (effective) level and further to coefficient SNR that is typical in Ukraine.
Coefficients I and А can be determined in another way -the coefficients are normed in many languages. Thus, in [12,22], these coefficients are specified for the Russian language -weight coefficient of band i k and formant speech parameter in band .
i A ∆ Analysis of coefficient of information intelligibility quality shows that according to (9), SII is determined in the dynamic range of 30 dB, shifted to the right by 15 dB. This indicates that it is appropriate to use coefficient SII to assess the level of speech information privacy at SNR≥-15 dB. That is, in conformity with the "Very high speech security" level in Table 1.
However, the coefficient has a considerable drawback. According to (9) SII, it is inappropriate to use it in automated systems for assessment of the level of speech information privacy at considerable noise pollution level equal (SNR<-15 dB). In papers [12,22], it was shown that at coefficient W=20 % for 5 and 6 octave bands (root mean square frequencies of bands of 2,000 Hz and 4,000 Hz), coefficient SNR is 18.2 dB and 24.7 dB. Measurements were performed using white noise at interference. Weight coefficient of band i k for the specified bands was found to be 0.30 and 0.26, respectively. In this case, these values, according to [12,22], are the highest, that is, more than a half of all formants are concentrated in them.

Improvement of the method for determining the level of speech information privacy by introducing the coefficient of residual speech intelligibility
Determining the level of speech information privacy in the systems, which use the means of setting active interference, should include the preparatory stage. The main requirements for it are: -the phonograms of specialized articulation tables, formed on the basis of short common use phrases and by professional orientation of the studied protection object (including terminology, slang, proper names, abbreviations, etc.) are used as test signals; -the studied phonogram is obtained according to the typical technology for determining the level of speech information privacy based on the SNR coefficient (or any of SPC, AI, SII, SNRuni32 coefficients); -clearing the studied phonograms is performed by modern methods and technologies of noise interference filtration.
Assessment of the level of speech information privacy based on the cleared phonogram can be performed by the following methods: 1. Linguistic examination of oral speech. 2. Determining the SII coefficient based on the correlation analysis of the original of a test-signal and a "cleared" signal.
3. Determining the coefficient of residual speech intelligibility.
The first method implies the involvement of specially prepared auditor (listener). Its advantage is determining the level of speech information privacy based on the direct method -listening. The shortcomings of the method are subjectivism and considerable labor intensity.
The second method is based on the preparation of the specialized database for each articulation table. The database specifies the coefficients of information weight for the bands for each phoneme in each of the words that are included in the articulation table, based on which the phonogram of a test-signal was recorded. The base is formed relying on analysis of a test-signal (phonogram) at the preparatory stage. During the base generation, it is necessary to take into consideration the peculiarities of the speech, professional focus of the staff of an object, homonyms and others. Thus, for example, the word "compass", depending on the professional orientation, can change the stress -"кòмпас" or "компàс". It is possible to use the words-homonyms "на бèрезі" and "на берèзі", and so on. This causes a change in the frequency of basic formants, their splitting, appearance of additional formants [24,25].
The disadvantages of this method are: -significant overstatement of the level of speech information privacy due to the use of the correlation method and databases; -existence of residual noise interference at frequencies of the upper octaves; -distortion of the spectrum of formants at frequencies of upper octaves.

Method 3.
It is proposed to introduce a new criterion of assessment of the level of speech information privacy -RII (Residual intelligibility index) -, R S determined from expression: where m is the total number of allophones in a word; R a is the coefficient of the importance of allophone recognition for recognition of a word (phrase, sentence); H is the coefficient of the frequency of use of this allophone in speech, n is the number of ⅓-octave bands, by which coefficient is calculated (n=18 or 21); І is the coefficient of information weight in the band for allophone recognition; А is the coefficient of information recognition quality in the band.  Fig. 1 shows the example of research in the MATLAB environment (version R2015a) into the influence of noise interference (white noise) at SNR=-10 dBA on the word "ship" from the expression "coast guard ship". Fig. 1, а shows the phonograms of the original (Original), noise signal (Signal & Noises) and residual (Residual Intelligibility) signals. Their spectral analysis is shown in Fig 1, b. As Fig. 1, b shows, at SNR=-10 dBA, the frequency of the basic tone is unambiguously determined for noises signal and residual signal. For the residual signal, cleared from interference, it is possible to recognize formants Fp1, Fp2 and Fp3, using correlation analysis for splitting the words into allophones and correlation analysis of the spectrum of allophones by "falls". а b Fig. 1. Studying the influence of noise interference (white noise) at SNR=-10 dBA on word "ship" in the MATLAB environment: а -noise interference filtration; b -spectral analysis on the long-term spectrum The linguistic component considers the peculiarities of formation and use of the language in the state (region), dialects, words-parasites, slang, etc. That is, this component considers the factors that affect the formation of words from allophones -the importance of allophone recognition by a listener to understand the essence of the word and the frequency of its use in speech. The importance of these factors is that: -the duration of sounding of the formed phoneme in the equal tempo text is normally 50-60 % of the time of its sounding (depending on the type of phonemes -vowel/consonant, sonorant/noisy, etc.). That is, the process has a discrete nature by time and readiness of a phoneme for analysis. Given that the length of the phoneme in the word is 0.1-0.25 s, time for the formation of the tangent in the frequency-temporal distribution for the formed phoneme at a significant influence of the noise interference is insufficient. This results in a decrease in the likelihood of the reliable recognition; -the process of forming a word from allophones is continuous by nature. This is due to the essence of the concept of allophone as a realization of a phoneme, its variant, caused by specific phonetic surroundings in the word, the stress in a word and in a sentence. Thus, the process of allophone formation actually begins and ends in the structure of the previous or the posterior allophone (or a pause between the words) -there occurs the process of mutual allophone overlapping. Thus, the period of time to analyze the formation of the tangent to the formants significantly increases. In this case, the use of the accumulation method can significantly reduce the influence of noise interference and, accordingly, increase the probability of reliable recognition of allophones and a word in general; -different frequency of using allophones in words makes it possible to significantly increase the level of allophone recognition in a word. Given the definition of the term "allophone" and the number of native speakers, we can assume that the total number of allophones tends to infinity. Therefore, there is a need to formalize the approach based on a simplified linguistic model -a phoneme (a letter), a diphone (two letters) and a triphone (three letters). Text documents are used as an information source [27][28][29][30][31][32].
An increase in recognition reliability is possible in the transition from analysis of frequency of using phonemes to frequency of using allophones. For the word "ship", the frequency of using phonemes [к], [о] and [р] can be found from Fig. 2 or [28,29,32] to be 4.01; 8.23 and 4.12, respectively.
Conducted studies of diphones show that for the Ukrainian language in the texts with the total number of letters of 276 thousand, phoneme [a] occurs 27,318 times. Table 2 shows the data on the frequency of using some diphones with background [a].
As can be seen from research results shown in Table 2, there is a low probability of occurrence in the studied phonogram of a large number of diphones, and accordingly, the allophones associated with them: Analysis of frequency of using triphones, created based of vowel [o], shows that they can be divided into four groups: 1. Triphones that have the highest frequency of usemore than 2 %. These triphones include [   At the same time, it is necessary to take into consideration that the phonemes that are close by the way of formation due to the destructive influence of noise interference or filtration means and/or phonogram processing can change the formant frequencies.
At the same time, out of 1,444 triphones, formed based of vowel phoneme [o], only 117 are used-their frequency of use is not zero.
Thus, the prevalence of the RII coefficient over the SII coefficient manifests itself by the transition from phonemes to the allophones, which have the envelope form that is more resistant to outside influences and take into consideration frequency-temporal distribution of information in the formants. The distribution, in general, depends on the features of the speech formation of both a particular presenter, and the speech of the given region (state). This allows, even at critical values of signal/noise ratio (SNR≤−15 dB), substantial recovery of speech information.
Analysis (10) shows that the RII coefficient is the average of residual speech intelligibility for this test-signal under specific conditions of noise. This makes it possible to neutralize the influence of separate significant deviations of intelligibility by words when long-term changes are taken into account.
Technologies of determining the RII coefficient involves the development of the specialized allophone base, which can take into consideration not only the characteristics of a particular speech, but also the dialect peculiarities of the speech of the regional presenters and the influence of other speech. Such base will be much larger than the base that is developed by the second method considered above. However, its advantage will be universality (independence on a certain articulation table) and the possibility of more distinct recognition of diphones, affricates, sibilants and hushing consonants and others.

Analysis of frequency of using of diphones and triphones (linguistic models of allophones) in the Ukrainian language
Objective evaluation of the degree of speech information privacy on the boundary of the controlled area should be based on the likelihood of recognition of speech information in the intercepted signal by an intruder with the reliability that is sufficient to recover a certain volume of information. Given that an allophone is accepted in (10) as a speech unit, it is logic to link the evaluation scale to the averaged level of allophone recognition, and accordingly word recognition -the use of the dual ordinate scale. If an expert (or a decision support system) has the originals of test-signals, by which it is possible to establish the correspondence of a certain group of allophones to a particular word, this approach is correct. The dependence argument is the number by order of an allophone in the word of test-signal.
For noisy and cleared from noise signals in Fig. 1, only the section of frequency of basic tone (Fp0) gives high rates of coefficients of information weight and the quality of information recognition in the band. In other sections, the value of the coefficient of the quality of information recognition will be minimal, which leads to obtaining the false assessment of information privacy.
The consideration of the linguistic component through the correlation analysis of words for their splitting into allophones, in combination with correlation analysis of the frequency spectrum, makes it possible to recognize formants (Fp1, Fp2, Fp3 and Fp4), therefore, to significantly decrease the assessment of the level of speech information privacy.
Research results ( Table 2, Fig. 2, 3) make it possible to determine the value of coefficient of frequency of using this allophone in speech (H) and coefficient of importance of allophone recognition for word recognition (R a ) in dependence (10).
According to the conducted research ( Table 2, Fig. 2, 3), the maximum value of the frequency of using phonemes, diphones and triphones does not exceed 10 % (0.1 cond. u.). Thus, for the coefficient of frequency of using allophone (H) in a word of the test-signal, it is possible to introduce the dependence: where h is the frequency of using the triphone (cond. u.).
Dependence (11) is approximated by expression Dependences (11) and (12) are designed both for manual calculation and for using in decision support systems for automated systems for assessing the level of speech information privacy.

Analysis of the quality of basic tone frequency recognition and formant on the quality of assessment of the level of speech information security
The quality of information recognition in the bands of frequency spectrum of a phoneme (a word) essentially depends on the level of destructive influence of noise interference and a curvature of frequency spectrum due to the use of digital methods of processing (Fig. 1). The simplest means for determining the recognition quality is the use of the spectral-correlation method.
However, at significant levels of destructive impact (at SNR≤-10 dBA) of its use is ineffective. The solution to the problem is to use allophones and their main componentsthe frequency of the basic tone (Fp0) and formants (Fp1, Fp2, Fp3 and Fp4). In this case, in fact, it is actually necessary to assess the quality of recognition of envelopes and their reciprocal placement on the frequency spectrum. At significant influences, which lead to impossibility to determine the envelopes correctly (Fig. 1), the use of correlation coefficients is incorrect.
To assess the quality of recognition of the basic tone and the formants, it is proposed to use the coefficient of importance of allophone recognition for the recognition of word , a R based on (10) and Fig. 1, is determined as follows: where k is the number of formants in an allophone (normally k=3…4, is determined for an allophone by a test-signal (Fig. 1)); j S is the coefficient of weight of a formant in allophone recognition; j F is the average value of the level of allophone formant signal in a test-signal ( 0 F is the average value of the level of signal of the basic tone of an allophone in a test-signal); j n L is the average value of the noise level on the formant section; j R L is the average value of the level of residual noise and the noise caused by the spectrum curvature during mathematical treatment (filtration, wavelettransformation, etc.).

The scale of objective evaluation of the level of speech information privacy
Taking into consideration (10) to (14), it is proposed to introduce the scale of objective evaluation of the degree of speech information privacy on the boundary of the controlled zone by the criterion of residual speech intelligibility. When forming the scale, the provisions used in [4] (Table 1) and correlated with the requirements [1][2][3], were taken into consideration.
When developing the scale, it is taken into consideration that at the average speech tempo, the average presenter pronounces 100-120 words/min. Taking into account that a word on average consist of 2-4 allophones, we obtain 200-480 allophones/min. Under such original conditions, based on Table 1, it is possible to establish the scale of objective evaluation of the degree of speech information privacy on the boundary of the controlled zone by the criterion of residual speech intelligibility (Table 3). Table 3 implies a uniform law of distribution of recognized words within the set period. Table 3 Scale of objective assessment of the level of speech information privacy on the boundary of the controlled area by the criterion of residual speech intelligibility Assessment of the degree of speech information privacy by the frequency of allophone recognition implies their uniform distribution in groups of 2-3 allophones, which make it possible to recover certain words of a test-signal with certain reliability.
In Table 3, not only the requirements for the methods of assessing the level of speech information protection from leakage through acoustic and vibration channels were established, but also the distribution of premises taking into consideration the categories of information was introduced.

Discussion of the results of research into the improved method for assessment of speech data information privacy
The proposed mathematical dependencies (10)- (14) create the theoretical base for the method of determining the level of speech information privacy based on the residual intelligibility index (RII). The implementation of the method will make it possible: -to improve the procedures and technologies of assessment of speech information privacy operating in Ukraine; -to take into account the modern state of the methods of digital processing of phonograms; -to ensure enhancement of reliability of assessment of its security level.
At the same time, the improved scale of objective evaluation of the degree of speech information privacy on the boundary of the controlled zone by the criterion of residual speech intelligibility (Table 3) was suggested, which allows: -transition from the two-level assessment of the levels of speech information security that operates in Ukraine to the four-level assessment, which will enable constructing a security system depending on the object category and optimizing the material costs; -establishment of the requirements for signal levels in the systems of setting active acoustic and vibration interferences in accordance with the object category; -decrease in the level of signals of generators of acoustic and vibration interferences for levels "Standard speech privacy" (SSP) and "Standard speech security" (SSS), which makes it possible to significantly improve the working conditions of employees.
Thus, the obtained results make it possible to indicate the achievement of the set goal. In addition, it is possible to indicate that: 1) The conducted analysis of the existing methods for assessing the level of information security showed the existence of three dominant methods -SNR (Ukraine and the state in the post-Soviet space), SII (USA and European countries) and SPC (Canada). However, each of these methods has its own peculiarities, which actually do not make it possible to compare transparently the results of their application. It was proposed to use the criterion of word intelligibility of speech (W), introduced in [12,22] as an auxiliary section. The main comparison criterion was the requirements to the level of speech information security and the possibility of interception of its part by an intruder for a certain period. This allowed formalizing the approaches to assessing the level of speech information privacy by different methods and unifying the requirements for the systems of speech information security (Table 1).
2) The use of the complex methods that combine subjective (word intelligibility of speech) and objective (instrumental research into spectrum frequency) methods for assessing the level of speech information privacy is most promising. Such methods were called objective evaluation methods. The American method based on determining the speech intelligibility index (SII) is the most common. However, analysis of its use showed its limited applications -the capabilities of the modern methods for digital processing of phonograms, such as wavelet-transformation, adaptive filtration, spectral and correlation analyses, etc. are not taken into account. Fig. 1, b shows the impact of the noise interference with the ratio SNR=-10 dBA on the test-signal and the spectrum distortion after the filtration procedure. As Fig. 1 and (9) show, when using the SII coefficient, a rather optimistic assessment of the security level will be obtained. The Signal & Noises analysis indicates that only on the spectrum section up to 300 Hz (section of basic tone), the value of coefficient of quality of information recognition in band (A) will differ from 0. According to (9): -for sections from 50…300 Hz, we obtain is the coefficient of information weight in the band, determined according to [12]. Such value corresponds to the level "High speech security".
However, after filtration (signal "Residual Intelligibility"), using wavelet-transformation, we obtain That is the coefficient increased by an order of magnitude and corresponds to the level "Standard speech privacy".
The ways of solving the specified problem involve taking into account the linguistic component of speech information and allophone resistance to noise interferences and spectrum distortion. It was proposed to introduce the RII (Residual intelligibility index and the procedure of its calculation (10).
3) The use of diphones and triphones as a simplified linguistic model of allophones was proposed and analysis of the frequency of their use was performed (Fig. 2, 3, Table 2). The analysis was conducted for specialized texts in the area of "Information security", "Finance" and "Criminology". The texts of the total volume of more than 20 Mb of the text format were analyzed [32]. The results of the analysis were compared with the results of other authors [28][29][30]. While the results were generally the same, some divergences were identified, which can be attributed to the peculiarities of professionally oriented texts.
We proposed to divide the triphones into 4 groups, depending on the usage frequency, and derived the dependences for calculation of coefficient of the allophone usage frequency (11) and (12) as a component of (10).
4) The analysis of the quality of recognition of the basic tone frequency (Fp0) and of formants (Fp1, Fp2, Fp3 and Fp4) was performed to determine the quality of assessment of the level of speech information security. The dependences for calculation of coefficients of the importance of allophone recognition for recognition of words for speech and noise signals (13) and (14) residual intelligibility signals were proposed.
5) The scale of objective assessment of the degree of speech information privacy on the boundary of the controlled zone by the criterion of residual speech intelligibility was proposed (Table 3). The requirements for the level of speech information privacy and the possibility of interception of its part by an intruder for a certain period were taken into account.
Thus, the results of the theoretical and applied research suggest the possibility of using the proposed method for assessment of the level of speech information privacy with the limited access on the information activity sites of Ukraine. The method takes into consideration the structure and the principles of operation of modern systems of speech information security and makes it possible to integrate into the current procedure of attestation of separated premises. Taking into consideration the capabilities of modern methods of digital processing of phonograms (wavelet-transformations, spectral-correlation analysis, adaptive filtration and others), together using articulation tables, ensures the reliability of the assessment of the level of speech information security.

1.
A review of the methods for assessing the level of speech information privacy used in Ukraine, the EU, the USA, and Canada was carried out and their comparative analysis was performed. This allowed formalizing the approaches to the assessment of the level of speech information security by different methods and unifying the requirements for the systems of speech information privacy. The main criterion for comparison was the requirements to the level of speech information security and the possibility of interception of its parts by an intruder in a certain period.
2. We substantiated the necessity of improvement of the methods for determining the level of speech information security based on the speech intelligibility coefficient (index) by taking into consideration the peculiarities of allophones and the introduction of the coefficient of residual speech intelligibility This approach makes it possible to enhance the reliability of assessment of the level of speech information security during digital processing of phonograms -noise filtration based on wavelet-transformation, spectral and correlation analysis, application of adaptive filtration, etc.
3. The use of diphones and triphones as a simplified linguistic model of allophones was proposed. This allowed formalizing the structure of allophones and analyzing the frequency of their usage in the Ukrainian language. The triphones were divided into 4 groups, depending on the usage frequency, and analytical dependences for calculation of coefficient of frequency of usage of triphones (allophones) for the problems of technical protection of information.
4. The analysis of the destructive influence of noise interference and of distortions of spectral composition, which arise when using the methods of digital processing of phonograms, was performed to determine the quality of the recognition of the frequency spectrum of the basic tone and the formant. Their influence on the level of speech information security was estimated and analytical dependences for the coefficient of importance of allophone recognition for the recognition of words (phrases, sentences) for noise and sounds and residual intelligibility signals were proposed.
5. The scale of objective estimation of the degree of speech information security on the boundary of the controlled zone by the criteria of residual speech intelligibility and the possibility of interception of its parts by an intruder for a certain period was proposed. Its usage makes it possible to improve the quality of assessment of the level of speech information protection from leakage through acoustic and vibration channels, taking into consideration the information categories.