Devising a New Filtration Method and Proof of Self-Similarity of Electromyograms

The main attention is paid to the analysis of electromyogram (EMG) signals using Poincaré plots (PP). It was established that the shapes of the plots are related to the diagnoses of patients. To study the fractal dimensionality of the PP, the method of counting the coverage figures was used. The PP filtration was carried out with the help of Haar wavelets. The self-similarity of Poincaré plots for the studied electromyograms was established, and the law of scaling was used in a fairly wide range of coverage figures. Thus, the entire Poincaré plot is statistically similar to its own parts. The fractal dimensionalities of the PP of the studied electromyograms belong to the range from 1.36 to 1.48. This, as well as the values of indicators of Hurst exponent of Poincaré plots for electromyograms that exceed the critical value of 0.5, indicate the relative stability of sequences. The algorithm of the filtration method proposed in this research involves only two simple stages: 1. Conversion of the input data matrix for the PP using the Jacobi rotation. 2. Decimation of both columns of the resulting matrix (the so-called "lazy wavelet-transformation", or double downsampling). The algorithm is simple to program and requires less machine time than existing filters for the PP. Filtered Poincaré plots have several advantages over unfiltered ones. They do not contain extra points, allow direct visualization of short-term and long-term variability of a signal. In addition, filtered PPs retain both the shape of their prototypes and their fractal dimensionality and variability descriptors. The detected features of electromyograms of healthy patients with characteristic low-frequency signal fluctuations can be used to make clinical decisions.


Introduction
Electromyograms (EMG) are recordings of electrical signals from muscles and nerves that control muscles [1]. Electromyogram plots provide a high level of diagnostic information [2]. Modern electromyography uses computer mathematical systems to process EMG [3].
Real entries at first glance look like complicated and chaotic fluctuations. These sequences are characterized by considerable variability. Changes in the statistical characteristics of a signal during its receipt determine the essence of variability [4].
Generally accepted means of processing may not take into consideration the subtle features that vary in time. Meanwhile, these missed details may contain important diagnostic signs, so a correct and thorough reading of electromyograms is a serious problem in therapy. This problem is also common for all complex time sequences [5].
Poincaré plots (PP) are a special tool that ensures acceleration and visualization of the results of analysis of variability of time series [4][5][6][7][8][9]. The PP displays the complete EMG record on one 2D plot. A classic PP is a diagram of scattering of sequence terms relative to their predecessors, that is, a kind of delay map with a single offset (the so-called lag) [4].
Poincaré plots are widely used to process medical signals [5][6][7][8][9]. First of all, they are a way of preliminary visualization, however, they can provide quantitative results to assess variability. The authors of [8] proposed basic numerical descriptors for Poincaré plots: two standard deviations (SD1 and SD2) and their ratio SD1/SD2 ratio. Determine them as in [9]: ( ) 1 2 , SD SD a = ⋅ ( ) 2 2 , SD SD b = ⋅ 1 , 2 n n n S S a tional, and its purpose is to show the possibilities of using PP in the processing and visualization of medical signals using standard descriptors SD1 and SD2.
Paper [8] is also an example of using the Poincare plots to process biomedical signals. It explores the effectiveness of the use of the PP in processing and observation of changes in the heart rhythm caused by changes in the load.
In addition, an example of the use of PP in the processing of medical signals is article [9]. It shows the methods of Poincare plots filtering. In addition, a comparison was made with the method of discrete Fourier transform, which showed the advantages of the PP method.
The above papers [2][3][4][5][6][7][8][9] show the importance of using Poincaré plots in processing and presenting medical signals and demonstrate their value in the ability to reflect non-linear aspects of a data sequence. At the same time, methods of Poincare plot filtering have only recently begun to develop [9] and are sometimes overly complex [10][11][12] for clinicians. Filtration, which is accompanied by data sifting (downsampling), initiates the question of possible fractality (statistical self-similarity) of Poincaré plots. However, this problem remains practically unexplored.

The aim and objectives of research
The purpose of this research is to identify the self-similarity of Poincaré plots for electromyograms by checking if the scaling law is true for them. This enables the separation of data of healthy patients and patients with myopathy or neuropathy.
To achieve the set aim, the following tasks were set: -to explore classic Poincaré plots and their variability descriptors for electromyograms; -to study modified (filtered) Poincaré plots and their variability descriptors for electromyograms; -to check if the law of scaling is true for PP electromyograms and to identify the ranges of scales in which it is true.

1. Research data
The data from the PhysioNet portal were used in the research. To obtain all electromyograms, concentric needle electrodes (25 mm), placed on the patient's tibial muscle, were used. A patient gently bent the foot, overcoming some resistance, and then relaxed it [1].
The sampling rate for all records was 4 kHz. The signal magnitude is assigned in mV. Table 1 shows additional information contained in the files [1]. Table 1 Brief information about a patient [1]. N n S The first descriptor SD1 determines the short-term variability of the time series, the second SD2 -the long-term variability. The first can be called high-frequency (SD1), and the second -low-frequency variability indicator (SD2). Their ratio assesses the effect of random factors on a signal [8]. The method of Poincaré plots is widely used in the area of examination of heart rate variability [5,6,8,9]. However, its scope is much wider [7].
Variability of most medical and physiological signals, including electromyograms, is one of the manifestations of the state of dynamic equilibrium of living organisms in the environment of the so-called homeostasis. Digital analysis methods, one of which is the Poincaré plot method, look especially promising in the concepts of the Internet of Things (in particular smart medical sensors) and personalized medicine, which indicates the relevance of corresponding research.

Literature analysis and problem statement
Paper [2] describes the techniques of analysis of electromyograms data, namely, determining, processing, classification. electromyogram signals received from muscles require modern and progressive methods for their detection, decomposition, processing, and classification. The paper illustrates various methodologies and algorithms for analyzing EMG signals to provide effective and instrumental ways to understand a signal and its nature. Certain implementations of hardware complexes using EMG, focusing on applications related to hand prosthetics, grip recognition, and human interaction with the computer, are also highlighted. There is also a comparative study to show the effectiveness of various methods for analyzing EMG signals. However, the purpose of the research was to review the information on EMG without detailed attention to the processing of this type of signal.
Research [3] describes the methods for computer processing of electroneuromyography signals using the system of Maple computer mathematics. It also contains the results of frequency and statistical analyses of electroneuromyography records for a healthy person and a patient with myopathy. However, this work does not answer questions about the self-similarity of electromyograms.
Article [5] uses the method of dynamic density delay map to visualize the behavior of complex systems. Animations based on this method visualize the values of dynamic properties of complex systems that are not visible in plots of time series or in standard Poincare plots. However, this method for processing and visualization may not take into consideration the subtle features that vary in time.
A new descriptor is proposed, called the value of complex correlation for quantifying the time aspect of Poincaré plots, was proposed in research [6]. The authors claim that this descriptor is effective in detecting arrhythmia and stagnation heart failure compared to a normal heart rate. However, the authors did not conduct a study of the effectiveness of using this descriptor to process electromyogram signals.
In paper [7], the authors show the use of Poincare plots to process biomedical signals. The paper itself is observa-The Jacobi orthogonal rotation matrix, which can diagonalize the aforementioned matrix (3), takes the following form: The angle of rotation θ is from the obvious condition: where the right-hand part (5): is a diagonal matrix of eigenvalues of the matrix (3). Then the angle of rotation can be determined from the condition of zero non-diagonal elements (5) and is a well-known expression [16]: Since vector-column α differs from β only by the first and last terms, α 2 and β 2 are very close to each other, especially for long sequences (N>>1). That is why the real-time series satisfy such the strong inequality: Satisfaction of strong inequality (7) is the main assumption for the use of filtering by Haar wavelets, which is described below.
If (7) is satisfied, 4 π θ → ± and determines the orthonormalized basis, two vectors of which are approximately colinear and normal to the identity line, this determines the main directions for the majority of real Poincare plots. Then the simplified Jacobi rotation matrix (4) within the Principal Component Analysis method for typical PP will take the following form: A few comments should be made: 1) vector-columns (8) can have opposite directions, so one can use four versions of the Jacobi matrix, which differ in algebraic characters in columns; 2) orthogonal matrix (8) contains columns that are right singular vectors of the data matrix (2); 3) if condition (7) is not met, it is possible to use matrix (4) instead of (8). The expression (1) in this case must be replaced with a more general one: Complete records, as opposed to [3][4][5][6][7], which used signal fragments, were studied. Thus, the signal length varied from 50860 to 147858 countdowns.
Maple 2020 (Canada) [13] was used for the computer processing of all data. The authors of [14] explained the conversion of binary files from PhysioNet to Maple. Statistics of shortened EMG were described in [3]. Their real probability densities were quite far from the standard Gaussian ones.

2. The principal component method and filtration by Haar wavelets
Each Poincaré plot represents the implementation of a time sequence in a 2D space, that is, onto a projective plane.
Take the time sequence of length N: . Next, it is possible to select all possible pairs of sequential terms: (s n-1 , s n ) where 2≤n≤N. Each of these ordered pairs is a vector that determines the point of the Poincaré plot.
Columns of the data matrix are usually quite strongly correlated. The reason for this correlation is obvious: the columns of the data matrix (2) are taken from the same series by a single time delay (lag). The size of this lag is determined by the sampling rate.
The method of Principal Component Analysis (PCA) can be formulated in four equivalent ways [15], one of which guarantees zero correlation of the columns of the new data matrix in the orthonormalized basis of principal components. The first principal component (basis vector) assigns the direction of the maximum standard deviation on the plot of scattering of columns-vectors in a two-dimensional projective plane [15]. This scattering plot is actually a Poincaré plot. The second principal component is normal to the first.
The data matrix (2) can be depicted as a 2D scattering plot, marking the data of the first column on the vertical axis, and those of the second one -on the horizontal axis. Each row of the data matrix (2) sets two coordinates of a separate point of PP (Fig. 1). The points of the plot are closely grouped around the so-called identity line, which is the bicep of the projective plane. A separate point belongs to this line, provided that s n -s n-1 =0. Moreover, for the points that are above this line, the difference is positive, and it is negative if the point gets below the line.
The orientation of the principal components depends on the structure of the data matrix (2) and can be any for arbitrary time series. However, the question arises: are the principal components so arbitrarily oriented to the PP matrix (2), which has two strongly correlated vector columns?
It is possible to show that principal components for any PP can be found using the algorithm of Jacobi eigenvalues [16]. Let us assume that the vector-columns of the data matrix (2) have values α, β. Then the corresponding covariation matrix is symmetrical and has a dimensionality of 2×2: The columns of the matrix of rotation (8) are proportional to the Haar digital filters with multiplication coefficient 2: this is the filter of low frequencies (first column) and the filter of high frequencies (second column) [17,18]. Thus, converting the data matrix (2) into the basis of principal components leads to filtering by Haar digital filters, if the normalization factor is equal to 2. If strong inequality (7) is met, of course.
Filtering requires an additional step after Jacobi rotation, namely excluding all even rows or all odd rows from the converted data matrix. Such an operation is known as decimation, or "Lazy wavelet-transformation" (LWT), or double downsampling because it does not require complex mathematics.
The data matrix (2) was transformed by the simplified Jacobi rotation matrix (4), that is, projecting principal components into the basis, considering the angle close enough to : 4 π θ = − Matrix (10) is a data matrix on the basis of principal components, provided that inequality (7) is satisfied.
Standard deviations of matrix columns (10) are determined by descriptors (1). Thus, descriptors (1) and (9) show standard deviations along both vectors of principal components under conditions of compliance or non-compliance with inequalities (7) respectively. Expression (10) also clearly indicates the need for LWT when filtering. Almost every term of the sequence is represented twice in two rows of the matrix (10), and in this case, the Hankel type of data matrix (10) is the main reason for this doubling. The LWT eliminates this redundancy. In addition, columns (10) after the LWT can be considered as Haar wavelet coefficients to describe the range of low frequencies and high frequencies, respectively.
On the other hand, matrix (10) without LWT is a matrix of the same PP as matrix (2), but within the new basis of principal components. If we compare a plot of column scattering, it is the same PP, but rotated clockwise by angle π/4. In addition, after the LWT we get filtered PP with half points from the initial set.

1. Classic Poincaré plots and variability descriptors
Classic, that is, unfiltered, Poincare plots were constructed for all database entries [1] (Fig. 1). These plots are plots of scattering for two columns of the source data matrices (2). The columns display numeric sequences [1] with a single time lag, the magnitude of which is assigned by discretization rate (4 KGC). Each row of the data matrix (2) contains two coordinates of a separate point of the PP. Data matrices were obtained as described in paragraph 4. 2. Electronic copy available at: https://ssrn.com/abstract=3920442 The points are united into clouds around the identity line, which is the diagonal of the plane of the plot [5][6][7][8]. Standard descriptors are standard deviations of points along this line (SD1) and normal to them (SD2) [8]. The results of calculations of descriptors by expression (1) are given in Table 2. As one can see, the shapes of the clouds of Poincare plots are different and they can be conventionally called "comet", "ellipse" and "jet plane" due to similarity. It is worth noting that different scales of scattering along the identity line will be observed for each PP, despite their subjective names.
It is possible that the shapes and descriptors of classic PP can give qualitative signs for practical diagnostics. The values of descriptors increase in the series "Healthy" -"Myopathy" -"Neuropathy". Thus, the average variability of signals also increases in the same order. The SD1/SD2 ratio estimates randomness in time sequences [7]. The obtained results prove the conclusions [3,7] concerning the minimum randomness in EMG for a healthy patient. In addition, this ratio can be useful for clinical sorting "norm -pathology".

2. Filtered Poincare plots and descriptors
The two known shortcomings of the classic PP are the following: 1. Each classic PP makes up about double surplus in the total number of points. Almost every term of the sequence (s n ) is represented there twice. The point appears in pair (s n-1 ,s n ) and then in the next pair (s n ,s n+1 ). The only exceptions are the first and last terms. Thus, any classic PP contains a surplus of data.
2. Classic PP provides only implicit visualization of low-frequency and high-frequency signal parts. Critics of this type of Poincare plot are presented in [5,6].
Haar digital filtering divides the signal into two halves. Each half belongs to its frequency range: either low-frequency or high-frequency. The plot of scattering high-frequency half in relation to low-frequency half visualizes the sifted (decimated) data matrix (10), which contains low-frequency and high-frequency halves of the signal in the first and second columns, respectively.
The following scattering plots were obtained for all records (Fig. 2). Next, they will appear as filtered PP. Scattering plots in Fig. 1, 2 have similar shapes, which represent "comet", "ellipse" and "jet aircraft", although in a different orientation. At the same time, filtered PPs have twice fewer points. That is filtered PP (Fig. 2) retain the shapes of their own classic prototypes from Fig. 1 and look like their fractal parts. Why does this happen? Is it an accident or a regularity?
Variability descriptors for filtered PP are shown in Table 3. They are well coordinated with the corresponding classic PP. Thus, filtered PPs retain not only shapes but also variability descriptors.
In addition, filtered PPs ensure explicit visualization of the signal variability resulting from the conversion of classical PP in the method of Principle Component Analysis. Now, shortterm (high-frequency) variability is associated with vertical scattering of points, while long-term (low-frequency) -with their horizontal scattering.  Fig. 1, 2 were created with Maple 2020. In addition, the rotation matrices for all three sequences were obtained using the Maple software package "PCA" [13]. For example, the calculated Jacobi rotation matrix (4) for a healthy patient is as follows: which is very well consistent with matrix (8). Absolute deviations between the other two calculated rotation matrices and (8) do not exceed 5· 10 -6 . Key inequalities (7) are very well met for all three EMG, so, Correlation factors between the columns of the source data matrices (2) were predictably high: 0.8981, 0.6630, and 0.7306 (in Table 1). However, they practically turn into zero after conversion to an axis of principal components and do not exceed 10 -7 . These results prove that the identity line is practically the most commonly collinear to the vector of the first principal component, as discussed earlier.

3. Self-similarity of Poincaré plots for electromyograms
The similarity of the classical Poincaré plots (Fig. 1) and their corresponding filtered fragments (Fig. 2) is a sign, but not a proof of the fractal nature of the PP. The scaling law is the main proof of self-identity, provided that it describes filtered PP. This law implies the following ratio for fractal structures [19,20]: where N(a) is the number of coverage figures, a is their characteristic linear size (scale), d is the fractal dimensionality of the structure. The coverage figure, in this case two-dimensional, should provide full coverage of the Poincaré plot or its fragment. The law of scaling (12) determines the linear dependence between ln(N) and ln(a). The slope of the line determines fractal dimensionality d [20]. Fig. 3 shows the plots ln(N) relative to ln(a) for all PP. The magnitudes of the number of coverage figures and the scale (a i ) were obtained according to the procedure given in [19]. The good linearity of dependences in Fig. 3 proves that the law of scaling (12) is true in the specified scale ranges for the PP under study. The ranges were: a=(0.025-0.6) mV for a healthy patient and a patient with myopathy and a=(0.15-2.0) mV for a patient with neuropathy. These ranges are determined from the approximate areas occupied by each PP in Fig. 2.  Table 4 gives fractal dimensionalities, their standard deviations (that is estimates of the accuracy of calculation of fractal dimensionalities), and adjusted determinant coefficients for trend lines in Fig. 3. Hurst exponential indicator (H) is obtained from a well-known formula: H=2-d. Here it is advisable to cite the authors of paper [21]: "The assumption of statistical self-affinity implies a linear relationship between the fractal dimensionality and the Hurst exponential factor and thus connects these two phenomena".
The adjusted determination factors (R 2 ) are close to unity, which indicates almost perfect compliance with the scaling law (12) for the studied PP. Thus, thereby self-similarity of Poincaré plots for EMG is proved.

Discussion of the results of studying Poincare plots for electromyograms
The filtration method proposed in this work differs from those proposed in [9][10][11][12]. The algorithm [10][11][12] includes the following steps: 1) decomposition of the singular value of the data matrix (2) with finding singular vectors and values; 2) decomposition of the two-range matrix (2) into the sum of two single-range matrices using the Eckart-Young theorem; 3) antidiagonal averaging (hankelization) of both single-range matrices; 4) decimation (LWT) of both parts of the restored sequence.
The filtration technique proposed in this paper includes simple stages, namely: Jacobi rotation and subsequent decimation (LWT). Thus, now there is no need to know either the method of singular decomposition of the data matrix or its own values and vectors, which is a significant advantage compared to the method [9][10][11][12].
The PP self-similarity for EMG, revealed through the proposed filtration method, and by verifying the implementation of the scaling law, was proved in this paper. Filtered Poincare plots retain the shape, descriptor values, and fractal dimensionality for the proposed Haar filtering process.
The fractal structure of signals is not new for diagnosis in medicine. The law of fractal scaling of cardio intervals is known from the dissertation [22]. Later, the fractal properties of the variability of the heart rate were discussed in many articles, such as [23][24][25][26][27]. The fractal Higuchi dimensionality of Poincare plots for cardio intervals may vary, for example, due to the physiological activity of a patient [26] or depending on the diagnosis [23,25]. On the contrary, the fractal dimensionalities of Poincaré plots for electromyograms (Table 4) do not look sensitive to the diagnosis.
One of the generally accepted methods used for fractal analysis is the detrended fluctuation analysis (DFA). The specific DFA metric, so-called alpha-1 exponent, is strongly correlated with the SD1/SD2 ratio for Poincaré plots. The authors [27] report a positive high correlation rate of about 0.78. These ratios (Tables 2, 3) clearly differ between healthy and sick patients in EMG records according to the results [3,7]. The values of relationships are certainly higher for pathological cases.
In addition, the authors tested preliminary results [3], which stated that only data on a healthy patient show peaks in power spectra belonging to the low-frequency range. Power spectra of full EMG were restored using Maple 2020 tools. They are generally consistent with the spectra [3]. However, the main peak for a healthy patient shifts to the range of [16][17][18][19][20][21][22] Hz, as opposed to the 5-10 Hz range reported in [3].
The existence of low-frequency oscillations in the signal of a healthy patient can be used for pre-recognition of pathology along with the shape of PP and the ratio of descriptors.
The method of PP filtration by Haar wavelets, proposed in this work, is limited to condition (7), which in practice is almost always well met, especially for long enough time series. It can simply be extended to other medical signals, that is, it is not specific only to EMG.
However, for short time sequences, condition (7) may not be met well enough. In this case, it is possible to devise a similar filtering method using expressions (9), but then, of course, we cannot say about Haar digital filters.

Conclusions
1. Classic Poincare plots (PP) for electromyograms (EMG) are self-similar. The law of scaling is well met in a fairly wide range of scales. Fractal dimensionalities (in the range 1.36-1.48) and Hurst exponential value (≥0.52) of PP for the EMG indicate relative stability of sequences. A simple way of filtering by Haar wavelets was designed for classic PP, in this case, those filtered by Haar wavelets retained shape, fractal dimensionality, and variability descriptors of classic PP. This non-changeability is the result of self-similarity.
2. Modified (filtered) PPs have two advantages over the classic ones. Firstly, filtered PPs are not excessive. Secondly, filtered PPs allow clear visualization and direct evaluation of two types of variability: low-frequency and high-frequency.
3. Detected signs of EMG of healthy patients, specific shape, and a low ratio of variability descriptors (≤0.25) together with low-frequency signal oscillations (in the range of 5-25 Hz) can be used to make clinical decisions.