DEVELOPMENT AND ANALYSIS OF THE NEW HASHING ALGORITHM BASED ON BLOCK CIPHER

This paper proposes the new hash algorithm HBC-256 (Hash based on Block Cipher) based on the symmetric block cipher of the CF (Compression Function). The algorithm is based on the wipe-pipe construct, a modified version of the Merkle-Damgard construct. To transform the block cipher CF into a one-way compression function, the Davis-Meyer scheme is used, which, according to the results of research, is recognized as a strong and secure scheme for constructing hash functions based on block ciphers. The symmetric CF block cipher algorithm used consists of three transformations (Stage-1, Stage-2, and Stage-3), which include modulo two addition, circu-lar shift, and substitution box (four-bit S-boxes). The four substitution boxes are selected from the “golden” set of S-boxes, which have ideal cryptographic properties. The HBC-256 scheme is designed to strike an effective balance between computational speed and protection against a preimage attack. The CF algorithm uses an AES-like primitive as an internal transformation. The hash image was tested for randomness using the NIST (National Institute of Standards and Technology) statistical test suite, the results were examined for the presence of an avalanche effect in the CF encryption algorithm and the HBC-256 hash algorithm itself. The resistance of HBC-256 to near collisions has been practically tested. Since the classical block cipher key expansion algorithms slow down the hash function, the proposed algorithm is adapted for hardware and software implementation by applying parallel computing. A hashing algorithm was developed that has a sufficiently large freedom to select the sizes of the input blocks and the output hash digest. This will make it possible to create an almost uni-versal hashing algorithm and use it in any cryptographic protocols and electronic digital signature algorithms


Introduction
The rapid development of electronic devices, communications, and Internet technologies in recent decades has provided the possibility of almost instantaneous exchange of personal and collective data. Adversaries can relatively easily obtain huge amounts of confidential data using access to electronic sensors, computers, mobile terminals, and various social networks. This raises security issues in the use and transmission of data. Of particular importance among the most important components of information security are encryption and hashing, which are the most widely used cryptographic methods for ensuring the confidentiality, integrity, and availability of data.
Hashing was originally used to check the integrity of messages but has now become widespread in computer science and programming to optimize critical data operations. The field of application of the hashing mechanism is extremely wide.
Modern secure hash algorithms are crucial for the integrity of data and confirmation of the authorship of information during its transmission and storage in infocommunica-tion systems and general-purpose networks. Hash functions are used to perform authentication, verify the integrity of information, protect data and files, including, in some cases, the detection of malicious software and much more. Hash functions solve the problem in terms of the volume of incoming data, which is why algorithms that can operate with concise values are very popular in the modern world of digital technologies. The hash mechanism is also used to reduce the time required to generate and verify a signature, as well as to reduce its length.
Hashing is also a fundamental transformation for blockchain technology, applied in areas such as financial transactions, user identification, or the creation of cybersecurity technologies. A blockchain is a connected chain of records called blocks. Each block contains its own hash value, the hash value of the previous block, and a timestamp, which prevent an attacker from making changes to the data [1,2].
The very first hash function was built around the DES (Data Encryption Standard) block cipher. Since then, a lot of new hash functions have been developed using new constructions and ways of constructing them. Conventionally, hash function constructs can be divided into three categories: hash functions based on block ciphers, hash functions based on arithmetic functions, and special hash functions.
Designed hash functions must be subject to rigorous security checks. When designing an efficient hash function based on block ciphers, it is recommended to use well-studied cryptographic transformations and constructions that allow their subsequent software, firmware, and hardware implementations. The intensive development of information technology capabilities, including computing power, contributes to the emergence of new and modification of existing attacks, which requires constant development and updating of protection systems.
Thus, the area of research under consideration is relevant. A comprehensive study of the block cipher components used in the development of hash functions, as well as their relevance to modern technologies, is necessary and requires continuous and breakthrough scientific research.

Literature review and problem statement
As it is known, standards for the IT industry should be harmonized with international technical regulations, as our country is integrating into the global economy. In the area of information security, each state strives to develop its own national standards in the field of cryptography.
In 2015, the US state standard FIPS 202, SHA-3 (Keccak hash function), a variable bit length hashing algorithm, was approved and published. Keccak is based on the Sponge (cryptographic sponge) construction [3]. SHA-3 is one of the most widely used hash functions. At the moment, it is known that the scientific community is conducting a variety of studies on the strength of its latest version since previous versions of SHA-3 were broken or had vulnerabilities. The SHA-3 hashing process consists of two steps: absorption and compression. At the first stage, each message block of a fixed length of r bits is added to the current state of the matrix and 24 rounds of the compression function f are performed. At the second stage, the state matrix is truncated to the desired hash digest length by iteratively executing the compression function f. Japan has the JIS X 5057-2: 2003 (ISO/IEC 10118-2: 2000) standard. "Information Technology. Security methods. Hash functions. Part 2. Hash functions using n-bit block cipher" [4]. The hash function of this standard is suitable for environments where the n-bit block cipher algorithm is already implemented. Since 2018, SHA-1 has been used as a standard JIS hash function.
In 2016, China approved the standard "GB/T 32905-2016 Information security technology SM3 cryptographic hash algorithm". In 2017, SM3 was standardized by the International Organization for Standardization ( South Korea uses its own LSH hashing standard, developed in 2014. LSH is one of the cryptographic algorithms approved by the Korean Cryptographic Module Verification Program. The advantage of this algorithm is that it more than doubles the performance of international standards (SHA2/3) in various software environments. LSH is still protected from known hash attacks. LSH is collision-resistant for q<2 n/2 and has preimage resistance and second preimage resistance for q<2 n in an ideal cipher model, where q is the number of requests for the LSH construction [7].
The interstate standard GOST 34.11-2018 has been put into effect in the Russian Federation. "Information Technology. Cryptographic information protection. Hashing function", which is prepared on the basis of the application of the standard GOST R 34.11-2012 ("Streebog"). The algorithm calculates a hash function with an input data block size of 512 bits and a hash code size of 256 or 512 bits. It uses a compression function based on three transformations: nonlinear bijective transformation, byte permutation, linear transformation (SPL) [8]. This standard has been adopted in Armenia, Kyrgyzstan, Republic of Kazakhstan, and Tajikistan.
A new standard "STB 34.101.77-2020 Information Technologies and Security" has been put into effect in Belarus since 2020. "Cryptographic algorithms based on the sponge function" [9]. The cryptographic hashing algorithm used in this standard is based on the cryptographic sponge function.
In Republic of Kazakhstan, foreign cryptographic algorithms and standards are currently used in the existing electronic data protection systems. Since this poses a security risk, the creation of a domestic hashing algorithm to control the integrity of confidential information is an urgent task for our country. This work is targeted at the development of domestic information security systems and the creation of software and hardware packages for their practical use.
To date, standards for hash functions and cryptographic hashing algorithms have been adopted in many foreign countries, including the United States, Japan, China, Ukraine, South Korea, etc.
Republic of Kazakhstan uses international standards and mainly foreign hardware and software. The creation of domestic algorithms for cryptographic information protection, including hashing algorithms, is an urgent and necessary task.
The development of cryptographic primitives makes progress, and hash functions are used in many applications and on various platforms, which forces us to place high demands on their strength. In this regard, a lot of research is being carried out in the field of developing new and modifying existing hash algorithms.
The work [10] shows several attacks of finding the second preimage (pseudo-preimage) and collisions on the cryptographic hash function Kupyna-256 and Kupyna-512. Since Kupyna uses the wide-pipe construction, it is difficult to build a pseudo-preimage attack on it. The authors of the paper argue that there were not so many cryptanalytic studies of Kupina. They demonstrated all known attacks on it and their qualitative and quantitative indicators. In addition, the paper emphasizes that the modular constant addition operation provides additional resistance to the "meet-in-the-middle" attack.
In [11], a lightweight one-way cryptographic hash algorithm LOCHA was developed to create a hash digest of a fixed and relatively small length for a power-intensive wireless network. The focus is on lightening the algorithm so that when used in networks such as WSNs (Wireless Sensor Networks), nodes can successfully run the algorithm with low power con-sumption. The use of simple mathematical operations such as residue of division (mod), arithmetic modulo addition, and two substitution tables of 97 and 67 primes ensures high performance in obtaining a 96-bit hash digest. Despite the simplicity of implementation, this algorithm is not limited in scope. This is because LOCHA has proven to be more secure than other strong hashing algorithms such as MD5, SHA1. But, over time, the reliability of such hash functions can decrease due to the small and static size of their hash digest.
The work [12] proposes a hash function model with scalable output. The model is based on an artificial neural network (ANN) trained to mimic the chaotic behavior of the Mackey-Glass time series. This hashing method can be used to check data integrity and generate a digital signature. This makes it possible to create cryptographic services according to user requirements and time constraints due to output scalability. The authors confirm that changing the ANN architecture, that is, adding neurons to the output layer or removing them, makes it possible to obtain hash digests of the desired length. The results of three independent tests confirm that the hashing algorithm on ANN satisfies all the requirements for a hash function that creates short-term hash digests.
The paper [13] considers a hashing algorithm, determined by a timestamp, for the secure distribution of data between vehicles. The proposed algorithm fulfills all the basic properties such as preimage resistance, collision resistance of a one-way hash function without a key.
One method to make cryptographic hash functions more resistant to future attacks is through combinations of hash functions. The work [14] analyzes hash combinators, such as XOR combiner, concatenation combiner, and Hash-Twice, which combine two or more hash functions. The paper presents some approaches for combining two or more hash functions that do not provide n-bit security of preimage stability. Several attacks are defined by which second preimage resistance does not provide n-bit security of combined hash functions using concatenation and cascade methods of two n-bit hash functions. The upper security bound for the indicated hash combinators is also determined, based on the most well-known general attacks on preimages and attacks on finding the second preimage. In tabular form, the updated security status of the above hash combinators after the authors received new research results is presented. This shows that the security of most combinators is not as high as expected. As a result, given the basic security requirements, these hash combinators of two or more n-bit hash functions do not provide greater, sometimes even n-bit security. Therefore, the development of one n-bit ideal hash function is considered to be still relevant.
An extended overview of the current state of security of hash functions is presented in the paper [15]. The work highlights the existing models and security aspects in the development of a compression function through a modular approach, which refers to the creation of a hash function based on a block cipher or permutation. This paper, which presents modern scientific views and the process of modular design, substantiates its relevance and demonstrates the key points in the development. The authors pose open problems of modular design and present ways to solve them.
The paper [16] describes a hash function developed on the basis of a block cipher. The authors, using the Davis-Meier mode, built a new hash function, which was investigated for safety against collisions, and also presented approaches for using k-fold hash input lengths. The developed hash function inherits all the properties of a random oracle with a high degree. The developed double-length hash functions (DLHF) can be used on devices with a limited size since the block cipher used provides O(2 128 ) security.
The work [17] describes hashing modes (schemes) used as a transformation of block ciphers into a compression function. The AES (Advanced Encryption Standard) block cipher algorithm is considered as a compression function, various modes for hashing are investigated and several preimage attacks are carried out, it is also described in detail how to reduce the complexity of attacks by applying key neutral bits.
One of the significant problems in cryptography is ensuring resistance to multicollisions. This problem arose from the birthday attack, the answer to which was to double the length of the resulting hash value. This solution turned out to be inadequate to the available computing resources of the society and the time constraints for hashing. Increasing the bit depth in the Wide Pipe design obviously negatively affects the performance of computing resources. In 2010, a modification of Fast Wide Pipe [18] was proposed, which made it possible to double the computational speed compared to Wide Pipe. Each internal state value is divided into two halves. The first half is fed to the input of the compression function, and the second is added to the result of the same iteration. However, this scheme requires additional computer memory, so research work in this direction continues.
The hashing algorithms considered in [3][4][5][6][7][8][9] are the state standards of the countries of the world developed in the field of IT technologies. Each state seeks to create its own reliable cryptographic standards, including those for hash functions. Kazakhstan does not have its own cryptographic standard, as well as its own standard for hashing data. Therefore, for Kazakhstan, the issue of creating its own hashing standard, which determines the algorithm and procedure for calculating the hash function of the transmitted information, is relevant. In this regard, comprehensive studies of existing hashing algorithms are being carried out, and work is underway to develop a domestic reliable hashing algorithm. The new hashing algorithm proposed in this paper can become a candidate for the state standard.
The papers [10-18] present the effective methods and structures of hash functions developed to date. Hashing algorithms for various purposes have been studied in detail, including lightweight hashing algorithms and hashing algorithms for blockchain technology. The results of various cryptographic attacks of finding the first preimage, as well as the search for collisions of the first and second kind for hash functions are analyzed. In these papers, various techniques have been applied to improve the performance of hashing. The difference of the algorithm proposed by us lies in the fact that to increase the speed in one round we apply a non-linear cryptographic primitive with a special principle twice. In addition, when calculating each new byte, linear and non-linear functions are performed alternately. This approach to building a hash function is not considered in other works and is characterized by increased computational performance without compromising the security of a hash function built based on block ciphers.

The aim and objectives of the study
The aim of this work is to develop a fast and reliable hash function based on a symmetric block cipher algorithm, as well as to study and evaluate its reliability using cryptanalysis methods.
To achieve this aim, the following objectives were set: -to develop a symmetric block encryption algorithm; -to develop a hash algorithm that meets the basic requirements for cryptographic hash functions, and provides high performance and flexibility in hardware-software implementation; -to conduct a study and evaluate the reliability of the developed hash algorithm by methods of statistical and cryptographic analysis; -to implement hardware-software implementation of the developed hash algorithm.

Materials and methods
It is worth noting that designing a good hash algorithm is more difficult than designing a symmetric encryption algorithm. A cryptographic hash function is a mathematical algorithm that converts an arbitrary array of data into a fixed-length string [19]. The main requirement for cryptographic hash functions is that for any message represented in binary form, the value of the hash digest must be quickly and efficiently calculated. Besides, a high-quality hash function should have a number of properties [20,21]. The most convenient and popular hashing method involves dividing a message into blocks of a fixed length, after which these blocks are iteratively processed.
Currently, the most popular and security-oriented approach is to build hash functions based on block ciphers. In this approach, a block cipher is taken as the compression function, with two inputs representing a message block and a key [22]. The work [23] presents 64 possible PGV schemes (Preneel, Govaerts, and Vandewalle) for constructing hash functions based on the block cipher where n is the block length in bits. Of the 20 collision-resistant PGV schemes, the most commonly used is the Davies and Meyer scheme: where y i and M i are the input of the compression function f, and y i is its output [24]. To develop our HBC-256 hash algorithm, we use the Davis and Meyer scheme.
Currently, the most widespread is the assessment of the cryptographic strength of hashing and encryption algorithms based on the methods of linear and differential cryptanalysis. Differential cryptanalysis technique is to track the change in the difference between the output bits depending on the change in the input bits at each round of transformation. It should be noted that the presence of the "avalanche effect" in the algorithm is a necessary condition for ensuring cryptographic resistance to differential cryptanalysis [25,26].
The following two criteria are usually used to analyze the avalanche effect: -avalanche criterion; -strict avalanche criterion. If the avalanche criterion requires an average change of 50 % of the bits in the output sequence when each bit in the input sequence changes, then the strict avalanche criterion requires a change with a probability of 1/2 of each particular bit in the output sequence when each particular bit in the input sequence change; and they are estimated by the following relations, respectively: here i is the number of the modified bit in the input sequence, k i is the probability of changing half of the bits in the output sequence when changing the i th bit at the input, ε α is the avalanche parameter; where i is the number of the modified bit in the input sequence, j is the number of the analyzed bit in the output sequence, k s i,j is the probability of changing the j th bit in the output sequence when the i th bit at the input changes as compared to the output value with the unchanged input value.
The hash digest h(M) for any message of arbitrary length M must satisfy the properties of pseudo-randomness. This is one of the main requirements for hashing algorithms, i.e. it should be difficult to distinguish a hash-based pseudo-random number generator from a random number generator. For a hash digest to be considered random and unpredictable, at least it is necessary that there is no period, and that various combinations of bits of a certain length are distributed evenly over its entire length. This requirement can be statistically interpreted as the complexity of the law of generating a pseudo-random sequence of the hashing algorithm [27][28][29].
A hash function h is said to be collision resistant if it is computationally undecidable to find any two inputs that map to the same hash pattern for the given hash function. Collision attacks are carried out to establish two different messages M 1 and M 2 with the same hash digests h(M 1 )=h(M 2 ). In the classical attack, unlike a preimage attack, the cryptanalyst does not deliberately select the hash value.
A hash function is said to be near-collision resistant if it is computationally difficult to find any two messages M 1 and M 2 such that their hash digests. h(M 1 ) and h(M 2 ) differ by only a few bits for a given hash function [30]. A pair of mes- To study the reliability and performance of the developed algorithm, we used its software and hardware-software implementation, written in C++ in the Qt Creator 4.15.2 integrated development environment using the Qt library version 5.15, as well as a software package for statistical analysis, developed at the Institute of Information and Computational Technologies of the Committee of Science of the Ministry of Education and Science of the Republic of Kazakhstan.

1. Development of the new encryption algorithm 1. 1. Encryption algorithm scheme
The CF encryption algorithm belongs to the class of symmetric block cipher with a block and key length of 128 bits. The algorithm uses both linear (modulo 2 addition, cyclic left shifts) and nonlinear (four substitution S-boxes) transformations. The cipher structure is a variant of a substitution-permutation network (SP-network) with four rounds (R 1 =4) [32]. One round of encryption consists of three transformations called Stage-1, Stage-2, and Stage-3 and is shown in Fig. 1.
The values of the input text A(a 0 , a 1 , a 2 , …, a 15 ) are written as the 4˟4 square matrix A: . A a a a a a a a a a a a a a a a a Stage-1 transformation. This transformation, which consists of two steps, is used to obtain from a given matrix А a new matrix of the same size.
Step 1. The intermediate values c ij of the matrix A are calculated by adding the element of the matrix a ij modulo 2 with the remaining three elements of the ith row and three elements of the jth column.
Step 2. At this step, the new value i j c passes through the substitution S-box (SBOX procedure) to be stored in the same place as the new value of the matrix А.
The Stage-1 transformation, consisting of the 1st and 2nd Steps, can be written as: where c ij is the intermediate value of the matrix А, SBOX is the substitution S-box, ⊕∑ denotes the sum of terms modulo 2.
The principle of SBOX operation is shown in Fig. 2. The input is one byte i j a of the matrix А, which has a binary representation ( ) = 7 6 5 4 3 2 1 0 2 .
S-boxes perform the replacement procedure at the nybble or quadbit level, called the left nibble t 1 =b 7 b 6 b 5 b 4 and the right nibble t 0 =b 3 b 2 b 1 b 0 (written in binary). Further, according to the table, p 1 =S i (t 1 ) and p 0 =S j (t 0 ) are determined. The indices i and j of the matrix element correspond to the numbering of the S-boxes. Further, the resulting nibbles through the ith and jth S-boxes are combined into a byte. Here, the nibbles are swapped, i.e. p 1 is stored in the right nibble, and p 0 is stored on the left. The byte thus obtained is sent to the output a ij =(q 7 q 6 q 5 q 4 q 3 q 2 q 1 q 0 ) 2 . Therefore, Stage-2 transformation. This transformation consists of two operations: cyclic shift and XOR. The elements of the matrix A obtained in Stage-1 are stored in the form of a one- dimensional array (a 00 , a 01 , a 02 , a 03 , a 10 , a 11 , a 12 , a 13 , a 20 ,  a 21 , a 22 , a 23 , a 30 , a 31 , a 32 , a 33 ). Then all the elements of the array are perceived as bytes, and their bit representations are combined using the concatenation operator: W=a 00 ‖a 01 ‖a 02 ‖a 03 ‖a 10 ‖a 11  Stage-3 transformation. This transformation is similar to the Stage-1 transformation. Here, too, the transformation consisting of two steps is performed with the matrix A. The difference is that the elements of the matrix are calculated from bottom to top, from right to left.
Let's write this transformation, consisting of the 1 st and 2 nd steps, similar to the previous one: At the end of each round, the values obtained after the Stage-3 transformation are summed modulo 2 with the round key values.

1. 2. Round key schedule algorithm
This section discusses an algorithm CFKey for deploying round keys based on a master key K(k 0 ,k 1 ,k 2 ,…,k 15 ) with a length of 16 bytes. We assume that the master key K is the round key K 0 . The total number of round keys is the same as the number of rounds R 1 of the encryption algorithm. The values of the round key K 0 (k 0 ,k 1 ,k 2 ,…,k 15 ) are stored in a 4×4 matrix A in the following form:  00  01  02  03  0  1  2  3   10  11  12  13  4  5  6  7   20  21  22  23  8  9  10  11   30  31  32  33  12  13  14  15 . A a a a a k k k k a a a a k k k k a a a a k k k k a a a a k k k k (3) The CFKey key schedule algorithm consists of the StageKey-1, StageKey-2, and StageKey-3 transformations. The presented round key schedule algorithm is schematically shown in Fig. 3.
Note that the CFKey algorithm is functionally very similar to the CF algorithm: the StageKey-1 and StageKey-3 transformations are completely identical to the Stage-1 and Stage-3 transformations, respectively. The difference lies in StageKey-2. This transformation consists of only one operation, which is a cyclic shift. It also performs a one-bit cyclic left shift. There is no XOR operation in StageKey-2.
The CFKey algorithm is repeated R 2 =8 times, and then the resulting 16-byte result is added with the round key K i-1 modulo 2 (ХOR) and finally the next round key K i , where i=1,…,R 1 is formed.    To enhance collision resistance, we use the Davies-Meyer scheme, where the CF output is summed (XOR operation) with the result of the previous hashing iteration −1 . j i h The p j value is the result of the i th iteration of the Davies-Meyer hash function. This scheme is used in hash algorithms based on block ciphers and acts as a one-way compression function.

|| || . h h h h =
Formula (4) can be represented in Table 2. Table 2 Byte permutation (х are byte positions, starting from 0) After processing the last block M t-1 , from the obtained hash digest of length 384 bits using the ComF (Compression Function), we determine the final hash digest h of length 256 bits: The order of padding. The HBC-256 algorithm iteratively processes 384-bit blocks of the input message M. If the length of М is a multiple of 384, then at the end of М, one more 384-bit block is added, consisting of zero bits, except for the first and last bits, which are equal to one. If the length of М is not a multiple of 384, then М is padded with so many bits that it is a multiple of 384. Suppose that the length of the input message M is not a multiple of 384 and is equal to l bits. We add a bit "1" at the end of M, after that, we add s zero bits, where (l-2)≡smod384 and add the last bit "1".
The order of division into parts. For hashing, the padded message М is divided into t blocks of 384 bits each as follows: M=M‖Pad(M)=M 0 ‖M 1 ‖M 2 ‖…‖M t-1 . Pad is abbreviated from "padding".
The hashing process is iteratively performed according to the scheme in Fig. 1 with the input message M r with a length of 384 bit, r=0, 1, …, t-1.
Obtaining a hash digest. The final hash digest is determined through the ComF procedures. In our case, the values of the first and second block are taken as the final hash digest, the length of which is 256 bits: Table 3 presents data on the complexity of attacks for every three problems when probability p=0.5. Here, k is the minimum number of different (different from each other) hashed data required for the attack, N is the number of possible hash digests relative to the length of the hash digest.

3. 1. Assessment of the "avalanche effect" of the hash algorithm
The analysis of the propagation of the avalanche effect and the implementation of the avalanche effect after the 1 st , 2 nd , and 4 th rounds were carried out according to the CF encryption algorithm scheme. The results after the 4 th round are presented in Table 3. As an example, a 128-bit message M 0 in the form 0xcc156c4ce024d5113d680d7cce6d8b2 was selected for the analysis. For 1≤i≤128, 128 plaintexts M i were generated with the difference of one bit from M 0 as follows: ⊕ << After applying CF to these 129 messages M i (i=0, 1, …, 128), the corresponding 128-bit ciphertexts C i were obtained. Then the probabilities k i (i=1, 2, …, 128) between the ciphertext C 0 and the remaining 128 ciphertexts were calculated. Table 4 gives the calculated probability k i .
Next, we consider the avalanche effect of the HBC-256 hashing algorithm. The value M 0 =0 384 is taken as an example of a 384-bit original message M 0 . To analyze the avalanche effect of the HBC-256 algorithm, 384-bit messages were generated as follows: ( )  Table 5 shows the dynamics of statistical indicators of the avalanche parameter ε α depending on the number of hashing rounds. Table 5 Statistical indicators of the avalanche parameter ε α of the hash algorithm

3. 2. Statistical analysis of the algorithm
To assess the randomness, we checked the hash digests using the software package "Automated system for the selection of statistical tests by D. Knuth and graphic tests", which implements a set of statistical tests [35]. For this, 60 files of different formats were selected, each of which contained from 20 to 1,000 KB of information. Data on files for analysis are presented in Table 6.
After processing each file with the HBC-256 algorithm, a corresponding 256-bit hash digest was obtained. Graphical and evaluation statistical tests were applied to the new 60 files with hash digest sequences. In graphical tests, the statistical properties of hash patterns are displayed as graphical dependencies, and in evaluation tests, the statistical properties are determined by numerical characteristics. As a result, according to the relevant data, a conclusion is made about the success of the passed test. Fig. 6, 7 show data on the number of files that successfully passed the graphics and evaluation tests.   1  10  19  28  37  46  55  64  73  82  91  100  109  118  127  136  145  154  163  172  181  190  199  208  217  226  235  244  253  262  271  280  289  298  307  316  325  334  343  352  361 370 379

AVALANCHE EFFECT
The number of files that have passed graphical and evaluation tests are presented in Fig. 6, 7.

3. 3. Near-collision resistance
To check the degree of resistance to -near    Table 6 Plaintext files used in testing the hashing algorithm program is designed to obtain a hash image of data of arbitrary length. The input data is the content of any file stored on an external storage medium or text entered through the on-screen form. The output is displayed on the screen and can be saved as a "*.hash" file. The program is implemented in C++. No pre-installation is required to run the program. The result of the hash image is displayed as hexadecimal numbers. Fig. 8 shows the working window of the ISL_HASH 1.0 data hashing program, where the "2015-856.pdf" file is hashed using the HBC-256 algorithm.
The following are the main technical characteristics of the ISL_HASH 1.0 data hashing program: -Program type: 32-bit GUI application.

4. 2. 1. Hardware-software platform and implementation technology choice
The development board MYIR Z-turn was chosen for implementation. This board is equipped with a single-chip system (hereinafter SoC) Xilinx Zynq XC7Z020, a highspeed USB OTG interface chip, 1GB RAM, and a 16MB NAND Flash memory chip.
The SoC includes: -Artix-7 field-programmable gate array (hereinafter FPGA); -microprocessor with Cortex A9 core. The program code of hashing algorithm HBC-256 for the Cortex processor was written in the C programming language using assembler inserts.
The FPGA design was made using the VHDL technological markup language.

4. 2. 2. Working principle of the Product
The Cortex processor is designed to implement the functions of interacting with a PC, supporting the USB interface, and controlling the FPGA, on which the hardware-software implementation of the HBC-256 algorithm is performed.
Power supply and data exchange with the PC are carried out via the USB interface. Upon initiation, a connection to the PC is established in the Mass Storage Device (MSD) mode. As a drive for storing data, fast RAM is used, in which a 512 MB area is allocated for this purpose. The Cortex processor continually scans this area of memory against the FAT file system for new files. After detecting a new file copied by means of the operating system, the processor sends data blocks to the FPGA via the internal AXI bus using Direct Memory Access (DMA) technology. The FPGA, having received the next data block, performs the transformation in accordance with the description of the HBC-256 algorithm. At the end of the transfer of blocks, the central processor reads the result of the hash algorithm from the FPGA and creates a new file in the area allocated for storing data with a name corresponding to the name of the source file, but with the additional extension "hash".
Also, additional debugging information is written to this file -the size of the source file, the number of blocks, the time of the hashing operation, and the speed of the transformation.

4. 2. 3. Debug board resource statistics
The Cortex processor operates at 667 MHz, the FPGA at 150 MHz. The final consumption of the board is about 0.3W. FPGA resources involve 2,370 logical cells, 32 clock cycles for transforming one 384-bit block (Table 7).
In Table 7 we can see the hashing speed of 5 files of different sizes. 6. Discussion of the hash algorithm

1. Discussion of the developed symmetric block encryption algorithm
The peculiarity of using block ciphers in hashing algorithms is that the security of a hashing algorithm directly depends on the cryptographic strength of the cipher used in it. That is, when using strong block encryption algorithms, the security of the hashing algorithm is guaranteed. But one of the disadvantages of this approach is the reduction in the speed of the algorithm. This is because, during the hashing process, the round keys are updated each time iteratively, depending on the amount of data, i.e. round encryption keys are continuously generated. The developed CF encryption algorithm uses byte data processing, which improves its performance. Alternate execution in one round of linear and non-linear transformations, which are the operation of adding matrix elements modulo two and the replacement table (S-box), provides the diffusion property of the cipher. The proposed algorithm is characterized by the following features: 1. The number of rounds of the used block cipher is reduced, without prejudice to its cryptographic strength, in order to increase the performance of the developed hashing algorithm. The non-linearity of the transformations at Stage-1 and Stage-3 is ensured by the fact that the S-box is executed twice in one round.
2. The structure of the algorithm provides for the simultaneous execution of several CF compression functions of 128-bit length, which, using parallel computing also speeds up the hashing process. In this paper, we consider the case when k=3, i.e. three 128-bit blocks of hashable data are processed simultaneously. With the increase in the technical characteristics of the processor, the number of simultaneously processed blocks can be increased.
3. Instead of traditional 8-bit S-boxes, four 4-bit S-boxes are used. Such an original approach gives the algorithm another advantage, which lies in the fact that, depending on the arrangement of the matrix elements, the same values of the input data take on different values at the output.

2. Discussion of the developed hash algorithm
Typically, the design of a hash function uses the Wide Pipe construction to deal with multiple collisions. The essence of this construction is to increase the size of the internal state, which makes the search for multiple collisions resource-intensive. However, this scheme requires additional memory. This shortcoming in the proposed algorithm is eliminated by the fact that in one hashing cycle, the CF algorithm is executed k=3 times for different m j , j=0, 1, 2. The general scheme of the developed hash algorithm HBC-256 is illustrated in Fig. 4. Therefore, the length of the intermediate hash image w is 128*3 bits. By adjusting the parameter k, performance can be improved. The flexibility in optimization, the possibility of parallel computing in hardware implementation and the achievement of an optimal balance of resources/performance should also be noted.

3. Discussion of HBC-256 hash algorithm security study
When evaluating the security of any hash function, three problems are examined [36] The listed problems in relation to the HBC-256 algorithm are specified by the following parameters. The length of the HBC-256 hash digest is n=256 bits, so the number of all possible hash digests is N=2 256 . For each of the three problems, we define k as the minimum number of implementations with a probability p=0.5. Table 3 presents data on the complexity of attacks for each task.
First, we are discussing the analysis of the avalanche effect of the CF encryption algorithm. As is known, the range of variation of the avalanche parameter lies in the range from 0 to 1, inclusive. The closer the value of the avalanche parameter is to zero, the more the avalanche effect appears in the encryption algorithm. The experiment (Table 4) showed that 98.5 % of the k i values (probabilities) of the considered rounds lie in the interval (0.41; 0.59). The average of all changes is equal to 49.93 %. Therefore, changing a bit in the input yields about 50 % changes in the output. The analysis showed that the average values of the avalanche parameter ε α for rounds 1, 2, and 4 are 0.074, 0.071, and 0.073, respectively. The algorithm's avalanche effect is high even after the first round of encryption. For the purity of the experiment, the avalanche criterion was used for the analysis after the 8 th , 16 th , and 24 th rounds of encryption and confirmed the necessary degree of propagation of the avalanche effect of the CF algorithm.
Next, we are considering the analysis of the avalanche effect of the HBC-256 hashing algorithm. We examined the hashing results after the 1 st , 2 nd , 4 th , 8 th , and 12 th rounds. After the first round, the avalanche parameter average of 0.66 was found to be the worst. However, due to the high spread of the avalanche effect of the CF algorithm, starting from the 2 nd round of hashing, an acceptable level of bit diffusion is observed. In Table 5, it could be seen that the hash function after the 1 st round does not provide the required degree of the avalanche effect. Its values obtained, depending on the location of the changed bit, are in the interval (0.594, 0.724), which is far from 0. But after the 2 nd and subsequent rounds, the statistical indicators take almost the same values, i.e. the range of their deviation from each other is very narrow.
The statistical indicators of the avalanche parameter ε α of the HBC-256 algorithm given in Table 5 give positive results in evaluating the effectiveness of the algorithm. From Fig. 5, we can conclude that a change in one bit of the input data leads to a 50 % change in the 328-bit hash code. therefore the obtained results k i are positive and, accordingly, the hash algorithm HBC-256 meets the requirements of the avalanche criterion.
Next, we consider the results of graphical and evaluation statistical tests in Fig. 6, 7. During the study, depending on the type of file, different results were obtained for different tests. From the evaluation of the results, it can be argued that the resulting hash digests are statistically secure. Thus, the HBC-256 hashing algorithm under consideration has good statistical properties.
Here we are discussing the results of the analysis for near-collision resistance. As a result, we were able to establish that the number of pairs of hash digests that have a Hamming distance between 108 and 148 is almost 99 % of all possible pairs. This means that hash digests are protected from attack by near collisions. For a near collision, the Hamming distance between two messages should be small, namely up to 16 bits [37]. According to the results of the analysis, the HBC-256 algorithm is resistant to the attack associated with near collisions. 6. 4. Discussion of software and hardware-software implementation of the developed hashing algorithm 6. 4. 1. Software implementation To conduct a comparative analysis of the results of the developed HBC-256 hashing algorithm, we considered the following two hashing algorithms based on block ciphers: 1) The GOST-R 34.11-2012 Streebog cryptographic algorithm for calculating the hash function, which in 2013 was adopted as a state standard in the Russian Federation. For the analysis, a variant of the algorithm with a hash image size of 256 bits was chosen.
2) The MGR cryptographic algorithm for calculating the hash function proposed by Indian scientists Khushboo Bussi, Dhananjoy Dey, and others [38]. According to the authors, this hash function is a modification of the Streetbog algorithm, where an AES-like block cipher is used as a compression function.
A comparative analysis of the Streebog, MGR, and HBC-256 algorithms was carried out in terms of their efficiency. When testing, all-time measurements were performed on a PC with an Intel(R) Core i7-8700 processor with a frequency of 2.90 GHz and 4 GB RAM. From Table 8, it can be seen that the software implementation of the HBC-256 algorithm showed the best results in terms of performance compared to the Streebog and MGR algorithms.

4. 2. Hardware-software implementation (Product)
From Tables 7, 8 we can see that our hardware-software implementation of the HBC-256 algorithm showed very good results of performance compared to our software implementation.
The hardware-software implementation performed can compete with analogs performing hash transformation under existing algorithms. The Product in terms of execution speed and the number of FPGA resources is commensurate with or surpasses the existing analogs. At the same time, since a commercially available, not a tailor-made, debug board was chosen as the hardware platform, it is possible to improve some parameters of the Product, namely: -dimensional overall features; -replacing the SoC with a less functional one (less than 10 % of FPGA resources are used, one Cortex processor core is disabled), which will reduce power consumption; -optimization and parallelization of hardware implementation to achieve an optimal balance of FPGA resources/performance; -increase the amount of RAM to be able to process larger files.

5. Limitations and further theoretical and practical studies of the hash algorithm
In practical use, the developed hashing algorithm does not require significant restrictions. The results of the study, obtained during the assessment of reliability and speed, showed that the developed hash function fully complies with the main requirements. It was noted that, taking into account modern technological capabilities, the length of the hash digest can be increased. With regard to the amount of hashed information, the parameter k indicated in Fig. 4 should be taken into account. The amount of information to be hashed should be more than 16(k-1) bytes. In our case, for k=3, the amount of hashed information must be at least 32 bytes. Otherwise, the round keys of the last part are not used. Research work in this direction will be continued.
A theoretical study of the four 4-bit S-boxes presented in the paper is required. We have considered the first four S-boxes with good cryptographic properties indicated in [33]. Future studies will analyze the influence of the selected four S-boxes on each other since the question of the independence of the choice of S-boxes remains open. Since the S-box is the only cryptographic primitive that provides non-linearity in the algorithm, it should not have any weaknesses.
In the future, in research work, the security of the developed hashing algorithm should be analyzed at a deeper level. It is supposed to carry out a number of cryptographic attacks, as well as differential and linear cryptanalysis. The results of these studies will be used to improve the proposed hashing algorithm.

Conclusions
1. As is commonly known, hash functions are built according to an iterative scheme with a number of transformations performed at each step. The transformations include a compressing function, the role of which can be performed by a block cipher. To implement such a scheme, the authors developed a new CF algorithm. Theoretical and experimental tests have shown that the algorithm fully complies with the basic cryptographic requirements. It is assumed that the study of the cryptographic strength of the CF encryption algorithm will be continued in subsequent works.
2. In this paper, we propose a security-oriented hash algorithm HBC-256 based on the CF block cipher. The compression function is based on a Merkle-Damgard construct using a wide-pipe modification that is not susceptible to length expansion attacks. In order to turn the block cipher CF into a one-way compression function, the Davies-Meyer scheme is applied. The scheme of the algorithm is built in such a way as to increase performance through parallel computing by manipulating the parameter k, the number of parts from 3 to 8, depending on the amount of hashed data. The next stage of work will be a further study of the reliability of the proposed HBC-256 algorithm using other cryptanalysis methods and collision search attacks. 3. The hash digest was tested for randomness using the NIST and statistical test suite. From the results obtained, it was found that the binary sequence generated by the proposed algorithm is close to random. The results were examined for the presence of an avalanche effect in the CF encryption algorithm and the HBC-256 hash algorithm itself. Based on the tests and studies carried out, it has been found that the CF encryption algorithm, and therefore the hashing algorithm itself, is efficient to provide a good avalanche effect. The paper presents the statistical indicators of the avalanche parameter ε α , which shows acceptable results. The resistance of HBC-256 to near collisions has been practically tested. Thus, the HBC-256 algorithm is resistant to attacks related to near collisions. The reliability of the algorithm against various attacks is currently being studied.
4. The structure of the algorithm makes it possible to increase its performance in hardware-software implementation. In addition, the algorithm can be efficiently implemented in hardware.