Design and Implementation of the Distributed System Using an Orchestrator Based on the Data Flow Paradigm

The object of this research is distributed systems under the control of an orchestrator based on the data flow control paradigm, as well as micro-service management methods. One of the most problematic places of modern distributed systems is the choice of a method for controlling the logic of the micro-service and the processes of interaction between them. The existing concepts of micro-service orchestration and choreography do not allow to fully use and distribute the load evenly throughout the system, which is primarily due to the heterogeneous nature of the distributed environment.<br><br>As part of the research, the concept of hybrid orchestration based on the paradigm of data flow control is proposed. This approach allows the orchestrator to be used only to initiate a «wave» of calculations on the micro-service tree, and the micro-services themselves are responsible for the further calculation and dissemination of data. This approach, unlike others, combines the more optimal qualities of orchestration: simple and understandable, at each stage of calculation, system management, coordinated micro-service actions. Also, the use of a specialized hybrid orchestrator eliminated one of the main drawbacks, namely, it reduced the responsibility and the amount of computational burden assigned to the distributed system orchestrator, and computation nodes. As a result of an experiment using a distributed system with an orchestra, based on the data flow control paradigm, a several-fold decrease in the load on one orchestra was achieved. This makes it possible to use micro-controllers such as ESP8266, ESP32, Raspberry Pi as a distributed system. Such micro-controllers can act not only as an orchestrator, but also as data-flow nodes. At the same time, the data flow control paradigm allows to evenly and efficiently distribute the load on the system due to the fact that the input data of the system are presented in the form of a computational graph, where each node is a separate micro-service.


Introduction
The data flow management paradigm appeared in the early 70s of the last century and found itself in many aspects of distributed systems. Within the framework of this paradigm, the computational problem has the form of a directed graph, where the nodes represent operations on the data, and the connections between the nodes show the incoming and outgoing points for calculations. At the same time, the architecture of a modern microservice system has reached the level when the number of services reaches several tens or even hundreds of microservices [1]. It is customary to apply more intelligent approaches to such systems when setting up coordinated interaction of components, as well as in the process of controlling them [2][3][4].
In the case of data flow control systems [5], as well as systems based on the data flow paradigm [6,7], the issue of orchestration or control of microservices becomes even more relevant. Indeed, the operation of the entire system will depend on how communication and interaction between the components is carried out. The presence of confusing communication in a system with many components leads to a decrease in the speed of the system itself, as well as to a slowdown in the development and debugging of such a system. This paper presents an implementation of an orchestrator based on the data flow control paradigm, as well as an option to build a microservice system using this orchestrator. So, the object of research is distributed systems under the control of an orchestrator based on the data flow control paradigm, as well as microservice management methods. And the aim of research is to create an orchestrator based on the data flow control paradigm for a distributed system.

Methods of research
To achieve this aim, a study is conducted of existing solutions for microservice orchestration using the data flow paradigm. Among the analyzed solutions, it is worth highlighting the following: -Netflix Conductor [8]. It uses a central management node to manage system services. All calculation opera-

ISSN 2664-9969
tions are tied to the control node, including interaction with the database.
-ZeeBe [9]. It uses a message bus to communicate with services, and the data (or status) of the system is stored on the nodes of the system itself and, if necessary, is replicated to several services.
-Uber Cadence [10]. It also uses a message bus for inter-service communication, however, unlike ZeeBe, it has a centralized data storage, as in Netflix Conductor, which reduces system fault tolerance. The use of the message bus leads to an uncontrollability of the calculation process in the system, therefore this approach was rejected in favor of a centralized control node. The scheme of operation of the system with a centralized node is shown in Fig. 1.
The obvious drawback of such a system is that too many operations are concentrated on the orchestrator, that is, it plays the role of a kind of gateway that not only controls the calculation process, but also passes the entire data stream from microservices through itself. To solve this problem -reducing the load on the orchestrator, subject to the availability of a controlled data stream, let's propose such a model of orchestration, which made it possible to reduce and transfer the load to other services. This model of orchestration is presented in Fig. 2.
As shown in Fig. 2, the orchestrator takes part in the calculation process only at the initialization and receiving stage, while each microservice performs one operation of receiving and sending data. The peculiarity of this model is that while maintaining the centralized service management approach and the ability to monitor the state of the computational task, it is possible to reduce the load on the control node, making the system more balanced in terms of load.

Research results and discussion
As part of the study of the operation of the system with an orchestrator, based on the data flow control paradigm, 2 types of services are created: 1) orchestrator; 2) service worker.
The orchestrator is designed to control data flows in a distributed system, and also acts as a gateway between the user and the system.
The service worker acts as a wrapper over the business logic of the system, and also introduces functionality for interacting with the orchestrator and other service providers.
The UML class diagram of the designed system can be seen in Fig. 3.
Each service was launched in a separate Docker container. The structure of the system configuration file is shown in Fig. 4. According to the system configuration file with the orchestrator based on the data flow control paradigm, the number of services that took part in the experiment is 4, of which: 1 service orchestrator and 3 service executors.
This configuration file corresponds to the following system structure (Fig. 5).    The system works as follows: 1. The user makes a POST/execute request to the orchestrator and transmits the configuration file of the computational task (in the form of a directed graph).
2. As soon as the orchestrator receives the configuration, it begins to act according to the following protocol: 1) the orchestrator performs the procedure of assigning a separate service to the computing node in the configuration file according to the round-robin principle; 2) the orchestrator calculates which nodes (services) of the system have all the input data for starting the calculation (in fact, the orchestrator searches for the «leaves» of the calculation tree) and distributes the configuration file to them using the POST/execute request.
3. As soon as the service provider receives the configuration, it begins to act according to the following protocol: 1) the service provider checks that all incoming data is available. If not, the service will wait. If all the data for the calculation is already available, then perform the calculation; 2) the service worker writes the calculation result to the configuration file and distributes it using the POST/execute request, according to the configuration file. If the source node for this microservice is the orchestrator itself, then instead of POST/execute request, a POST/result/:key request is executed, where key is the request parameter that contains the key of the computing session.
An example of a configuration file for a computational task in JSON format is shown in Fig. 6. The working machine on which all tests were performed has the following configuration: -CPU: 3.4 GHz × 6; -RAM: 8 GB.
The resource consumption of the system described above is shown in Table 1. Thus, taking into account the results of the test of load and consumption of system resources, it is possible to conclude that the created orchestrator can be used on devices of the Internet of things.

Conclusions
As a result of the research, a system with an orchestrator is designed and implemented based on the data flow control paradigm. This system has several advantages compared to similar orchestration systems [8-10]: 1) relatively small size and low level of resource consumption; 2) uniform distribution of load on the system; 3) controlled data flow; 4) non-blocking calculation operations (several independent tasks can be performed simultaneously). The system was tested on a prototype code and implemented in JavaScript in Node.js. This environment allows to execute JavaScript code and interact with the file system.
The following stages of work are planned to investigate and implement the following tasks: 1) lack of service discovery services (to eliminate the need for manual configuration of services); 2) lack of fault tolerance services; 3) lack of a gateway software service that would allow traffic to be encrypted between the user and the system.
In general, the system showed that the implementation of an orchestrator for managing data flows can be quickly and efficiently implemented in a scripting language environment, for example, in JavaScript. This makes it possible to use this approach to build an orchestra for low-power computing systems, for example, in the devices of the Internet of things.