DEVELOPMENT OF INFORMATION TECHNOLOGY OF TASKS DISTRIBUTION FOR GRID-SYSTEMS USING THE GRASS SIMULATION ENVIRONMENT

GRID is a geographically distributed infrastructure, which is created on the basis of a set of heterogeneous network resources (cluster, server, separate PCs) and is used to solving scientific tasks on large computing powers. GRID is a coordinated, standardized and open environment that enables performing optimal distribution of computing resources to run the tasks, accessing at the entrance of the system. GRID-systems are divided into three types: voluntary, scientific and commercial. Each of these types has a number of advantages and disadvantages, is characterized by a large number of decisions, implementations, forms of organization of calculations. One of the relevant objectives is the development of application software, the main function of which is providing the user of GRID-systems with convenient interface between the application and the required computing resources.


Introduction
GRID is a geographically distributed infrastructure, which is created on the basis of a set of heterogeneous network resources (cluster, server, separate PCs) and is used to solving scientific tasks on large computing powers.GRID is a coordinated, standardized and open environment that enables performing optimal distribution of computing resources to run the tasks, accessing at the entrance of the system.GRID-systems are divided into three types: voluntary, scientific and commercial.Each of these types has a number of advantages and disadvantages, is characterized by a large number of decisions, implementations, forms of organization of calculations.One of the relevant objectives is the development of application software, the main function of which is providing the user of GRID-systems with convenient interface between the application and the required computing resources.

Analysis of scientific literature and the problem statement
There are many methods and algorithms of distribution of tasks on computing resources.A huge niche in this regard is given to programs that carry out the distribution of tasks for resources -Task Scheduler (brokers).The main objective of the broker is the construction of the distribution plan that meets the requirements of the task suppliers.When building a distribution plan, an optimization of values of the parameters of the objective function is carried out, through the use of which, a reduction in run-time of accessed tasks proceeds in the system, which leads to efficient use of computing resources.
Often, when distributing the tasks, a first suitable resource without any optimization is selected [1][2][3][4].Other algorithms do not take into account the peculiarities of the environment with inalienable resources, i.e. a monitoring of the computing resource utilization dynamics is not proceeded, the task supplier's competition is not taken into account [5,6].In [6], the criteria for the selection of the slots on the basis of utility functions are provided, which specifies a task supplier.However, even here, the optimization is performed at the level of selection of the best available computational resources.Many algorithms are not focused on advanced reservation, and are based on the priority of the tasks in a queue [7,8].However, the use of advanced reservation [1-3, 9, 10] gives a gain in a performance time of the task pool, which in turn, can reduce the downtime of computational resources in the system.
Planning methods described in [4,[9][10][11][12][13] use a number of criteria (the value of the use of resources, the level of load of computing resources, the use of nearby resources for related objectives in the task).In the studies [14][15][16][17] the methods of planning, which focus on the preferences of task suppliers, administrators of virtual organizations or suppliers of computing resources are suggested to be used.
An optimization criterion is entered into a resource request format which takes into account the interests of users.This criterion is used for the search of alternatives for performance of tasks, i. e., the owners of computing resources get the possibility to manage the workload of computer systems.The policy of partition of tasks into separate subtasks is also implemented, which can be run on diverse computing resources that enables improving the efficiency of the system load.
In [18] the selection of computing resources for the tasks, entering into the system, is performed by means of logical-probabilistic algorithm.The proposed algorithm is focused on multi-level planning of tasks for specified quality criteria (Cost, lead-time, reliability index).Planning is carried out in four steps based on the mechanism of regulation of demand and supply of resources.
In [19] a classification of incoming tasks is proposed, which can be used as a superstructure for the task scheduler (broker).Using the proposed superstructure increases the efficiency of the use of computing resources by optimizing the distribution of computing resources.
There are also those methods [14,15], in which there is the possibility of entering a predetermined selection criteria (E.g., via JSDL language), but they are characterized by a linear time complexity, i. e., they depend on the number of computational resources, which are available at the current planning time.
The conducted analysis shows that the disadvantage of existing GRID-systems is the use of a single broker, which is oriented to the certain class of objectives and at distribution of incoming tasks, it uses one distribution policy.However the GRID is primarily a heterogeneous system [20] and, in addition to that, the tasks, accessing for performing, may be of a heterogeneous nature.Most of today's planners do not take into account the cost of computing resources, which also leads to inefficient use of resources.

The purpose and objectives of the study
The purpose of this work is the developing of the information technology for distribution of tasks for computing resources in a heterogeneous GRID-system.
To achieve this goal it is necessary to solve the following tasks: -to develop a mathematical model of task representations and computational resources in GRID-systems; -to develop a method of selection of the best distribution option; -to carry out a series of experiments on the distribution of tasks on the basis of the proposed method in the GRASS simulation modeling environment.

The use of simulation modeling for the distribution of tasks in GRID-systems
Let us justify the expediency of the proposed GRASS simulation environment (GRID Advanced Simulation System) for the research of application of different task planning strategies.
There are many sets of modeling environments of the GRID-systems.The most popular of them are SimGrid, GridSim, MicroGrid, ChicSim, OptSim, Alea [21,22].The comparative analysis shows that most of them are characterized by specialization, limitations of structures, absence of accessible and open versions.Also to work with such systems the knowledge of specific programming languages is required.
Creating a GRID-system requires investments, which are not always available, so a cheaper option is the use of simulation modeling.Simulation modeling is based on the building of a mathematical model, which can be used to research the properties of the real system.
GRASS system [23] is a computer model of GRID-system which enables developing a framework of distributed computing systems taking into account the problems that are solved by it and to explore its properties.This model enables: -to reproduce the process of functioning in time of elementary events occurring in the system with the preservation of their interaction logic; -to produce a series of numerical experiments, which will enable collecting, analyzing the modeling results and a comparison of the received data with the actual behavior of the object.
The GRID-system tasks do not run on their arrival, but at a certain time of the day.Therefore, we can use the task and resource pool from the real GRID-system and with the help of GRASS environment to simulate and get the best distribution plan, which can be offered to a real system.The received best plan minimizes not only the final performing time of pool tasks, but allows to increase of efficiency of the use of computing resources of the GRID system.However, in this case, a problem arises that a GRASS environment simulates a work of a real GRID-system and its result can be obtained only after a certain time, when the distribution plan may be no longer relevant.This problem can be solved in two ways: -start of GRASS environment is carried out parallel on several processors (according to the number of present distribution policies in the environment); -introduction of a temporary scaling for process modeling.
The first solution can reduce a modeling time, as the first received plan is the best solution in time for a given task pool.However, this solution is a long-term option, it cannot be always possible to apply in practice.The second solution, based on the introduction of a temporary factor, enables reducing of distribution time (transition from hours to seconds).After receiving the plans on all distribution methods, it is required to make the transition from seconds to hours and obtain a real modeling time on the basis of which to analyze and select the best distribution plan.

Technologies of tasks distribution using GRASS environment
GRID-system is a complex object, containing a set of comprising of interconnected heterogeneous resources, whose operating is subjected to a number of predetermined rules.One of the main tasks of this system is coordinating of resource distribution, for the solution of the provided tasks.The distribution model in GRID-system can be constructed on the basis of two sets: a set computing resources R and a set of tasks Z, and also a distribution policy q, i. e.,

{ }
G R, Z, q .= Tasks, accessing the GRID-system, form a stream {Z i , i=1, 2,…,M}, where I is the serial task number, and M is the number of tasks.Each task includes a number of parameters required for its launching in the environment (1): where ar i is the architecture; os i is the operating system; pc i is the processor count; ps i is the processor speed; ms i is the memory size; dc i is the disk capacity; pr i is the priority.
Any task is a package of objectives { } z , a z Z , ∈ combined by a specific theme.Each task is a separate performing program.The task for its performing can be started when all requested resources will be selected for all tasks.
Resources also form the set { } j R , j 1,2,..., N , = where j is the number of computing resource, and N is the number of resources in GRID-system.Any computing resource, accessing GRID-system is described with a number of features which are represented by a tuple (2): In different distribution systems the input tuples (1) and ( 2) may have insignificant differences -due to the fact that the systems develop and during the work with them, arises the need for introduction of addictions, related to requirements of task suppliers.
An important place in the GRID-systems is assigned to task schedulers (brokers), whose responsibility is to establish a schedule of use of computing resources.There are currently more than twenty well known schedulers for GRID-systems: Portable Batch System (PBS), Sun Grid Engine (SGE), TORQUE, Condor, LoadLever, MAUI Scheduler, etc.The listed above schedulers do not provide the user with a unique efficient task distribution method.With their help it is only possible to use one simple algorithm, which is started on the condition that all the tasks and resources of the system are given to an advanced predetermined general description.
Currently, there is no universal task scheduler, as the tasks in GRID-systems are heterogeneous.Therefore, the additional options should be taken into account at distribution, which will enable increasing the efficiency of use of computing resources of the system.For example, the analysis of objectives in the task in terms of their connectivity will enable to reduce the performing time of the task pool in the system by eliminating the losses in time, which are caused by data exchange between separate objectives in the task.
Let us extend the model of GRID-system through the introduction of 1 and 2 parameters in the tuples, which will reduce the downtime of computing resources in GRID-system (3), (4): Z ar , os , pc , ps , ms , dc , pr , ca ,rt , { } r r r r r r r r j j j j j j j j j R ar ,os ,pc , ps ,ms , dc , bw , d , where ca i (coefficient of association) is the coefficient of connectivity of objectives in the task; rt i (run time) is the time of task performing (time of use of the resource); bw j (bandwidth) is the total bandwidth (from the broker to the resource) including the network condition at the current time; d (delay) is the total packet transmission delay time based on network condition at the current time.
Let us also introduce a set of distribution methods in the model instead of a single distribution policy.
where mn is the application distribution algorithm on computing resources; lp is the list of input parameters taken into account at distribution.As mentioned above, the disadvantage of the GRID-system is the use of a single broker, which is focused on a certain class of objectives.The proposed GRASS environment enables operating several distribution algorithms at the moment: First-Come First-Served (FCFS), Last In First Out (LIFO), Highest Priority First (HPF), a linear programming method (Simplex), distribution method on free resource, (Smart), a backfill method (Backfill), and a task distribution method on the computing resources based on network traffic (Backfill mod).Backfill mod is an upgrade of a backfill method, which, unlike the existing one, enables performing distribution of tasks, accessing GRID-system depending on objective associations in each of the tasks.
To select an effective distribution plan, a selection method of the best distribution plan has been developed, which was implemented in GRASS modeling environment.It is required: -to form pools of tasks and resources for different classes of tasks (either to form in GRASS environment using specified generators, either use the pools of tasks of the real GRID-systems; -to select a policy (method) of the distribution; -to run the simulation in the GRASS environment; -to analyze the log files for all of the policies (methods) of distribution; -to select the best plan for the distribution on the basis of accepted rules and restrictions.
The proposed task distribution policies in GRASS environment for different input pools of tasks and resources can provide a gain in time and reduce a downtime of computing resources, so it is necessary to set accurate limits for the access.
Let us observe the operation of the information technology of task distribution [24] in a heterogeneous system, which uses a simulation modeling environment GRASS (Fig. 1).The proposed technique involves the following stages.
Stage 1. Experiment parameters setting.To run an experiment it is required to set a configuration file plugins.xml, which describes interconnections between the module names in the system and the names of the files and libraries, as well as paths to configuration files [25].A plugins.xml file allows setting parameters, passed to the modules at boot time, which can be used to set the operation mode or initializing of internal values: for tasks, computational resources and methods of distribution.
The result of this step is reading and decoding of configuration file (plugins.xml),loading the main system plugins to start a distribution process.
Stage 2. A distribution policy selection.The process of distribution of tasks is performed after the tasks_count parameter validation of the AlgorythmLoader plugin.As the task number in the queue exceeds the number, set in the tasks_count parameter, a call of distribution algorithm proceeds (5), which is defined in the DistributionAlgorythm parameter.According to the selected distribution method [26], a selection of resources for each task in the queue will be processed.A module of task distribution, which is a part of GRASS environment, contains a number of methods, which emulate different distribution algorithms, each of which uses its own set of parameters ( 3) and ( 4) for distribution.FCFS and LIFO methods do not use additional parameters, as they are the simplest policies, serving visual comparison with other methods.
Stage 3. Uploading of information about computing resources and tasks into GRASS modeling environment.
A task, accessing the system, is a packet of objectives, which is a separate executable program.The processor can execute only one task at each moment of time, and the task can be started for execution only in case of selection of computing resources for all objectives of the task.Any task, accessing into GRASS environment can be divided into two components: the characteristics (parameters) of the task and the task (as a .exefile, input data files, databases, etc.).
Stage 4. Formation of additional parameters for the most efficient distribution.All tasks, accessed to the system, are placed to the task queue (link 5, Fig. 1) and parallel to it, a transfer of information about each task is proceeded (a tuple 3) into a tuple convolution module and an association analysis module (link 6, Fig. 1).Tuple convolution module computes the generalized evaluation criteria for each task, which enables to manage the process of distribution of tasks on computing resources more efficiently and will show what part of the resource the task takes while performing [27].At the same time, a transfer of information of resources, available in the system, is proceeded (link 7, Fig. 1).The result of the tuple convolution module is a list of resources, on which each task can be distributed.
Then information is received at the association analysis module, (link 8, Fig. 1), where the analysis of objectives in the task is performed.If objectives in the task have high connectivity (the exchange of information between tasks in the course of their implementation is required) or for this task a transfer of large amount of input and output data is required, a call of method checkQueueStrict() will be proceeded, which will compare the requirements of the task with available computing resources.
If computing resources for running of the current task are available in the system, it remains in the queue and receives a Waiting status, and an appropriate computing resources will be selected to suit it.If these resources are not found, the task gets a Cancelled status, after which it leaves the queue i.e., this status indicates that the task cannot be run in this configuration.
In this case, at distribution of objectives out of the task, the focus on selection of computing resources will be done in such a way, to reduce the time of data transferring for task objectives.
If objectives in the task are not interconnected, a method checkQueueBase() will be called in the system, which, similar to the method checkQueueStrict() will perform a selection of computing resources to distribute individual objectives out of the task to various computing resources and the task remains in the queue at Waiting status up to the time of its running on computing resource.
Stage 5. Task distribution module (broker) receives the information which enables to make decisions for distribution of tasks on computing resource or resources, performing on which will be the most optimal for it both in hardware characteristics, so in characteristics of performance.For distribution, the broker must receive the following information: -information on the resources (4), which are currently present in the system (link 9, Fig. 1); -information on the resources that are currently reserved (link 10, Fig. 1) and the time of their release; -information from the database of the previous starting of the task (link 11, Fig. 1); -policy (method) of distribution (link 12, Fig. 1); -information about connectivity of objectives in the task (link 13, Fig. 1).
On the basis of the received data, a distribution is performed.The result of the work of the module is the plan (link 14, Fig. 1), which defines a computing resource to each task, that is suitable according to the previously mentioned requirements.Any task, accessing to the Grass environment will be distributed, the only possible exception is when the computing resources will not be found in the system.Stage 6.A process of task sending to the resource or resources, identified in the distribution plan.To do this, an inquiry into the queue of tasks is carried out from the unit which is in charge of transportation (link 15, Fig. 1), which, according to plan of distribution selects a specific task and assigns it a status of Running.Further, a task (second component), is sent to the selected resource for the next starting (link 16, Fig. 1).As the actions with resources took place, (in this case a resource for task performing is selected), it is required to notify a system about it (link 17, Fig. 1).A redundancy block sends information to the system that the resource cannot be used for the subsequent start, until the finish of task performing (link 18, Fig. 1).Once again, actions with resources emerge, (for example, a release of the resource emerges), the backup unit will receive information about the occurred event (link 17, Fig. 1) and will notify the system of the possibility of using of the resource for distribution (link 18, Fig. 1), and will send a message in the queue of tasks to delete the task from the queue due to the end of its execution (link 19, Fig. 1).
Stage 7. Collection of statistical information about the distribution.To obtain statistical information about the distribution and its following analysis, the module starts up, which is responsible for log-files.In Grass environment a logfile conducting module (Logger) is responsible for collection of statistics.This module provides a flexible and centralized logging of all plugins of GRASS modeling environment.
The system has several types of log-files, which record a series of actions that enables a quick obtaining of information form the database in case of necessity.
At the task accessing into system, an automatic recording of them performs into the queue.logfile (link 3, Fig. 1), and at resource accessing (link 4, Fig. 1) -it performs into resources.log.file.At the time of task starting on the resource or at the end of task performing, a fixation of information of this action performs in the raspred.logfile (link 20, Fig. 1), i. e., a recording of the file performs.All records from the log files are sent to the database of the system (link 21, Fig. 1), data from which can be used next for sampling the specified parameters or for graphical information display for a certain period of time (link 22, Fig. 1).
The modeling process continues as long as there are tasks in the queue.

Peculiarities of GRASS simulation system
GRASS environment is a project with an open source software, which implements modular environment of GRID systems [23], is cross platformed, implemented in C++ language using a cross platform Qt4, Boost libraries.
GRASS modeling environment has both a console and a graphical user interface that enables to observe the simulation of the process in real time.The graphical interface provides the user with more options, as it displays and visualizes statistics of the simulation environment, the status of the queue of tasks and resources, and demonstrates the performance of tasks by computing resources.In addition to the interactive observation of the simulation process, GRASS environment enables to write a report for the following more detailed analysis.The report is implemented using scripts that make it easy to change both its appearance and content.
GRASS modeling environment has a modular structure: It consists of a core and dynamically loaded modules (plug-ins).Each module performs a highly specialized task, referring if necessary to the other modules of the system.The core provides means of inter-module interaction and provides the boot and system configuration.Each module has a unique string identifier (name or ID), which provides a set of interfaces to interact with the other system components.Each interface of the module has also a name and can be implemented by any number of modules.Thus, to achieve any interface of the module it is only necessary to know its name and the name of the interaction interface.
GRASS module environment is a dynamic linked library dll (dynamic linked library), which implements the factory method that creates a sample of module class.
Module class implements a Framework::IPlugin interface, which enables working universally with it, not taking into account peculiarities of implementations.
To create a flexible modular system an easy possibility of changing the set of plug-ins and their parameters is required without modifying the core or the libraries of modules.For this purpose, GRASS environment uses the system of configuration files based on XML.
The developed software system is the basis for the continuation of research in the field of GRID-systems.

Discussion of the results of research of application of the proposed information technology
In the study of information technology of distribution of tasks, a series of experiments were conducted.Their aim was to show the rationality not of using of a single broker for all incoming tasks on the system input, but to select the best, focusing on task classes.
2 pools were formed with the GRASS environment: of tasks and resources.Pool of tasks includes 300 tasks, each of which is described by a tuple (3), a resource pool is formed in accordance with the tuple (4), and includes 80 computing resources.Each task is characterized by the requirements, specified by the job providers.
Using different distribution policies a simulation was conducted.The simulation results are compared with each other, and on the basis of the analysis a decision of the best distribution for a particular task pool is made.Fig. 5 shows the run time of all GRASS environment methods for a particular task pool, and Fig. 6 shows the percentage of downtime of computing resources.

Fig. 5. Distribution time of task pool in the GRASS environment
All tasks, accessing the GRID-system, can be divided into 4 classes, which are characterized by: -a large amount of input data and a large amount of output data; -a small amount of input data and a large volume of output data; -a large of input data and small amount output data; -a small amount of input data and a small amount of output data.
Table 1 and Fig. 7 show the simulation results of all methods of GRASS environment in accordance with the following classification of tasks.In the course of the experiments, a dependence of distribution results to task classes was revealed.There is no unique method of distribution at present, which could enable getting the best plan on any pool of tasks.But in case, when the GRID-system characteristics and the input task stream are known, we can carry out modeling and select distribution method for a certain task pool.
The advantage of GRASS simulation environment is that it operates on an algorithm that analyzing the input stream of tasks, chooses the distribution method that gives the best distribution of the specified requirements of task provider.

Conclusions
As the result of conducted research, methods of task distribution in heterogeneous GRID-systems were analyzed.From the analysis we can conclude that the existing schedulers on the market have a number of drawbacks, the main of them is the focus on a specific class of tasks.This can be explained by the fact that these planners were originally intended for cluster systems, which in turn, were later included in the GRID-infrastructure and tasks continued to be carried out locally.
In the process of research a modification of mathematical models of representation of tasks and resources in GRID-system was proposed.The use in representation model tasks a connectivity factor (step 4) enables the selection of computing resources with a minimizing of time for data exchange between tasks.Introduction to the model of representation the computing resources of value, taking into account the amount of traffic and transmission delays of information on the channel, enable to reduce the performance time of the task pool, which will increase the efficiency of the use of computational resources in GRID-system.
Based on a mathematical model of representing of resources and tasks, a method of selection of the best distribution option on the basis of analysis of data obtained during the experiment was developed.The proposed GRASS environment enables to operate multiple distribution algorithms: FCFS, LIFO, HPF, Simplex, Smart, Backfill, Backfill mod that enables to simulate a distribution for a particular pool task at the various distribution policies and to analyze the simulation results in the future.Due to the fact that the proposed system is modular, there are no obstacles for the implementation and connection of new distribution algorithms.
In the course of research, a series of experiments on the distribution of tasks to computing resources in the GRASS simulation-modeling environment for different distribution policies was conducted.The results obtained during the experiments show a decrease in task pool performance time by 24 % and increase of efficiency of use of computing resources by 32 % for a number of the distribution policies (methods), implemented in the GRASS environment.

Fig. 6 .
Fig. 6.Downtime percentage of computing resources in GRASS environment

Fig. 7 .
Fig. 7. Run time of GRASS environment methods depending on task classes

Table 1
Run time of GRASS environment methods for various classes of tasks