FORMING THE CLUSTERS OF LABOUR MIGRANTS BY THE DEGREE OF RISK OF HIV INFECTION

The research deals with the problem of dividing a group of migrant workers into subgroups according to the degree of risk of infection with the human immunodeficiency virus. Mathematical model of the problem of clustering as a problem of building up a rule was developed, by which reflection from the set of possible values of characteristics on a set of clusters is carried out and the method of evolutionary clustering of objects was adapted to the selection of groups of migrant workers by constructing a fitness function, which provides assignment of an object to the cluster, the Euclidean distance from the center of which to the object is the smallest. Experimental verification of the developed method for the problem of defining subgroups of persons according to socio–demographic characteristics in the group of migrant workers was performed, the result of which was dividing the group of migrant workers into three groups of clusters in the ascending order by the degree of risk: a group of clusters with high risk, a group of clusters with moderate risk and a group with a relatively low risk. As a result of this division, each group of clusters is homogeneous not only by socio–demographic portraits of its representatives, but also by the degree of prevalence of practice of risky behaviors with regard to human immunodeficiency virus infection. Comparative analysis of the results of the problem solving of clustering of the objects of the set group with high risk by the method of k–means and by the method of evolutionary clustering was carried out by the values of the function, which is the integral sum of the distances from objects to the centres of those clusters where they belong. Therefore, according to the performed calculations, the advantages of the evolutionary method in particular have been proven.


Introduction
Under conditions of concentrated stage of HIV-infection/AIDS epidemic that is taking place in Ukraine now [1], when the epidemic spreads mainly among specific groups of population, vulnerable to the HIV infection [2], the main measures against epidemic are focused exactly on the representatives of these groups.Given the fact that the commonly-recognized groups with high risk (GHR) of HIV infection are very diverse in their structure [3], and also considering the limited resource base, it is argued that there is a need to develop targeted preventive intervention and influences on their separate subgroups, maximally homogeneous by socio-demographic characteristics, based on a client-centered approach.Since it is based on the implementation of measures that are attractive and comfortable for representatives of the target groups [4], an important condition for its implementation is the study of the main socio-demographic characteristics of such subgroups of GHR and determination of the level of dissemination of risky behavior in each subgroup in terms of HIV infection.This approach will allow determining the priority of the need to implement prevention programs for different subgroups of GHR and designing action program that is the most convenient and close to the representatives of GHR.
One of the leading GHR of HIV infection in Zakarpattia Oblast is external and internal labor migrants [5].This group is quite numerous and diverse in its composition, therefore, the formation of separate subgroups, homogeneous by socio-demographic characteristics and defined degree of risk of HIV infection for each of these subgroups in this GHR, is necessary and relevant.
The task of dividing the GHR in subgroups according to socio-demographic portrait of their representatives can be represented mathematically as the problem of clustering [6].

Рассматривается задача формирования подгрупп по степени риска инфицирования вирусом иммунодефицита в группе трудовых мигрантов, как задача кластеризации. Построена математическая модель задачи кластеризации и выполнена адаптация метода эволюционной кластеризации объектов к определению групп трудовых мигрантов. Осуществлено разбиение группы трудовых мигрантов на кластеры в соответствии со степенью риска инфицирования вирусом иммунодефицита Ключевые слова: эволюционная кластеризация, группа трудовых мигрантов, риск инфицирования вирусом иммунодефицита человека
method of solving the problems of clustering and proves its efficiency.Theoretical bases of neural network methods are contained in [12].Such methods require considerable experience from researchers when setting up the values of the main parameters, which makes it difficult to apply when solving practical tasks.The methods of group of tree clustering [13] consist in the construction of clustering tree.The methods of fuzzy clustering, among which the main method is a method of fuzzy c-means [14] that is based on application of the apparatus of fuzzy sets, allow simultaneous assigning of the same item to different clusters with different degrees of membership.Adaptation of the method of fuzzy c-means, which is provided in [15], allows dividing objects into clusters with regard to degrees of the impact of the characteristics on the object belonging in a cluster.
A separate group includes heuristic clustering methods.Thus, the paper [16] presented a heuristic method of clustering, based on the analysis of matrices of differences.The paper [17] is devoted to the design of hybrid heuristics for solving the problems of clustering.The problem of clustering of time series and methods of its solution, based on the evaluation of degree of uncertainty of parameters of autoregressive models, are given in [18].Methods of automatic clustering, based on the method of supporting vectors, are the subject of the papers [19,20].
The accuracy of solution of the clustering problem depends on the distribution of the objects in the field of research, the number of necessary iterations for the formation and clarification of clusters, other problems associated with the peculiarities of the application of different methods.
Given the importance and urgency of the problem of determining the risks of HIV infection by certain groups of the population, it is expedient to design such models and methods of clustering, which would allow its efficient solving.

The purpose and objectives of the study
The purpose of the research is to increase the efficiency of decision-making processes when determining clusters of migrant workers by the feature of infection with the human immunodeficiency virus through the development of models and methods of evolutionary clustering.
According to the stated goal, the following tasks were set: -to build a mathematical model of the problem of clustering and perform the adaptation of the method of evolutionary clustering of objects to define the groups of migrant workers; -to perform experimental validation of the designed method for the task of defining the groups of people by socio-demographic characteristics in GHR -migrant workers; -to carry out a comparative analysis of the results of solving the problem of clustering of the given GHR by different methods of clustering.

Adaptation of method of evolutionary clustering to the formation of groups of migrant workers
Let us consider the task of clustering objects in this formulation.Let there is a set of objects where ij x is the value of the feature j X X Î for the corresponding object.It is necessary to build the rule In other words, the problem of clustering is to define the indicator variable s in the following way: where are the regions of space ( ) M X that match the clusters.
For the solution of the problem we will apply the method of evolutionary clustering EvoClast [8], based on the genetic algorithm, which classically includes the following stages: 1. Determining the population of individuals Θ , who are potential solutions to the problem of optimization of objective function.
2. The implementation of the previous steps of the algorithm, which consist in determining the number of elements l of the initial population Θ, where l << Θ ; selection of rule of normalization of the input data; the choice of ways of recombination, mutation and inversion and the corresponding probabilities.
3. For each element i θ ÎΘ, i 1,l = we compute the values of the objective function We propose an algorithm for constructing the objective function to solve the problem, formulated above.
Let us assume that all of the components of vectors of characteristics of objects ( ) ( ) are numeric.Then, on the initial stage we perform normalization of the data by the formula ij jmin ij jmax jmin x x x : .x x After the conversion, the values of all vectors of the data will be allocated to the single hypercube ).Then we successively perform the following steps: Step 1. Set the initial value of the fitness function F : 0 = .
Step 2. For each element of the sample set O we successively perform the following steps.Let i 1 = .
Step 3. We calculate distances from the object i O to each of the K clusters by the formula: Step 4. Attribute the i-th object to the q-th cluster, where k k 1,K q arg min d = = .
Step 5. Adjust the value of the objective function by the rule: q F : F d = + .Proceed to the next object: i : i 1 = + .Step 6.If all the items from the set O have been considered, that is i n 1 = + , then the process of calculating the value of the function F is completed, otherwise, proceed to step 3.
The expression to calculate the objective function can be written down in the following way: F y ,...,y min x y .
Then the problem of clustering is to find the minimum value of the objective function (5).

Experimental verification of the method of evolutionary clustering for the problem of defining subgroups of GHR -migrant workers
To perform the experimental verification of the modified method of evolutionary clustering, the problem of defining subgroups of GHR -migrant workers was studied.Initial data in the set task are the results of a previously conducted sociological survey among labour migrants -residents of Zakarpatska Region of Ukraine, in which 561 representatives of the target group were polled.
Analysis of the submitted questionnaires was conducted by the following areas: socio-demographic characteristics of survey respondents (gender, age, residence location, education and marital status); the direction of labour migration (external labour migration -European countries and, separately, Russian Federation; internal labour migration -other regions of Ukraine); behavioral peculiarities (the practice of HIV infection risky behavior).
Mathematically, the problem of selecting sub-groups of GHR was presented as a problem of clustering.To perform clustering, we used n=561 questionnaires of the persons -migrant workers.The following socio-demographic characteristics were selected as the basis: gender of a person, age category, residence location, education, marital status and direction of labour migration (m=6).A snippet of the data is in Table 1.
Each value of a qualitative characteristic was assigned with a numeric value according to Table 2.After the transfer of quality characterisitics into numerical equivalents and fulfillment of normalization by the formula (3), we carried out clustering by the algorithm of clusterization of k-means and by evolutionary algorithm for varying numbers of clusters ( K 3,8 = ), described in the study.For each break-down, we calculated the value of the objective function F by the formula (5).Results of the calculations are listed in Table 3 and in Fig. 1.According to the data from Table 3 and Fig. 1, the values of the objective function obtained by evolutionary clustering are lower than those when applying the method of k-means, indicating the expedience of solving a clustering task of the group of persons -labour migrants, by the method of evolutionary clustering.
With the number of clusters K=6, the following results were obtained by the method of evolutionary clustering, which are in Table 4. Based on the obtained results of the centres of clusters, according to Table 2, it is possible to construct a socio-demographic portrait of the typical representative-cluster by its characteristics.Division into clusters by socio-demographic characteristics of the representatives of the target group of the study are presented in Table 5.
For further analysis and interpretation of the results of clustering, a system of characteristics { } u , u , u , u , u , u , u , was formed from the behavioral questionnaires of personsmigrant workers, which define the practice of HIV infection risky behavior of these persons (Table 6).− .For each cluster we calculated percentage of its representatives (migrant workers), for whom these characteristics are inherent.The results are in Table 7.

Table 6 Descriptive characteristics of features that define the practice of HIV infection risky behavior of migrant workers
Table 7 Percentage of representatives of clusters, for whom characteristics that define the practice of HIV infection risky behavior are inherent.As presented in Table 7, HIV infection risky behavior is practiced by the smallest share of the representatives of the second and third clusters.Thus, only 13.2 % of cluster 2 representatives noted the existence of casual sexual partners (feature u 1 ) -this is the lowest indicator among all formed clusters.For the representatives of this cluster also characteristic is a low percentage of the representa- tives, who display features of risky behavior (u 2 , u 3 , u 4 ) and a relatively high proportion of representatives, for whom the features of the absence of the risk of HIV infection (u 5 -u 7 ) are characteristic.At the same time, the percentage of representatives of clusters 1 and 4, for whom the features of risky behavior (u 1 -u 4 ) are characteristic, is much higher.Analysis of the results allowed dividing all personsmigrant workers, into three groups of clusters in ascending order by the degree of risk, according to the degree of riskiness of their behaviour regarding HIV infection: Group 1group of clusters with a relatively low risk of HIV infection, Group 2 -group of clusters with moderate risk, and Group 3 -group of clusters of high risk.As a result of the division, each group of clusters is homogeneous not only by socio-demographic portraits of its representatives, but also by the degree of prevalence of the practice of HIV infection risky behavior.Division of clusters by the socio-demographic characteristics into the groups according to the degree of risk of HIV infection is in Table 8.
As shown in Table 8, a group of clusters with high risk of infection includes representatives of 1 and 4 clusters, the socio-demographic portrait of whom is as follows: men of age 25-44 who have full secondary or higher education and are married.In this case the people with the highest risk among the rural residents of this group are labour migrants who travel to Russia, and among the urban populationto the countries of Western Europe and other regions of Ukraine.Representatives of clusters 2 and 3, which belong in the groups with relatively low risk of HIV infection, are residents of rural areas, married, with a full secondary education.These are young (25-34 years of age) and mature (up to 45 years of age) women and men, who travel to work to the countries of Western Europe.
Experimental verification of the adapted method of evolutionary clustering for the solution of the problem of the formation of clusters of persons -migrant workers proved its efficiency.The advantage of the adapted method of evolutionary clustering for the solution of the set problem is its greater accuracy compared to the method of k-means.
Application of the proposed method allowed conducting more precise clustering of the representatives of a group with a higher risk of HIV infection -migrant workers, by the principal socio-demographic characteristics and forming the groups of clusters based on the degree of risk of HIV infection of their representatives.This approach allows, when planning preventive measures and interventions in the programs against HIV epidemic, concentrating attention and allocating resources to the most vulnerable subgroups of representatives of GHR, which are homogeneous by their socio-demographic characteristics.This is especially relevant under conditions of limited both financial and human resources in the area of counteraction to HIV epidemic, which is currently observed in Ukraine.
It is advisable to apply the adapted method of evolutionary clustering to the analysis of GHR in other regions of Ukraine, as well as to social groups of different origin in general.

Conclusions
Solving the problem of determining the risks of HIV infection for certain groups of the population as the clustering problem was proposed.In the course of the study: 1. Mathematical model of the problem of clustering as the problem of building up a rule was developed, by which reflection from the set of possible values of characteristics on a set of clusters is carried out and the method of evolutionary clustering of objects was modified, which allowed dividing the objects into clusters, using evolutionary paradigm.The modification of the method was performed by constructing a fitness function in the form of such an objective function, which provides assignment of an object to the cluster, the Euclidean distance from the center of which to the object is the smallest.
2. Experimental verification of the modified method of evolutionary clustering for the problem of defining subgroups according to socio-demographic characteristics in GHR -migrant workers was performed, allowing dividing the representatives of this GHR into clusters according to the degree of risk of HIV infection.Three groups of clusters were formed in the ascending order by the degree of risk of HIV infection: a group of clusters with high risk, a group of clusters with moderate risk and a group with a relatively low risk of HIV infection.Therefore, the representatives of this GHR, who display socio-demographic features that match the characteristics of clusters 1 and 4 (group of clusters with relatively high risk of HIV infection) potentially have higher level of risk of infection and thus require focused prophylactic measures to prevent infection.The applied approach allows concentrating attention and resources on the most vulnerable category of persons of this GHR, which is the basis of improving the efficiency of combating HIV infection/AIDS among the representatives of this group of population.
3. Comparative analysis of the results of the problem solving of clustering of the set GHR by the method of k-means and by the method of evolutionary clustering was carried out.The comparison was performed by the values of the function F, which is the integral sum of the distances from objects to the centres of those clusters where they belong.Less value of this function corresponds to the best division into clusters.Therefore, according to the fulfilled calculations, the advantages of the evolutionary method have been proven.

4 . 7 .
With probabilities l i p , proportionate to the values i f , we select two individuals and carry out recombination to receive two new individuals.We form a new composition of the population Θ, delete individuals with worse values of the objective function.
Set the value K -the number of clusters.Define T m K = ⋅the number of elements in the vector-ndividual of the population.Then, the objective function F will depend on T variables ,y ,...,y ,y ,y ,...,y ,...,y = , where ( ) k1 k2 km y ,y ,...,y is the center of the k-th cluster on the corresponding iteration (k 1,K =

Fig. 1 .
Fig. 1.Diagram of values of the objective function for the various methods of clustering

Feature
Description of a feature as the practice of HIV infection risky behavior u 1 Existence of casual sexual partners over the last 12 months u 2 Sexual relations with casual sex partners without the use of condoms in the region of work destination u 3 Presence of past infections that are transmitted primarily sexually u 4 Sexual relations with a casual sex partner without the use of condoms in the region of permanent residence u 5 Absence of casual sex partners u 6 Casual sexual relations with sex partners over the last 12 months, but sexual relations only with a condom u 7 No sexual relations or all sexual relations only with a condom Availability of characteristics 1 4 u u − display high behavioural risk of HIV infection for the person.Similarly, the same is expressed by the absence of features 5 7 u u

Table 2 Table of
accordance of characteristics of socio-demographic portrait to their numeric equivalents

Table 1
Snippet of the socio-demographic characteristics of persons -migrant workers

Table 3
Values of the objective function

Table 4
Results of clustering (centres of clusters)

Table 5
Clusters by the socio-demographic characteristics

Table 8
Grouping of persons -migrant workers according to the degree of risk of HIV infection