VEHICLE ROUTING PROBLEM OPTIMIZATION WITH MACHINE LEARNING IN IMBALANCED CLASSIFICATION VEHICLE ROUTE DATA

The object of this research is a combinatorial opti­ mization problem arising in the problem of the route of goods delivery vehicles. In this study, the proposed method for solving combinatorial optimization prob­ lems consists of several stages: Data Cleaning, Data Preprocessing, K­NN and Cavacity Vehicle Routing Problem model. The results show that the machine learning approach can optimise combinatorial opti­ mization problems, especially in generating vehicle route points and delivery capacity. The characte­ ristics in determining vehicle routes by considering latitude and longitude points. This research builds a framework and implements it in a multi­class optimi­ zation model to reduce overfitting and misclassifica­ tion results caused by unbalanced multiclassification from the influence of the number of ‘nodes’ on vehicle routes with machine learning. The purpose of the model in general is to gain an understanding of the mecha­ nism in the problem so that it can classify unbalanced vehicle route data based on Jalur Nugraha Ekakurir delivery routes. So that with the availability of the model can be a model in determining vehicle routes based on the capacity limit of the number of shipments of goods. The results of research with machine learn­ ing models and vehicle routing problems with testing K values 11, 13, 15. Where it has a percentage of K=11 accuracy 57.3265 % and K=13 accuracy 57.3265 % and K=15 accuracy 81.8645 %. From the test results with odd K values have better accuracy and the K 15 K=15 value is better with a percentage of 81.8645 % compared to K 11 K=11, and 13 K=13. As a result, the developed model in terms of accuracy of the cavaci­ ty vehicle routing problem model has an accuracy of 93.80 % and the time series achieves an average pre­ cision of 93.31 % and with a recall value of 93.80 %. The results obtained can be useful in developing a more modern model, Cavacity Vehicle Routing Problem with


Introduction
Problems in industry and distribution services that are classified as combinatorial optimization include determining vehicle routes with the Vehicle Routing Problem [1].The Vehicle Routing Problem is one of the most important issues in the distribution transportation system [2].The Vehicle Routing Problem can be defined as the problem of finding the optimal point from the central route to the destination route [3].One of the most important approaches that has been adopted to try to improve the efficiency and growth of distribution systems is the problem of vehicle routing [4].The Vehicle Routing Problem is an important problem where optimization in the distribution of a fleet of vehicles is required to deliver the requested goods to the client with minimal total cost [5].
Manufacturing company products require good distribution in order to reach customers, directly from manufacturers to customers or indirectly through distributors [6].The form of distribution of goods can also occur from distributors to customers.Whatever the form of distribution is for the company, the most important thing is that the distribution can generate the lowest costs [7].The Vehicle Routing Problem optimization problem in the route determination process is a challenge and attracts the attention of a number of researchers.The Vehicle Routing Problem was first introduced.The Vehicle Routing Problem emerged from the existing optimization problem traveling sales problem and remains the most structured and rigorous transportation model [4].
This research presents an electric vehicle routing problem that depends on the recharging time of the vehicle.
The approach used is divided into 2 stages, the first to find the best path and the second to optimize the route [8].
The method that can be used is to use the Vehicle Routing Problem (VRP).A vehicle can be identified by its color and the license number for security, odd-even traffic control, and electronic payment systems for toll and parking payments [9].This method will provide optimal routing so that the resulting distance or time is the shortest or fastest, thus the use of fuel becomes more efficient [10].
Implementation of Vehicle Routing Problem optimization with Machine Learning is usually done to improve machine learning algorithms.This research applies Vehicle Routing Problem optimization and Machine Learning in multi-class node imbalanced data problem [11].
The use of machine learning on predictive analytics on the vehicle routing problem and to get high quality performance and meaningful information for all [12].In this study, the classification performance of the supervised learning algorithm consisting of K-Nearest Neighbor is the best classification with a percentage of 90 % [13].
In classification, a dataset is said to be imbalanced when there is a class with a smaller amount of data compared to other classes.The class with a larger amount of data is called the majority class while the class with a small amount of data is called the minority class [14].The imbalanced problem in the vehicle route classification process has become a challenge in the classification process and attracted the attention of a number of researchers.
In addition, by optimizing the principles of machine learning theory, these studies contribute to the development of a vehicle routing problem model based on the vehicle route problem so that classification based on the capacity of the number of vehicle routes is required.Unlike the combinatorial method, this method can introduce different processes to generate a capacity model of the vehicle routing problem with machine learning.

Literature review and problem statement
ML approaches can be divided into three categories: supervised learning, unsupervised learning, and reinforcement learning.Supervised learning methods come with input (training) data, including the desired target (response) variable [15].However, further in-depth observations are still needed, and these allow to conduct studies aimed at deepening the data training process using vehicle route distance input data.In [16] the data is «labeled», and the authors propose that machine learning models be used in a way to predict the values of decision variables and to predict branching values for fractional variables.Supervised learning infers a general function, mapping inputs to outputs based solely on training data.This function is evaluated to predict outputs when presented with (new) unseen inputs.In contrast, unsupervised learning tries to «automatically» discover patterns or associations in unlabeled data (here there is no predefined output as part of the input data) [17].Highlight that in the early stages of applying classification by being labeled is a priority for the author, at this early stage it makes it easier to train and test data so that classification with machine learning is formed.
The K-nearest neighbor algorithm is one of the most popular data mining approaches.The main premise is that if all K nearest neighbors of a point in the training set belong to the same category, then it can be assumed that the point has the same properties and qualities [18].By using raw data features, KNN can avoid difficult equation solution procedures and focus on correlation.The performance of the prediction results can be determined by using KNN from the training set [19].Exploring the value of K is a priority in testing the capacity model of the vehicle routing problem is the author's concern to test the value of K.The value of K tested will determine the accuracy of the vehicle route visualization.
This method works by applying the most similar past examples to new data.The Euclidean distance, for example, can be used to determine how similar two records are.Once the neighbors have been found, the average can be used to make a summary prediction [19].Testing with distance is the author's concern so that in the initial stage determining latitude and longitude then determining the Euclidean distance with the Euclidean Distance method with this stage will produce the next route point in the delivery of goods.
K-nearest neighbor algorithm is one of the most popular data mining approaches.Its core premise is that if all of a point's K closest neighbors in the training set belong to the same category, it may be assumed that the point has the same traits and qualities [20].In [21] this study proposes an efficient classification approach that combines opposition-based learning and k-nearest neighbor to reduce feature selection and classification.The proposed algorithm aims to overcome the drawback of data imbalance.Emphasizing that machine learning with the K-Nearest Neighbor method still has problems in calcifying unlabeled data, so in this study the initial stage determines the label and determines the latitude and longitude distances.The testing stage also requires evaluation of the K value so that it will produce a distance that matches the initial vehicle route point or depot.
The KNN regression model operates by determining the distance between a new observation and all the observations present in the training data.The most common distance metric used is the Euclidean distance, which is calculated as displayed in equation [22].However, the distance matrix is still a concern because data testing will produce distance points from each route data.The Euclidean distance matrix is one of the tests that can be implemented and produces visualization of vehicle route data.
In addition, by optimizing the principles of machine learning theory, this research contributes to the development of a capacity model of the vehicle routing problem with machine learning where this concept will classify vehicle routes based on delivery capacity.The customized phases of the research, namely beginning, middle and end, have shown promising results in improving the classification process.Unlike the vehicle routing problem method, this model introduces the process of determining vehicle routes based on the initial stage of determining latitude and longitude, the middle stage determines the route point label and capacity using the K value in K-Nearest Neighbor.Then the final stage performs visualization with the vehicle routing problem capacity model with machine learning.

The aim and objectives of the study
The aim of this research is to optimize the combinatorial vehicle routing problem using machine learning approach.This optimization will improve the performance and accuracy of the vehicle routing problem model.
To achieve this aim, the following objectives are accomplished: -to build a vehicle routing problem model with machine learning in vehicle routing problems; -to evaluate the performance of the machine learning approach to find the optimal number of K; -to visualize the modelling of the vehicle routing problem by determining the maximum number of goods to be transported in order to better route the vehicles.

Materials and Methods
The theory of combinatorial Vehicle Routing Problem is currently being developed for various research purposes, but there is still no available reference related to the application of Vehicle Routing Problem model with Machine Learning approach.This algorithm model aims to solve the problem of vehicle routes in the delivery of goods, by applying based on the capacity of the number of goods transported from the depot point to the destination location of the goods.Taking inspiration from the concept of CVRP, an effort will be made to adapt it, to find the best, most optimal, efficient, and effective in the Vehicle Routing Problem model.The inspiration of this theory is illustrated in Fig. 1.
Methodology in testing to get a model of the results of optimization of vehicle route data imbalanced using Vehicle Routing Problem and Machine Learning so that in research built a model of optimization of computer science analysis.In the data collection research used JNE vehicle routes in the city of Medan in the delivery of goods in 2023 with several variables in determining the vehicle route.This research conducted tests using the values of K = 11, K = 13, K = 15: -datasets.
In this study using data on JNE vehicle routes in Medan city in delivering goods in 2023 with several variables in determining vehicle routes.The selected dataset includes the recipient's name, recipient's address, sub-district, latitude and longitude.A detailed description of this dataset can be seen in Table 1.The design of the vehicle routing problem model with machine learning follows the core structure of the algorithm.The steps involved are as follows: Step 0: determine all latitude and longitude distance data by (1).
Step 1: assign all distances based on the delivery depots according to the delivery distance with (1)-(3).
Step 2: calculate each data by ( 4)- (8).The KNN regression model operates by determining the distance between a new observation and all the observations present in the training data.The most common distance metric used is the Euclidean distance, which is calculated as displayed in equation [22]: Euclidean Distance is used to measure the level of distance similarity between data with the Euclidean formula.Description: d(xi,yj) -distance, p -number of data, Xik -attribute of the to i data, Xjk -attribute of the to i cluster center; -vehicle routing problem.
A CVRP can be formulated as a linear integer programming model.The total distance of the route, where all costumer demands are met, should be minimized.The binary variable xijk has a value of 1 if the arc from node i to node j is in the optimal route and is driven by vehicle k [23].
At this stage, the linear integer programming model formula is used, this step is used to determine the depot or coordinate point of the vehicle route.
Whereby, there is no travel from a node to itself: Implementation of formula (3) to determine the node point from the node to itself.
The parameter dij describes the distance from node i to node j.There are n nodes (depot = 1) and vehicles.The p objective function can be formulated as follows: The use of formula (4) to explain the distance from the node or depot to the node to be reached; -vehicle leaves node that it enters.Every node should be entered and left once (expect for the depot) and by the same vehicle.The depot should be left and entered once by each vehicle.qi describes the demand of each costumer and Q is the capacity of the vehicles.The sum of the demands of all costumers that vehicle k will serve, should not exceed the capacity of vehicle k.
Ensure that the number of times a vehicle enters a node is equal to the number of times it leaves that node: -ensure that every node is entered once: Together with the first constraint, it ensures that every node is entered only once, and it is left by the same vehicle; -every vehicle leaves the depot: Together with constraint 1, it is known that every vehicle arrives again at the depot; -capacity constraint.Respect the capacity of the vehicles.Note that all vehicles have the same capacity: The above constraints are formulated in the Common Constraints and Variables section in the Cavacity Vehicle Routing Problem Library.

Results of Vehicle Routing Problem Optimization
with Machine Learning

1. Adapting the Concept of Model Vehicle Routing Problem with Machine Learning
In this section, it is explained the results of research and at the same time is given the comprehensive discussion.Results can be presented in figures, graphs, tables and others that make the reader understand easily [24,25].The discussion can be made in several sub-sections.The following flowchart is shown in Fig. 1.
In Fig. 1 the research flow chart generally describes the research to be carried out.The following is an explanation of the flow chart: 1. Data collection: collecting data on the route of goods distribution vehicles for the last 3 years.
2. Data preprocessing: data cleaning is the process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in a dataset.
3. Model building: building an optimization model of imbalanced vehicle route data using the Vehicle Routing Problem approach with Machine Learning based on the specified parameters.
4. Model evaluation: evaluate the model using evaluation matrix such as: accuracy, precision, recall, and F1-Score.
5. Result analysis: analyzing the results of testing and model evaluation to determine the effectiveness of using Vehicle Routing Problem with Machine Learning in optimizing imbalanced data.
6. Obtain the results of optimization of vehicle route data imbalanced.
7. Obtain a novelty model from the use of Vehicle Routing Problem and Machine Learning.The research methodology is shown in Fig. 2.
The research flow scheme consists of several stages, each stage has a goal that must be achieved to meet the milestone so that the research can continue.The stages of the research flow are as follows: -data understanding.This data understanding is carried out in the process of collecting vehicle route data and conducting data analysis and data quality evaluation to recognize further data and seek initial knowledge.The data source used in this research is the route data of JNE goods distribution vehicles every day so that it reaches the recipient of the goods; -data preprocessing.Data preprocessing is the initial process that will transform the input data into data with a suitable format and ready for processing.Some examples of things done in preprocessing include various necessary processes such as: merging, deforming, reducing and discretizing; -modeling machine learning with vehicle routing problem.This modeling is carried out according to the modeling technique chosen, which will be applied to the preprocessing dataset to address the appropriate needs.The modeling technique in this research uses classification techniques, using cavacity vehicle routing problems with machine learning.

2. Evaluating the performance of Vehicle Routing Problem with Machine Learning
In applying the determination of vehicle routes with the vehicle routing problem capacity model, data on goods distribution routes is used by determining 21 labels and 2 variables and 1 output.At this stage, data understanding is carried out in the process of collecting vehicle route data and analyzing data.
At this stage the researcher starts by collecting initial data and then cleaning the data and combining it into one dataset, the data that has been collected will be examined and this stage provides an analytical foundation for a study by identifying potential problems in the data.The data in this study was obtained from the JNE freight forwarding dataset with 5 attributes and 3000 data per day.This data can be seen in Table 2. Next is the result of data visualization based on longitude and k value shown in Fig. 3.
The visualization of Fig. 3 shows the results of the data that has been processed in determining latitude and longitude.
The results of Fig. 3 show the results of data cleaning and then the location data of the area based on google maps that will be sent goods with visualization of latitude and longitude data from the delivery point.

3. Visualization in modelling cavasity vehicle routing problem with machine learning
The visualization stage is carried out data preprocessing or an initial process that will convert input data into data that is in a format that is ready for processing.Preprocessing is the process of preparing raw data to be used in the data transformation process into the data format needed by users.The next process before the algorithm model is created is data preprocessing.In this research, the preprocessing techniques used are: cleansing, data aggregation, data checking, checking missing values.The data generated from the JNE dataset still contains imbalanced data that will interfere with the vehicle route classification process, so preprocessing is needed to filter and clean it.The following are the results of the preprocessing process.The visualization of k = 11 is shown in Fig. 4 and k = 13 in Fig. 5 and k = 15 in Fig. 6.
The visualization results in Fig. 4 The test results can be shown in Table 3.
Based on the test results in Table 3

Discussion of the results optimization vehicle routing problem with machine learning in imbalanced classification vehicle route data
The study results from optimizing the route data of JNE goods delivery vehicles in the Vehicle Routing Problem with a Machine Learning approach with several stages: Data cleaning, Data understanding, and Vehicle Routing Problem Model with Machine Learning.
In this study, it was found that modelling with a machine learning approach in the classification of imbalanced vehicle route data can produce latitude and longitude as shown in Fig. 3.With Machine Learning K-Nearest Neighbor modelling in the classification of locations based on sub-district areas.Point K from the classification results as a point of determining the vehicle route from the depot to distribute goods.In Fig. 3, the classified area has been visualized so that the delivery route will be visually optimal in distributing goods based on the closest distance.
The test results of the machine learning approach shown in Table 3  making a good classification of the data.This research demonstrates that the combinatorial problem approach to the vehicle routing problem of goods distribution.Further research and experimentation will be needed to explore the combinatorial problem of vehicle routing.Overall, the results obtained from optimizing the vehicle routing problem with machine learning.Further study and comparison with existing research can provide a more comprehensive insight into the potential advantages and limitations of the Vehicle Routing Problem in optimizing the capacity of the number of goods transported with a machine learning approach for different types of data, this research is also relevant to research [3] a method to determine the optimal Cavacity Vehicle Routing Problem using classification techniques.In this research, the machine learning model is used in two ways, to determine the vehicle route point and for the classification of vehicle route data based on the capacity of the carrying amount and research conducted by [4] about modeling machine learning with vehicle routing problems.This research has differences with previous research, in this study the authors conducted modeling with the classification of vehicle route data with machine learning based on the capacity of the number of vehicle routes from the depot with the vehicle routing problem.Modeling Cavacity Vehicle Routing Problem with Machine Learning has accuracy: 0.94 % precision: 0.93 %, recall: 0.98 %.Compared to previous research only has a percentage 0,80 % [5].
The limitation of this research is that the tests carried out only use data on the distribution of goods based on the reci pient's address in the form of distance data but not in the form of realtime data from the location based on the recipient's maps.The shortcomings of this research data are that it is expected that in the future it will be discussed further with various types of datasets, such as realtime vehicle route data and other datasets that can be further developed.

Conclusions
1.The proposed model, which allows for the classification of vehicle route data based on the delivery carrying capacity of the delivery depot point.In this model unlike existing models that determine vehicle routes based only on the point of delivery location.In addition, the dynamic concept of the model is applied based on the number of goods carrying capacity parameters.
2. The results of testing the vehicle routing problem model with machine learning have a percentage value of K = 11 accuracy of 57.3265 % and K = 13 accuracy of 57.3265 % and K = 15 accuracy of 81.8645 %.Based on the test results the value of K = 11 has a better percentage of accuracy.To determine the effectiveness of the proposed method.The test results show standard optimization range average Accuracy = 93.80%, Precision = 93.31% Recall: 93.80 %.
3. Data visualization of the model is presented using 3000 daily delivery vehicle route data by testing odd K values in the K-Nearest Neighbor classification.This classification presents the value of K = 15 has an accuracy of 81.8645 %.

Fig. 1 .
Fig. 1.Research Flow Chart -6 are based on the equation steps of (1)-(8): -(1) by determining the distance based on latitude and longitude; -(2) determines the distance based on the depot; -(3) to determine the amount of transport capacity of goods delivery vehicle routes.

Table 1 Data
Collection of Vehicle Route Data

Table 2
Vehicle Route Dataset

Table 3 Testing
Results Model K Value With VRP