Use of Big Data for Actualization of Approaches to Road Accident Analysis

The death and injuries of road users is one of the biggest problems that negatively affect the development of society and socio-economic progress. The price of human life is too high to neglect the least opportunity to save it. Therefore, the object of research is the huge amounts of information that modern society generates and which are known under the general concept of Big data. Regarding highways and streets, Big Data means arrays of information about a network of highways and streets, design decisions applied to them, operational status, traffic conditions, interaction of pedestrian and traffic flows and the like.<br><br>The study used Big Data from road owners, suppliers of cartographic and navigation systems, intelligent transportation systems and law enforcement. For each of the Big Data sources, the methods of collection and processing, the scope, degree of selectivity, and accuracy of the measurements are evaluated.<br><br>The results confirm the fact that the main indicator characterizing the influence of road conditions, the technical condition of the car and psycho-physiological factors on the driver is the speed of both individual vehicles and traffic flows over a certain period of time and on a selected section of the road. The proposed approach is based on the fact that speeds with a high degree of reliability can be established using the Big Data in a form suitable for machine processing. Big data is not just a source of information, it allows to track trends, assess risks and make forecasts.<br><br>The obtained results indicate that Big data can and should be used to describe traffic conditions and analyze the behavior of road users, including in order to better understand the interaction of factors in the occurrence of road traffic accidents (RTAs). And also, as far as possible, to prevent emergencies and/or reduce the severity of the consequences of the traffic accident. Thus, Big Data can be used to update the current approaches to determining the concentration of traffic accidents and the existing methods for assessing the impact of road conditions on road safety.


Introduction
According to statistics [1] published by the World Health Organization, in 2016, one person died on roads and streets every 24 seconds in traffic accidents.
The analysis of traffic accidents on a global scale [1] indicates the following (Fig. 1): -road traffic injury is the 8th leading cause of death among the 10 most common causes of death; -road traffic injury is also now the leading cause of death for children and young adults aged 5 to 29 years; -54 % of deaths are vulnerable road users (28 % are motorcyclists, 23 % are pedestrians, 3 % are cyclists); -in countries with high incomes of the population, in which 40 % of the world number of vehicles are registered, traffic accidents cause 7 % of all deaths; -in countries with low incomes, in which 1 % of the global number of vehicles is registered, traffic accidents cause 13 % of all deaths.

ISSN 2664-9969
In addition, according to the estimates given in [2], almost 90 % of all deaths due to traffic accidents occur in middle-and low-income countries.
If to compare the data (Fig. 2) on the number of traffic accidents, for example, in Ukraine and in the Member States of the European Union (Fig. 3), it is possible to see that in recent years the number of traffic accidents not only stopped decreasing, but in some years even tended to growth.
The fact of the onset of such «stabilization» may indicate the need for quality transformations and changes designed to improve road safety. Reducing the risk of traffic accidents requires determination and informed decision-making by senior management of the state, industry, non-governmental and international organizations. It also requires the coop-eration and interaction of specialists and experts, including road designers, vehicle designers, law enforcement agencies, medical professionals, journalists, scientists and educators, public organizations, opinion leaders and individual road users. Powerful information campaigns are important for understanding the importance of this issue and for motivating officials and individuals to take appropriate measures and comply with applicable laws. As well as the adoption of new regulations or amendments to those that have been ineffective.
In addition, updates require approaches to the analysis of traffic accidents on roads and streets, methods for assessing the impact of road conditions, the technical condition of the car and the psycho-physiological state of the driver on road safety.   [5] Electronic copy available at: https://ssrn.com/abstract=3681353 ISSN 2664-9969 Therefore, research aimed at studying the masses of information about roads and traffic flows, tracking trends, assessing risks, and making forecasts is relevant. As well as gaining new knowledge about traffic accidents, which will be used to improve road safety.

The object of research and its technological audit
The object of research is the huge amounts of information that modern society generates and which are known under the general concept of Big data. Regarding highways and streets, Big Data means arrays of information about a network of highways and streets, design decisions applied to them, operational status, traffic conditions, interaction of pedestrian and traffic flows and the like. The acquired knowledge will be used to analyze the traffic accident rate, determine the location of the concentration of traffic accidents (the so-called black dots) in order to create new or improve existing approaches.
Currently, generally accepted approaches to determining the concentration of traffic accidents in the world are reduced to a comparative analysis using the traffic accident rate in road transport, characterized by the number of traffic accidents in which people died or suffered over a certain period of time. Moreover, the severity of traffic accident is determined by the number of deaths per 100 victims, for which relative indicators are used, namely the number of deaths on the roads per 100 thousand citizens (risk) or 10 thousand vehicles (transport risk).
In addition, mainly in the post-Soviet space, the method of assessing the impact of road conditions on road safety, based on the methodology of traffic accident rates [6], is used. The danger degree of a particular section of the road is characterized by a total accident rate equal to the product of individual coefficients that take into account the influence of individual elements of the road or traffic characteristics. Such elements and characteristics, as a rule, include: -traffic intensity; -the number of lanes; -highway width; -shoulder width; -longitudinal slopes; -radii of the curves in the plan; -visibility of the road in plan and in longitudinal profile; -carriageway width of the bridges relative to the width of the carriageway of the road; -lengths of straight sections; -types of intersections or junctions at one level and the visibility of vehicles on them; -distance from the building to the carriageway; -equality of the highway and the coefficient of adhesion and the like. The results of determining the traffic accident rates are drawn up in the form of line graphs.
One of the most problematic places is that most of the current approaches are reactive, that is, their application is due to the increase in traffic accident rate and the severity of the consequences of traffic accidents on a particular section of the road or street. In addition, even a proactive (measure) approach basically boils down to an analysis of the applied design decisions in order to convey the benefits to those which experience of using indicates a positive impact on road safety.
If to take into account the fact that the presence of a deliberately dangerous element/design decision on a road or street section is not always the cause of the traffic accident, there is a need for a multivariate analysis of factors that have a decisive influence on the occurrence of an emergency and traffic accident, as its catastrophic consequences.

The aim and objectives of research
The aim of research is to create a new indicator or index, with which it is possible to assess the level of road safety or updating the existing dependencies.
To achieve this aim, it is necessary to complete the following objectives: 1. Perform a search and determine the factors that have a decisive influence on the occurrence of an emergency.
2. Develop a scientific method for collecting and analyzing data.

Research of existing solutions of the problem
The development of theoretical foundations and modeling of the movement of vehicles and traffic flows has always been given great attention by researchers, for example, in [7,8]. It should be noted that a significant part of such studies considers road safety issues rather onesidedly -from the point of view of achieving maximum efficiency of traffic flows: in the shortest time, along the shortest route and at the highest possible speed.
Research [9] is carried out by analyzing the readings of GPS trackers and related to the study of driving behavior and driving style of 27 drivers on a 10 km long section of the SS106 road in Southern Italy. Based on three characteristic speed ranges, three types of driving behavior have been proposed. It has been found that with age and growing experience, driving behavior becomes safe. In addition, there are times when older drivers prefer low-speed driving, which can also endanger road users. That is, young people with little driving experience are more prone to dangerous driving behavior in terms of exceeding the permitted speed.
Research [10] has a slightly similar goal, but is carried out by interviewing 139 drivers of different age groups regarding their propensity to over speed and generally impulsive decision-making. Among the findings, it should be noted that current approaches to improving road safety, such as education and penalties, are not effective for drivers of «adventure seekers». For these categories, some form of age-related restriction should be applied, such as a differentiated approach to issuing a driver's license.
According to the generalized research results [11,12], the main reason for the occurrence of traffic accidents is the human factor ( Fig. 4), which may correspond to the dry wording in the traffic accident protocol: «the driver did not take into account the traffic situation and did not choose a safe speed».
Studies [13] conducted in Denmark regarding 291 traffic accidents that occurred over more than 10 years show the following distribution of factors that caused them ( Table 1).
The research [14] is based on a huge database of continuous 3-year observations using DVRs and sensors for more than 3,500 drivers. It presents conclusions that indicate that the driver's distraction to extraneous irritants ISSN 2664-9969 or actions (Table 2), especially as a result of the use of personal electronic devices, caused 618 (68.3 %) of 905 traffic accidents with injuries and/or property damage. Thus, the elimination of drivers' distraction factors can prevent the occurrence of 4 out of the 11 million traffic accidents that take place in the USA annually.  Many years involving a significant amount of scientific research [15] indicate that the successful implementation of the latest auxiliary systems for drivers, dynamic traffic management systems (intelligent transport systems, ITS) or improvement of road structures largely depends on how much road users can or want to perceive these innovations. To do this, it is necessary a knowledge about the performance and behavior of people in complex and dynamic environments, since road users are the final consumers of ITS. At all stages of the development of ITS (from conceptual design to large-scale implementation), empirical research is necessary to study issues such as driver behavior, human-machine interaction, workload, and acceptability.
These findings are correlated with the Vision Zero philosophy, which has become widespread in many countries of the world and consists in the fact that by creating a safe transport system it is possible, if not to prevent the occurrence of traffic accidents, then at least correct the inevitable human errors so that they do not lead death and serious injury.
The studies [16] are based on statistics on traffic accidents that occurred in Poland between 2010 and 2016 and aimed at developing a mathematical model for predicting the number of traffic accidents on a road network. as far as possible, to prevent emergencies and/or reduce the severity of the consequences of traffic accident.

Methods of research
To study Big Data, a comparison method was used. For this purpose, Big Data was classified by source of origin, by the number of parameters that describe the road, driver and driving conditions, by scope and frequency of updates. Special attention was paid to the possibility of checking the reliability of some Big data with the help of others.

Research results
An analysis of the sources of origin (Table 3) shows that a single global source of Big Data on traffic conditions and traffic flows does not exist and, obviously, can't exist. Moreover, in order to obtain a holistic view of the interaction of all related factors and their influence on the occurrence of emergency situations, there is a need for search or development of mechanisms for consolidating the accumulated data. This is done in order to make them as suitable as possible for making managerial decisions to improve traffic safety.
An analysis of the distribution of speeds along the length of the road, combined with an analysis of the traffic conditions they were called up, can be the key to establishing the causes of emergencies. In this case, the fact that the driver's driving behavior is derived from temperament, time spent driving, driving conditions and vehicle characteristics, which is manifested in the manner of control and choice of driving speed, is taken into account.
In addition, driver behavior modeling [8] allows to determine a hypothetical driver comfort zone (Fig. 5), which is determined by the curves of acceptable acceleration forces (when the driver depresses the gas pedal) and deceleration (when the driver depresses the brakes). The origin on the graph in Fig. 5 represents a potential danger or obstacle, and the axes respectively show the distance to this obstacle and the speed of approach to it. Normal braking, as a rule, leads to efforts of the order of 0.3 g, and sharp -0.6 g.
The driver, taking care of its own comfort (and the comfort of passengers), as a rule, seeks to abstain in the comfort zone for as long as possible, avoiding sudden accelerations and situations that may require sharp braking.

Meteorological services l
Weather stations £

Insurance companies £
Legend: l -global data -data collected from the entire road network; £ -partial data -data collected from a separate road or for a limited range of cases; ¤ -fragmented data -data collected about an individual person or group of persons, or data whose receipt depends on the desire of an individual; -no data Fig. 5. Hypothetical driver comfort zone [15] Electronic copy available at: https://ssrn.com/abstract=3681353 TECHNOLOGY AUDIT AND PRODUCTION RESERVES -№ 3/2(53), 2020

ISSN 2664-9969
Thus, analyzing the Big Data, a decrease in the speed of movement should be considered as evidence of the onset of a pre-emergency situation, and all traffic accidents should be taken into account. Since, as noted in [1], speed is one of the main risk factors for road traffic injuries: it not only increases the risk of traffic accident, but also enhances its consequences. Moreover, deliberate neglect of which of the factors, even for the purpose of simplifying or facilitating the analysis, can lead to erroneous conclusions -the so-called survivorship bias -which is unacceptable.
In order to verify, confirm and/or refute the above facts and assumptions in the framework of this study, it was decided to analyze the Big Data of one of the roads within Ukraine as an example. The choice of the highway M-06 Kyiv -Chop (on the metro station Budapest through the cities of Lviv, Mukachevo and Uzhhorod) was carried out on the basis of the following factors: -this is the longest (more than 900 km along with approaches, detours and road junctions) road of Ukraine and an important transport corridor (of national and international significance); -this road passes through 3 of the 4 climatic zones of Ukraine (northern, central and mountainous); -in terms of aggregate characteristics, this road does not belong to high-speed roads according to the European classification [4]. Therefore, the analysis may be useful for researchers, since traffic accidents on precisely this type of country roads (rural roads, non-motorway) in the EU have serious consequences (54 % of all deaths); -on this road as of 01.01.2020, 19 out of 59 all-Ukrainian places of traffic accident concentration are taken into account [3].
6.1. Sites with stable speeds. The movement duration along a route by navigation systems is defined as the sum of the time intervals for the passage of individual sections that make up this route. The basis for the calculations is the data of observations collected from the navigation devices of vehicles that moved these sections. In addition, the results of the initial calculation are constantly adjusted based on instantaneous data on speed and traffic jams.
As an example of a Big Data source -a navigation system provider -the TomTom company was selected as part of this study, which operates with data coming from more than 600 million connected devices from 77 countries (11 million data records daily). This data provides an accuracy level of up to 10 meters and is updated every 30 seconds. False data or data about unusual behavior is filtered out. Data is summarized quarterly.
The results of the separation of the M-06 highway into sections with stable speeds are shown in Fig. 6 Fig. 6. Speed distribution on the M-06 highway from km 14+080 to km 831+711 (the value of division of the kilometer scale depends on the saturation of key points): a -on the section from km 14+080 to km 296+000; b -on the section from km 296+000 to km 561+000; c -on the section from km 5 61+000 to km 831+711; -flow rate; -speed limit; -accident blackspots as of 01.01.2020 TECHNOLOGY AUDIT AND PRODUCTION RESERVES -№ 3/2(53), 2020

ISSN 2664-9969
According to the data source, the entire road is divided into 612 conventional sections with an average length of 1335 m (from 15 to 7624 m), during which the free flow rate, defined as the mid-range span, is stable (from 17 to 115 km/h). The average speed difference between neighboring sections is 5.8 km/h (1 to 45 km/h).
An analysis of the distribution of speeds (Fig. 6) confirms the tendency of drivers to exceed the established speed limits, especially on sections of roads passing through settlements. Only in the highlands of the Carpathians, the geometric parameters of the road do not allow (except for settlements) to accelerate beyond the established limits.
The speed distribution on individual sections of roads can also be obtained by analyzing the data collected by the weighing system in traffic, for example, km 24+130 (WIM1, only towards Kyiv) and km 54+336 (WIM2, towards Kyiv and towards Chop). For approximately the same period of time that TomTom developed (Fig. 6) -the 1st quarter of 2020 -1,691,993 vehicles passed through these 2 weighing platforms in motion (Fig. 7).
Comparison of the mid-range ranges for the TomTom and WIM data are given in Table 4. The difference in speeds between adjacent lanes (Fig. 8-10) averages from 20 to 30 km/h, which in turn can be an additional factor in the occurrence of an emergency when performing maneuvers (advancing, changing lanes, etc.). 6.2. Accident blackspots. By combining data on the distribution of speeds with other data on the road and traffic accident rate, it is possible to trace individual dependencies, to establish which combination of hazardous factors has a decisive influence on the occurrence of each individual traffic accident or the formation of places of concentration of traffic accidents (Fig. 11).
Obviously, most traffic accidents occur precisely in places where the speed drops. And the severity of the consequences is greater, the higher the flow rate or the greater the difference between the actual and allowed speeds.
To understand the reasons that force drivers to change their speed and confirm or refute the distribution of speeds (Fig. 6), it is possible to use the data from the GPS trackers of individual cars (Fig. 12). This study used anonymous data on the movement of 36 cars belonging to enterprises subordinate to Ukravtodor, recorded between December 2019 and March 2020. During this period, they covered a distance of 178,574 km. The value of such data primarily lies in the fact that they are collected from cars driven by professional drivers who knew the traffic conditions on each route well. In addition, the drivers did not know that they were participating in the experiment and, therefore, knowingly or unconsciously, could not affect the quality of the data collected.
The championship among emergency hazardous places in this section of the M-06 highway includes left turns (Fig. 13), and the traffic accident rate on them is proportional to the traffic intensity (Table 5) and speed (Fig. 11).
Traffic speeds at approaches to pedestrian crossings at the same level (Fig. 13) indicate that only traffic management measures (road signs with a border, road markings 1.14.3 and noise lanes) can't achieve a reduction in speed to an acceptable 50 km/h. Even if the driver reacts in time to the appearance of the pedestrian and brakes before crossing, it risks getting into traffic accident as a result of a rear-end collision caused by a less attentive driver (Tables 1, 2).
In addition, emergencies more often occur in places of so-called «increased social activity». For example, in the area from km 89+000 to km 106+000 of the M-06 highway, there is a high concentration of spontaneous trade points. Since improvised stalls are usually arranged by sellers on the side of the road and on the barrier, vehicle drivers slow down by looking at the assortment, and if they are interested in making purchases, they brake and stop the vehicles within the carriageway, often unexpectedly for other road users. to km 290+000; b -on the section from km 290+000 to km 560+000; c -on the section from km 560+000 to km 831+711  Analysis of the records on the movement of individual cars can be somewhat simplified (Fig. 14) if only dangerous accelerations/decelerations are recorded (Fig. 5).
Examples of fixation points for dangerous accelerations and decelerations on the M-06 highway are shown in Fig. 15-18.
For visual and geospatial analysis, Big data can also be superimposed on a cartographic basis (Fig. 19, 20). The distribution of dangerous curves (R<500 m), descents and ascents (more than 40 ppm), as well as individual traffic accidents, allows to trace the dependence of the probability of emergency situations on the geometric parameters of the road (Fig. 16). Thus, Big Data can be used to evaluate design decisions for the construction of roads for traffic safety, evaluate the effectiveness of measures for the organization of traffic, including calming traffic.
Comparison of the traffic accident scene (Fig. 20), recorded by the National Police inspectors and users of the free social and navigation application for Waze mobile devices, once again focuses on the need to fix the spatial coordinates in the traffic accident record card. As well as the need for a separate field of this map, which will indicate the direction of movement of vehicles. This will significantly facilitate machine data processing, since emergency situations can differ in the directions of movement not only on the Ia and Ib category roads.
Summing up what has been written, it becomes clear that the traffic accident is the tragic end of an emergency, the decisive factor in the occurrence of which is the human one. And the higher the speed, the higher the price of a driver's mistake. That is, the absence of the traffic accident on a particular section of the road is not a reason to refuse to analyze the traffic conditions on it.
The use of a kilometer-wide distribution of traffic accidents and taking into account only traffic accidents with victims instead of specifying the places of traffic accidents and taking into account all types of traffic accidents was justified at the previous stages of development. And also in the absence of sufficient computing power, mechanisms and tools for analysis and research.

SWOT-analysis of research results
Strengths. The proposed approach makes it possible to involve in the traffic accident analysis all the information available to the company on roads, streets, road users and traffic conditions. This will allow to consider the smallest aspects and the whole range of factors that cause an emergency to prevent its occurrence or reduce the severity of the consequences.
Weaknesses. It should be noted that the study is aimed at making separate generalizations, determining the vector of scientific research and laying the lines for subsequent research on the development of new or improvement of existing models, methods and algorithms for traffic accident analysis.
Opportunities. The proposed approach avoids the conscious or subconscious influence of road users on the results of observations, which is often impossible to achieve during surveys or simulate emergency situations.
Threats. Access to Big Data is often limited by its owners. In addition, it is not always possible to influence the data collection algorithms in order to adjust them or supplement them with new indicators.

Conclusions
1. With the help of the Big Data, confirmation is obtained that the human factor has a decisive influence on the occurrence of an emergency, and the vehicle speed is the determining factor in the severity of the consequences of the traffic accident.
2. Current methods of collecting, composition and content of Big Data at the moment are sufficient for a comparative analysis and the formation of general conclusions. Therefore, during the study it was found that the speed of movement is a significant criterion for assessing traffic safety. A sharp change in the speed of movement (most often, braking) is an indicator of the onset of an emergency, which is confirmed by the distribution of places of traffic accident concentration. And the traffic accidents themselves have serious consequences precisely on road sections where high speeds are observed or the road infrastructure «provokes» drivers to exceed the permissible speed. It is also established which design and technical solutions used on the roads lead to increased traffic accidents and which contribute to calming traffic. It has been proven that setting speed limits using only road signs is ineffective, as is the use of yellow border signs.
In addition, given the many elements that form the «human factor» or «traffic conditions», when forming an indicator or index with which it is possible to evaluate the level of road safety, certain generalizations can't be avoided. An effective result of further research will be the establishment of critical levels of exposure to each of these elements, as well as the formation of an algorithm that will allow for prediction and timely warning of a dangerous combination of conditionally safe factors.