Correlations in Spatiotemporal Headway Dynamics of Road Traffic Using Extremely Accurate Microscopic Empirical Data

As we recently showed [1] by using empirical data there is a certain behavior referring to the development of the headway between two consecutive human driven vehicles. Following on from this, we investigate correlations of the change in temporal headway over two subsequent road segments as the main goal of the present work and found a strongly correlated behaviour for increasing temporal headways. In this way a strong improvement for short-term prediction algorithms of conventional road users should be achieved. A stationary infrared-based sensor system was developed for this purpose, which has been mounted at reflector posts next to an urban street over a distance of about 50m. Due to its good accuracy, we are able to resolve vehicle following times down to 25 milliseconds and to determine speeds more precisely. In 45 hours of measurement the system detected over 20,000 passing vehicles.


Introduction and Related Work
Safety, energy efficiency, and comfort are three of the most important issues related to modern and upcoming road traffic [2]. In the course of ever increasing automation, the energy consumption per vehicle will further continue to decrease [3]. A system that will consist only of connected and automated vehicles (CAVs) is comparatively easy to handle with regard to these points, since individual driving behaviour can be electronically Collective Dynamics 5, A81:1-17 (2020) Licensed under coordinated either by an external source or by the vehicles themselves. Hence, different situations become predictable leading to an efficient, crash-free use of the road network. In this way, impending traffic disruptions can be counteracted in advance through targeted, collective action using vehicle-to-vehicle (V2V) [4] or vehicle-to-infrastructure (V2I) [5] communication. Even an increase in traffic flow and road capacity is expected due to a higher influence of automated vehicles as simulations show [6][7][8].
Before the phase of fully automated traffic occurs in the distant future, however, there will be the much more challenging phase of partially automated traffic between automated and human driven vehicles. These new circumstances call for novel approaches with regards to intelligent transportation systems (ITS) aiming at a decrease in individual travel time finally [9]. Especially urban situations, which require frequent interaction between machines and human beings, will be difficult to handle. Thus, it becomes necessary to get into the behaviour of conventional road users and make it as predictable as possible for upcoming technlogies. In order to capture different traffic scenarios with human driven vehicles involved, the on-board sensors of an automated vehicle can be supplemented by V2I sensor systems, espeacially if the direct view is obstructed through obstacles like buildings or other vehicles. The collected data yield short-term predictions and thus enable the CAV to operate efficiently, while disturbing the ongoing traffic as little as possible [1]. It is particularly important to minimize the risk of any dangerous situations including crashes. If a situation should ever develop contrary to the calculated scenario, the driving strategy of the automated vehicle must of course change in an appropriate way. Even continual changes may become necessary to meet the required conditions in terms of safety, energy efficiency and comfort. The actual prediction can be realized by using different motion models [10], taking into account real-time information of the locations and speeds of the relevant road users. With the help of a dedicated algorithm, the CAV can plan and finally follow the most promising trajectory [2,11]. However, it should be noted that such a driving manoeuvre and the associated success depends crucially on the quality of the used model as this is seen as a key feature in terms of robotic architectures [12]. In this paper we provide microscopic empirical data which is the basis for a realistic prediction model. The use of the Pearson correlation coefficient in the context of vehiclevehicle interaction over subsequent road sections leads to a quantitative estimation of human behaviour in terms of driving motor vehicles. The recorded traffic data should help to select the right model approach and to calibrate it correctly. To the best of the authors' knowledge, there is no comparable work that deals with microscopic correlations regarding vehicle following behaviour and temporal headways. However, there are publications that deal with observables of static origin only like the distribution of headways (e.g. [1,13]). In the mid-1970s, Cowan was the first to came up with an empirically confirmed result [14]. His famous so-called M3-distribution is still well known in transportation science today. On the other hand there are approaches analyzing spatiotemporal patterns in traffic and their correlation within an entire road network [15,16]. For these purposes microscopic models are usually used, mostly based on classic models like cellular automata (CA) [17] or car-following models [18] in order to obtain macroscopic observables finally. However, this does not mean that correct microscopic results Figure 1 Exemplary application of the results in this work. T-intersection with common road traffic on its major street. A connected and automated vehicle (red arrow) is approaching from below and tries to merge into the ongoing traffic (blue marked headway). The question is how the marked headway has developed, when reaching the intersection point. Due to the view obstructed by the buildings (hatched boxes), an infrastructure sensor system is required to provide the headway data from the major street to the CAV. [1] also can be derived by this procedure as no classical model is able to reproduce real world vehicle-following behaviour in an appropriate way [19,20].

Application
The actual application of this work is illustrated in Fig. 1. An connected and automated vehicle (red arrow) is approaching a T-intersection without traffic lights. As the CAV is driving on the minor road, it must give way to conventional traffic on the major road before turning right. Due to the infrastructure sensors installed along the major road, the CAV is informed about the microscopic traffic situation already before it reaches the intersection area. A large set of previously collected traffic data from there, allows a statistical analysis regarding certain patterns in human driving behaviour. Based on this historical data, it is now possible to make a statistical short-term prediction of the vehicles on the major road considering the current vehicle configuration. The CAV can therefore adjust its driving strategy in advance in such a way that it can merge into a suitable gap (headway) among the ongoing traffic, preferably without stopping at all. Nevertheless, it has to take into account possible changes of the target gap during its approach to the intersection and has to adapt its speed according to arisen circumstances continuously. Even stopping may be necessary, if the merging manoeuvre cannot be completed safely, i.e. crashes cannot be almost 100% excluded. This strategy is expected to reduce stopping manouvres compared to conventional road user, but indeed is not able to totally avoid them. All in all, this technology yields great advantages not only in terms of safety, but as well in terms of energy efficiency and passenger comfort. The present paper firstly deals with the collection of necessary traffic data and secondly provides a mathematical statistical analysis using correlations with regard to the spatio-temporal prediction of headways between consecutive human driven vehicles.

Measurement
With the premise of obtaining extremely accurate microscopic data, the measuring system from our previous work [1] has been improved by replacing the infrared-based sensors by even more reliable lidar sensors. In this new hardware configuration the current sensor value can be read by the microcontroller at least every 25 ms and is then saved together with a time stamp to the internal flash memory. Beside the higher sampling rate, these new lidar sensors also deliver a much better detection rate, and a lower susceptibility to external interferences at the same time. Another difference compared to the previous work is, that only three of the four available sensors are needed here with regard to the current application. Three sensors mean two resulting sectors between which the addressed correlation is investigated to keep things as simple as possible. In [1] we were mainly interested in the change of the temporal headway ∆τ over different distances, where the consideration of several sectors is more expedient. The measurement has taken place on an urban single-lane road in Duisburg, Germany with a speed limit of 50 km/h, where there is no influence on passing vehicle e.g. through traffic lights, bus stops, speed bumps, or speed cameras. As a consequence, all road users are able to choose their personal driving speed freely taking into account the speed limit and the current traffic situation. During the four days of measurement in September and October 2019, there have been similar weather conditions with dry roads and a mix of cloudy and sunny periods. A total of exactly N = 20, 018 vehicles were detected within 45 hours of measurement in order to enable a statistical data processing afterwards. The traffic on the concerning road is dominated by cars. Only a small proportion of the passing vehicles are made up of trucks and buses. Please refer to [1], if there is interest in the traffic volume according to individual vehicle classes, since another section of the same road was studied there.

Measuring System
Three measuring units including one lidar sensor each make up the measuring system. As Fig. 2 and 3 show, they have been attached to reflector posts at positions P 1 , P 2 , and P 3 . The distance between any two reflector posts i and j at positions P i and P j is called ∆s i j . In the present study the distance between adjacent reflector posts is about the same. It applies ∆s 12 ≈ ∆s 23 ≈ 23.5 m.
To avoid the detection of passing vehicles from the opposite lane, the sensors are attached at a certain height and aligned down to the lane of interest (see Fig. 3). All three measuring units are identical and are running independently of each other during the whole measurement. Since we are investigating a single-lane road, it is not necessary to consider the exact lateral position of passing vehicles, only the longitudinal position is relevant here. However, during the evaluation, the recorded data of the single units is finally merged. For this purpose it is necessary to synchronize the system times of all units involved before the actual measurement takes place. This has been achieved by using three extremely accurate clock modules, which must be set to the same system time first. After the measurement has been successfully completed, the clock modules have to be checked electronically for the exact amount of their temporal deviation. As this deviation shows  Table 1 Calibration process driving at a defined vehicle speed v (c) = 52 km/h. The average transit time between adjacent reflector posts i and j at positions P i and P j (see Fig. 2) is called ∆t i j and the corresponding standard deviation σ (c) i j , respectively. According to Eq. 1 the specified sector length ∆s i j is finally calculated. a linear behaviour in dependence of time, it is thus possible to eliminate the resulting systematic error in retrospective. To ensure that the flowing traffic behaves as usual, the measuring units are not recognizable for passing drivers, because the lidar sensors are mounted on the back side of the reflector posts. Every time a vehicle passes one of the relevant positions P i on the lane of interest, a characteristic signal drop occurs. This is due to the fact that the lidar sensors in this setup are actually used for distance measurements. The internal electronics determine the time delay between the transmission of a infrared laser signal and its reception after reflecting off of a target. This information is calculated together with the known speed of light and is finally translated into a distance. A low voltage output corresponds to a low distance (a passing car, compare to Fig. 3 (a)) and a high voltage output to a high distance (the road, compare to Fig. 3 (b)). Please refer to Fig. 4 for a typical example of the signal shape.

Calibration Method
After the measuring system has been successfully installed and has taken up its function, it now has to be calibrated. This step is necessary because of the following two reasons. First, to determine the exact distance of the reflector posts yielding the exact speed of passing vehicles and second, to get an idea about the error in measurement. Achieving this goal has been done by passing the concerning road segment multiple times with a car driving at a GPS confirmed calibration speed of v (c) = 52 km/h. In order to make sure that v (c) is both constant and reproducible, the car's cruise control has been activated during the calibration process. As a result the average transit times ∆t  23 serve as a quantity for the error in measurement. Additionally, the small resulting values of the standard deviation confirm that the constant speed using the cruise control of the calibration vehicle can be seen as reproducible. A total of ten calibration runs were carried out per measurement day. A summary of all corresponding parameters can be seen in Tab. 1.

Data Processing
From the averaged transit times ∆t To obtain finally the average speed v n i j (short: v) of a particular vehicle n within the sector ∆s i j using the corresponding transit time ∆t n i j , we write: The vehicle following time T n i (short: T ) for two consecutive vehicles n and n + 1 is calculated as follows: where t n i describes the signal start time of vehicle n at position P i (see Fig. 2). The temporal headway τ n i (short: τ) is derived from this by subtracting the corresponding signal duration d n i (short: d) from the preceding vehicle n: Consequently Eq. 4 describes the temporal headway in front of vehicle n. In Fig. 4 the difference between T and τ is illustrated with an exemplary signal sequence. Finally, the change in temporal headway ∆τ n i j (short: ∆τ) between two consecutive vehicles n and n + 1 over the defined distance ∆s i j is regarded: Hence, a positive value of ∆τ corresponds with an increase, whereas a negative one corresponds with a decrease in temporal headway τ. Last but not least, the correlation behaviour between the change in temporal headway ∆τ n i j within the adjacent road sectors ∆s 12 and ∆s 23 is presented. It has been empirically shown [1] that ∆τ n i j is sufficiently normally distributed to justify the following approach. The Pearson correlation coefficient c (first found by Francis Galton in 1888 [21]) is a statistical variable measuring the linear correlation between two series ∆τ ∆τ ∆τ 12 and ∆τ ∆τ ∆τ 23 of length N. For a detailed mathematical view please refer to [22]. From this the following N value pairs are formed. However, in order to apply the Pearson correlation coefficient the right way and to obtain values between c = −1 (perfectly anti correlated behaviour) and c = 1 (perfectly correlated behaviour), it has been necessary to normalize the data series ∆τ ∆τ ∆τ 12 and ∆τ ∆τ ∆τ 23 with its elements ∆τ n 12 and ∆τ n 23 , respectively, first: where ∆τ ∆τ ∆τ i j describes the mean, and σ (∆τ ∆τ ∆τ i j ) the standard deviation of the series ∆τ ∆τ ∆τ i j . To calculate the Pearson correlation coefficient between the normalized series ∆τ ∆τ ∆τ 12 * and ∆τ ∆τ ∆τ 23 * (each of length N) finally, the following formula is applied: c = c(∆τ ∆τ ∆τ 12 * ,∆τ ∆τ ∆τ 23 * ) = ∆τ ∆τ ∆τ 12 * ·∆τ ∆τ ∆τ 23 Please note that N + 1 detected vehicles lead to N temporal headways τ to be included in the calculation as the last vehicle n = N + 1 has no vehicle ahead.
Each of these contains N k elements. It is obvious that for the used dataset applies N > N k . If we now want to calculate the Pearson coefficient c of the concerning headway values τ = τ 1 within a particular interval k, Eq. 8 changes to In other words, the data is first sorted according to different temporal headways τ = τ 1 detected at position P 1 . All those vehicles whose temporal headways τ 1 fall within the relevant interval from Eq. 10 are used for further processing. The change in their temporal headway ∆τ 12 in sector ∆s 12 is then compared with that (∆τ 23 ) in sector ∆s 23 (see Fig. 2). From that the correlation coefficient according to Pearson is finally calculated as shown in Eq. 11. All in all, this method leads to a much more accurate prediction of the most probable headway behaviour, finally. Another method to show the relationship of the two series ∆τ ∆τ ∆τ 12 and ∆τ ∆τ ∆τ 23 with length N (or with subsets of these according to Eq. 9) is to plot them in a common scatter diagram. In this way, the (linear) correlation between two variables can be visualized. This is achieved by entering all N (N k ) pairs of elements {∆τ n 12 , ∆τ n 23 } {∆τ n 12 , ∆τ n 23 } ∈ τ − (k), τ + (k) as points in a common diagram using Cartesian coordinates. It is important to see that despite the same symmetric shape of the marginal distributions, the scatter diagram can vary in appearance depending just on the correlation coefficient c. For values around c = 0 (no or very less linear correlation) the set of dots in the diagram looks circular, whereas for increasing positive (or negative) values of c the set of dots becomes more and more ellipsoidal. The difference is that for correlated data (c > 0), the slope of the main axis of this ellipse becomes positive and for anti-correlated data (c < 0) negative.

Results
The following part is splitted up into two subsections. Beside the empirically determined speed and temporal headway distribution in Sec. 4.1, the main goal of this work is presented in Sec. 4.2. The latter includes the visualization of the correlated ∆τ-data using scatter diagrams. Additionally, a sample of the underlying symmetrical marginal distributions is illustrated. Last but not least, the development of the respective Pearson correlation coefficient for different temporal headways τ is shown and quantified by using a non-linear fit function.

Speed and Headway Distribution
In order to give the reader further information about the properties of the recorded vehicle data, the empirical speed ( Fig. 5(a)) and the temporal headway distribution (Fig. 5(b)) are shown in advance. Due to the fact that there are three measuring units installed along the street, two average speeds v per vehicle are available, one for each road sector ∆s 12 and ∆s 23 . For more details regarding the averaging process, please refer to Sec. 3.3. As the two single speed statistics are not exhibiting significant statistically deviation among themselves, they are displayed in a common histogram. The same applies for the temporal headway τ. Because this observable is measured at three positions P 1 , P 2 , and P 3 , there are even three values available for each passing vehicle contributing to the respective histogram.

Correlations
This chapter is devoted to the empirically measured correlation between the change in temporal headway ∆τ 12 and ∆τ 23 within the two adjacent road sectors ∆s 12 and ∆s 23 (see Fig. 2; for the formal defintion of ∆τ please refer to Eq. 5). As explained in Sec. 3.3, the vehicle data are divided into the intervals from Eq. 10 according to different temporal headways τ (measured at P 1 ), first. Subsequently, the associated changes in temporal headway ∆τ 12 and ∆τ 23 are plotted over each other in a scatter diagram (Fig. 6). This is done for the first nine τ-intervals. Based on this visualization, an increasing trend of the correlation between the two variables ∆τ 12 and ∆τ 23 can be seen with increasing τ. This is   Table 2 Properties of the histograms shown in Fig. 7. The underlying data has been analyzed for its mean µ, standard deviation σ , skewness γ, and kurtosis κ, where µ and σ are also the fit parameters of the respective normal distribution. To differentiate, the fit parameters of the generalized Student's t-distribution are written with a bar on top:μ,σ , andν.
also confirmed by the calculated Pearson coefficient in the top left corner of each scatter plot. From this correlation analysis, it can be concluded that a small temporal headway τ between sucessive vehicles tend to remain almost constant, but is subject to individual, more or less disordered fluctuations. This may be due to the fact that the closer a vehicle approaches the vehicle in front, the less it is able to choose its personal speed. As a consequence, the vehicle will adapt the speed from its preceding vehicle without caring what the precise headway is, i.e. there will be a continuous switching between slight enlargement and reduction of τ leading to a small correlation coefficient c, finally. This is also in line with the three-phase traffic theory [23], in which the temporal headway (time gap) is constantly changing to a certain extent. In terms of larger headways τ (large c), where there is no disturbing vehicle directly ahead, another effect is observed. The driver is now able to maintain the selected state of motion over greater distances without constantly switching between slight acceleration and decelerating. As a result, a To get an idea about the related error in measurement of all nine shown diagrams, a black (25 ms × 25 ms)-box named "Accuracy" has been added to the bottom right one. The actual accuracy of the microcontroller is around 1 ms. However, due to the fact that the sensor can only be read out every 25 ms, the error box must be adjusted accordingly. The associated Pearson correlation coefficient c is displayed in the top left corner of each plot. As one can already see here, c shows an increasing trend with increasing temporal headway τ. In order to quantify this fact, the related fit function c(τ) can be seen in Fig. 8. The axis labeling refers to all nine plots. All times are given in milliseconds (ms).
systematic change in temporal headway takes place, i.e. the gap becomes systematically larger or smaller. This in turn corresponds to a high positive correlation c between the two values of ∆τ 12 and ∆τ 23 , which in this case are predominantly either both positive or both negative. At this point the associated marginal distributions of ∆τ 12 and ∆τ 23 for the first three scatter plots from Fig. 6 (columnwise) should be presented to give the reader an idea about their shape. They are shown in Fig. 7. The best probability density function (PDF) to fit these histograms turned out to be the generalized Student's t-distribution (dashed blue line) taking into account the heavy tails of the underlying data. For comparison, the normal distribution is plotted to the data as well (red line). Please refer to App. A in order to see the exact mathematical form of the PDFs. All related parameters can be found in Tab Marginal distributions (columnwise) of the first three scatter plots in Fig. 6. Each plot shows the empirically determined histogram of the change in temporal headyway ∆τ 12 (first row of the plots) and ∆τ 23 (second row of the plots) within the first sector ∆s 12 and second sector ∆s 23 , respectively. The best probability density function (PDF) to fit the underlying histograms turned out to be the generalized Student's t-distribution (plotted in dashed blue). For comparison, the normal distribution is plotted to the data in red additionally. All related statistical parameters are indicated in Tab. 2.
process (see Tab. 1). As expected, the width of the ∆τ-distributions (standard deviation σ ) in Fig. 7 increases with the temporal headway τ, which probably can be explained due to a greater scope for action of the respective vehicle. In addition to the scatterplots in Fig. 6, it is also of interest to quantify the development of the Pearson correlation coefficient c for different headways τ. This is achieved by calculating c for the first thirty intervals of τ from Eq. 10. A fit function of the form c(τ) = −a · exp(−b · τ) + c 0 (12) has been applied to this data, where a and b are curvature parameters and c 0 the asymptotic parameter (upper boundary). Using the assumption c(τ = 0) = 0 leads us to the final and simplified expression The resulting plot is shown in Fig. 8. Obviously, the course of the underlying data points exhibits an increasing asymptotic behaviour, which justifies the use of the fit function according to Eq. 13. Based on the present data, the value of τ, from which the correlation is assumed to be constant and saturated, can be set somewhere between 3000 ms and 5000 ms. These information enable a statistical short-term prediction for the spatiotemporal development of the temporal headway τ. By implementing an appropriate algorithm using the current vehicle configuration of a major street, an automated vehicle can adapt its speed in advance in order to merge into the ongoing traffic as efficiently as possible (see Fig. 1).

Conclusion and Outlook
In the coming decades, road traffic will be increasingly dominated by automated vehicles. In order to enable the smoothest possible interaction between them and conventional vehicles, a number of conditions must be fulfilled. One of the most important points in this context is a good estimation of the behaviour of human-controlled vehicles through associated algorithms. Especially in terms of lane changing and merging maneuvers at intersections (see Fig. 1), short-term predictions of conventional vehicles become indispensable with regard to a fluent traffic flow on the one hand, and indivdual safety, energy, and comfort conditions on the other hand. Therefore, this paper deals with a statistical analysis of the headway behaviour between consecutive human driven vehicles. As already shown in our latest paper [1], it can be confirmed once again that the change in temporal headway ∆τ increases with a growing temporal headway τ between consecutive vehicles (see Fig. 7). In the present work, this value was measured over a fixed distance of about 23.5 m. The associated probability density function (PDF) best matching to the ∆τ-data is a generalized Student's t-distribution, since it takes into account the heavy tails of the underlying histogram. For a simple mathematical treatment, however, even a nor-mal distribution may be justified in order to approach the data satisfactorily. When the road is splitted up into two adjacent sectors of the same length, in which ∆τ is measured each, it is possible to gain even more information of the headway behaviour of two consecutive vehicles. This is done by determining the Pearson correlation coefficient c between the concerning ∆τ-values for different temporal headways τ. In this case, the empirical distributions of ∆τ from Fig. 7 serve as the respective marginal distributions of the related scatter diagrams shown in Fig. 6. Finally, the development of the Pearson coefficient c itself has been quantified according to different temporal headways τ (see Fig. 8). It turned out that there is a strongly positive correlated and asymptotic behaviour for large headways (τ > 5000 ms). Between τ = 500 ms and τ = 5000 ms the correlation coefficient rises sharply from c = 0.30 to c = 0.92 and reaches its upper boundary value around there. This characteristic behavior justifies the usage of an exponential fit function as described in Eq. 13 with a boundary value of c = c 0 = 0.943. Due to the high correlation values these results can be used to obtain short-term predictions and answer the question of how the investigated headway τ develops most likely in a hypothetical third sector. Because of the increasing correlation coefficient, the reliability of this forecast increases with the headway between the regarded vehicles. Using this information, an automatic vehicle can now adapt its driving strategy in order to merge as smoothly as possible into the ongoing traffic. Nevertheless, it must always be prepared to react adequately to unforeseen events and, if necessary, revise its chosen parameters as all underlying predictions are of statistical origin.