Lane Formation Beyond Intuition Towards an Automated Characterization of Lanes in Counter-flows

Pedestrian behavioural dynamics have been growingly investigated by means of (semi)automated computing techniques for almost two decades, exploiting advancements on computing power, sensor accuracy and availability, computer vision algorithms. This has led to a unique consensus on the existence of significant difference between unidirectional and bidirectional flows of pedestrians, where the phenomenon of lane formation seems to play a major role. The collective behaviour of lane formation emerges in condition of variable density and due to a self-organisation dynamic, for which pedestrians are induced to walk following preceding persons to avoid and minimize conflictual situations. Although the formation of lanes is a well-known phenomenon in this field of study, there is still a lack of methods offering the possibility to provide an (even semi-) automatic identification and a quantitative characterization. In this context, the paper proposes an unsupervised learning approach for an automatic detection of lanes in multi-directional pedestrian flows, based on the DBSCAN clustering algorithm. The reliability of the approach is evaluated through an inter-rater agreement test between the results achieved by a human coder and by the algorithm.


Introduction & Literature Review
Pedestrian dynamics have been growingly investigated by means of (semi)automated computing techniques for almost two decades, exploiting advancements on computing power, expressiveness of languages and models, sensor accuracy and availability, computer vision advances. We are witnessing a transition from time-consuming manual counting and/or post-processing tasks (e.g. [1], [2]) to computer supported analyses involving automated tracking algorithms such as [3], characterization of the pedestrian population (e.g. group identification [4]), identification of typical trajectories for the definition of origin/destination matrices [5].
A relevant example of useful outcomes of this kind of analyses is represented by [6], which deeply discusses the physics of bi-directional flows of pedestrians in corridor settings: in particular, it is quantitatively observed that the dynamics significantly differ for uni-and bi-directional flows at least for densities between 1 and 2 ped/m 2 , in contrast with previous studies. While this experiment brings relevant findings about the bi-directional dynamics, it does not fully explain how the lane formation phenomenon emerges, how it can converge to a stable state and whether the dynamics differ at a very microscopic level depending on, e.g., the lane width. The dynamism of the phenomenon, on the other hand, increases the difficulty of defining tools or formulas for its analysis. Counter-flow movements represent thus one of the situations that can be further fruitfully investigated by means of computer supported analyses.
Currently, besides intuitive characterizations of the lane formation phenomenon, some mathematical formulations for aggregated analysis of lane formation have already been defined in the literature [7]. A well-known criterion is described by the order parameter [8], which is achieved by superimposing a discrete representation of the observed environment to aggregate the number of pedestrians moving in each direction for each row of the grid (overall direction of movement is supposed to be parallel to the xaxis of the grid). Values of this observable close to 0 indicate overall chaotic dynamics in the analysed time window. Vice-versa, an order parameter equal to 1 means that the dynamics is perfectly ordered and each row of the grid only contains pedestrians moving in the same direction. This metric, however, is of limited applicability due to the need of discretizing the analysed environment and assuming that lanes and flows are perfectly aligned with the corridor.
Another kind of analysis of the lane formation phenomenon is based on the notion of rotation, or turbulence of trajectories within a given time interval [9]. This aggregate observable increases its value with the number of changes in direction of tracked pedestrians and it can be considered an aggregated indicator of the quantity of head-on conflicts in the observation.
These mathematical formulations, on one hand, are already able to quantitatively describe the dynamics observed in controlled situations (e.g. bidirectional flows in straight corridors), but they provide aggregated indicators not actually very informative on the number of lanes and their relevant features, such as width, number of included pedestrians, duration of the phenomenon. This kind of more detailed characterization could be employed as an additional element for the validation of simulation models.
A more generally applicable approach, based on a technique from the machine learning area, has been proposed in [10]. In the paper, authors describe performed experimental observations and analyses employing a clustering algorithm to support the identification of pedestrian lanes in the video. The approach is a simple customization of the well-know DBSCAN algorithm [11] (Density Based Clustering Analysis) and it aggregates the instantaneous information about the position and velocity of pedestrians to form the clusters. Although results are preliminary, the adoption of an unsupervised machine learning technique seems particularly suited to this kind of problem. Moreover, a similar approach providing an automatic characterization of the flows in the scene is proposed in [5]. The described algorithm is capable of aggregating positions and velocity vectors of pedestrians and identify origins and destinations of main pedestrian flows in the scene, but the scope of the analysis is not as microscopic as to characterize the possible pedestrian lanes.
The present paper builds on these results trying to provide both a general method for the analysis of the lane formation phenomenon as well as an approach to evaluate its effectiveness. To estimate the precision and reliability of the proposed algorithm for the automated characterization of lanes, we use the video and tracking results collected during the execution of controlled experiments focused on pedestrian counter flows dynamics described in [12]. More precisely, we tested the level of inter-rater agreement between the results achieved by the automated tool and by an expert human coder through a series of Cohen's Kappa statistical analysis [13].
The paper breaks down as following. Section 2 described the methodology and the clustering algorithm, with details on the two steps of the process; Section 3 presents an overview of results, and describes reliability test procedure. The paper concludes with final remarks and future works.

A Clustering Algorithm to Characterize Lane Formation
In the vein of [10], we propose a novel clustering-based approach able to identify lanes in arbitrary settings. The algorithm is based on a hierarchical two-steps application of DBSCAN, with distance metrics and respective parameters specifically tailored to deal with this problem. The aim is to achieve clusters that are in tune with the intuitive conception of the lane formation phenomenon.
We briefly introduce the main concepts of DBSCAN to enable the reader grasping the main concepts of the proposed approach. The algorithm identifies an arbitrary number of clusters, of any given shape (even concave) in the dataset, identifying "dense" and well separated zones according to three elements: (i) the assumed distance metric ����⃗, ⃗ � between points; (ii) a threshold θ for the maximum distance among points that can be considered neighbours and therefore members of the same cluster; (iii) the minimum size of a relevant cluster . While the first element relates to the choice of a function to compute distance between pair of points, the other two are the actual parameter of the algorithm for the final identification of clusters. For the computation of the output, DBSCAN assigns a label to each point specifying whether the point is of type noise, border or core. The first label is assigned to points which do not have enough neighbours (less than ) and which are not neighbours of a core point.
Otherwise, a point is labelled as border point if it is neighbour of a core point but that does not have enough neighbours. A point is labelled as core, then, if it has at least neighbours and, thus, defines a dense area of the dataset. Finally, according to DBSCAN a cluster is composed of the set of all neighbour core points, plus their neighbour border points.
In the customization of DBSCAN proposed in this paper, the characterization of the lane formation phenomena is computed by means of a hierarchical approach, in which two distinct distance metrics are applied with different thresholds; for the context of application, we assume that the parameter is shared in the two procedures. In particular, this parameter is set to 3 to allow the characterization of lane for a situation in which three persons walk in a river-like pattern, that we consider as the simplest case of lane.
An intuitive description of the two-steps of the overall algorithm is shown in Figure 1. In the first step velocity vectors from the input raw data are used to identify big clusters associated to the main directions of flow, using a distance function which accounts for the angular distance between vectors. A second step of DBSCAN is further performed on the output clusters, considering the global average velocity of pedestrians within the cluster and the coordinates of pedestrians to finally characterize the lanes. In this second step, the distance metric is more complicated, since it must consider additional information, and it will be formally described in Sect. 2.2, but it basically considers pedestrians' positions with respect to the positions of neighbouring members of the same flow-cluster.
The algorithm works on almost instantaneous data, potentially allowing the implementation on realtime systems: by means of an aggregation with mobile average, mean positions and velocity vectors of pedestrians related to short time windows (less than half second) are calculated. The two sequential procedures will be now discussed in the following subsections.

Recognition of main directions of movement
Predominant directions of flow are recognized to identify the potential counter-flow situation as well as to achieve relevant information for the second step of the procedure, where the average velocity of pedestrians walking in the same direction is relevant for the computation of distance.
This process takes as input the set of velocity vectors ���⃗ of observed pedestrians in the current frame and analyse the differences in their orientation. The magnitude of the velocity vector is not influencing the result in this case: the emergence of locally higher densities in counter-flow situations, in fact, can lead to sensible differences among the speeds of pedestrians, and then to errors in the clustering process. To avoid this bias, the distance metric for a pair of velocity vectors ����⃗, ���⃗� is defined as: where the numerator denotes the dot-product between the two vectors. This distance describes the inner (i.e., minimum) angle between ���⃗ and ���⃗ and a unique threshold ∈ [0,180] is introduced to calibrate the range of neighbourhood (i.e. below this angular distance, the pedestrians are considered members of the same major flow); in the experiments presented in the next section, is configured to about 50 degrees. While this could be considered as a rather substantial value, two considerations must be introduced. On one hand, given the hierarchical structure of the process we prefer a looser definition of neighbourhood at this stage and refining the output in the next task. Secondly, a thorough calibration on the parameters of the algorithm is still an ongoing work and the results later presented are referred to a calibration that was chosen because it generated visually stable and plausible results (fact that is also endorsed with the Kappa analysis shown in Figure 3). As exemplified in Figure 1, this process produces output clusters that describe pedestrians headed towards the same direction of movement.

Characterization of lane formation
The second step works sequentially on the individual clusters identified with the previous task, by performing density-based clustering using only their points. It takes as input the actual positions of pedestrians ���⃗ and the average velocity vector of the clustered pedestrians * ����⃗ which is used for the computation of distance. The aim is the final identification of pedestrian lanes in the given frame, and this objective is pursued with the definition of a particularly tailored distance metric to compose clusters describing pedestrians which are not only relatively close, but also in an arrangement that can be associated to a form of queueing situation. With this purpose, in fact, a function able to differentiate distances according to the movement direction of the pedestrian is required. We then configure a function for a pair of points ����⃗, ���⃗� as to compute distance values asymmetric with respect to the axis of the representation of the analysed environment; the rationale is that this measure grows more substantially along one axis of the 2-dimensional space (we consider the y-axis by assumption): where � and � are the x-and y-coordinates of the relative position of pedestrian with respect to (the evaluated point), after being rotated according to the average velocity * ����⃗ of clustered pedestrians at the first step.
is the calibration parameter of this function. The vector rotation is performed to align pedestrian positions according to the average direction of movement, for which the distance grows slower in order to aggregate positions describing queueing pedestrians. Formally, it is described with the following equation: with being the rotation operator, the function computing the counter-clockwise angle between two vectors and (1,0) a unit vector to align the orientation of the movement direction * ����⃗ along the x-axis, by rotating neighbour points accordingly.
The functioning of this part of the algorithm is also graphically exemplified in Figure 2. The function defines the neighbourhood of points with an elliptical shape, whose long side is aligned toward the average direction of movement * ����⃗ : this allows pedestrians that are walking in a river-like formation to be considered as members of the same lane although there is a certain distance between them, whereas if they walk in a line abreast formation the same distance might be considered more relevant. The dimensions of the ellipse are provided by the threshold and the parameter , which acts as a multiplier to define the proportion of the long side. In this way the algorithm provides clusters with a shape that follows the direction * ����⃗ and aggregating points referring to queueing pedestrians.

Reliability Test between the Clustering Algorithm and a Human Coder
To test the reliability (i.e. internal validity) of the above described tool for the automated characterization of lanes, we use the video and tracking results collected during the execution of controlled experiments focused on pedestrian counter flows dynamics, thoroughly described in [12]. We focused one of the performed experiments, based on the analysis of fully-balanced bidirectional pedestrian flows. Then, we executed a cross-checking analysis between the results achieved through a preliminary calibration of the algorithm and the results achieved by a human coder, expert in the field of pedestrian crowd dynamics. Although the formation of lanes is a well-known phenomenon in this field of study, there is still a lack of formal definition for its quantitative characterization. As a general guideline, we started from drafting a common-sense definition of the lane formation phenomenon for sake of human rater analysis, as follow: "a lane is a group of three or more pedestrians, walking in the same direction with a river-like spatial arrangement, avoiding collision with counter flows".
Then, the human coder was asked to analyse both the video images of the experiments and a videoclip where position and IDs of pedestrians (see Fig. 3(a) and (b)): the video clip has been prepared with the positions of the tracked pedestrians participating the above described experiments, annotated with a numerical identifier to allow human coder to identify pedestrians walking in lanes. To facilitate manual annotation of ID of the lane, the coder was asked to analyse the images of the two synchronised videos starting from the bottom-right part of the screen. To more thoroughly evaluate all these indicators the coder was also asked to rewind the video and take the necessary time to classify pedestrian lanes. According to the above described methodology, the video of one experimental procedure was analysed by annotating the situation every 30 frames and by describing the observed lane formation phenomenon in the last second. Pedestrians have been classified with a univocal numerical identifier considering: their condition of "walking out of any lane" or "walking in a lane" (i.e. gross classification); their belonging to a certain lane (i.e. granular classification).
More in details, the coder was asked to annotate the following information for each second of the video: (i) time of the video; (ii) numerical identifier of pedestrian; (iii) numerical identifier of lane. The conventional value -1 was assigned to identify the condition of "walking out of any lane", while values greater or equal to zero were used to assign the ID of the lane the pedestrian belonged to in that time frame.
The described data analysis procedure had the objective to cross-check the results about lane formation phenomena achieved through the clustering algorithm and the support of a human coder. To measure the level of reliability of the tool, we tested the level of inter-rater agreement between the two methods by means of a series of Cohen's Kappa statistical analysis [13]. The Kappa statistics measures the level of inter-rater agreement between two coders in classifying a certain object/subject by using categorial variables. It has a maximum value of 1, when agreement is perfect, 0 when agreement is no better than chance. Other values can be roughly interpreted as: < 0.20 poor; < 0.40 fair; < 0.60 moderate; < 0.80 good agreement. Results (see Figure 4) showed a high agreement on average between the two independent coding methods in both gross classification (K = 0.837, SD 0.144) and granular classification (K = 0.841, SD 0.139), confirming the consistency of results and empirically corroborating the reliability of the clustering algorithm.
As introduced by other works present in the literature (e.g., [9,14]), we analysed the observed pedestrian counter flow situations as subdivided into three phases (see Figure 4): (i) lane generation (from the first frame in which pedestrians are separated according to their movement direction, to the moment in which the two flows physically interact and lanes start to emerge); (ii) lane fingering (bidirectional flow characterised by the consolidation of the emerged lanes); (iii) lane dissolution (lanes are being dissolved). According to this consideration, Cohen's Kappa results has been further analysed considering the lower level of inter-rater agreement in case of lane generation and lane dissolution situations (see Figure 4), due to an effectively more ambiguous situation among the observed phenomenon. Moreover, the trend within the central time window although always over the moderate agreement threshold (and almost always above the good one) is not stable, but it rather reflects transient turbulences in which the number of lanes and their position in the corridor was changing. This seems reasonable, since these transient situations represent a sort of phase change between more stable configurations, that the human coder can foresee, also being able to rewind and look back at the past, a situation analogous to automated trackers taking a global perspective on the analysed video compared to those based on a limited-temporal-locality assumption [15].

Conclusions
The paper presented an unsupervised learning approach for an automatic detection of lanes in multidirectional pedestrian flows, based on the DBSCAN clustering algorithm. We also presented a method for the evaluation of its reliability employing an inter-rater agreement test between the results achieved by a human coder and by the algorithm. Achieved results are promising and they will support a further calibration of the overall workflow, that will also consider additional datasets from the literature and potentially consider multiple human coders.
Even though there are many future works in this line of research (first of all, the impact of the presence of groups on the overall phenomenon, which is considered a potentially relevant factor [16]), the central goal is to achieve a characterisation of lanes throughout the whole analysed video and time frame, whereas the current approach only considers relatively small time windows. The basic idea to aggregate current results into a global description of identified lanes is to connect local lanes within different time windows whenever the Jaccard distance among them (considering them as sets of pedestrian identifiers) is greater than a certain threshold. This will allow us to have a more comprehensive characterisation of a lane, granting the possibility to evaluate its persistence in time, average cardinality, length, and potentially other aggregated measures.