A Method for Joint Estimation of Homogeneous Model Parameters and Heterogeneous Desired Speeds

One of the main strengths of microscopic pedestrian simulation models is the ability to explicitly represent the heterogeneity of the pedestrian population. Most pedestrian populations are heterogeneous with respect to the desired speed, and the outputs of microscopic models are naturally sensitive to the desired speed; it has a direct effect on the flow and travel time, thus strongly affecting results that are of interest when applying pedestrian simulation models in practice. An inaccurate desired speed distribution will in most cases lead to inaccurate simulation results. In this paper we propose a method to estimate the desired speed distribution by treating the desired speeds as model parameters to be adjusted in the calibration together with other model parameters. This leads to an optimization problem that is computationally costly to solve for large data sets. We propose a heuristic method to solve this optimization problem by decomposing the original problem in simpler parts that are solved separately. We demonstrate the method on trajectory data from Stockholm central station and analyze the results to conclude that the method is able to produce a plausible desired speed distribution under slightly congested conditions.


Introduction
Microscopic simulation is a powerful tool to evaluate or compare infrastructure design or control strategies. One of its strengths is the ability to explicitly represent the heterogeneity of the pedestrian population in the model. Most pedestrian populations are heterogeneous with respect to the desired speed; the speed that a pedestrian is striving to keep but is often unable to keep due to surrounding pedestrians.
The outputs of microscopic models are sensitive to the value of the desired speed; it has a direct effect on the flow and the travel time in most scenarios, thus strongly affecting results that are of interest when applying pedestrian simulation models in practice. An inaccurate desired speed distribution will in most cases lead to inaccurate simulation results.
The desired speed is often not directly observable, since in any situation with significant congestion, most pedestrians are unable to keep their desired speed. A proxy for the desired speed is the free flow speed; the speed that the pedestrians walk at in absence of any interactions with other pedestrians. The free flow speed is directly measurable, and in most microscopic models the free flow speed of an individual pedestrian is equal to its desired speed. However, since the population present when free flow occurs may have a different desired speed distribution than the population present when congestion occurs, observations of the free flow speed distribution may provide an inaccurate estimate of the desired speed distribution during the congested conditions of interest. This may occur even if the same individuals are present both during free flow and congested conditions due to variations of individual desired speeds over time.
Investigations of pedestrian speeds have been performed since at least the fifties, when controlled experiments were performed and observations in the London Underground were made, resulting in an estimated free flow speed of 1.6 m/s [1]. A modern investigation of a similar kind was performed in Hong Kong, reporting a free flow speed of around 1.3 m/s, but with significant variations between walking areas with differing characteristics [2]. Numerous similar studies have been performed, see [3] for an overview. A common reference of the free flow speed is [4], who reports a value of 1.34 m/s, which also happens to be the average of the values reported by the studies reviewed by [3]. Measurements of free speed distributions dates back to at least the seventies through observations of low density conditions [5]. More recently, the free speed distribution was estimated in controlled experiments, also by only considering low density observations [6], [7]. These desired speed estimates were improved in [3], which corrects for that the data is censored; an estimate based only on free pedestrians in partly congested traffic will be biased since pedestrians with a high desired speed have a larger probability to be constrained. However, this method depends on a classification of observed pedestrians into constrained or freely walking, respectively. For vehicular traffic this classification can be circumvented by considering observations partly censored [8], but this method is hard to apply to pedestrian traffic due to the lack of clearly defined lanes.
In this paper we propose a method to estimate the desired speed distribution by treating the desired speeds as model parameters to be adjusted in the calibration together with other model parameters. The method is based on the calibration methods previously applied in [9]- [11] to calibrate the Social Force Model (SFM) [12]. Also here, we demonstrate the proposed method by calibrating the SFM, but both the proposed method and the previously applied methods can also be used to calibrate similar models.
The optimization problem of the proposed method is similar to the one in [9]: simulations are performed for each observed pedestrian, while letting the surrounding pedestrians move exactly according to the observations. The deviations of the simulated trajectories from the observed ones are used to define the objective of a minimization problem with the model parameters as decision variables.
In [9] the desired speed of each pedestrian is set to the maximum observed instantaneous speed of that pedestrian, while in [10] it is set to a certain percentile of the observed instantaneous speeds of the pedestrian. As noted in [9], this works well for low density condition. However, the desired speed is underestimated when fast pedestrians start to get delayed. A biased estimate of the desired speed may lead to biased estimates also of the other parameters, due to interdependence between parameters; a too small desired speed may for example be partly compensated by lowering the relaxation time.
The method in [11], on the other hand, treats the desired speeds as calibration parameters, adjusting them together with the parameters that are common to all pedestrians. This results in an optimization problem with dimension proportional to the number of observed pedestrians, which dramatically increases the solution time with increasing size of the data set. This was not a problem in [11], due to the use of a data set from controlled experiments with a relatively small number of subjects. However, for naturalistic data sets with thousands of pedestrians the computational cost becomes problematic.
We propose a heuristic solution approach to this optimization problem that decomposes the problem into one problem for the model parameters that are assumed to be constant over the population, here called homogeneous parameters, and a set of one-dimensional problems, one for the desired speed of each pedestrian. These problems are solved alternately until the improvement is negligible. In this way, instead of having a problem with dimension proportional to the number of observed pedestrians, we get a number of one-dimensional problems proportional to the number of pedestrians and a problem with dimension equal to the number of homogeneous parameters. This implies that the method is feasible for large data sets for which the optimization problems would be prohibitively costly to solve directly. Intuitively, this decomposition is possible since the optimal desired speeds are not too strongly dependent on the homogeneous parameters, and the optimal homogeneous parameters are only slightly affected by each desired speed.

Method
This paper presents a method to jointly estimate homogeneous parameters and heterogeneous desired speeds. The method is expected to be applicable for most microscopic models with continuous space representation, but for concreteness and since the exact definition of the desired speed depends on the model considered, a specific version of the SFM is considered.

Simulation model
The simulation model considered in this study is based on the version of the SFM presented by [9]. The acceleration ̈ of agent is given by a sum of forces dependent on the surroundings, where the desired velocity , is given by some route choice model or as input data; is the angle between the direction of motion of the affected agent and the direction toward the affecting agent . When applying the model in simulations, a stochastic term is usually included in addition to the systematic terms above; however, only the systematic effect is calibrated, in line with [9]. Also, a force from static obstacles is necessary to include, however, the observed area does not include any obstacles, see section 3. The social force, , exerted on agent by agent is given as the gradient of a potential of the form where is the range scale and the strength of the force, is the anticipation time, and the relative position of the affecting agent. In total, this model contains five parameters: the relaxation time , the social force strength , the social force range scale , the anticipation time , and the desired speed .

The calibration problem
For application of microscopic pedestrian simulation models for predictive purposes, the goal of the calibration is in general to find parameter values that result in a model that can predict traffic under conditions and environments that are similar, but not identical, to some observed reference situation. To achieve this the parameters are adjusted such that the output of the model becomes sufficiently similar to the observed reference traffic. We call this the calibration of the model, and this is the focus of the present study, while the subsequent test of the predictive power of the model through comparison with independent data, that is the validation, is not considered.
As noted above, the model has five parameters that correspond to various properties and preferences of the simulated pedestrians. However, some, or all, of these properties and preferences may vary over the population, so in principle we would like to estimate the multivariate distribution of the parameters over the population. This is, however, an immense task requiring large amounts of data. We will here undertake the simpler task of estimating the distribution of only the desired speed, under the assumption that the remaining parameters are homogeneous, that is all agents have the same value of the parameters. An important observation is that the distribution of the desired speed under the assumption of homogeneous remaining parameters is not necessarily the same as the marginal distribution of desired speed when all parameters vary over the population.
We formulate the calibration problem as an optimization problem, minimizing some error function that quantify the difference between the model output and the reference data, Proceedings from the 9th International Conference on Pedestrian and Evacuation Dynamics (PED2018) Lund, Sweden -August 21-23, 2018 where is the number of trajectories in the data set, is an error function that quantify the difference between the simulated trajectory of agent and the observed trajectory of the corresponding pedestrian, and is the set of homogeneous parameters, that is, = ( , , , ). A significant difficulty here is that the dimensionality of the solution space of problem (5) is + 4, that is, the dimensionality is proportional to the number of observed trajectories. For a modest number of observed trajectories problem (5) is tractable, but as the number of trajectories in the data set increases the problem quickly becomes computationally too costly to solve directly.

Objective function
The error function quantify the fit of individual simulated trajectories to the observed data. Many versions have been used in the literature; here we consider the integrated Euclidian distance between the observed trajectory and the trajectory obtained by simulating an agent with the same initial conditions and environment as the observed pedestrian. That is, the agent is simulated in presence of agents moving exactly according to the observed trajectories. This is similar to the approach taken by e.g. [9], [10]. Furthermore, it is assumed that the desired destination of the agent is the end of the observed trajectory. The simulation is executed for a certain time , and is then restarted with the agent reset to a position at the observed trajectory. This is repeated times, to avoid promoting parameter values that steer back the agent toward the observed trajectory from a position deviating from it.
The error function thus becomes where ( , ; ) and ( ) are the simulated and observed positions at time , respectively; and , = 0,1, … , are the starting times for each of the short simulations.

Optimization method
As mentioned above, problem (5) is computationally too costly to solve directly for large data sets. This is due to the combination of the large dimensionality of the solution space and that the objective function is very unlikely to be convex, likely to have multiple local minima, and is likely to be non-smooth and even discontinuous at some points. We therefore propose a method similar to the coordinate descent class of optimization methods, see e.g. [13], to solve the problem. The proposed method can be summarized as: 0. Obtain initial estimate of the desired speed distribution by heterogenous calibration. 1. Minimize the sum of the error functions with respect to the homogeneous parameters , keeping the desired speeds at the values obtained in previous step. 2. Separately minimize each error function with respect to , keeping the homogeneous parameters at the values obtained in previous step. 3. Go to step 1 if improvement in the error is above some threshold.
In the initial step, step 0, the problems min , ( , ) , = 1,2, … , , are solved separately. This gives an initial estimate of the desired speed distribution. The values of , on the other hand are highly uncertain, since most trajectories separately contain too little information to obtain meaningful values of all the parameters. We also use the solutions of (7) to remove any trajectories with a value of * = min , ( , ) above a threshold corresponding to an average deviation of 0.1 m from the observed trajectory from further use in the calibration procedure, since such trajectories are likely affected strongly by factors external to the model. If included, these trajectories could promote values of the parameters that compensate for such external effects. The threshold was chosen rather high to only sort out strongly deviating trajectories. In step 1, the problem is solved, with according to the result of the previous iteration of step 2 (or step 0 if it is the first iteration). This problem has a computationally costly objective function requiring the simulation of each of the agents representing the observed pedestrians, and the solution space has a dimensionality equal to the number of homogeneous parameters; four in the case of the model used as an example here. This makes problem (8) a costly problem, especially for more complex (realistic) models with more parameters. However, the calculation of the objective is suitable for parallelization since it is a sum. The result of solving problem (8) is a set of values for the homogeneous parameters .
Step 2 consists of solving the set of one dimensional problems where the homogeneous parameters have the values obtained from step 1. These problems are one dimensional and can be solved in parallel, and thus computationally cheap compared to problem (8) and if treated carefully it is likely that the global minimum of each problem can be found.
Since the objective function of the problem for the homogeneous parameters is likely to contain discontinuities (when a small shift in the value of a parameter leads to that the agent passes another agent on the other side compared to without the parameter shift), a derivative free optimization algorithm is preferable. In the demonstration of the method we apply a genetic algorithm in line with [9]- [11], since this also can handle the existence of a large set of local minima.
The advantage of the proposed method over trying to directly solve problem (5) is that it reduces the hard problem (5) to the much simpler problems (7)(8)(9). However, there is no guarantee that the method will find the global minimum of problem (5), so careful analysis of the results is required to check that the results are reasonable.

Case
We now demonstrate the proposed method on trajectory data collected at Stockholm central station during the afternoon peak through manual annotation of video recordings [14]. The annotation was made by estimating the center of mass of the pedestrian by the point half way between their feet when they were maximally separated or together. This annotation method almost completely removes the swaying problem encountered when tracking the heads of the pedestrians but is slightly more labor intensive.
The observed area is approximately four by six meters, located in the middle of a wide passage, with dominating flows in the direction along the longer sides of the observed area. There are no fixed obstacles in, or directly adjacent to, the observed area.

Results
The result of the calibration in terms of the optimal values of the homogeneous parameters are given in table 1, together with corresponding values from two similar studies, and the resulting desired speed distribution is presented in figure 1. The estimated parameters do not deviate strongly from the results of the previous studies. The anticipation time estimated here is a bit high compared to the other studies and it seems to be compensated by a lower value of the range scale of the social force. The mean of the desired speed distribution is 1.25 m/s, its support is between 0.46 m/s and 3.0 m/s, and its standard deviation is 0.29 m/s.

Analysis
Since there is no proof that the proposed method converges to the global solution of problem (5), some analysis of the solution is provided here. In figure 2 the solution progress of the genetic algorithm used to solve problem (8) is displayed for the first, second, fifth and last iterations of the proposed method. As can be seen, 50 generations seem sufficient for the genetic algorithm, and the progress after the second iteration of the procedure is negligible. To the right in figure 2 the sensitivity of the objective functions to perturbations in each of the homogeneous parameters around the best found solution is presented. The objective is clearly sensitive to perturbations in all parameters except the anticipation time. That the data contain limited information on the anticipation time is expected due to the size of the observed area. If two agents are walking toward each other, each at a speed of say 1.3 m/s, and the anticipation time is 2 s, the agents will start reacting to each other at a distance approximately equal to the length of the observed area.
The clear increase of the objective in the direction of each homogeneous parameter is an indication that the procedure may indeed have found the optimum, but it is far from certain. Also, even though the increase is clear it is rather small, indicating that either the data only contain limited information on the parameters, or that the parameters really should be heterogeneous over the population. In the case of strong heterogeneity over the population, the found solution would be a compromise and a shift in either direction would improve the fit for some trajectories and worsen it for others, thus giving a relatively flat objective. To investigate the interdependence between the desired speed and the homogeneous parameters, the relative change in the solutions of a sample of the problems (9) to changes in each of the homogeneous parameters from the best found solution are presented in figure 3. Note that the presented relative change in the desired speed is the absolute relative change. This shows that the estimated desired speed indeed is dependent on the values of the homogeneous parameters, at least for some of the trajectories, even though this dependence is rather weak. The dependence is clearly stronger for the relaxation time and the range scale of the social force, than for the social force strength and the anticipation time. This seems reasonable, since an agent with a high value of the desired speed will tend to perform smaller evasive maneuvers then an agent with a low Proceedings from the 9th International Conference on Pedestrian and Evacuation Dynamics (PED2018) Lund, Sweden -August 21-23, 2018 value of the desired speed, due to the stronger desired force for a given directional change due to an interaction with another agent. A similar effect is obtained by decreasing the relaxation time or the range scale of the force, so it is reasonable to expect an interdependence between the desired speed and these two parameters.

Discussion and conclusions
We conclude that the method is able to estimate a desired speed distribution in slightly congested conditions that seems plausible, while further studies are needed to evaluate the accuracy and robustness of the estimation of the remaining, homogeneous, parameters. Also, it may be worth noting that the proposed method relies heavily on the use of individual trajectories, and it is hard to see any version of the method that does not, and the method thus has the drawbacks of any trajectory-based method. There is a risk that observed pedestrians close to the border of the observed area might be affected by pedestrians outside the observed area. However, this risk is reduced by removal of trajectories deviating too strongly after the solution of eq. 7. An important topic for future research is to verify the method against synthetic data and test for how high densities it is capable to estimate the desired speed distribution.