Modeling Trajectory-level Behaviors using Time Varying Pedestrian Movement Dynamics

We present a novel interactive multi-agent simulation algorithm to model pedestrian movement dynamics. We use statistical techniques to compute the movement patterns and motion dynamics from 2D trajectories extracted from crowd videos. Our formulation extracts the dynamic behavior features of real-world agents and uses them to learn movement characteristics on the ﬂy. The learned behaviors are used to generate plausible trajectories of virtual agents as well as for long-term pedestrian trajectory prediction. Our approach can be integrated with any trajectory extraction method, including manual tracking, sensors, and online tracking methods. We highlight the beneﬁts of our approach on many indoor and outdoor scenarios with noisy, sparsely sampled trajectory in terms of trajectory prediction and data-driven pedestrian simulation.


Introduction
The modeling of pedestrian movement dynamics has received considerable attention in multiple fields, including computer-aided design, urban planning, robotics, and evacuation planning.In many of these applications, the goal is to generate trajectories and behaviors of virtual pedestrians that are similar to those observed of humans in realworld environments.The most common approaches used to model pedestrian and crowd movement are based on agent-based models that treat individuals as autonomous agents who can perceive the environment to make independent decisions about their behavior or movement.Agent-based methods have been well studied in different fields for decades and various formulations have been proposed for global and local navigation.However, current approaches are unable to simulate the dynamic nature, variety, and subtle aspects of real-world pedestrian motions.
Advances in sensor (e.g., camera) technologies have made it possible to easily capture videos of pedestrian and crowd motion.Such videos are widely available on the Internet (e.g., YouTube).It is possible to use computer vision methods to extract trajectories of pedestrians from these videos.These trajectories correspond to the location of each pedestrian on the walking plane as a time-dependent function.There is considerable interest in utilizing these real-world trajectories to learn pedestrian behaviors and use them for different applications.In particular, there is a new class of algorithms, called data-driven pedestrian [1][2][3][4] or crowd simulation, in which such real-world trajectories are used for simulating the pedestrians in a synthetic environment.
However, current techniques available to extract trajectories from videos have many limitations and the use of such data-driven methods for pedestrian simulation is therefore restricted.The accuracy of video tracking methods varies with the number of pedestrians and crowd density as well as the video resolution and the illumination conditions.As a result, use of these methods is currently limited to isolated pedestrians or sparse crowds.Many behavior learning algorithms require a high number of training videos to learn the movement patterns offline, and typically extract a fixed set of parameters that are used as a global characterization of pedestrian behavior or trajectories; thus, they may not be suited to capturing the dynamic nature or time-varying behaviors of pedestrians that are observed in the real-world scenes.
Main Results: In this paper, we present statistical algorithms to learn the characteristics of pedestrian movement from trajectories extracted from real videos.These characteristics are used to compute collision-free trajectories of virtual pedestrians whose movement patterns resemble those of pedestrians in the original video.Our approach is automatic and interactive and captures the dynamically changing movement behaviors of real pedestrians.We demonstrate its applications for many data-driven crowd simulations, where we can easily add hundreds of virtual pedestrians, generate dense crowds, and change the environment or the situation.
Our goal is to develop robust techniques that can account for noise in trajectory datasets and extract high-level characteristics of time-varying pedestrian movement dynamics (TVPMD).TVPMD is the descriptor of the movements of pedestrians at each point of time, as opposed to the averaged over long sequence of inputs.Our techniques include automatic and interactive approaches to extract the movement patterns and motion dynamics of pedestrians.There is a large collection of crowd videos available on the Internet and our statistical algorithms make it easier to extract a large library of dynamic movement patterns for different real-world situations.
We present a fast method to learn the characteristics of pedestrian movement dynamics from 2D trajectories extracted from a single video on the fly.First, our formulation models the group of pedestrians or the crowd as a complex system with non-linear dynamics and computes the most likely state of each pedestrian from noisy trajectory data using Bayesian inference.We then compute the TVMPD, the high-level characteristics of the Our approach can perform all these computations in tens of milliseconds.
pedestrians, based on the estimated states.These characteristics consist of three components that correspond to movement patterns and motion dynamics: the entry point or starting position of each individual human in the environment, the movement flow used to estimate its preferred velocity or intermediate goal position, and the local collision avoidance technique.We show that these three components can generate pedestrian movements that are similar to those observed in the original videos and that they can also be used in slightly varying environments.Furthermore, we present algorithms to combine these characteristics with other agentbased models, which allows us to adapt their movement to a new environment or a new situation.We also use these movement dynamics to improve long-term pedestrian trajectory prediction (Fig. 1).
We have implemented our approach and evaluated the benefits of pedestrian movement dynamics on several indoor and outdoor scenarios.The original videos of these scenes have tens of real-world pedestrians, and we are able to reliably compute pedestrian movement dynamics at interactive rates on a desktop PC.We are able to demonstrate up to 12% improvement in long term pedestrian prediction and about 3.5e-01 seconds feature computation time improvement for data-driven pedestrian simulation algorithms using our approach.
For the rest of the paper we use the terms "pedestrian", "agent", "real-world agent" interchangeably indicating a real person in a video.We also use the terms "virtual agent", "virtual pedestrian", and "user-controlled agent" interchangeably indicating a simulated person.
The rest of the paper is organized as follows.Section 2 provides an overview of related work in pedestrian movement dynamics, data-driven crowd simulation, and behavior learning.We introduce the terminology and present our interactive pedestrian dynamicslearning algorithm in Section 3. In Section 4, we highlight the benefits of our approach for adaptive data-driven crowd simulation and pedestrian trajectory prediction.We describe our implementation and highlight the performance on different benchmarks in Section 5.

Related Work
In this section, we give a broad survey of related work, including multi-agent simulation, data-driven crowd movement dynamics, and behavior learning.

Pedestrian Trajectory Simulation
Some of the commonly used techniques for simulating crowd behaviors and trajectory computation are based on agent-based models, including rule-based methods, which use a set of behavioral rules to guide the behavior of each pedestrian [5,6].Another group of algorithms includes force-based methods that model interactions among pedestrians using attraction or repulsion forces [7], and velocity-based [8,9] and vision-based approaches [10], which are useful for collision-free local navigation.Another set of algorithms is based on continuum techniques, which compute fields for pedestrians to follow based on continuum flows [11] or fluid models [12].Many extensions and experiments have also been proposed to understand pedestrian flows [13][14][15] and density relationships [16,17].

Data-driven Crowd Movement Dynamics
Data-driven methods use real-world motion specifications or captured data to generate the trajectory or behavior of each pedestrian.At a broad level, prior work in data-driven methods can be classified into offline methods (which involve preprocessing) and interactive algorithms.
Offline Methods: There is a large body of work in computer graphics and animation that captures human motion data and performs motion synthesis based on Motion Patches [18] and extensions that can be used to generate a dense crowd with multiple interacting human-like characters [19].These methods can also model close interactions between pedestrians.Other data-driven methods emphasize generating trajectory-level behaviors using video data or recorded trajectories.For example, data extracted using semi-automatic trackers are used to generate group behaviors [20].The virtual scenes can be populated by copying and pasting small pieces of real-world crowd data [21] or by using efficient data structures to represent sequences of motion in a large database for motion retrieval [22].A different class of data-driven algorithms uses real-world crowd data to learn or optimize the motion-model parameters for agent-based simulation algorithms.This class includes a density-based measure [23], a similarity-based entropy metric [24] to learn or evaluate parameters from a given scenario; offline optimization methods that use real-world trajectory data to compute the best parameters for simulated motion models [25,26]; and a data-driven framework that analyzes the quality of and anomalies in crowd simulations by comparing them to given training data [27].
Interactive Methods: There is large body of computer vision literature on realtime pedestrian tracking from videos, and these can be used to generate 2D agent trajectories for data-driven simulation [28,29].However, current methods are limited to generating 2D trajectories and cannot handle any changes in the environment or simulate different trajectory behaviors from those observed in the videos.There is a large body of work on interactive editing of crowd trajectories and animations [30,31].These methods can be directly used on extracted trajectories or on the full body motion of animated characters to generate plausible pedestrian movement simulations for different environments.

Pedestrian Prediction
Prior work in pedestrian prediction [32,33] makes simple assumptions on pedestrian movement, such as the use of constant velocity or constant acceleration motion models.In order to improve the accuracy and deal with medium-to-high density crowds, more accurate motion models and interaction rules have been used.Bruce et al. [34] and Gong et al. [35] predict pedestrians' motions by estimating their destinations.Liao et al. [36] obtain a Voronoi graph from the environment and predict a pedestrian's motion along the edges.Luber et al. [37] track pedestrians using a Kalman filter-based tracker along with Helbing's social force model.Mehran et al. [38] apply the social force model to detect people's abnormal behaviors from videos.Pellegrini et al. [39] use an energy function to build up a goal-directed short-term collision-avoidance motion model.Bera et al. [40] improve pedestrian prediction and tracking accuracy by using reciprocal velocity obstacles and hybrid motion models.Yamaguchi et al. [41] use an agent-based behavioral model called ATTR and learn additional social and personal properties from the behavioral priors, such as grouping information and destination information, to perform pedestrian tracking and prediction.Fulgenzi et al. [42] use a probabilistic velocity-obstacle approach, combined with the dynamic occupancy grid.This method assumes obstacles have constant linear velocity.

Video-Based Crowd Analysis
There is extensive work in computer vision, multimedia, and robotics that analyzes the behavior and movement patterns in crowd videos, as surveyed in [43,44], where the main objectives include human behavior understanding and recognition and crowd activity recognition for detecting abnormal behaviors [45,46].Many of these methods use a large number of training videos to learn the patterns offline [47,48].Other methods utilize motion models to learn crowd behaviors [49,50] or machine learning methods [51,52].In contrast, our goal is to develop improved techniques for interactive data-driven crowd simulation.

Time-Varying Pedestrian Movement Dynamics
In this section, we present our interactive algorithm that learns time-varying pedestrian dynamics from real-world, 2D pedestrian trajectories.We assume that these trajectories are extracted from observations using standard tracking algorithms.

Pedestrian State
We first define specific terminology used in the paper.We use the term pedestrian to refer to independent individuals or agents in the crowd.We use the notion of state to specify the trajectory and behavior characteristics of each pedestrian.The components used to define a state govern the fidelity and realism of the resulting crowd simulation.Because the input to our algorithm consists of 2D position trajectories, our state vector consists of the information that describes the pedestrian's movements on a 2D plane.We use the symbol x ∈ R 6 to refer to a pedestrian's state: where p is the pedestrian's position, v c is its current velocity, and v pre f is the preferred velocity on a 2D plane.The preferred velocity is the optimal velocity that a pedestrian would take to achieve its intermediate goal if there were no other pedestrians or obstacles in the scene.In practice, v pre f tends to be different from v c for a given pedestrian.We use the symbol S to denote the current state of the environment, which corresponds to the states of all other pedestrians and the current positions of the obstacles in the scene.
The state of the crowd, which consists of individual pedestrians, is a union of the set of each pedestrian's state X = i x i , where subscript i denotes the i th pedestrian.Our state formulation does not include any full body or gesture information.Moreover, we do not explicitly model or capture pairwise interactions between pedestrians.However, the difference between v pre f and v c provides partial information about the local interactions between a pedestrian and the rest of the environment.

Pedestrian Movement Dynamics
Pedestrian dynamics consist of those factors that govern pedestrians' trajectory behaviors, i.e., the factors that change the state of the pedestrians.We model pedestrian dynamics using three components: starting position or entry point, movement flow, and the local collision-free navigation rule.Formally, we represent the characteristics of these dynamics for each pedestrian with a vector-valued function, f (), with an initial value determined by the function, E(): For each pedestrian in the crowd, the function G : R × R 6 × S → R 2 maps time t, current state of the pedestrian x ∈ X, and current state of the simulation environment S ∈ S to a preferred velocity v pre f .Function I : R 6 × S → R 2 computes the interactions with other pedestrians and obstacles in the environment and is used to compute the collision-free current velocity v c for local navigation.The function P : R 2 → R 2 computes the position, given v c ; E : R → R 2 computes the initial position for time t 0 , which is the time at which a particular pedestrian enters the environment.The three components of the pedestrian dynamics (entry point, movement flow, and local collision-free navigation) can be mapped to the functions E(), G(), and I(), respectively.We learn E() and G() from the 2D trajectory data.The local collision-free navigation rule I() can be chosen by the datadriven algorithm.We refer to our interactive method as learning time-varying pedestrian movement dynamics (TVPMD).Fig. 1 gives an overview of our approach, including computation of TVPMD and using that computation for crowd simulation.The input to our method consists of the trajectories extracted from a sensor.The trajectories are time-series observations of the positions of each pedestrian in a 2D plane.The output TVPMD consists of entry point distributions and movement flows learned from the trajectory data.Notably, our approach is interactive and operates based on current and recent states; in other words, it does not require future knowledge of an entire data sequence and does not have to re-perform offline training steps whenever new real-world pedestrian trajectory data is acquired or generated.As a result, our approach can effectively capture local and/or individual variations and the characteristics of time-varying trajectory behaviors.We use TVPMD for data-driven crowd simulation in Section 4.

State Estimation
The trajectories extracted from a real-world video tend to be noisy and may have incomplete tracks [53]; thus, we use the Bayesian-inference technique to compensate for any errors and to compute the state of each pedestrian.
At each time-step, the observation of a pedestrian computed by a tracking algorithm is the position of each pedestrian on a 2D plane, denoted as z t ∈ R 2 .The observation function h() provides z t of each pedestrian's true state xt with sensor error r ∈ R 2 , which is assumed to follow a zero-mean Gaussian distribution with covariance Σ r : h() can be replaced with any tracking algorithms or synthetic algorithms that provide the trajectory of each pedestrian.
The state-transition model f () is an approximation of true real-world crowd dynamics with prediction error q ∈ R 6 , which is represented as a zero-mean Gaussian distribution with covariance Σ q : We can use any local navigation algorithm or motion model for function f (), such as social forces, Boids, or velocity obstacles.The motion model computes the local collisionfree paths for the pedestrians in the scene.
We use an Ensemble Kalman Filter (EnKF) and Expectation Maximization (EM) with the observation model h() and the state transition model f () to estimate the most likely state x of each pedestrian.EnKF uses an ensemble of discrete samples assumed to follow a Gaussian distribution to represent the distribution of the potential states.EnKF is able to provide state estimation for a non-linear state-transition model.During the prediction step, EnKF predicts the next state based on the transition model and Σ q .When a new observation is available, Σ q is updated based on the difference between the observation and the prediction, which is used to compute the state of the pedestrian.In addition, we run the EM step to compute the covariance matrix Σ q to maximize the likelihood of the state estimation.EM for state estimation: Expectation Maximization (EM) is an iterative process that maximizes the likelihood of the latent variable [54].The EM process repeats the ψ step, which computes the expected value for Σ q (in our case by using EnKF) and the M step, which computes the distribution with the computed value Σ q during the previous ψ step.By using EM, the likelihood of the state estimation being accurate to given observation data can be maximized.It is performed by maximizing the expected log-likelihood (ll) of covariance matrix Σ q : We can estimate this value by finding the average error for each sample in the ensemble at each timestep for each agent.

Dynamic Movement Flow Learning
We compute the movement features, which are used as descriptors for local pedestrian movement.These movement features are grouped together and form a cluster of a movement flow.
Movement Feature The movement features describe the characteristics of the trajectory behavior at a certain position at time frame t.The characteristics include the movement of the agent during the past w frames, which we call the time window, and the intended direction of the movement (preferred velocity) at this position.
The movement feature vector is represented as a six-dimensional vector: where p, v avg , and v pre f are each two-dimensional vectors representing the current position, average velocity during past w frames, and estimated preferred velocity computed as part of state estimation, respectively.v avg can be computed from (p t − p t−w.dt )/w.dt,where dt is the time-step.
The duration of the time window, w, can be set based on the characteristics of a scene.Small time windows are good at capturing details in dynamically changing scenes with many rapid velocity changes, which are caused by some pedestrians moving quickly.Larger time windows, which tend to smooth out abrupt changes in motion, are more suitable for scenes that have little change in pedestrians' movement.For our results, we used 0.5 to 1 second of frames to set the value of w.

Movement Flow Clustering
At every w steps, we compute new behavior features for each agent in the scene using Eq. 6.We group similar features (average velocities and preferred velocities) and find K most common behavior patterns, which we call movement flow clusters.We use recently observed behavior features to learn the time-varying movement flow.
K-means clustering is an iterative algorithm that first assigns the cluster membership (i.e., which cluster the points belong to) for each data point, which is the dynamics feature b i : It updates the centroids of each cluster until there is no change in µ k : We use the k-means data clustering algorithm to classify these features into K movement flow clusters.K and N f are user-defined values that represent the total number of the clusters and the total number of collected behavior features, respectively, and K ≤ N f .A set of movement-flow clusters B = {B 1 , B 2 , ..., B K } is computed as follows: where b i is a movement feature vector, µ k is a centroid of each flow cluster, and dist(b i , µ k ) is a distance measure between the arguments.In our case, the distance between two feature vectors is computed as which corresponds to the weighted sum of the distance among three points: current positions, previous positions, and estimated future positions (which are extrapolated using v pre f , c 1 , c 2 , and c 3 as the weight values).Comparing the distance between the positions rather than mixing the points and the vectors eliminates the need to normalize or standardize the data.Each movement-flow cluster contains adjacent features that have similar average velocities and preferred velocities (see Fig. 2 (c)).

Entry-Points Learning
Entry points are a component of pedestrian dynamics we want to learn to estimate when real pedestrians enter the scene.These starting positions and timings for each agent are very important and govern their overall trajectory.We use a multivariate Gaussian mixture model to learn the time-varying distribution of entry points, which will be used as the initial position x 0 for a newly added pedestrian in a data-driven crowd simulation.We define E() as the function that provides a position sampled from the learned distributions.For a non-spherical distribution, a Gaussian distribution is preferred; the distribution of entry points, which are scattered near the scene's boundary and often correspond to long elliptical regions, is frequently non-spherical (see Fig. 2 (b)).
We assume that the distribution of entry points, e, from which the function E() samples, is a mixture of J components and that each of the components is a multivariate Gaussian distribution of a two-dimensional random variable, p, with a set of parameters Each component e j is a Gaussian distribution given by the parameters θ j = (µ j , Σ j ), where µ j is the mean of the component j and Σ j is a 2 × 2 covariance matrix.α j is a mixture weight, which is the probability of a point p that belongs to the component j. α j ∈ [0, 1] for all i and the sum of α j s are constrained to 1 (1 = ∑ J j=1 α j ).From an initial guess of the parameters θ j , we perform EM to learn these parameters θ j = (µ j , Σ j ) from the given entry points collected from the real pedestrian trajectories.The entry point distribution is updated whenever we have a new observation of a pedestrian entering near the boundary of the scene (i.e., the starting positions of a trajectory).We use only the recent N e observations of entry positions from trajectories and discard old observations.A large value for N e can capture the global distribution of entry points, whereas a smaller value for N e can better capture the dynamic changes of the distribution.Although we update the model frequently, we can exploit the locality in distributions because the new distribution is evolving from the previous distribution.We use the previous parameters and choose cluster j, which satisfies argmin j ||p − µ j ||, as our initial guess for the new distributions.
EM for Gaussian Mixture Model (GMM): The E step updates the membership weights α j for all components, and the M step uses the updated membership weights and data to update these parameters.E and M steps are iteratively processed until the log-likelihood (ll(Θ)) of the mixture model converges: where L is the number of observation points, which corresponds to a set of observed entry points Z 0 = z 0 j | j ∈ R.

GMM for Entry Point Learning
In Sec. 3, we discussed the time window w used for TVPMD learning.We can leverage the range of local prediction based on the value of the time window w.As w gets larger, the estimation captures more global characteristics of the data.In this section, we present results from an experiment conducted with synthetic data to highlight the comparisons between local and global estimation.
In our experiment, synthetic data is generated from two multivariate Gaussian distributions, g 1 and g 2 : We sample 20 points for each frame (10 total frames) while decreasing the value of η gradually from 0.8 to 0.2.The sampled points are shown in Fig. 3.
We use GMM+EM to learn the parameters µ j , Σ j , and η.We compare three cases.The first case learns a mixture model using all the points.This model may best describe the global distribution observed during all the frames.The second case learns a mixture model using all past data.For example, at frame t, the model takes into account all points generated from frame 1 to frame t.The final case uses time window w = 3.In other words, the mixture model is learned from the local data sampled from three recent frames.We refer to these models as global, accumulative, and local methods, respectively.Fig. 4 shows the comparisons between these methods.The figure shows the approximated η and 1 − η for g 1 and g 2 , respectively.As shown in the figure, the global model does not perform as well as the accumulative or local models when data changes over time.The accumulative model gradually converges to the global result.The local model generates noisy approximation due to a smaller number of samples, but generally gives good approximation of ground truth data.

Performance improvement:
In Sec.3.5, we discussed how we use the previous parameters as an initial guess for updating the distributions.When we update the entry point estimation with new observations, we can use previously learned distributions (i.e., entry point distributions at t − w frame) as priors.For example, we can assign each new data point x to the closest cluster j: argmin j ||x − µ j ||.Since entry points tend to have similar distributions and previously observed entry points are stored during the time window, having such prior information provides very good estimation for the model and also improves the performance.From our experiment, the average number of iterations for EM has reduced more than 3 times when we use the prior compared to random initialization.

Applications
In this section, we use our TVPMD algorithm for data-driven crowd simulation and improved long-term pedestrian prediction.

Data-driven Crowd Simulation
The first part of this section gives details of our algorithm used to compute TVPMD.In the second part, we use TVPMD to compute the state of the virtual crowd based on local collision avoidance and situational trajectory adaption methods.For a detailed discussion of data-driven crowd simulation, we refer our readers to [55].

Asymmetric behavior
The TVPMD computation allows user interaction with the agents in the scene by adding obstacles during the simulation.Furthermore, our method can be extended to allow users to directly control any pedestrian in the scene.In this case, we need to model the asymmetric behavior for the user-controlled pedestrian.Asymmetric behavior modeling of agents has been studied and different modeling techniques have been proposed based on social forces and reciprocal velocity obstacles [56][57][58].We use a similar approach for modeling collision avoidance between a usercontrolled pedestrian and the rest of the virtual pedestrians in the scene.Currently, we use a velocity-based local navigation technique called ORCA [8] for local navigation.It computes the current velocity v c from v pre f and the set of collision-free ORCA constraints.This algorithm assumes symmetric behavior, which means that all the pedestrians have equal responsibility for avoiding a collision.However, we impose 100% of the collision-avoidance responsibility on the user-controlled agent in the simulation when it has an impending collision with other pedestrians or obstacles.
In this case, the ORCA algorithm can be slightly modified to handle both symmetric and asymmetric behaviors.Each ORCA constraint is a linear constraint computed using velocity obstacles.Given two pedestrians, A and O, we compute the minimum vector u of the change in relative velocity needed to avoid collision.Formally, the ORCA constraint on A's velocity induced by O is given as where v A is A's current velocity, û is the normalized vector u, and d is a constant value that determines the minimum change of the velocity in the direction of û.Normally when we deal with only virtual pedestrians in a scene, we set d = 1/2.This implies that each pedestrian shares the responsibility for avoiding collisions equally.When it comes to collision avoidance with a user-controlled pedestrian, we set d = 1, which means the usercontrolled pedestrian A is solely responsible for collision-free navigation with respect to the rest of the environment.As result, the user-controlled pedestrian A moves further away to avoid collision.If A has multiple neighboring pedestrians, each neighboring pedestrian will result in a separate ORCA constraint while computing the new velocity for A. Local navigation is performed by computing the new current velocity for A (v c ) that is closest to its preferred velocity (v pre f ), while satisfying all ORCA constraints: where v c and v pre f are the new velocity and preferred velocity, respectively.

Long-term Pedestrian Prediction
A key aspect in any real-time prediction algorithm is estimating the motion of the pedestrian in a crowd.In this section, we give an overview of a real-time algorithm that learns movement flows from real-world 2D pedestrian trajectories that are extracted from video.
Our approach involves no pre-computation or pre-learning, and can be combined with any real-time pedestrian trackers.For a detailed discussion of the pedestrian prediction approach, we refer our readers to [59].Fig. 6 gives an overview of our approach, including computation of movement flows and their use in pedestrian prediction.The input to our method consists of a live or streaming crowd video.We extract the initial set of trajectories using an online particle-filter based pedestrian tracker.These trajectories are time-series observations of the positions of each pedestrian in the crowd.The various components used in our algorithm are shown in the figure and explained below.The output is the predicted state of each agent that is based on learning the local and global pedestrian motion patterns (Figure 7).For more details and results we point the readers to [59].

Results
In this section, we describe the implementation of our method and highlight its performance on different scenarios.Our system runs at interactive rates on a desktop machine with a 3.4 GHz Intel i7 processor and 8GB RAM.For state estimation, we use  velocity-based reasoning as the state transition model, f ().For collision-avoidance computation, we use a publicly available library [8], and we use different real pedestrian tracking datasets corresponding to indoor and outdoor environments as the input for the TVPMD computation algorithm.These datasets are generated using manual tracking, an online multiple-person tracker, a KLT tracker, synthetic data, and 3D range sensor tracking [23,51,60].Tab. 2 presents more details on these datasets along with the number of tracked pedestrians and the number of virtual pedestrians in the data-driven simulation.
Our algorithms compute collision-free trajectories for the virtual pedestrians.
The TVPMD is able to capture the movement patterns and motion dynamics from the extracted trajectories.We have demonstrated the benefit of our pedestrian dynamics learning algorithm on several challenging benchmarks, including structured and unstructured benchmarks.Furthermore, we demonstrate its benefits on different scenarios: robust to noisy and varying sensor data (ATC Mall and Train Station scenarios), interactive computations (Black Friday Shopping Mall scenario), handling structured environments (Marathon scenario), adapting to a situation (Explosion scenario), and high-density simulation (Train Station scenario).
We have also applied it to the 2D trajectories generated from different crowd videos and compared the prediction accuracy with the ground truth data, that was also generated Crowd Scene Benchmarks: We highlight many attributes of these crowd videos, including density and the number of tracked pedestrians.We use the following abbreviations for some characteristics of the underlying scene: Background Variations (BV), Partial Occlusion (PO), and Illumination Changes (IC).We highlight the results for short-term prediction (1 sec) and longterm prediction (5 sec).We notice that our TVMPD algorithm results in higher accuracy for long-term prediction and dense scenarios.For more details and results we point the readers to [59].
(a) (b) Figure 8 (a) A frame from a video of pedestrians (From Figure 2) in a street with extracted trajectories (shown in red); (b) Our simulation algorithm computes collision-free trajectories of virtual pedestrians (shown in blue) in the 3D virtual environment, which have the same movement flows as extracted trajectories (red).
using a pedestrian tracker.The underlying crowd videos have different pedestrian density corresponding to low (i.e. less than 1 pedestrian per squared meter) and medium (1-2 pedestrians per squared meter).We highlight the datasets, their crowd characteristics, and the prediction accuracy of different real-time algorithms for short-term and long-term prediction in Tab. 1.We include comparisons to constant velocity (ConstVelocity) and a Kalman filter.Finally, we also compare the accuracy with the Bayesian reciprocal velocity obstacle (BRVO) algorithm [61] that computes a more individualized motion model for estimating local movement patterns.
It is important to predict the trajectory over a longer time-horizon.Our approach is able to perform long-term prediction (5-6 seconds) and exhibits much higher accuracy than prior methods (see Tab. 1).We notice that our algorithm results in higher accuracy for long-term prediction and dense scenarios.We use a simple prediction metric to evalu-ate the accuracy of both long and short term prediction.The average human stride length is about 0.8 meters [62].A prediction is counted as successful when the estimated mean error between the prediction result and the ground truth value at that time instant is less than this constant.We define prediction accuracy as the ratio of the number of "successful" predictions and total number of tracked pedestrians in a scene.We use our algorithm for long and short term prediction across a large number of datasets, highlighted in Tab.Performance of TVPMD on a single core for different scenarios.We highlight the number of real and virtual pedestrians, the number of static obstacles, the number of frames of extracted trajectories, and the time (in seconds) spent in different stages of our algorithm.DDS is the Data-Driven Scene computation time (in seconds), which includes the time for computing the additional collision avoidance constraints when virtual agents are introduced and other simulation overheads.

Conclusions, Limitations, and Future Work
We present statistical algorithms to learn the characteristics of pedestrian movement from trajectories extracted from real videos.These characteristics are used to compute collisionfree trajectories of virtual pedestrians whose movement patterns resemble those of pedestrians in the original video.Our approach is automatic and interactive and captures the dynamically changing movement behaviors of real pedestrians.We demonstrate its applications for many data-driven crowd simulations, where we can easily add hundreds of virtual pedestrians, generate dense crowds, and change the environment or the situation.
Limitations: The performance of our learning algorithm is governed by the accuracy of the input trajectories.Current algorithms for automatic pedestrian tracking can only handle low-to-medium density crowds.Our learning algorithm makes some assumptions about sensor error and statistical distributions and is only useful for capturing the characteristics of local pedestrian dynamics for each pedestrian, whereas offline learning meth-ods can compute many global characteristics.We consider only characteristics like movement flows, entry points etc. to compute the trajectories of virtual pedestrians and do not consider other aspects of pedestrian behaviors or states, full body actions or the interactions among pedestrians.Our approach only computes the trajectories, but we need to combine our method with techniques that can generate plausible animation and rendering.For example, we use out-of-box rendering toolkit which may result in some motion artifacts.Finally, even though in theory our algorithm can generate very dense simulations, it is limited by the underlying motion model, so only a collision-free movement without physical interaction is possible since we don't model other aspects of pedestrian behaviors or states, full body actions or the interactions among pedestrians.
Future Work: There are many avenues for future work.In addition to overcoming the limitations of our work, our interactive TVPMD can also be combined with other datadriven crowd simulation algorithms and offline behavior learning methods.We would like to combine the pedestrian dynamics characteristics with other techniques that can model complex crowd behaviors or multi-character motion synthesis techniques [18,19].

Figure 1
Figure1 Pedestrian Movement Dynamics Computation: Our method takes extracted trajectories of real-world pedestrians as input.We use the Bayesian inference technique to estimate the most likely state of each pedestrian.Based on the estimated state, we learn time-varying behavior patterns.These behavior patterns are used as underlying rules for data-driven simulation, for pedestrian prediction, and can also be combined with other multi-agent simulation algorithms.Our approach can perform all these computations in tens of milliseconds.

Figure 2
Figure 2 Pedestrian Movement Dynamics Learning: A one-frame example: (a) input consists of pedestrian trajectories (green) from the video; (b) probabilistic distributions of entry points at one frame computed using the Gaussian Mixture Model (shown as elliptical regions); and (c) movement flows grouped by the characteristics of pedestrian dynamics, in which each grouping is represented by the same color.

Figure 3
Figure3 Synthetic data points are sampled from g 1 and g 2 with varying weights for each frame.Red dots are points sampled from g 1 and blue dots are points sampled from g 2 .x and y axes refer to the respective x and y co-ordinates in the image space.

Figure 4
Figure 4 Estimated weight of cluster1 (g 1 ) and cluster2 (g 2 ) during ten frames.Ground truth (black), global (red), accumulative (green), and local (blue) the where x-axis refers to the time-step (sec) and y-axis refers to the weight.The global model does not perform as well as the accumulative or local model, when the data changes over time.Accumulative model gradually converge to global result.Local model generates noisy approximation due to smaller number of samples, but generally gives good approximation to the ground truth distribution.

Figure 5
Figure 5 Manko Benchmark: We highlight the benefits of entry point and movement flow learning.(a) A frame from a Manko video, which shows different flows corresponding to lane formation (shown with white arrows); (b) and (c) We compute collision-free trajectories of 18 virtual pedestrians (shown in green) along with the extracted trajectories of 42 real pedestrians (shown in red) from the Manko scenario.For (b), we use random entry points (dashed yellow circles) for virtual pedestrians and goal positions on the opposite side of the street.White circles highlight the virtual pedestrians who are following the same movement flow as neighboring real pedestrians.For (c), we use TVPMD (entry point distribution and movement flow learning) to generate virtual pedestrians' movements.The virtual agents follow the lane formation, as observed in the original video.

Figure 6
Figure6 Pedestrian Prediction: We highlight various components of our real pedestrian path prediction algorithm.Our approach computes the movement flow from realtime 2D trajectory data and uses it to improve prediction accuracy.

Figure 7
Figure7 Pedestrian Prediction Results: We demonstrate the improved accuracy of our pedestrian path prediction algorithm using TVPMD over prior real-time prediction algorithms (BRVO, Const Vel) and compare them with the ground truth (in yellow).We observe upto 12% improvement in accuracy.