Oppilatio+ - A data and cognitive science based approach to analyze pedestrian ﬂows in networks

Public transport services are a widespread and environmentally friendly option for mobility. In the majority of cases, passengers of public transport services will have to walk from a subway, train, or bus station to their desired travel destination. In an urban environment with a network of narrow streets, this can lead to crowd congestions during rush hour, due to the fact that passengers tend to arrive in waves. In order to monitor and analyze such crowding behavior, city planners, crowd managers, and organizers of public events must ascertain which routes these pedestrians will take from the respective station to their destination. The Oppilatio + approach is suitable for solving this problem. It is an easy-to-apply approach to predict way-ﬁnding behavior with a minimal set of information. The necessary data includes the schedule of incoming transport vehicles at the stations and the time-stamped count of pedestrians at the respective destinations. Under these conditions, the Oppilatio + approach is suitable for estimating the distribution of pedestrians on all possible walkways between stations and destinations. This information helps crowd control experts to recognize weak spots in the infrastructure and help event organizers to ensure an undisturbed arrival at their event. We validated our approach using two ﬁeld experiments. The ﬁrst one was a ﬁeld study on a public event, and the second one was a case study for a large Swiss train station.


Introduction
The number of people who live in cities will reach five billion in 2030 [1].This urban growth increases the significance of public transport services (e.g.trains, subways, or buses).Other than private transport (e.g.bicycles or cars), public transport does not take the passengers to their target destination directly.They arrive at train, subway, or bus stations and have to walk from the station to their actual travel destination (e.g. an office, their home, or an event).In urban and narrow street networks, this may lead to crowd congestions if many people arrive at the same time.Another bottleneck are narrow corridors in the stations themselves, at which all passengers arrive and depart.Such bottlenecks mainly occur during rush hour or if large public events take place (e.g.city festivals).This might result in critical situations.To prevent such crowd congestions, city planners and crowd managers require information about the distribution of pedestrians in the street network.One way to obtain this information is to employ pedestrian dynamics simulations.These simulations help to predict human movement behavior.A closer description of this popular approach is given in Sec. 2. Unfortunately, the usage of crowd simulations is quite complex.Proper use of such requires basic knowledge in applied computer sciences, precise data about all boundary conditions of the scenario (e.g.number of visitors), and background knowledge about pedestrian dynamics (e.g. for the specification of input parameters).In many cases, crowd managers, civil engineers, and city planners lack such background knowledge.Furthermore, it can be quite a difficult task to acquire valid data regarding the boundary conditions, especially in the context of complex scenarios like an urban environment.Another possibility to obtain relevant data would be an extensive video observation of all access routes.However, this method is expensive and difficult to execute due to legal regulations related to data privacy [2].Apart from data privacy issues, it can be time-consuming and complex to analyze large amounts of video data in the field of pedestrian dynamics, though support tools exist [3].We developed the Oppilatio + approach to enable crowd managers, civil engineers, and city planners to monitor and analyze pedestrian streams in an urban environment.They benefit from our approach, since it provides valuable data regarding the behavior of incoming passengers.The information can be used to detect problematic issues of the infrastructure, allowing to fix these in the future.Furthermore, it is possible to detect crowd congestions before they actually occur: if the Oppilatio + approach is applied "on the run" (e.g. during the course of an event), frequently used routes can be detected in an early stage before these paths get overcrowded.
The presented Oppilatio + method is based on the estimation of human way-finding behavior using well-grounded heuristics.It extends and improves our previous work in this research field [4].In contrast to crowd simulations, no background knowledge about pedestrian dynamics is required, and the input data for our method can be easily collected.The necessary sets of data are illustrated in Fig. 1: 1. Arrival times of public transport services at the stations 2. Accessible routes from the stations to the destinations 3. Time-stamped count of incoming pedestrians at the destinations The time schedules of the public transport services at the stations can be easily obtained from the local transport operators.The network layout of the scenario under investigation can be generated by the use of openly-licensed geodatabases.Time-stamped counts of incoming pedestrians can be acquired by manual counting.However, many automatic counting methods exist.If the entries of the destinations are monitored by cameras, video analysis tools can be applied to quantify the pedestrian inflow [5].Another alternative are light barriers at the entryways.Every pedestrian who passes through the entry will interrupt the light beam and, thus, trigger the counting system [6].If the destination in question is a public event, time-stamped entrance tickets can be used to quantify the inflow.Based on these data sets, it is possible to calculate the most likely routes for each incoming pedestrian p i , applying the algorithms given in Sec. 3.

Related Work
Current approaches to obtain information about the routing behavior of humans are based on pedestrian dynamic simulations.These complex simulations are modeled by three independent but interacting layers: strategic, tactical, and operational [7].An overview of this three-layer-approach can be seen in Fig. 2. The selection of destination(s) is done on the strategic level.This layer determines which targets are visited in which order during the simulation.For example: A person may have to decide whether to go to work directly or to visits a bakery first in order to get some morning coffee.Many pedestrian simulators use an origin-destination matrix approach for the strategic level, but there are also more Figure 2 The three layers from Hoogendorn [7]: the destination is chosen on the strategic layer, while the route to the destination is chosen on the tactical layer.The actual walking behavior is executed on the operational layer.
complex models [8].The tactical level determines which route a human takes to reach a given target: If a person wants to visit the bakery, tactical models determine which route a pedestrian will follow to get from her1 current position to the bakery.Different tactical models exist, from simple shortest path algorithms [9] to models based on psychological and cognitive findings [10].The actual walking behavior is simulated on the operational level.This layer models the stepwise movement of the pedestrians, so the operational level has to guarantee that a person will follow the route provided by the tactical model in a realistic manner.For example, if the pedestrian walks to the bakery, the calculations of the operational model have to ensure that the pedestrians walk with realistic speed and do not collide with other pedestrians or obstacles.One widespread operational model is the Social Force Model [11].
Pedestrian dynamics can be simulated on three different scales [12]: macroscopic, mesoscopic, and microscopic.Models on the macroscopic scale ensure fast simulations but have a low spatial resolution.These approaches reduce the simulation scenario to a graph-network and describe pedestrians as cumulated and flowing densities [13,14].In contrast, mesoscopic approaches model pedestrians as singular and discrete simulation objects.A common approach to describe mesoscopic models are cellular automata [15], which reduce the scenario to a regular grid and simulate the movement of pedestrians cell-wise [16].Models on the mesoscopic scale require more computational effort, but their spatial resolution is higher -only limited by the size of their unit cells [17].Another possibility is to employ models on the microscopic scale, which describe pedestrians as individual and discrete objects as well.These approaches have the highest spatial resolution since the pedestrians are simulated on a continuous space [11].Unfortunately, microscopic models have a high computational demand.Additionally, there are two types of hybrid models.The first type combines pedestrian models of different spatial resolutions [18].This type of hybrid approach is used to reduce the computational costs of a simulation.Thus, critical areas of the scenario (e.g.bottlenecks) can be simulated with a high spatial resolution, while less critical parts (e.g.open areas) can be computed using less costly models [12].An interesting example for a combination of simulation models from different scales is the bidirectional coupling of a network flow model with a cellular automaton by Borrmann et al. [19].The second type of hybrid models couples pedestrian dynamic simulations with approaches from other research fields [20,21].

Macroscopic Scale Mesoscopic Scale Microscopic Scale
Other than pedestrian dynamic simulations, the Oppilatio + approach combines confirmed knowledge from cognitive sciences with data-based knowledge to describe the routing behavior of humans.Compared to classic pedestrian dynamic simulations, the Oppilatio + method is a good alternative, due to its low computational costs for realistic large scenarios and its easy applicability.The downside of the approach is the lack of fine-grained modeling of pedestrian behavior with respect to their trajectories and interactions.

Methodology of the Oppilatio + approach 3.1 Overview of the methodology
The Oppilatio + approach aims at determining the route of individual persons based solely on the network entry time (arrival at station) and the network exit time (arrival at final destination).It comprises different individual steps to calculate the most likely route to be chosen by a pedestrian.At first, the method considers the time at which a pedestrian was registered at the destination.Thus, we have the information where and when the journey of a pedestrian ended.Based on this information, the most likely station and make decision process enter chosen edge enter subsequent node repeat until destination node calculate starting node starting node reduce network graph

Oppilatio +
Figure 4 Overview about the methodology of Oppilatio + on a network graph scenario from the point of view from a singular pedestrian the most likely arrival time at that station is calculated (see Sec. 3.2).In the next steps, the route a pedestrian has chosen must be determined.Since we know the starting and ending time as well as the starting station and the destination, we can reduce the number of possible routes for this pedestrian.Since minimal and maximal velocities are known from the literature [22], we can discard every route that is too long or too short to have been chosen (see Sec. 3.3).If, after the network reduction, there are still several possible routes left, the movement of the pedestrian is calculated edge-and node-wise.The pedestrian under consideration starts at the station node at her assumed starting time.In the case that multiple outgoing edges exist, the pedestrian choses one according to a rating function based on cognitive sciences.The rating function contains different aspects which influence the human decision process for navigation: the preference of beeline orientation (see Sec. 3.4), the tendency of limiting the number of direction changes (see Sec. 3.5) and longest legs (see Sec. 3.6), the usage of shortest paths to the destination (see Sec. 3.7), as well as the socalled "herding behavior" (see Sec. 3.8).After a decision for one outgoing edge was made, an ideal velocity is assigned to this pedestrian.The velocity is assigned in such a way that the pedestrian will reach the destination at the time she was detected at the destination.For this, the remaining total path length to the destination has to be estimated.This procedure is explained in Sec.3.3.As soon as the next node is entered, a new network reduction and decision process starts.This continues iteratively until the pedestrian reaches her destination.Fig. 4 shows this iterative process.The same procedure is executed for all pedestrians.

Allocation of pedestrians' arrival times at the stations
In a first step, we have to elaborate when and where incoming pedestrian p i started to walk from the station to her destination.Thanks to the time-stamped counting at the destinations, we know when (at an ending-time t Θ,i ) and where (at a destination Θ Θ Θ) pedestrians finished their journey, but have no further knowledge about when (at a starting-time τ i )  and where (at a station ϒ ϒ ϒ s ) they started.The Oppilatio + approach can derive this information by approximation.Initially, we use the arrival time t Θ,i at the destination Θ Θ Θ to determine the time τ i a visitor p i started at a public transport station ϒ ϒ ϒ s .Since the people arrive by public transport, the set of possible starting places is limited to the number of stations ϒ ϒ ϒ s in the scenario.Each station ϒ ϒ ϒ s has its own schedules, based on the arrival times τ s,k of the public transport vehicles.Therefore, the set of possible starting times τ i is limited by the number of public transport arrivals at this station.Consequently, the arrival times τ s,k of all stations determine the maximal number of possible starting conditions for each pedestrian.We assume that pedestrians walk directly from their station to their destination.Thus, the network between one starting station ϒ ϒ ϒ s and the pedestrians' destination Θ Θ Θ has to be an acyclic graph.However, the whole network between all stations and destinations is allowed to be cyclic.
A longest path d max and a shortest path d min can be determined for each destination.Thus, we can calculate a minimal walking duration ∆t min = d min /v max and a maximal walking duration ∆t max = d max /v min for each destination.The maximal and minimal velocity depends on different external factors, e.g. on the kind of scenario, the age structure, the current density [22], or the composition of the flow [23].In many cases, these sets of information are not available.As an approximation, we provide an overview of the average velocities for different typical scenarios in Tab. 5. A standard deviation of ±1.5σ covers over 90 percent of all possible velocities, which is sufficient for our approach.Consequently, we recommend to use a standard deviation of ±1.5σ to determine v min and v max .
If a pedestrian p i arrives at the destination at t Θ,i , she must have left the station during a period τ i ∈ ∆D i = t Θ,i − ∆t max ,t Θ,i − ∆t min .Thus, the starting times of the pedestrians p i can only correspond to the arrival times of the public transport vehicles during this time interval.If multiple transport vehicles arrive at the different stations during the time interval ∆D i , it is not possible to determine a precise starting times τ i .In this case, we assume a normal distribution as a probability distribution to distinguish between multiple possibilities of starting times at the different stations: The expected value µ i = t Θ,i − 1 2 (∆t max + ∆t min ) describes the mean value of the time interval ∆D i .The behavior of the normal distribution is given by the standard deviation.If we assume that our interval ∆D i includes about 95 percent of all possible values, we can determine the standard deviation as σ = 1 4 (∆t max − ∆t min ).In the next step, we determine the probability that the arrival time τ k of the public transport vehicle is chosen as the starting time of pedestrian p i at the station: Parameter τ j with j = 1...J corresponds to all possible arrival times of the public transport vehicles at the different stations according to their timetable.If multiple starting times τ i are possible, one starting time τ i = τ k is chosen randomly, depending on its probability Ψ i,k .As an optional parameter, it is possible to add the maximal capacity of a public transport vehicle.If the maximal capacity is reached, no more pedestrians can be assigned to this vehicle and the pedestrian will be assigned to another transport service arrival time.In the case that no other services are available, the pedestrian is discarded from the calculation.Fig. 5 shows an example of this procedure.A pedestrian was detected at the coffee shop destination at 12:48.Based on the maximal walking time ∆t max = 19 min and the minimal walking time ∆t min = 8 min, we know that p i arrived at the station in the time-interval ∆D i between 12:29 and 12:40.Each of the three given stations has its own schedule.According to these schedules, the pedestrian has arrived at the starting station either at 12:30, 12:40 (most western station) or at 12:33 (northernmost station).The final selection of such a starting configuration is calculated by Eq. 1 and 2. The determined station ϒ ϒ ϒ s and the pedestrian's starting time τ i are assigned to pedestrian p i .

Decision process and network reduction
In the next step, we estimate the most likely path a pedestrian p i has chosen from her assigned station to her destination.A path is one possible sequence of nodes and edges from the current node or edge of a pedestrian to her destination.We use information from given data and from cognitive findings for this calculation.In Sec.3.2, we calculated the starting-position and starting-time of a pedestrian p i for our scenario network.The pedestrian p i starts at the node of her starting station and has to make a decision for one of the outgoing edges.Based on the velocity v i (calculated by Eq. 6) and the length l of 12:48 Figure 6 An example of a routing process for a pedestrian p i , who walks from a bus station to a café.The pedestrian has chosen the path with the solid line.
the chosen edge, it is possible to calculate the time τ i + l/v i when the pedestrian reaches the next node.At this position and point in time, the next decision process for the routing of p i is executed.This continues for all subsequent nodes and edges, until pedestrian p i reaches the destination: As soon as a pedestrian enters a node, a new decision process starts to determine the subsequent edge.
Normally, there are several pedestrians inside the network at the same time.Thus, we need an execution sequence to handle the order of the decision processes of these pedestrians.We use a priority queue to solve this issue [24].Whenever p i makes a decision for an outgoing edge, the pedestrian is put into the queue.The priority of the queue is given by the time a pedestrian will enter the subsequent node.The decision process for the pedestrian with the lowest entering time is executed first.In this way, all pedestrians on the network are considered simultaneously.
The selection of routes in Oppilatio + is based on different aspects: a data-based network reduction plus the human preference for beeline directed walking, few direction changes, longest legs, the shortest path, and following other people [25].The data-based network reduction reduces the number of possible paths for a pedestrian p i based on the arrival time Θ, i at the destination.In the Oppilatio + approach, a routing decision process for a pedestrian is executed if the pedestrian is at an intersection-node e m at a time t.Since the pedestrian has to reach the destination at t Θ,i , all paths that are not able to fulfill this requirement can be excluded.Thus, all paths λ l with a total length d l > v max • t Θ,i − t or d l < v min • t Θ,i − t are excluded for pedestrian p i .In some cases, this reduction is sufficient enough to ensure that only one outgoing edge from the node e m is left for the decision process.Thus, pedestrian p i has to choose this outgoing edge.A further decision process has to be made if multiple outgoing edges exist after the network reduction.This procedure determines which outgoing edge of node e m is chosen by pedestrian p i .Various aspects of human cognitive behavior have to be considered.According to cognitive sciences [26], this navigation and routing behavior is a complex process.However, for estimation purposes, we assume that a significant simplification is possible.Therefore, we introduce a rating system to rate the attractiveness of different outgoing edges s m .The rating system is based on different routing approaches from cognitive sciences [25]: The probability q m that pedestrian p i chooses the outgoing edge s m is weighted by its rating value.On each node, pedestrians chose their next edge based on the total rating value ξ m (o m , ω * m , l m , λ m , ζ m,i ).Parameter α(o m ) describes human beeline orientation (see Sec. 3.4), parameter β (ω * m ) the preference for few direction changes (see Sec. 3.5) and parameter γ(l m ) the preference of pedestrians for routes with longest legs (see Sec. 3.6).The δ (λ m ) parameter describes the higher probability that pedestrians will choose a short path (see Sec. 3.7), and parameter ε(ζ m,i ) represents the herding behavior of pedestrians (see Sec. 3.8).The weighting between parameters α(o m ), β (ω * m ), γ(l m ) and δ (λ m ) are taken from the experimental studies of Kneidl et al. [25,27].Unfortunately, no weighting for the herding behavior ε(ζ m,i ) was studied in these experiments.Thus, we executed a field experiment to determine this weighting factor for public events (see Sec. 4.1).The probability of choosing an outgoing edge s m is given by According to the probability q m , the pedestrians are assigned to one outgoing edge.If a pedestrian p i is assigned to an outgoing edge s m , the pedestrian's optimal velocity v i,m has to be calculated for this outgoing edge.Thanks to our data, we know the point in time at which a pedestrian p i will reach her destination Θ Θ Θ.Additionally, we know the remaining lengths of all paths from this node e m to the destination node Θ Θ Θ.Therefore, we have to assign a velocity to this pedestrian, ensuring that the pedestrian will reach the destination at the measured point in time t Θ,i .Consequently, the optimal velocity v i for a pedestrian p i , is based on the remaining time to reach the destination at t Θ,i and on the time t m,i at which the pedestrian entered node e m .For each possible route from this current node e m to the destination node Θ, it is possible to determine the optimal velocity.This is done by adding the length of all edges of the path from the current node e m to the destination node.This results in the total length of this singular path.We obtain the optimal velocity for this path if we divide this length by the time the pedestrian has left to reach her destination: If we repeat this calculation for all possible K paths from this node to the destination node, we get the whole set (v i,1 , v i,2 , ..., v i,k , ..., v i,K−1 , v i,K ) of valid velocities for pedestrian p i .Each of these velocities represents the optimal velocity for one of the possible paths from node e m to the final node Θ Θ Θ.To preserve the greatest possible quantity of valid velocities, we assign the median velocity to the pedestrian p i .As soon as p i starts a new decision process at a node, a new optimal velocity has to be calculated to ensure that the pedestrian will reach the destination Θ Θ Θ at the measured arrival time t θ ,i .Fig. 6 illustrates the whole procedure of the Oppilatio + approach.In our example in Sec.3.2, we calculated at which station and at which time the pedestrian started.In the following, we will show how the most likely route taken by p i is determined.From the pedestrian's starting station ϒ ϒ ϒ to the first subsequent node 1 , no decision process has to be executed since only one outgoing edge is available.In total, there are two possible paths from this node to the destination ([ϒ-1 -2 -3 -5 -Θ] and [ϒ-1 -2 -4 -6 -Θ]).According to the optimal velocity ṽi,m , the pedestrian reaches the first subsequent node at 12:31 and the second one at 12:35.At node 2 , multiple outgoing edges are available.Thus, a decision process is necessary to determine which route should be taken by pedestrian p i .In this case, the edge leading to node 3 is chosen.A new optimal velocity has to be calculated based on Eq. 6.The subsequent nodes 3 and 5 do not include further decision processes since only one outgoing edge exists for each node.Due to the optimal velocity, pedestrian p i will reach the destination at 12:48, which is consistent with the measured arrival time.

Preference of beeline orientation
One important factor of attractiveness is the influence of beeline orientation.This means, that pedestrians prefer routes which run close along the beeline from their position to their destination Θ Θ Θ [25,28,29].We include this aspect in our rating by a factor α(o m ) to describe the preference of beeline-oriented outgoing edges.Next, we calculate the mean derivation o m from the beeline for each edge s m (see Fig. 7).The beeline from the node e m to the destination is given by the beeline vector Γ Γ Γ m = Θ Θ Θ − e m .Since the node e m is located at the beginning of edge s m , we can calculate the mean derivation of this edge s m The angle ω m describes the preference of humans to avoid direction changes, and the length of the outgoing edge s m describes the preference of long and straight lines as: We have to scale the derivation ν m by the length of its edge to obtain the normalized beeline orientation: We calculate the average beeline orientation of all M outgoing edges to compare these values between all outgoing edges.According to a field experiment by Kneidl [27], 71.2% of all routes chosen by the participants were beeline-oriented and 28.1% were not [25].The difference ∆p α between these values describes the percentage of pedestrians who prefer the beeline orientation.Therefore, we limit the influence of beeline orientation by ∆ α = ±0.5 • (71.24 − 28.11)% = ±21.6%.These limits define the conditions of our rating function R (x 1 , x 2 , x 3 ): In this case, we obtain the following rating:

Preference for few direction changes
Another influence factor of the routing behavior of pedestrians has to be seen in direction changes.Humans prefer to keep their current walking direction.Thus, they try to avoid direction changes [27,30] (see Fig. 8).This behavior might be helpful for humans to prevent disorientation [25].We used the angle between ω m the incoming edge s m−1 and an outgoing edge s m to determine the heading of this outgoing edge.A small angle ω m makes it more likely for a pedestrian p i to choose this outgoing edge, since large angles mean a significant change in the current walking direction.A direction change occurs if ω m ≥ ω 0 = π 18 .If the angle is smaller, pedestrians will not recognize any divergence to their current heading [25].Consequently, the angle ω * m perceived by a pedestrian can be described by the Heaviside step function H (x).This unit step function is defined as: This results in the following perceived angle ω * m : The parameter ω m is calculated by the scalar product of the neighboring edges: For each outgoing edge of a node e m , we calculate the rating (see Eq. 10) to describe the influence of direction changes for the decision process: Parameter ω * ∅ is the averaged angle of all outgoing edges from node e m .Rating parameter p β is based on an experiment executed by Kneidl et al. [25,27].73.2% of all routes selected by the participants had few direction changes, whereas 26.8% involved many direction changes.Therefore, the influence of direction changes is limited by ∆ β = ±0.5 • (73.20 − 25.49)% = ±23.9%.

Preference for longest leg
Another important aspect of human navigation behavior is the preference to walk on long and straight lines [25,29].Thus, for the decision process of a pedestrian, all outgoing edges have to be compared by their length (see Fig. 8).Each edge of the street network is a straight connection between two nodes, the length of an outgoing edge represents the minimum distance which a pedestrian can go without any interruptions.Therefore, a longer length increases the probability for p i to choose this outgoing edge.The length of an edge is simply given by Like in our previous procedure, we normalize l m by the average length l ∅ of all outgoing edges of node e m .The influence of the edge length ∆ γ = ±0.5 • (62.09 − 36.03)%= ±13.0% is given by [25,27].This study shows that 62.09% of all routes chosen by the participants started with a long line, while 36.03% did not.This leads to the following rating function:

Preference for the shortest path
Current studies show that humans have an intrinsic tendency to minimize their energy consumption while they are walking, e.g. by avoiding unnecessary high velocities [31,32].Another possibility to save energy is to take short routes to the destination (see Fig. 9).Experiments by Kneidl et al. [25,27] showed that 89.54% of all routes chosen by the participants were almost as short as the shortest path.Here, a path was considered as "almost as short" if it was not more than 10% longer than the shortest path.Only 10.46% of all route choices were longer.This results in a limiting value of ∆ δ = 0.5 • (89.54 − 10.46)% = ±39.54%.Pedestrians who are familiar with the scenario -e.g.locals, or nonlocal visitors who can rely on technical navigation support -tend to choose the shortest possible path [33].We included this into the decision process by calculating the length λ m of the shortest path from the current node e m to the destination for each outgoing edge.
There are different algorithms that can serve to solve this problem, for example the wellknown Dijkstra algorithm [9] or the Bellman-Ford approach [34,35].The parameter λ ∅ equals the average distance of the shortest paths of all outgoing edges.This results in the following rating function:

Preference for density dependency
Additionally, the navigation behavior of humans is influenced by the surrounding density of pedestrians.Humans have a preference to imitate the behavior of other people, e.g. to make the same route choices as other pedestrians [36,37].Especially people with poor knowledge of their surroundings tend to copy the route choices of other people.Thus, a high pedestrian density on an edge increases this so called "herding behavior".A sufficient description of this behavior was described by Schadschneider et al. [38], who applied the established ant-algorithm from Dorigo et al. [39] to pedestrian dynamics.In the scope of this approach, the influence of other humans on the route choice is valid only if these people are visible to the pedestrian p i .In contrast, if the density is too high, this decreases the attractiveness of a route, and pedestrians start to avoid such edges: streets that are too crowded (ρ ≥ 0.5 Ped/m 2 ) affect the tactical behavior of pedestrians [22].Thus, an algorithm that aims to cover density has to model both aspects.The edge s m runs linear between e m and e m+1 .Thus, every pedestrian on s m can see all the other pedestrians on this edge.This means that a pedestrian p j is visible on s m for a time period . Velocity v j,k equals the walking velocity of a pedestrian p j on edge s k , and τ j is the pedestrian's starting time at the station.The number of all visible pedestrians on this edge determines the perceived density of this edge s m for a pedestrian p i : The parameter b m describes the width of an edge s m .We assume that the width b m of an edge is approximately the same and, therefore, constant along its whole length | s m |.This is a reasonable assumption if we consider straight streets or well-developed pedestrian walkways.However, in the case that the width b m varies significantly, we recommend to divide such an edge into two edges with different widths.
The number of all pedestrians p j that are visible for a pedestrian p i at an edge s m at the time t m,i is given by: Parameter t m,i is the moment a pedestrian p i would enter the intersection e m .At this time, the pedestrian p i has to decide which edge to choose next.Therefore, the local density at this moment influences the decision making process.This point in time can be calculated by We use the established parabolic relation by Greenshield [40] to model density-dependent behavior.The Greenshield approach is based on the fundamental diagram of transportation sciences.It describes the dependency between traffic flow and the local density.Our scoring system is based on this approach to model the contrary density behavior: The parameter ρ max = 5.4 Ped/m 2 describes the amount of density at which the crowd flow stops for unidirectional pedestrian movements [22].This value corresponds to the highest possible density on an edge.most pedestrians are walking in the same direction (e.g.pedestrian flows to a public event), a unidirectional flow is given.Otherwise, bidirectional flows should be considered.The fundamental relation between density and velocity for bidirectional flow is an often discussed research question [23,[41][42][43][44].The research results show differences between the fundamental diagram of uni-and bidirectional pedestrian flow [43].Furthermore, the fundamental relation for bidirectional flows depends on the relative strength of these two interacting flows [23].As a result, no definite maximal density can be given for the case of bidirectional flows.Consequently, we use the maximal density of unidirectional flows as an approximation for bidirectional flows.
In a next step, the rating of each route is compared to the average rating value ζ ∅,i of all outgoing edges of node e m : The factor ζ m,i = ±0.93determines the influence of the density dependencies and is based on our experimental observations (see Sec. 4.1).

Field study: A public event
The Oppilatio + method was validated on the basis of a local open air music festival in the summer of 2015.This annual event took place in the metropolitan region of Munich and was visited by approximately 5000 persons [45].The largest share of visitors were under the age of 30.Most of them arrived by public transport.They disembarked at the nearby subway station and walked from there to the entry area of the music festival.We tracked 733 visitors on their way from the station to actual event site to verify the routing suggestions calculated by Oppilatio + .This field experiment was executed by student assistants, who followed visitor groups to record their trajectories with GPS-capable mobile phones.We tracked these pedestrians in 79 GPS-records over the whole duration of the event.The sizes of the tracked groups varied between 2 and 41 persons.60 of the 79 records were groups with a group size smaller than 10 persons.To obtain relevant paths only, we discarded all tracks with routes that were chosen by less than one percent of all recorded visitors.In consequence, we discarded 8 tracks with a total of 35 visitors.The valid 71 tracks are visualized in Fig. 10.Under open sky conditions, the GPS-devices in mobile phones have an average accuracy error between 1 and 5 meters [46,47].In our field study, nearby buildings may have had a negative impact on the accuracy of our collected GPS-data.These deviations are the reason why some of the recorded trajectories pass through buildings although these facilities were inaccessible for the festival visitors (see Figure 10).Based on the recorded trajectories, we built a network graph as a test scenario for Oppilatio + .On this graph, we determined the probability that visitors will use a specific edge s m on their way from the station to event site.These probabilities were compared with the probabilities calculated by Oppilatio + .In the result Tab. 2, edges with the same distribution of pedestrians are combined to edge sequences (see Tab. 1).This was only done if the edges were connected to each other and there were neither forks nor junctions between the edges of an edge sequence.Under these conditions, every pedestrian who enters one edge of an edge sequence has to visit every other edge of this edge sequence.For example, the scenario-layout forces every pedestrian who enters edge 03 to also enter edge 05, since there are no route choices between these two edges.Thus, these two were combined to one edge sequence for Tab. 2, since these two edges will be visited by the same pedestrians.Tab. 1 references all non-trivial edge sequences in this scenario.Edges 01 and 20 are trivial, since they contain all pedestrians of the scenario.
5991 visitors were time-stamped when they entered the public event.Following our approach, we were able to assign 4387 of these visitors to one of the incoming subways of the nearby subway station (see Sec. 3.2).According to the velocity distribution of leisure travel [22], we assumed an average velocity of 0.99 m s −1 for a maximal divergence of ±1.5σ = 0.39 m s −1 .All pathways in the scenario had a sufficient width, so no crowd congestions could occur.Thus, we assumed a large width of 25 m.The results from Oppilatio + for each herding parameter were averaged over 500 calculation runs (see Tab. 2).
We developed a procedure to determine the best fitting herding parameter according to our calculation results.In a first step, the differences between the field data D {m} and the calculated results E {m} have to be determined for all edge sequences {m}: We defined a grading parameter E ∆ ε to compare the concordance of the data and the calculation results for different herding factors ∆ ε (see Tab. 2).Grading parameter E ∆ ε is the cumulated and weighted divergence of all edge sequences.We weighted the impact of ∆ {m} for each edge sequence {m} by the total length of this sequence (see Tab. 1).This ensures that long edge sequences contribute more to the grading parameter E ∆ ε than short ones.A low E ∆ ε value corresponds to a low divergence between data and calculated values.
The results are shown in Fig. 11.Red crosses represent all data differences ∆ {m} of individual edge sequences {m}.Black triangles are the cumulated and weighted data divergences E ∆ ε from all edge sequences.These divergences are shown for each chosen herding factor ∆ ε .We determined these divergences for various herding factors ∆ ε and detected a minimum for ∆ ε = 0.93.Thus, we assume that the herding behavior was the main influence factor for the route choices of the arriving visitors.Such a strong herding behavior corresponds with the observations of the event organizers during our field study [45].
During the peak hours, most visitors walked along the edge sequences {1}−{5}−{10} to the event site.Thus, the Oppilatio + approach was able to determine edge sequences which were visited by the largest share of pedestrians.Larger differences between the field study data and the pedestrian distributions calculated by Oppilatio + could be observed on the edge sequences {4} and {8}.In both cases, the calculated values were higher than the data measured during the field study.Edge-sequence {8} is surrounded by two large building complexes.Thus, this route seems darker and is less visible and attractive than edge sequence {9}.We overestimated the total attractiveness of this sequence, since our approach does not consider the attractiveness of illumination.The overestimation of edge sequence {4} can be explained by the influence of herding.In reality, the frequently used edge sequence {1} is visible from edge sequence {2}.Thus, the herding on this route may have influenced many pedestrians to choose edge sequence {3} instead of {4}.Our method considers herding on neighboring edges only, so this influence was not taken into account for the Oppilatio + -approach.Another important aspect is the scattering of calculation results.Since we use a probabilitybased approach, calculation runs of the same scenario can lead to different results.A small scattering corresponds to a small difference between individual calculation runs.A significant parameter to quantify the amount of scattering is the standard deviation σ {m} .In our case, the standard deviation σ {m} for an edge sequence {m} describes the difference between its individual calculation runs x i and the averaged value x of all 500 calculation runs: We normalize σ {m} by its expected value x{m} to make the standard deviations of different edge sequences {m} comparable to each other.Doing so, we obtain the coefficient of variation V {m} : Following Eq. 24, we introduce a rating parameter V ∆ ε , which describes the cumulated and weighted scattering of all edge sequences for a specific herding factor ∆ ε : The results for V ∆ ε and V {m} according to the herding factor ∆ ε are shown in Fig. 12. Red crosses represent the scattering V {m} of individual edge sequences {m}, and the black triangles are the cumulated and weighted scattering V ∆ ε from all edge sequences.The cumulated and weighted coefficient of variation V ∆ ε increased from less than 2% for no herding (∆ ε = 0.00) to more than 20% in the case of full herding (∆ ε = 0.99).Thus, it is possible to assume that a larger herding factor ∆ ε increases the scattering of the calculation results.If we assume a causal relationship between the herding factor and the coefficient of variation, we can explain it by the high instability of herding behavior.If a small amount of people choose a specific route at the beginning, following pedestrians will more likely choose the same route.Thus, a small change in the beginning can have a large impact on the overall pedestrian distribution of the whole scenario.

Case study: A large railway station
We carried out a case study in cooperation with Swiss Federal Railways (SBB), Switzerland's national railway company.This case study is based on data collected at the railway station "Zürich Stadelhofen".The station is heavily frequented, so crowd congestions arise quite often during rush hour.It is necessary to gain a better understanding of passenger flows inside this station to prevent such inconveniences.Thus, we applied the Oppilatio + method to predict the distribution of flows.This is more challenging than the public event described in Sec.4.1.In the railway station scenario, multiple origins and destinations must be considered (see Fig. 13).In contrast to the public event, the pedestrians move over two different floors connected by staircases.The walking velocity of pedestrians on staircases was slowed down [22] by extending the corresponding edge length.All data for the case study was collected on a working week in January 2016 during the 7:00 AM to 10:00 AM morning rush hour.Therefore, the pedestrian traffic consists mainly of commuters.Commuters are typically well aware of the route they want to take, since they tend to take the same route every day.Consequently, commuters are not influenced by any herding behavior.As a result, we assumed a herding factor of ∆ ε = 0.00.
The commuters arrive by train on a station platform and walk from this platform (red nodes in Fig. 13) to one of the different exit doors (blue nodes in Fig. 13) to leave the railway station.All exit doors were equipped with automatic counting devices to keep track of the in-and outflow of pedestrians separately.Consequently, time-stamped data for every passenger leaving this facility is available.We used these measurements as input data for the Oppilatio + approach.Additionally, field data was available from counting devices at different places inside of the railway station (numbers with gray backgrounds in Fig. 13).We used these data to compare the arising local pedestrian flows with the densities calculated by our approach.For the calculation, we used an average velocity of 1.49 m s −1 with a maximal divergence of ±1.5σ = 0.39 m s −1 to describe the velocity distribution of commuter traffic [22].The walkway width within the station was assumed to be 3 m.The results for each day were averaged by five calculation runs.During the morning rush hours of the observed working week, each day between 17000 and 20000 pedestrians were counted at the exit doors of the station.Approximately three quarters of the measured pedestrians were leaving and one quarter of them were entering the railway station.Only pedestrians leaving the station can originate from one of the arriving trains.In contrast, incoming pedestrians could not originate from trains and therefore must be considered disruptive noise according to our measured field data (see Tab. 3).This problem was much larger in the evening rush hour from 4:00 PM to 7:00 PM.In this time period, approximately 23000 people entered or left the railway station.Unfortunately, two thirds of these pedestrians were entering the station and therefore not originating from any train 2 .In this case, the disruptive noise factor is much larger then the actual measured data.Therefore, only the morning rush hour was used for this case study.Another disruptive factor are problems with the counting devices themselves.Exemplary, the data from station 4 seems incomplete, since approximately 500 persons were measured at the outflow, but only a few pedestrian at the inflow of this edge.Furthermore, the network we use as a scenario for our approach is always a simplification of the reality.For example, 10 OUT and 11 IN are the same directed edge in our scenario and therefore the same amount of people should be counted at the corresponding counting devices.If we look at the counted field data, we recognize that the counting devices at the two ends of this edge have counted a different number of pedestrians.This means, that some measured pedestrians entered this edge from node 7 , but did not reach node 9 .A possible explanation is, that some pedestrians have changed their walking direction and therefore have turned around while visiting this edge.According to our opinion, the largest uncertainty factor is the allocation of pedestrians on the different arriving trains (see Sec. 3.2).During the rush hour of each day 104 trains arrived, which means that the average time gap between two incoming trains was less than two minutes.Consequently, for almost every measured pedestrian, a possible train could be assigned, although many pedestrians have originated from different locations (e.g. from another entrance of the station).This results in an overestimation of the pedestrian flow since many pedestrians just crossed the railway station and therefore were only counted at a few counting devices.If many trains arrive at the same time and many different origins exist it is more difficult to obtain significant results.The Oppilatio + could give some sufficient and significant results.For example, our approach detected the most crowded directed edge 2 OUT for all five observed days as well as two of the five most crowded ones.So even in such a more complex scenario, our approach is able to detect potential crowd congestions.Tab. 4 shows the comparison between the available data at the counting devices and the crowd flow on the corresponding edges calculated by our method.The numbers Π are the locations of the counting devices shown in Fig. 13.At the numbered locations, the amount of pedestrians were measured for both directions.Overall, a good coherence exists between measured and calculated data.However, due to the complexity of this scenario some edges are largely overestimated by our method.The most problematic one is counting device 4 IN .Our approach estimated that approximately a fifth of all passengers passed this edge, although the devices detected only a few persons.

Conclusion and Outlook
We presented a new method to detect and analyze the walking behavior of pedestrians.The Oppilatio + approach is suitable to estimate the route choices based on a very limited set of easily collectible data.To achieve this, a grading factor based on cognitive insights is used to calculate routes preferred by human beings.More precisely, the rating uses the tendency of pedestrians to prefer beeline oriented routes with few direction changes and longest legs.Furthermore, we included the pedestrians' preference for routes with a low total length and the tendency to follow the same routes as other pedestrians.The calibration of these cognitive parameters is based on experimental findings from a large experiment by Kneidl et al. [25,27].Additionally, we validated our results with field data from a local music festival and conducted a case study for a large Swiss train station.Thus, we conclude that Oppilatio + is a useful method to obtain information about the behavior of human crowds.Because it is easy to apply, it is well-suited for crowd managers, city planners, and event organizers analyzing the flow of human crowds.Due to its nature of an approximation method based on sparse data, the method will not produce error-free estimations.Differences between the actual behavior and the estimated behavior are likely to occur.However, the data captured in the frame of our field studies show that the error is acceptable and the outcomes can be used as a basis for decision making.Oppilatio + is a good estimation method to support crowd management, but needs specific boundary conditions to function properly.One important drawback compared to pedestrian dynamics simulations is the necessity of collecting data at the destinations.This information is essential for the usage of Oppilatio + .Furthermore, the application field is mainly restricted to scenarios with a clocked inflow of pedestrians (e.g.shuttle buses, trains, subways).A continuous inflow of pedestrians produces too much noise for our approximation method.A benefit of our approach is the low computational time compared to pedestrian dynamics simulations.Our approach is suitable to calculate the publicevent-scenario with approximately 5000 visitors in less than 30 seconds (see Sec. 4.1) and the complex railway-station-scenario with 13000 passengers in some minutes (see Sec. 4.2) on a recent standard personal computer.The parameter weighting of α(o m ), β (ω * m ), γ(l m ) and δ (λ m ) is based on a field study.Test subjects of these experiments were students of a local university [25,27].Thus, it is possible that the determined weighting parameters are biased.Further research studies are necessary to make a more general statement about these parameter values.The herding parameter ε(ζ m,i ) was determined via empirical data (see Sec. 4.1) for a public event.Unfortunately, the current literature provides no information about the influence of the herding factor on human navigation.Thus, we propose Tab. 5 with assumed herding factors for different scenarios, based on our personal experience.But more validated research about the influence of herding in the context of crowd dynamics is needed urgently.
In further research, various extensions are planned for the Oppilatio + method.One Table 5 Velocities from Weidmann [22] and assumed herding factors Scenario Average Herding Velocity [22] Factor Commercial Traffic 1. main issue concerns the layout input: at the current state, event organizers have to define possible walking paths from stations to destinations manually.Thus, we are planning an integration with a network design approach.This method optimizes a given network scenario based on insights from cognitive science.In this way, it is possible to determine all routes that are pareto-optimal regarding their attractiveness in the scope of human routing behavior.The network scenario will be read out automatically from open geodatabases.In this way, experts who use the Oppilatio + method will obtain a set of routes most likely used by the pedestrians.These routes can be used as an optimized scenario input.Based on our field study, we recognized that the inclusion of local illumination would improve the results of the Oppilatio + approach.Unfortunately, the collection of illumination data is quite complicated under real life conditions.Furthermore, the aspect of density-dependent deceleration is currently not covered.This means that the velocity, which is assigned to a pedestrian as she enters an edge, is independent from the current pedestrian density.This velocity is calculated based on the paths which are available for this pedestrian (see Sec. 3.3).Therefore, we can ensure that each pedestrian will reach her destination at the correct time.If we introduce a density-dependent velocity, we can not guarantee that all pedestrians will reach the destination at the time they were measured.Consequently, we decided not to include a density-dependent velocity.However, we will work on this issue in future publications.In future research, we will consider the problem of background noise produced by undetected pedestrians.In the Oppilatio + approach, only pedestrians who enter the destinations are detected and treated in the calculation.Pedestrians who have other destinations are not considered.Nevertheless, these undetected persons have an influence on the measured pedestrians in the real world.We can consider this influence by adding random density noise to the edges of our network.However, large field studies are necessary to determine the correct noise value for different scenarios.For example, the noise value for the field study on the festival was approximately zero, since nearly all pedestrians were visitors of this event.In contrast, the noise has a much larger influence on a scenario which contains shopping streets.Future field studies should address this research question.Another issue -a more technical one -is the integration of automatic counting devices.If a scenario is analyzed after all data was recorded (e.g. in the field studies in Sec. 4) this is not an issue, since all counting data is available.However, it is an issue if our method is applied simultaneously to the data measurement (e.g. during a public event) for the sake of a "live" analysis of incoming crowds.In these cases, new counting data has to be added to the Oppilatio + method continuously.

Figure 1
Figure 1 Based on easily accessible data, Oppilatio + calculates the distribution of pedestrians on a network of walkways.

Figure 3
Figure 3 Multiscale view on pedestrian dynamic simulations: the macroscopic, mesoscopic and microscopic scale

Figure 5
Figure5 An example of how the station ϒ ϒ ϒ s and the starting-time τ i are determined for a pedestrian p i .The H H H-symbols in this scenario represent stations of the public transport services.

Figure 7
Figure7 Beeline-derivation for the edge s m between the node e m and its subsequent node e m+1

Figure 9
Figure9 The shortest path leads from e m over s m , s m+1 and s m+2 to the destination Θ Θ.The edge colors visualize the amount of density on these edges.The density ρ m is compared to the average density of all outgoing edges from e m .

Figure 10
Figure 10 Layout of the researched field study with a visualization of the tracked GPS-data.The number of the directed edges corresponds to the edge sequences in Tab. 1

Figure 11
Figure 11 Data divergence of ∆ {m} and E ∆ ε according to the herding factors ∆ ε .

Figure 12
Figure12 Scattering V {m} and V ∆ ε of the calculation runs according to the herding factor ∆ ε .

Figure 13
Figure 13 Network of the railway station scenario.Pedestrians walk from station platforms (red nodes) to the different exits of the railway station (blue nodes).Red edges represent staircases between the two floors of this station.Measured pedestrian flow was counted at different counting locations Π (numbers with gray background) 45 m s −1 to 1.61 m s −1 0.00 − 0.05 Commuter Traffic 1.34 m s −1 to 1.49 m s −1 0.00 − 0.15 Shopping Traffic 1.04 m s −1 to 1.16 m s −1 0.25 − 0.40 Public Event 0.99 m s −1 to 1.10 m s −1 0.80 − 0.95

Table 1
All non-trivial edge sequences [m] and their lengths (see Fig.10)

Table 3
Overview about the number of pedestrians detected at the exits of the railway station over the observed time period and the number of pedestrians assigned to an arriving train by Oppilatio + .

Table 4
Percent results from the case study compared with the results from Oppilatio + .The percentage share refers to Π D for the measured data and Π O for the calculation results from Oppilatio + (see Tab. 3).Numbers Π correspond to the locations of counting devices shown in Fig.13.