Document [original]

CHAPTER 3

A Closer Look at Scoring

Kai Nagel, Benjamin Kickh¨

ofer, Andreas Horni and David Charypar

3.1 Good Plans and Bad Plans, Score and Utility

As outlined in Section 1.4 and by Figures 1.1 and 1.4, MATSim is based on a co-evolutionary algo-

rithm: Each individual agent learns by maintaining multiple plans, which are scored by executing

them in the mobsim, selected according to the score and sometimes modified. In somewhat more

detail, the iterative process contains the following elements:

mobsim The mobility simulation takes one “selected” plan per agent and executes it in a synthetic

reality. This may also be called network loading.

scoring The actual performance of the plan in the synthetic reality is taken to compute each

executed plan’s score.

replanning consists of several steps:

1. If an agent has more plans than the maximum number of plans (a configuration

parameter), then plans are removed according to a (configurable) plan selector (choice

set reduction, plans removal).

2. For some agents, a plan is copied, modified and then selected for the next iteration

(choice set extension, innovation).

3. All other agents choose between their plans (choice).

An agent’s plans in a given iteration may be considered the agent’s choice set in that iteration. As a

result, steps 1 and 2 of replanning modify the choice set, while step 3 implements the actual choice

between options. Choice is typically based on the score; higher score plans are more likely to be se-

lected. This is discussed in more detail in Chapters 47 and 49. For the time being, note that the three

steps of replanning must cooperate for the approach to work: the plans removal step should remove

“bad” plans, the innovation step should generate “good” plans, and the choice should, ingeneral,

How to cite this book chapter:

Nagel, K, Kickh¨

ofer, B, Horni, A and Charypar, D. 2016. A Closer Look at Scoring. In: Horni, A, Nagel, K

and Axhausen, K W. (eds.) The Multi-Agent Transport Simulation MATSim, Pp. 23–34. London: Ubiquity

Press. DOI: http://dx.doi.org/10.5334/baw.3. License: CC-BY 4.0

24 The Multi-Agent Transport Simulation MATSim

select good plans. Here, “good” means “able to obtain a high score in the mobsim/scoring”. Fortu-

nately, due to its evolutionary concept, the approach is fairly robust: the innovation step does not

always have to generate good solutions; it is sufficient if some of the solutions are good and lead to

a high score.

With this, it is clear that scoring is a central element of MATSim. Only solutions obtaining a high

score will be selected by the agent and survive the plans removal step. Thus, the scoring function

needs to be “correct” for a given scenario, meaning, more or less, that plans “performing well”

obtain a higher score than plans that “do not perform well”. Whether a performance is good or

not, is decided, in the end, by travelers living in a region: some may prefer a congested car trip,

others may prefer a crowded, but affordable, trip by public transit, while others may prefer using

the bicycle, even in bad weather.

The typical way to bridge this gap is to use econometric utility functions, for example, from

random utility models (e.g., Ben-Akiva and Lerman, 1985; Train, 2003) for the score. However, in

AI (Artificial Intelligence), utility functions may also be used in a more general way: for example,

the scorethat each individual agent (or the system as a whole) wants to, or should, optimize (Russel

and Norvig, 2010). For these reasons, the terms “score” and “utility” are normally interchangeable

in the MATSimcontext. Since we will need the concept of a marginal utility, this chapter will mostly

speak of ’utility’, since it is a bit unusual to talk about ’marginal score’.

The user can configure numerous parameters to specify the scoring function. When users are

ready to extend MATSim in the next part of the book, they will also learn how to plug in their own

customized scoring function.

However, because MATSim is based on complete day plans, the application of choice models

for parts of day plans only (for example, mode choice) is not straightforward, as detailed in Sec-

tion 97.4.4. Because of the absence of complete-day utility functions in the literature, MATSim has

started with the so-called Charypar-Nagel scoring or utility function (Section 3.2). This scoring

function was, at times, modified, extended, or replaced for specific investigations (Section 3.5).

Readily applicable estimates for a full-day utility function are not yet available, as discussed in

Section 97.4.4.

3.2 The Current Charypar-Nagel Utility Function

3.2.1 Mathematical Form

The first, and still basic, MATSim scoring function was formulated by Charypar and Nagel (2005),

loosely based on the Vickrey model for road congestion, as described by Vickrey (1969) and Arnott

et al. (1993). Originally, this formulation was established for departure time choice. However, all

studies performed so far indicate that the MATSim function is also appropriate for modeling fur-

ther choice dimensions. It is, however, almost certainly not appropriate for activity dropping and

activity addition (see Section 3.3).

Basic Function For the basic function, utility of a plan Splan is computed as the sum of all activity

utilities Sact,qplus the sum of all travel (dis)utilities Strav,mode(q):

Splan =

N−1

q=0

Sact,q+

N−1

q=0

Strav,mode(q)(3.1)

with Nas the number of activities. Trip qis the trip that follows activity q. For scoring, the last

activity is merged with the first activity to produce an equal number of trips and activities.

A Closer Look at Scoring 25

Activities The utility of an activity qis calculated as follows (see also Charypar and Nagel, 2005,

p.377ff):

Sact,q=Sdur,q+Swait,q+Slate.ar,q+Searly.dp,q+Sshort.dur,q.(3.2)

The individual contributions are defined as follows:

•The expression

Sdur,q=βdur ·ttyp,q·ln(tdur,q/t0,q)(3.3)

is the utility of performing activity q, where opening times of activity locations are taken into

account. tdur,qis the performed activity duration, βdur is related to the marginal utility of activity

duration (or marginal utility of time as a resource, the same for all activities; see Section 3.2.4),

and t0,qis the duration when utility starts to be positive.

•The expression

Swait,q=βwait ·twait,q

denotes waiting time spent, for example, in front of a still-closed store; βwait is the so-called

direct (see Section 3.2.4) marginal utility of time spent waiting; and twait,qis the waiting time.

We recommend leaving βwait at zero; also see Section 3.2.5.

•The expression

late.ar,q=βlate.ar ·(tstart,q−tlatest.ar,q)if tstart,q>tlatest.ar,q

0 else

specifies the late arrival penalty, where tstart,qis the activity starting time qand tlatest.ar is the

latest possible penalty-free activity starting time (for example, the starting time of the office

core hours, or the starting time of an opera or theater performance).

•The expression

Searly.dp =βearly.dp ·(tend,q−tearliest.dp,q)if tend,q>tearliest.dp,q

0 else

defines the penalty for not staying long enough, where tend,qis the activity ending time and

tearliest.dp,qis the earliest possible activity end time q. We normally recommend leaving βearly.dp

at zero, except if really good data about this effect is available.

•The expression

Sshort.dur,q=βshort.dur ·(tshort.dur,q−tdur,q)if tdur,q<tshort.dur,q

0 else

is the penalty for a ’too short’ activity, where tshort.dur is the shortest possible activity duration.

We normally recommend leaving βshort.dur at zero, except if really good data about this effect is

available.

The config syntax (config version v2) is approximately

26 The Multi-Agent Transport Simulation MATSim

...

</ parameterset>

...

</ module>

Travel Travel disutility for a leg qis given as

Strav,q=Cmode(q)+βtrav,mode(q)·ttrav,q+βm·1mq

+(βd,mode(q)+βm·γd,mode(q))·dtrav,q+βtransfer ·xtransfer,q(3.4)

where:

•Cmode(q)is a mode-specific constant.

•βtrav,mode(q)is the direct (see Section 3.2.4) marginal utility of time spent traveling by mode.

Since MATSim uses and scores 24-hour episodes, this is in addition to the marginal utility of

time as a resource (again, see Section 3.2.4).

•ttrav,qis the travel time between activity locations qand q+1.

•βmis the marginal utility of money (normally positive).

•1mqis the change in monetary budget caused by fares, or tolls for the complete leg (normally

negative or zero).

•βd,mode(q)is the marginal utility of distance (normally negative or zero).

•γd,mode(q)is the mode-specific monetary distance rate (normally negative or zero).

•dtrav,qis the distance traveled between activity locations qand q+1.

•βtransfer are public transport transfer penalties (normally negative).

•xtransfer,qis a 0/1 variable signaling whether a transfer occurred between the previous and

current leg.

The config syntax (config version v2) is approximately

</ parameterset>

...

</ module>

Equation (3.4) is the direct utility contribution of travel; see Section 3.2.4 for the the full indirect

utility as well as the relation to the VTTS (Value of Travel Time Savings), and Chapter 51 for a more

general discussion.

Note that distance contributes to disutility in two ways. First, it is included in a direct manner

via βd,mode(q), which is normal for modes involving physical effort, like walking or cycling. Second,

distance is also included monetarily via βm·γd,mode(q), which is normal for car or pt mode, where

monetary costs increase depending on distance.

A Closer Look at Scoring 27

3.2.2 Illustration

Figure 3.1 illustrates the scoring function. Time runs from left to right. The example shows part of

an executed schedule, with home, work, and lunch activities, connected by a car and walk leg.

Activities are scored with concave functions, modeling decreasing returns to spending more time

at the same activity. Travel, in contrast, is modeled with downward sloping straight lines, where

the slope may differ for different modes of transport and there may be an initial offset (alternative-

specific constant). Note the delay between arrival at the workplace and workplace opening time,

reflected in no score accumulation during that period. Agents accumulate those scores over a day,

reflected in the bottom graph.

When one assumes all other things (particularly travel times) are equal, then agents maximize

their score when activity durations are such that all activities have the same slope (=the same

marginal utility; red lines). This follows from basic economic theory (cf. Section 51.2), but can also

be seen intuitively; if red lines did not all have the same slope, the agent could gain by extending

those activities with steeper slope at the expense of others. Clearly, this holds only when all other

things remain constant, particularly travel times.

3.2.3 The “Wrapping Around” of the Utility Function

The MATSim mobsim typically starts at midnight and runs until all plans have reached their final

activity. By itself, the mobsim, is not limited to a day. However, as already stated in Section 3.2.1,

the standard scoring function assumes that plans “wrap around” to 24-hour days. Thus, the last

activity is merged with the first into one activity. For example, if the first activity ends at 7 am and

the last activity starts at 11 pm, then it is assumed that this is the same activity, with a duration of

eight hours.

accum. score

@home @workplace @lunch

time

workplace opening time

wlkcar

score

Figure 3.1: Illustration of the scoring function. TOP: Individual contributions of activities and

legs. BOTTOM: Score accumulation over a day.

28 The Multi-Agent Transport Simulation MATSim

Note that scoring the two activities separately would lead to a different result, because of the

nonlinear (logarithmic) form of the utility of performing. For example, ln(1)+ln(7)=ln(7)6=

ln(1+7)=ln(8).

3.2.4 MATSim Scoring, Opportunity Cost of Time, and the VTTS

As a result of the wrap-around concept, travel receives, beyond the typically negative direct

marginal utility βtrav,mode, an additional implicit penalty from the marginal utility of time as a

resource: If travel time could be reduced by 1ttrav, the person would not only gain from avoiding

βtrav ·1ttrav, but also from additional time for activities (effect of the opportunity cost of time).

The (total) marginal utility of travel time savings is thus:

mUTTS = − ∂

∂ttrav

Strav +∂

∂tdur

Sdur.

which is

mUTTS = −βtrav +βdur ·ttyp,q

tdur,q

(3.5)

and at the typical duration of an activity

mUTTStdur,q=ttyp,q

= −βtrav +βdur,

where it can be imagined qis the activity immediately following the trip (cf. Section 51.2). The

marginal utility of travel time savings, mUTTS, can thus be defined as the indirect effect on the

overall time budget, corrected by an offset βtrav that denotes how much better, or worse, it is to

spend that time traveling, rather than “doing nothing”.1To differentiate βtrav from the indirect

effect, it is sometimes called direct marginal utility of time spent traveling.

The marginal utility of travel time savings can be transformed to the more common VTTS

(Value of Travel Time Savings) by dividing it by the marginal utility of money, βm:

VTTS =mUTTS

βm

−βtrav +βdur ·ttyp,q

tdur,q

βm

and at the typical duration of an activity

VTTStdur,q=ttyp,q

=mUTTS

βmtdur,q=ttyp,q

=−βtrav +βdur

βm

This is important for calibration of the utility function.

3.2.5 The Resulting Modeling of Schedule Delay Costs

Arriving Early In the same way as the marginal utility of travel time savings is not only given

by −βtrav, but instead by −βtrav +βdur ·ttyp,q

tdur,q, the marginal utility of waiting time savings is given

1This is an approximate statement; in the full theory, the reference marginal utility is not given by “doing nothing”,

but by a Lagrange multiplier related to the constraint that a day has 24 hours; again, cf. Section 51.2.

A Closer Look at Scoring 29

by mUWTS = −βwait +βdur ·ttyp,q

tdur,q: Even when the direct marginal utility of waiting, βwait, equals

zero, then “doing nothing” still eats into the overall time budget and thus incurs the same oppor-

tunity cost of time as traveling does. Intuitively, one can imagine that one must leave the previous

activity earlier to have a longer waiting time, thus reducing the score of the previous activity.

Thus, as long as one cannot estimate βwait separately from βdur, we recommend leaving βwait at

zero.

Arriving Late Arriving late incurs a marginal utility of βlate, typically negative. Here, no addi-

tional opportunity cost of time is involved. Intuitively, arriving later implies having left the previous

activity later. That is: the current activity is shortened by the same amount that the previous activity

was extended, leaving the overall score unaffected (cf. Section 51.2).

Vickrey Parameters As a result, the Vickrey parameters of α(marginal penalty for arriving

early), β(marginal penalty for traveling) and γ(marginal penalty for arriving late) (as defined

by Arnott et al., 1990) are consistent with the following equations:

−βwait +βdur ·ttyp,q

tdur,q=α

−βtrav +βdur ·ttyp,q

tdur,q=β

−βlate =γ.

(3.6)

3.3 Implementation Details

This section summarizes the current implementation of the default MATSimscoring function. The

section can be skipped if the reader understands that what has been summarized up to this point

is not the full story.

3.3.1 Zero Utility Duration

The duration when an activity’s utility is exactly zero is computed by the somewhat cryptic

expression

t0,q:=ttyp,q·exp−10h

ttyp,q·prio,(3.7)

where prio is a configurable parameter. This is designed so that all activities with the same value of

prio obtain, at their typical duration, i.e., when tdur,q=ttyp,q, the same utility value of 10 ·βdur, with

the idea that this makes them equally likely to be dropped in a time shortage situation (Charypar

and Nagel, 2005).2However, this does not work as intended, since activities receiving this utility

value from a short duration have a larger utility accumulation per time unit than others and are thus

dropped later. In consequence, without additional constraints, the “home” activity gets dropped

2Starting from Equation (3.3) and inserting Equation (3.7), one obtains

Sdur,qtdur,q=ttyp,q

=βdur ·ttyp,q·ln ttyp,q

ttyp,q·exp−10h/(ttyp,q·prio)!

=βdur ·ttyp,q·lnexp10h/(ttyp,q·prio)=10h·βdur/prio ,

which is indeed the same for all activities with the same value of prio.

30 The Multi-Agent Transport Simulation MATSim

first, which is clearly not plausible. See Section 97.4 for a discussion of alternatives. In the meantime,

the recommendations are:

•Do not set the priority value in the config away from its default value.

•Recognize that the current MATSim default scoring/utility function is not suitable for activity

dropping.

3.3.2 Negative Durations

In MATSim, somewhat oddly, it is possible to have activities with negative durations. This can hap-

pen because of the “wrap-around” mechanism, where the last activity of a plan is stitched together

with the first activity of the plan, and only that merged activity is scored (cf. Section 3.2.3). In this

situation, it can happen that an agent arrives at the last activity of the plan at a later 24-hour-time

than when the first activity ended. For example, an agent could stay at home until 3 am (end of

first activity), then go through her daily plan including a very late party, and return home at 6 am

the next morning (Figure 3.2). In this case, the duration of the wrap-around home activity would

be minus three hours. Originally, a score of zero was assigned to these negative duration activities.

However, the adaptive agents quickly found out that they could use this to their advantage, expand-

ing this negative duration without a penalty would lead to more time elsewhere, which the agent

could use to accumulate score. For an adaptive algorithm, a penalty like this needs to be defined

so that it guides the adaptation back into the feasible region. The penalty must increase with in-

creasing negative duration. It also needs to be more strongly negative than any score value for a

positive activity duration. The latter is, however, impossible to achieve with a logarithmic form,

which tends to −∞ as tdur,qapproaches zero from above. The current approach is to take the slope

of the expression βdur ·ttyp,q·ln(tdur,q/t0,q)when it crosses zero, and extend this towards minus

infinity (Figure 3.3).

−3h @ home

!! negative duration !!

start @ 06:00 + 1d

... ... a ... very ... long ... schedule ... ...

00:00 24:00

@home

start @ 21:00end @ 09:00

... a normal schedule ...@home

12h @ home

@home

end @ 03:00

@home

Figure 3.2: Illustration of wrap-around scoring. TOP: Normal situation. BOTTOM: Situation

where final activity starts at a later time of day then when the first activity ended, resulting in

negative duration.

A Closer Look at Scoring 31

score

activity duration

Figure 3.3: Extending the slope when the utility function crosses the zero line to negative

durations.

First and Last Activity not the Same Clearly, the wrap-around approach fails if the first and last

activity are not the same. The present code does not look at locations, but gives a warning and

problematic results if they are of different types.

3.3.3 Score Averaging

The score Sthat is computed according to the rules given in this chapter is not assigned directly to

the plan, rather, it is exponentially smoothed according to

Sk=αS+(1−α)Sk−1,(3.8)

where Skis the newly memorized score, Sk−1is the previously memorized score, Sis the score

obtained from the plan’s execution in the mobsim, and αis a “learning” or “blending” parameter.

The default value of αis one; it can be configured by the line

in the config file.

Non-executed plans just keep their score.

3.3.4 Forcing Scores to Converge

For many situations, both practical and theoretical (see Section 47.3.2.2), it is desirable that each

plan’s score converges to its expectation value. Equation (3.8) will not achieve that; it just dampens

the fluctuations. A well-known approach to force convergence to the expectation value is MSA

(Method of Successive Averages):

Sm=1

mS+m−1

mSm−1.(3.9)

This resembles Equation (3.8), with two important differences: (1) The fixed blending parameter

αis now replaced by a variable 1/m, and (2) mis not the iteration number but counts how often a

plan was executed and thus scored. This is necessary in MATSim since a plan is not executed and

scored in every iteration.

32 The Multi-Agent Transport Simulation MATSim

This behavior can be switched on by the following config option:

This is plausibly used together with innovation switch off (Section 4.5.3), meaning that MSA

operates on a fixed set of plans.

3.4 Typical Scoring Function Parameters and their Calibration

The current MATSim default values are

βm=1utils/monetaryunit

βdur =6utils/h

βtrav,mode(q)= −6utils/h

βwait =0utils/h

βshort.dur =0utils/h

βlate.ar = −18 utils/h

βearly.dp =0utils/h.

(3.10)

They are very loosely based on the Vickrey bottleneck model (e.g., Arnott et al., 1990).

An additional insight is that, in many of the systems that we model, traveling does not seem to be

less convenient than “doing nothing”. Thus, the direct marginal utility of traveling, βtrav, is close to

zero and sometimes even positive (see, e.g., Redmond and Mokhtarian, 2001; Pawlak et al., 2011).

Based on this, a possible approach to calibration is as follows:3

1. Set βm≡marginalUtilityOfMoney to whatever is the prefactor of your monetary term in your

mode choice logit model.

If you do not have a mode choice logit model, set to 1.0. (This is the default.)

This is normally a positive value (since having more money normally increases utility).

2. Set βdur ≡performing to whatever the prefactor of car travel time is in your mode choice

mode, while changing that parameter’s sign from its typical −to a +.

If you do not have a mode choice logit model, set to +6.0. (This is the default.)

This is normally a positive value (since performing an activity for more time normally

increases utility).

3. Set βtt,car ≡marginalUtilityOfTraveling... to 0.0.

It is important to understand this: Even if this value is set to zero, traveling by car will be implic-

itly punished by the opportunity cost of time: If you are traveling by car, you cannot perform

an activity; thus, you are (marginally and approximately) losing βdur. See Section 3.2.4.

4. Set all other marginal utilities of travel time by mode relative to the car value.

For example, if your logit model says something like

... −6/h·ttcar −7/h·ttpt...,

then

βdur =6, βtt,car =0,and βtt,pt = −1.

If you do not have a mode choice logit model, set all βtt,mode ≡marginalUtilityOf

Traveling... values to zero (i.e., same as car).

3Different groups have different systems; this one is typical for VSP, although it uses ideas from Michael Balmer.

A Closer Look at Scoring 33

5. Set distance cost rates monetaryDistanceRate... to plausible values, if you have them.

Note that this needs to be negative: distance consumes money at a certain rate.

6. Use the alternative-specific constants Cmode ≡constant to calibrate your modal split.

(This is, however, not completely simple; one must run iterations and look at the result;

especially for modes with small shares, one needs to have innovation switched off early

enough near the end of the iterations.)

If you end up having your modal split right, but its distance distribution wrong, you probably

need to look at different mode speeds. In our experience, this works better for this than using the

βtt,mode.

Calibrating schedule-based public transport (see Chapter 16) goes beyond what can be provided

here.

3.5 Applications and Extensions

The default scoring function has been applied and extended for various purposes. Thus, the his-

torical development is accompanied by various conceptual and technical modifications leading to

the current utility function described above. This also means that the reported parameter settings

in the literature are an indication, not a direct recommendation.

Important applications for large scenarios are described in Chapter 52.

Special utility functions have been developed for car sharing (see Chapter 22), social contacts and

joint trips (see Chapter 28), parking (see Chapter 13), road pricing (see Chapter 15) and destination

innovation (see Chapter 27), also describing facility loading scoring and inclusion of random error

terms.

Future topics, available on an experimental basis, are: a full-blown utility function estima-

tion (Section 97.4.4), inclusion of agent-specific preferences (Section 97.4.5) and application of

alternative utility function forms (Section 97.4).