Document [original]

This version is available at https://doi.org/10.14279/depositonce-8266

right to use is granted. This document is intended solely for

personal, non-commercial use.

The final authenticated version is available online at https://doi.org/10.1007/978-3-540-77949-0_1.

Bazzan, A. L. C.; Oliveira, D. d.; Klügl, F.; Nagel, K. (2008). To Adapt or Not to Adapt – Consequences of

Adapting Driver and Traffic Light Agents. AAMAS 2005-2007: Adaptive Agents and Multi-Agent Systems

III. Adaptation and Multi-Agent Learning, 1–14. https://doi.org/10.1007/978-3-540-77949-0_1

Ana L. C. Bazzan, Denise de Oliveira, Franziska Klügl, Kai Nagel

To Adapt or Not to Adapt – Consequences

of Adaptin

Driver and Traffic Li

ht A

ents

Accepted manuscript (Postprint)Conference paper |

To Adapt or Not to Adapt – Consequences of

Adapting Driver and Traﬃc Light Agents

Ana L.C. Bazzan1, Denise de Oliveira1,FranziskaKl¨ugl2, and Kai Nagel3

1Instituto de Inform´atica, UFRGS

Caixa Postal 15064, 91.501-970 Porto Alegre, RS, Brazil

{bazzan,edenise}@inf.ufrgs.br

2Dep. of Artiﬁcial Intelligence, University of W¨urzburg

Am Hubland, 97074 W¨urzburg, Germany

[email protected]

3Inst. for Land and Sea Transport Systems, TU Berlin

Salzufer 17–19, 10587 Berlin, Germany

[email protected]

Abstract. One way to cope with the increasing traﬃc demand is to in-

tegrate standard solutions with more intelligent control measures. How-

ever, the result of possible interferences between intelligent control or

information provision tools and other components of the overall traﬃc

system is not easily predictable. This paper discusses the eﬀects of inte-

grating co-adaptive decision-making regarding route choices (by drivers)

and control measures (by traﬃc lights). The motivation behind this is

that optimization of traﬃc light control is starting to be integrated with

navigation support for drivers. We use microscopic, agent-based mod-

elling and simulation, in opposition to the classical network analysis, as

this work focuses on the eﬀect of local adaptation. In a scenario that

exhibits features comparable to real-world networks, we evaluate diﬀer-

ent types of adaptation by drivers and by traﬃc lights, based on local

perceptions. In order to compare the performance, we have also used a

global level optimization method based on genetic algorithms.

1 Introduction

Urban mobility is one of the key topics in modern societies. Especially in medium

to big cities, the urban space has to be adapted to cope with the increasing needs

of transportation. In transportation engineering, the expression of the transport

needsiscalleddemand. This demand (in terms volume of vehicles, pedestri-

ans, freight, etc.) is commonly used to evaluate transport supply. Thisisthe

expression of the capacity of transportation infrastructures and modes. Supply

is expressed in terms of infrastructure (capacity), service (frequency), and other

characteristics of the network. The increasing demand of transport needs we ob-

serve nowadays has to be accommodated either with increasing supply (e.g. road

capacity), or with a better use of the existing infrastructure. Since an expan-

sion of the capacity is not always socially or economically attainable or feasible,

transportation and traﬃc engineering seek to optimize the management of both

supply and demand using concepts and techniques from intelligent transporta-

tion systems (ITS). These refer to the application of modern technologies in the

operation and control of transportation systems [12].

From the side of supply, several measures have been adopted in the last years,

such as congestion charging in urban areas (London), restriction of traﬃc in

the historical centre (Rome, Paris, Amsterdam), alternace of vehicles allowed to

circulate in a given day (S˜ao Paulo, Mexico City).

From the point of view of the demand, several attempts exist not only to di-

vert trips both spatially as well as temporally, but also to distribute the demand

within the available infrastructure. In this context, it is now commonly recog-

nized that the human actor has to be brought into the loop. With the amount

of information that we have nowadays, it is almost impossible to disregard the

inﬂuence of real-time information systems over the decision-making process of

the individuals.

Hence, within the project “Large Scale Agent-based Traﬃc Simulation for

Predicting Traﬃc Conditions”, our long term goal is to tackle a complex problem

like traﬃc from the point of view of information science. This project seeks

to integrate microscopic modelling tools developed by the authors for traﬃc

and transportation control and management. These range from traﬃc signal

optimization [1], binary route choice, and eﬀect of information on commuters

[4], to microscopic modelling of physical movement [7].

An important milestone in the project is to propose a methodology to inte-

grate complex behavioral models of human travellers reacting to traﬃc patterns,

and control measures, focusing on distributed and decentralized methods. Clas-

sically, this is done via network analysis. Using this technique, it is assumed that

individual road users seek to optimize their individual costs regarding the trips

they make by selecting the “best” route among the ones they have experienced

or have been informed about. This is the basis of the well known traﬃc network

analysis based on Wardrop’s equilibrium principle [17]. This method predicts a

long term average state of the network. However, since it assumes steady state

network supply and demand conditions, this equilibrium-based method cannot,

in most cases, cope with the dynamics of the modern transportation systems.

Moreover, it is deﬁnitely not adequate for answering questions related to what

happens in the network within a given day, as both the variability in the de-

mand and the available capacity of the network tend to be high. Just think

about changing weather conditions from day to day and within a single day!

In summary, as equilibrium-based concepts overlook this variability, it seems

obvious that they are not adequate for microscopic modelling and simulation.

Therefore, the general aim of this paper is to investigate what happens when

diﬀerent actors adapt, each having its own goal. The objective of local traﬃc

control is obviously to ﬁnd a control scheme that minimizes queues in a spatially

limited area (e.g. around a traﬃc light). The objective of drivers is normally to

minimize their individual travel time – at least in commuting situations. Finally,

from the point of view of the whole system, the goal is to ensure reasonable

travel times for all users, which can be highly conﬂicting with some individual

utilities (a social dilemma). This is a well-known issue: for instance, Tumer and

Wolpert [15] have shown that there is no general approach to deal with this

complex question of collectives.

Speciﬁcally, this paper investigates which strategy is the best for drivers (e.g.

adaptation or greedy actions). Similarly, traﬃc lights can act greedily or simply

carry out a “well-designed” signal plan. At which volume of local traﬃc does

decentralized control of Traﬃc Lights start to pay oﬀ? Does isolated, single-

agent reinforcement learning make sense in dynamic traﬃc scenarios? What

happens when many drivers adapt concurrently? These are hot topics not only

in traﬃc research, but also in a more general multi-agent research as they refer to

co-adaptation.

In this paper we depart from binary route choice scenarios and use a more

realistic one, that shows features such as: heterogeneity of origin-destination

pairs, heterogeneous capacity, and agents knowing about a set of routes between

their origins and destinations. To the best of our knowledge, the question on what

happens when drivers and traﬃc lights co-adapt in a complex route scenario has

not been tackled so far.

In the next section we review these and related issues. In section 3 we describe

the approach and the scenario. Section 4 discusses the results, while section 5

presents the concluding remarks.

2 Background: Supply and Demand in Traﬃc Engineering

Learning and adaptation is an important issue in multiagent systems. Here, we

concentrate on pieces of related work which either deal with adaptation in traﬃc

scenarios directly or report on close scenarios.

2.1 Management of Traﬃc Demand

Given its complexity, the area of traﬃc simulation and control has been tackled

by many branches of applied and pure sciences, such as mathematics, physics,

computer science, engineering, geography, and architecture. Therefore, several

tools exist that target only a part of the overall problem. For example, sim-

ulation tools in particular are quite old (1970s) and stable. On the side of de-

mand forecasting, the arguably most used computational method is the so-called

4-step-process [11]. It consists of: trip generation, destination choice, mode

choice, and route assignment. Route assignment includes route choice and a very

basic traﬃc ﬂow simulation that may lead to a Nash Equilibrium. Over the years,

the 4-step-process has been improved in many ways, most mainly by (i) combin-

ing the ﬁrst three steps into a single, traveller-oriented framework (activity-based

demand generation (ABDG)) and by (ii) replacing traditional route assignment

by so-called dynamic traﬃc assignment (DTA). Still, in the actual implementa-

tions, all travellers’ information gets lost in the connection between ABDG and

DTA, making realistic agent-based modelling at the DTA-level diﬃcult.

Another related problem is the estimation of the overall state of the com-

plete traﬃc network from partial sensor data. Although many schemes exist for

incident detection, there are only few applications of large scale traﬃc state es-

timation. One exception is www.autobahn.nrw.de. It uses a traﬃc microsimula-

tion to extrapolate between sensor locations, and it applies intelligent methods

combining the current state with historical data in order to make short-term

predictions. However, the travellers themselves are very simple: They do not

know their destinations, let alone the remainder of their daily plan. This was

a necessary simpliﬁcation to make the approach work for simulating the real

infrastructure. However, for evaluating the eﬀects of travellers’ ﬂexible decision

making, it is necessary to overcome this simpliﬁcation for integrating additional

information about dynamic decision-making context.

A true integration of these and other approaches is still missing. Agent tech-

nology oﬀers the appropriate basis for this. However, until now agent-based sim-

ulations with a scale required for the simulation of real-world traﬃc networks

have not been developed.

2.2 Real-Time Optimization of Traﬃc Lights

Signalized intersections are controlled by signal-timing plans (we use signal plan

for short) which are implemented at traﬃc lights. A signal plan is a unique set

of timing parameters comprising the cycle length L(the length of time for the

complete sequence of the phase changes), and the split (the division of the cycle

length among the various movements or phases). The criterion for obtaining

the optimum signal timing at a single intersection is that it should lead to the

minimum overall delay at the intersection. Several plans are normally required

for an intersection to deal with changes in traﬃc volume. Alternatively, in a

traﬃc-responsive system, at least one signal plan must be pre-deﬁned in order

to be changed on the ﬂy.

In [1], a MAS based approach is described in which each traﬃc light is mod-

elled as an agent, each having a set of pre-deﬁned signal plans to coordinate

with neighbours. Diﬀerent signal plans can be selected in order to coordinate

in a given traﬃc direction. This approach uses techniques of evolutionary game

theory. However, payoﬀ matrices (or at least the utilities and preferences of the

agents) are required. These ﬁgures have to be explicitly formalized by the de-

signer of the system.

In [10], groups of traﬃc lights were considered and a technique from dis-

tributed constraint optimization was used, namely cooperative mediation. How-

ever, this mediation was not decentralized: group mediators communicate their

decisions to the mediated agents in their groups and these agents just carry

out the tasks. Also, the mediation process may take long in highly constrained

scenarios, having a negative impact in the coordination mechanism.

Also a decentralized, swarm-based model of task allocation was developed in

[9], in which the dynamic group formation without mediation combines the ad-

vantages of decentralization via swarm intelligence and dynamic group formation.

Regarding the use of reinforcement learning for traﬃc control, some applica-

tions are reported. Camponogara and Kraus [2] have studied a simple scenario

with onlytwointersections,using stochasticgame-theoryandreinforcementlearn-

ing. Their results with this approach were better than a best-eﬀort (greedy), a

random policy, and also better than Q-learning [18]. In [8] a set of techniques were

tried in order to improve the learning ability of the agents in a simple scenario.

Performance of reinforcement learning approaches such as Q-learning and Priori-

tized Sweeping in non-stationary environments are compared in [13]. Co-learning

is discussed in [19] (detailed here in Section 2.3).

Finally, a reservation-based system [3] is also reported but it is only slightly

related to the topics here because it does not include conventional traﬃc lights.

2.3 The Need for Integration

Up to now, only few attempts exist to integrate supply and demand in a single

model. We review three of them here.

Learning Based Approach. A paper by [19] describes the use of reinforce-

ment learning by the traﬃc light controllers (agents) in order to minimize the

overall waiting time of vehicles in a small grid. Additionally, agents learn a value

function which estimates the expected waiting times of single vehicles given dif-

ferent settings of traﬃc lights. One interesting issue tackled in this research is

that a kind of co-learning is considered: value functions are learned not only by

the traﬃc lights, but also by the vehicles which thus can compute policies to

select optimal routes to the respective destinations. The ideas and results pre-

sented in that paper are interesting. However, it makes strong assumptions that

may hinder its use in the real world: the kind of communication and knowledge

or, more appropriate, communication for knowledge formation has high costs.

Traﬃc light controllers are supposed to know vehicles destination in order to

compute expected waiting times for each. Given the current technology, this is

a quite strong assumption. Secondly, it seems that traﬃc lights can shift from

red to green and opposite at each time step of the simulation. Third, there is no

account of experience made by the drivers based on their local experiences only.

What about if they just react to (few) past experiences? Finally, drivers being

autonomous, it is not completely obvious that they will use the best policy com-

puted by the traﬃc light and not by themselves. Therefore, in the present paper,

we depart from these assumptions regarding communication and knowledge the

actors must have about each other.

Game Theoretic Approach. In [16] a two-level, three-player game is dis-

cussed that integrates traﬃc control and traﬃc assignment, i.e. both, the con-

trol of Traﬃc Lights and the route choices by drivers are considered. Complete

information is assumed, which means that all players (including the population

of drivers) have to be aware of the movements of others. Although the paper

reports interesting conclusions regarding e.g. the utility of cooperation among

the players, this is probably valid only in that simple scenario. Besides, the as-

sumption that drivers always follow their shortest routes is diﬃcult to justify

in a real-world application. In the present paper, we want to depart from both,

the two-route scenario and the assumption that traﬃc management centres are

in charge of the control of Traﬃc Lights. Rather, we follow a trend of decen-

tralization, in which each traﬃc light is able to sense its environment and react

accordingly and autonomously, without having its actions computed by a central

manager as it is the case in [16]. Moreover, it is questionable whether the same

mechanism can be used in more complex scenarios, as claimed. The reason for

this is the fact that when the network is composed of tens of links, the number

of routes increases and so the complexity of the route choice, given that now it

is not trivial to compute the network and user equilibria.

Methodologies. Liu and colleagues [6] describe a modelling approach that

integrates microsimulation of individual trip-makers’ decisions and individual

vehicle movements across the network. Moreover their focus is on the description

of the methodology that integrates both demand and supply dynamics, so that

the applications are only brieﬂy described and not many options for the operation

and control of Traﬃc Lights are reported. One scenario described deals with

a simple network with four possible routes and two control policies. One of

them can roughly be described as greedy, while the other is ﬁxed signal plan

based. In the present paper, we do not explore the methodological issues as in

[6] but, rather, investigate in more details particular issues of the integration

and interaction between actors from the supply and demand side.

3 Co-adaptation in an ITS Framework

Figure 1 shows a scheme of our approach based on the interaction between supply

and demand. This framework was developed using the agent-based simulation

environment SeSAm [5] for testing the eﬀects of adaptation of diﬀerent elements

of the supply and demand. The testbed consists of sub-modules for speciﬁcation

and generation of the network and the agents – traﬃc lights and drivers. Cur-

rently the approach generates the network (grid or any other topology), supports

the creation of traﬃc light control algorithms as well as signal plans, the creation

of routes (route library), and the algorithms for route choice. The movement of

vehicles is queue-based.

The basic scenario we use is a typical commuting scenario where drivers re-

peatedly select a route to go from an origin to a destination. As mentioned

before, we want to go beyond simple two-route or binary choice scenario; we

deal with route choice in a network with a variety of possible routes. Thus, it

captures desirable properties of real-world scenarios.

We use a grid with 36 nodes connected using one-way links, as depicted in

Figure 2. All links are one-way and drivers can turn to two directions in each

crossing. Although it is apparently simple, this kind of scenario is realistic and,

from the point of view of route choice and equilibrium computation, it is also

network modelling

and generation

modelling of

traffic lights

learning

mechanisms

signal plans

and generation

drivers definition

adaptation

mechanism

trip generation

route library

route choice

OPTIMIZATION OF CONTROL

MANAGEMENT OF OPERATION

(DYNAMIC) TRAFFIC ASSIGNMENT

TRAVELER INFORMATION SYSTEM

Fig. 1. Elements of Co-Adaptation in an ITS Framework

a very complex one as the number of possible routes between two locations

is high.

In contrast to simple two-route scenarios, it is possible to set arbitrary origins

(O) and destinations (D) in this grid. For every driver agent, its origin and des-

tination are randomly selected according to probabilities given for the links: To

render the scenario more realistic, neither the distribution of O-D combinations,

nor the capacity of links is homogeneous. On average, 60% of the road users have

the same destination, namely the link labelled as E4E5 which can be thought as

something like a main business area. Other links have, each, 1.7% probability of

being a destination. Origins are nearly equally distributed in the grid, with three

exceptions (three “main residential areas”): links B5B4, E1D1, and C2B2 have,

approximately, probabilities 3, 4, and 5% of being an origin respectively. The

remaining links have each a probability of 1.5%. Regarding capacity, all links

can hold up to 15 vehicles, except those located in the so called “main street”.

These can hold up to 45 (one can think it has more lanes). This main street is

formed by the links between nodes B3 to E3, E4, and E5.

The control is performed via decentralized Traﬃc Lights. These are located in

each node. Each of the Traﬃc Lights has a signal plan which, by default, divides

the overall cycle time – in the experiments 40 time steps – 50-50% between the

two phases. One phase corresponds to assigning green to one direction, either

north/south or east/west.

The actions of the Traﬃc Lights consist in running the default plan or to

prioritize one phase. The particular strategies are:

i. ﬁxed: always keep the default signal plan

ii. greedy: allow more green time for the direction with higher current occupancy

iii. use single agent Q-learning

Regarding the demand, the main actor is the simulated driver. The simulation

can generate any number of them; in the experiments we used 400, 500, 600,

8 A.L.C. Bazzan et al.

CDEFA B

2520

2525

301515 10 30

15 30 10

25 25 30 15

25 20 25

15 25 20 15 15

25 25

35 15 25 25 25

530 15 25

10 15

Fig. 2. 6x6 grid showing the main destination (E4E5), the three main origins (B5B4,

E1D1, C2B2), and the “main street” (darker line). Numbers at the links represent the

green times for the particular direction (determined by global optimization

and 700 driver agents. Every driver is assigned to a randomly selected origin-

destination pair. Initially it is informed about only a given number of routes. The

experiments presented next were performed with each agent knowing ﬁve routes.

These route options are diﬀerent for each driver and were generated using an

algorithm that computes the shortest path (one route) and the shortest path via

arbitrary detours (the other four). We notice that, due to topological constraints,

it was not always possible to generate ﬁve routes for each driver. One example

is the following: origin and destination are too close. Thus, in a few cases they

know less than this number, but at least one. Drivers can use three strategies to

select a route (before departure):

i. random selection

ii. greedy: always select the route with best average travel time so far

iii. probabilistically: for each route, the average travel time perceived so far is

use to compute a probability to select that route again.

The actual movement of the driver agents through the net is queue-based.

4 Results and Discussion

4.1 Metrics and Parameters

In order to evaluate the experiments, travel time (for drivers) and occupation

(for links) were measured. We discuss here only the mean travel time over the

last 5 trips (henceforward attl5t) and travel time in a single trip. All experiments

were repeated 20 times.

The following parameters were used: time out for the simulation of one trip

(tout) equal to 300 when the number of drivers is 400 or 500; 400 when there are

600 drivers; and 500 when there are 700 drivers.

The percentage of drivers who adapt is either 0 or 100 (in this case all act

greedily) but any value can be used; percentage of Traﬃc Lights that act greedily

is either 0 or 100; a link is considered jammed if its occupancy is over 50%; cycle

length for signal plans is 40 seconds.

For the Q-learning, there is an experimentation phase of 10×tout, the learning

rate is α=0.1 and the discount rate is λ=0.9.

4.2 Global Optimization

For the sake of comparison, we show the results of a centralized approach be-

fore we continue with the main focus of the paper on local (co-)adaptation ap-

proaches. We use a centralized and heuristic optimization method in order to

compute the optimal split of the cycle time between two traﬃc directions at each

intersection.

This centralized optimization was performed using the DAVINCI (Developing

Agent-based simulations Via INtelligent CalIbration) Calibration Toolkit for

SeSAm, that is a general purpose calibration and optimization tool for sim-

ulation. Although DAVINCI provides several global search strategies such as

genetic algorithm (GA), simulated annealing or gradient based search, here we

have used standard GA only, with a ﬁtness proportional selection.

The input parameters for the GA are the default split values for each of the

36 traﬃc light agents (see next). The optimization objective is to minimize the

average travel time over all drivers in a scenario with 400 drivers, where all

drivers have only one route (the shortest path).

For a cycle length of 40 seconds, we have set seven possible values for the

split at each intersection: 5/35, 10/30, 15/25, 20/20, ..., 35/5. Using four bits to

codify each of these splits, for each of the 36 intersection, this leads to 144 bits

for each GA string. We have allowed the GA to run for 100 generations.

The resulting optimized splits can be seen in Figure 2: numbers depicted close

to the respective links indicate how much green time the link receives in the best

solution found by the GA. Using these optimized splits, the average travel time

of drivers is 105. This value can be used as a benchmark to assess the utility of

adapting drivers and Traﬃc Lights in a decentralized way.

4.3 Drivers and Traﬃc-Lights Learning in a Decentralized Way

In this section we discuss the simulations and results collected when drivers

and Traﬃc Lights co-adapt using diﬀerent strategies, as given in Section 3. As

a measure of performance, we use the attl5tdeﬁned previously (Section 4.1).

These are summarized in Table 1. For all scenarios described in this subsection,

400 drivers were used. As said, all experiments were repeated 20 times. Standard

deviations are not higher than 4% of the mean value given here.

Table 1. Average Travel Time Last 5 Trips (attl5t) for 400 drivers, under diﬀerent

conditions

Type of Simulation Average Travel Time

Last 5 Trips

greedy drivers / ﬁxed traﬃc lights 100

probabilistic drivers / ﬁxed traﬃc lights 149

greedy drivers / greedy traﬃc lights 106

probabilistic drivers / greedy traﬃc lights 143

greedy drivers / Qlearning traﬃc lights 233

probabilistic drivers / Qlearning traﬃc lights 280

Greedy or Probabilistic Drivers; Fixed Traﬃc Lights. In the case of

probabilistic drivers, the attl5tis 149 time units, while this is 100 if drivers

act greedily. The higher travel time is the price paid for the experimentation

that drivers continue doing, even though the optimal policy was achieved long

before (remember that the attl5tis computed only over the last 5 trips). The

greedy action is of course much better after the optimal policy was learned.

In the beginning of a simulation run, when experimentation does pay oﬀ, the

probabilistic driver performs better.

Notice that this travel time is slightly better than the one found by the

heuristic optimization tool described before, which was 105. In summary, greedy

actions by the drivers work because they tend to select the routes with the short-

est path and this normally distributes drivers more evenly than the case where

drivers take longer routes.

Greedy or Probabilistic Drivers; Greedy Traﬃc Lights. When Traﬃc

Lights also act greedily we can see that this does not automatically improve the

outcome (in comparison with the case in which Traﬃc Lights are ﬁxed): the attl5t

is 106. This happens because the degree of freedom of Traﬃc Lights’ actions is

low, as actions are highly constrained. For example, acting greedily can be highly

sub-optimal when, for instance, traﬃc light Aserves direction D1(thus keeping

D2with red light) but the downstream ﬂow of D1is already jammed. In this

case, the light might indeed provide green for vehicles on D1but these cannot

move due to the downstream jam. Worse, jam may appear on the previously

un-jammed D2too due to the small share of green time. This explains why

acting greedily at Traﬃc Lights is not necessarily a good policy. The travel time

of 106, when compared to the travel time found by the centralized optimization

tool (105), is of course similar. This is not surprising because the decentralized

strategy does exactly the same as the centralized optimizer, namely drivers use

their best route and Traﬃc Lights optimize greedily.

Q-Learning Traﬃc Lights. We have expected Q-learning to perform bad be-

cause it is already known that it does not have a good performance in noisy and

non-stationary traﬃc scenarios [13]. In order to test this, we have implemented a

Q-learningmechanismin the traﬃc lights.Available actions are: to open the phase

serving either one direction (e.g. D1), or the other (D2). The states are the com-

bination of abstract states in both approaching links, i.e. {D1jammed, D1not

jammed}×{D2jammed, D2not jammed}.

The low performance of Q-learning in traﬃc scenarios is due basically to the

fact that the environment is non-stationary, not due to the poor discretization of

states. Convergence is not achieved before the environment changes again, and

thus Traﬃc Lights remain in the experimentation phase.

4.4 Scenarios with More Drivers

For more than 400 drives, we only investigate the cases of greedy drivers / ﬁxed

Traﬃc Lights versus the scenario in which both drivers and Traﬃc Lights act

greedily. This was done in order to test whether or not increasing volume of

traﬃc (due to increasing number of drivers in the network) would cause greedy

Traﬃc Lights to perform better. This is expected to be the case since once the

number of drivers increases, greedy actions by the drivers alone do not bring

much gain; some kind of control in the Traﬃc Lights is expect to be helpful in

case of high occupancy of the network. Notice that 400, 500, 600 and 700 drivers

mean an average occupancy of ≈40%, 47%, 59%, and 72% per link respectively.

In Table 2 the attl5tfor these numbers of drivers are shown. The case for

400 drivers was discussed above. With more than 600 drivers, the attl5tis lower

when Traﬃc Lights also act greedily. In the case of 700 drivers, the improvement

in travel time (411 versus 380) is about 8%. Thus, the greedy traﬃc lights are

successful in keeping the occupancy of links lower, resulting in a reduction of

travel times.

Table 2. Average Travel Time Last 5 Trips for Diﬀerent Number of Drivers and

Diﬀerent Adaptation Schemes

Average Travel Time Last 5 Trips

Type of Simulation Nb. of Drivers

400 500 600 700

greedy drivers / ﬁxed traﬃc lights 100 136 227 411

greedy drivers / greedy traﬃc lights 106 139 215 380

4.5 Overall Discussion

In the experiments presented, one can see that diﬀerent strategies concerning the

adaptivity of drivers, as well as of Traﬃc Lights have distinct results in diﬀerent

settings. We summarize here the main conclusions.

For the 6×6 network depicted, increasing the links capacity from 15 to 20 would

lead to travel time levels that are the same we have achieved without this increase

in capacity, i.e. substituting this increase by a better use of the available infrastruc-

ture. This is important because increasing network capacity is not always econom-

ically feasible, so that other measures must be taken. Diverting people by giving

information to them, has only limited performance. Thus the idea is to use the

control infrastructure in a more intelligent way. Therefore, we have explored the

capability of the Traﬃc Lights to cope with the increasing demand.

Regarding travel time, it was shown that the strategies implemented in the

Traﬃc Lights pay oﬀ in several cases, especially when the demand increases. We

have also measured the number of drivers who arrive before time tout.Thisisnot

shown here but, to give a general idea of the ﬁgures, bad performance (around

75% arrived) was seen only when the drivers adapt probabilistically. The general

trend is that when the traﬃc lights also adapt, the performance increases, for

all metrics used.

Regarding the use of Q-learning, as said, single-agent learning, i.e. each agent

learns isolated using Q-learning, is far from optimum here due to the non-

stationarity nature of the scenario. This is true especially for those links located

close to the main destination and the main street as they tend to be part of each

driver’s trip so that the pattern of volume of vehicles changes dramatically. A

possible solution is to use collaborative Traﬃc Lights. In this case, traﬃc light

Awould at least ask/sense traﬃc light Bdownstream whether or not it shall

act greedily. This however leads to a cascade of dependencies among the Traﬃc

Lights. In the worst case, everybody has to consider everybody’s state. Even

if this is done in a centralized way (which is far from desirable), the number

of state-action pairs prevents the use of multiagent Q-learning in its standard

formulation.

5Conclusion

Several studies and approaches exist for modelling travellers’ decision-making.

In commuting scenarios in particular, probabilistic adaptation in order to max-

imize private utilities is one of those approaches. However, there is hardly any

attempt to study what happens when both the driver and the traﬃc light use

some evolutionary mechanism in the same scenario or environment, especially if

no central control exist. In this case, co-adaptation happens in a decentralized

fashion. This is an important issue because, although ITS have reached a high

technical standard, the reaction of drivers to these systems is fairly unknown. In

general, the optimization measures carried out in the traﬃc network both aﬀect

and are aﬀected by drivers’ reactions to them. This leads to a feedback loop that

has received little attention to date. In the present paper we have investigated

this loop by means of a prototype tool constructed in an agent-based simulation

environment. This tool has modules to cope with the demand and the supply

sides, as well as to implement the ITS modules and algorithms for the learning,

adaptation etc.

Results show an improvement regarding travel time and occupancy (thus, both

the demand and supply side) when all actors co-evolve, especially in large-scale

situations e.g. involving hundreds of drivers. This was compared with situations

in which either only drivers or only Traﬃc Lights evolve, in diﬀerent scenarios,

and with a centralized optimization method.

This work can be extended in many directions. First, we are already working

to integrate the tools developed by the authors independly for supply and de-

mand, namely ITSUMO [14] and MATSim (http://www.matsim.org/)which

are simulators with far more capabilities than the prototype described here, and

allow the modeling of even more realistic scenarios. For instance, drivers’ trips

can be described in MATsim in a richer way including activities that compose a

trip such as dropping children at school, shopping, etc. The results are not expect

to diﬀer in the general trends, though, unless en-route adaptation is added.

Therefore, a second extension relates to the implementation of en-route adap-

tation of drivers in reaction to the perception of jammed links.

Finally, another extension is the use of heuristics for multiagent reinforcement

learning in order to improve its performance. This is not trivial as it is known

that reinforcement learning for non-stationary environments is a hard problem,

especially when several agents are involved. In this context we also want to test

a scenario where drivers and traﬃc lights learn taking turns.

Acknowledgments

The authors would like to thank CAPES (Brazil) and DAAD (Germany) for

their support to the joint, bilateral project “Large Scale Agent-based Traﬃc

Simulation for Predicting Traﬃc Conditions”. Ana Bazzan is partially supported

by CNPq and Alexander von Humboldt Stiftung; Denise de Oliveira is supported

by CAPES.

References

1. Bazzan, A.L.C.: A distributed approach for coordination of traﬃc signal agents.

Autonomous Agents and Multiagent Systems 10(1), 131–164 (2005)

2. Camponogara, E., Kraus, J.W.: Distributed learning agents in urban traﬃc control.

In: Moura-Pires, F., Abreu, S. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 324–

335. Springer, Heidelberg (2003)

3. Dresner, K., Stone, P.: Multiagent traﬃc management: A reservation-based inter-

section control mechanism. In: Jennings, N., Sierra, C., Sonenberg, L., Tambe,

M. (eds.) The Third International Joint Conference on Autonomous Agents and

Multiagent Systems, pp. 530–537. IEEE Computer Society, Los Alamitos (2004)

4. Kl¨ugl, F., Bazzan, A.L.C.: Route decision behaviour in a commuting scenario.

Journal of Artiﬁcial Societies and Social Simulation 7(1) (2004)

5. Kl¨ugl, F., Herrler, R., Oechslein, C.: From simulated to real environments: How

to use SeSAm for software development. In: Schillo, M., Klusch, M., M¨uller, J.,

Tianﬁeld, H. (eds.) Multiagent System Technologies. LNCS (LNAI), vol. 2831, pp.

13–24. Springer, Heidelberg (2003)

6. Liu, R., Van Vliet, D., Watling, D.: Microsimulation models incorporating both

demand and supply dynamics. Transportation Research Part A: Policy and Prac-

tice 40(2), 125–150 (2006)

7. Nagel, K., Schreckenberg, M.: A cellular automaton model for freeway traﬃc. Jour-

nal de Physique I 2, 2221 (1992)

8. Nunes, L., Oliveira, E.C.: Learning from multiple sources. In: Jennings, N., Sierra,

C., Sonenberg, L., Tambe, M. (eds.) AAMAS. Proceedings of the 3rd International

Joint Conference on Autonomous Agents and Multi Agent Systems, vol. 3, pp.

1106–1113. IEEE Computer Society, Los Alamitos (2004)

9. Oliveira, D., Bazzan, A.L.C.: Traﬃc lights control with adaptive group formation

based on swarm intelligence. In: Dorigo, M., Gambardella, L.M., Birattari, M.,

Martinoli, A., Poli, R., St¨utzle, T. (eds.) ANTS 2006. LNCS, vol. 4150, pp. 520–

521. Springer, Heidelberg (2006)

10. Oliveira, D., Bazzan, A.L.C., Lesser, V.: Using cooperative mediation to coordinate

traﬃc lights: a case study. In: AAMAS. Proceedings of the 4th International Joint

Conference on Autonomous Agents and Multi Agent Systems, pp. 463–470. IEEE

Computer Society, Los Alamitos (2005)

11. Ort´uzar, J., Willumsen, L.G.: Modelling Transport, 3rd edn. John Wiley & Sons,

Chichester (2001)

12. Roess, R.P., Prassas, E.S., McShane, W.R.: Traﬃc Engineering. Prentice Hall,

Englewood Cliﬀs (2004)

13. Silva, B.C.d., Basso, E.W., Bazzan, A.L.C., Engel, P.M.: Dealing with non-

stationary environments using context detection. In: Cohen, W.W., Moore, A.

(eds.) ICML. Proceedings of the 23rd International Conference on Machine Learn-

ing, pp. 217–224. ACM Press, New York (2006)

14. Silva, B.C.d., Junges, R., Oliveira, D., Bazzan, A.L.C.: ITSUMO: an intelligent

transportation system for urban mobility. In: Stone, P., Weiss, G. (eds.) AAMAS

2006 - Demonstration Track. Proceedings of the 5th International Joint Conference

on Autonomous Agents and Multiagent Systems, pp. 1471–1472. ACM Press, New

York (2006)

15. Tumer,K.,Wolpert,D.:Asurveyofcollectives.In:Tumer,K.,Wolpert,D.(eds.)

Collectives and the Design of Complex Systems, pp. 1–42. Springer, Heidelberg

(2004)

16. van Zuylen, H.J., Taale, H.: Urban networks with ring roads: a two-level, three

player game. In: TRB. Proc. of the 83rd Annual Meeting of the Transportation

Research Board (January 2004)

17. Wardrop, J.G.: Some theoretical aspects of road traﬃc research. In: Proceedings

of the Institute of Civil Engineers, vol. 2, pp. 325–378 (1952)

18. Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8(3), 279–292 (1992)

19. Wiering, M.: Multi-agent reinforcement learning for traﬃc light control. In: ICML

2000. Proceedings of the Seventeenth International Conference on Machine Learn-

ing, pp. 1151–1158 (2000)