scieee Science in your language
[en] (orig)
Forschungsberichte
der Fakultät IV – Elektrotechnik und Informatik
Bayesian inference for
motion control and planning
Marc Toussaint
Bericht-Nr. 2007 - 22
ISSN 1436 - 9915
Bayesian inference for motion control and planning
Marc Toussaint
Machine Learning group, TU Berlin
Franklinstr. 28/29, FR 6-9, 10587 Berlin, Germany
December 21, 2007
Abstract. Bayesian motion control and planning is based on the idea of fusing motion ob-
jectives (constraints, goals, priors, etc) using probabilistic inference techniques in a way simi-
lar to Bayesian sensor fusing. This approach seems promising for tackling two fundamental
problems in robotic control and planning: (1) Bayesian inference methods are an ideal candi-
date for fusing many sources of information or constraints usually employed in the sensor
processing context. Complex motion is characterised by such a multitude of concurrent con-
straints and tasks and the Bayesian approach provides a solution of which classical solutions
(e.g., prioritised inverse kinematics) are a special case. (2) In the future we will require plan-
ning methods that are not based on representing the system state as one high-dimensional
state variable but rather cope with structured state representations (distributed, hierarchical,
hybrid discrete-continuous) that more directly reflect and exploit the natural structure of the
environment. Probabilistic inference offers methods that can in principle handle such repre-
sentations. Our approach will, for the first time, allow to transfer these methods to the realm
of motion control and planning.
The first part of this technical report will review standard optimal (motion rate or dynamic)
control from an optimisation perspective and then derive Bayesian versions of the classical
solutions. The new control laws show that motion control can be framed as an inference prob-
lem in an appropriately formulated probabilistic model. In the second part, by extending the
probabilistic models to be Markovian models of the whole trajectory, we show that probabilis-
tic inference methods (belief propagation) yield solutions to motion planning problems. This
approach computes a posterior distribution over trajectories and control signals conditioned
on goals and constraints.
1 Introduction
Planning in high-dimensional robotic systems is a fundamental and critical problem when
the goal are more autonomous, problem-solving robots in natural environments. Many
behaviours that we would like robots to exhibit, such as autonomously reasoning about
tools to manipulate objects, considering different paths to a goal, and generally coping with
unexperienced situations, cannot be based on purely reactive behaviour and require some
kind of reasoning. Real-world robotic environments are highly structured. The scalability
of planning and reasoning methods to cope with complex problems in such environments
crucially depends on exploiting this structure.
We propose a new approach to motion control and planning in robotics based on proba-
1
bilistic inference. The method uses structured probabilistic models to represent the scenario
and efficient inference techniques (belief propagation) to solve planning problems. The
idea is that Bayesian methods are ideal candidates to find solutions to problems combining
multiple criteria or sources of information and hence are a standard tool for sensor pro-
cessing and fusion. Similarly, we investigate Bayesian methods to solve a fusion problem
on the motor level. Further, probabilistic inference techniques can be applied to structured
representations and hence are a promising approach for coping with models that reflect the
structuredness of natural environments.
In contrast most approaches to planning in robotics try to directly cope with the high-
dimensional state space, using smart heuristics to find paths in this space, e.g., using prob-
abilistic road maps, or Rapidly Exploring Random Trees (Kavraki, Svestka, Latombe, &
Overmars 1996; Kuffner & LaValle 2000). Although these approaches provide practical
solutions to many real-world applications (such as foot step planning), the fundamental
question of scalability remains unsolved when the full high-dimensional state space has to
be considered. In particular, structural knowledge about the system is not exploited. What
we aim for is a new approach to planning based on a factored representation of state, i.e.,
based on models of the environment and the robot that use multiple variables to describe
the current state. These variables might represent different “parts” of the state (such as
different body parts, different objects, or different constraints), but they may also provide
different levels of abstractions of the current state (“situations”) as in hierarchical models.
Both fields, Reinforcement Learning (RL) and Probabilistic inference, developed techniques
to address structured models. For instance, in the field of RL, they include work on Fac-
tored Markov Decision Processes (Boutilier et al. 1995; Koller & Parr 1999; Guestrin et al.
2003; Theocharous et al. 2004; Kveton & Hauskrecht 2005), abstractions (Hauskrecht et al.
1998), and relational models of the environment (Zettlemoyer et al. 2005). On the other
hand, recent advances in inference techniques show how structure can be exploited both
for exact inference as well as making efficient approximations. Examples are message pass-
ing algorithms (loopy BP, Expectation Propagation), variational approaches, approximate
belief representations (particles, Assumed Density Filtering, Boyen-Koller) and arithmetic
compilation (see, e.g., Minka 2001; Murphy 2002; Chavira et al. 2006).
Applied to the realm of planning, the latter inference techniques would provide a funda-
mentally new perspective on decomposing planning processes. The idea of using proba-
bilistic inference for planning in Markov Decision Processes is not new. Ng, Parr, & Koller
(1999) use inference techniques to efficiently compute the policy gradient for gradient-
based policy search. From a complexity point of view, the equivalence between inference
and planning is well-known (see, e.g., Littman et al. 2001). Bui, Venkatesh, & West (2002)
have used inference on Abstract Hidden Markov Models for policy recognition, i.e., for
reasoning about executed behaviours, but do not address the problem of computing op-
timal policies from such inference. Attias (2003) proposed a framework which suggests
a straight-forward way to translate the problem of planning to a problem of inference: A
Markovian state-action model is assumed, which is conditioned on a start state x0and a
goal state xT. However, this approach does not compute an optimal solution in the sense of
maximising an expected future reward and the total time Thas to be fixed. Raiko & Tornio
(2005) introduced the same idea independently as optimistic inference control. Verma &
Rao (2006) used inference to compute plans (considering the maximal probable explana-
tion (MPE) instead of the MAP action sequence) but again the total time has to be fixed and
the plan is not optimal in the expected return sense. Recently, (Toussaint & Storkey 2006;
Toussaint, Harmeling, & Storkey 2006) developed a framework which exactly translates the
problem of computing an optimal policy in a Markov Decision Process to a likelihood max-
imisation problem. The latter is then solved using inference in the E-step and a standard
policy update in the M-step. Also the work by Wiegerinck, van den Broek, & Kappen (2006)
is related to translating the problem of planning to a free energy minimisation problem.
2
Section 2 will briefly review classical control laws as a precursor to the reformulation in
Bayesian terms presented in section 3. Once formulated this way it is straight-forward to
extend the approach to probabilistically reason about whole trajectories. We will do this in
section 4 and thereby derive algorithms for Bayesian motion planning. We conclude this
report with a discussion in section 6.
2 Preliminaries: the loss function of classical control laws
In this section we will first review classical control laws as minimisation of a simple loss
function. Since this loss function has a Bayesian interpretation it will be straight-forward to
develop also a Bayesian view on these control laws. The Bayesian inference approach can
then be generalised to what we actually aim for: motion planning in temporal probabilistic
models, as discussed in section 4.
Let xRdand qRn. Given x, consider the problem of finding qthat minimises
L=||xJq||2
C-1+||q||2
W2hTWq (1)
where ||q||2
W=qTWq denotes a norm and Cand Ware symmetric positive definite matri-
ces. This loss function can be interpreted as follows: The first term “measures” how well a
constraint x=Jq is fulfilled relative to a covariance matrix C, the second term “measures”
q= 0 with metric W, the third term “measures” the scalar product between qand hw.r.t.
W.
The solution can easily be found by taking the derivative
L
q =2(xJq)TC-1J+ 2qTW2hTW
L
q = 0 q= (JTC-1J+W)-1(JTC-1x+Wh)
Let us collect some identities which will allow us to rewrite this solution in several forms.
The Woodbury identity
(JTC-1J+W)-1JTC-1=W-1JT(JW-1JT+C)-1,
holds for any positive definite Cand W. Further we have the identity
In(JTC-1J+W)-1JTC-1J= (JTC-1J+W)-1W . (2)
We define the pseudo-inverse of Jw.r.t. Was
J]
W=W-1JT(JW-1JT)-1
and similar quantity as
J\
C= (JC-1JT)-1JTC-1.
Using these identities and definitions we can rewrite the solution in several forms:
q= (JTC-1J+W)-1(JTC-1x+Wh)(3)
= (JTC-1J+W)-1JTC-1x+ [In(JTC-1J+W)-1JTC-1J]h
= (JTC-1J+W)-1JTC-1(xJh) + h
3
=W-1JT(JW-1JT+C)-1x+ [InW-1JT(JW-1JT+C)-1J]h(4)
=W-1JT(JW-1JT+C)-1(xJh) + h .
This also allows us to properly derive the following limits:
C0 : q=J]
Wx+ (InJ]
WJ)h=J]
W(xJh) + h(5)
W0 : q=J\
Cx+ (InJ\
CJ)h=J\
C(xJh) + h
W=λIn:q=JT(JJT+λC)-1x+ [InJT(JJT+λC)-1J]h(6)
C=σId:q= (JTJ+σW)-1JTx+ [In(JTJ+σW)-1JTJ]h
These limits can be interpreted as follows. C0: we need to fulfil the constraint x=Jq
exactly. C=σId: we use a standard squared error measure for xJq.W0: we do not
care about the norm ||q||2
W(i.e., no regularisation); but interestingly, the cost term hTWq has
a nullspace effect also in this limit. W=λIn: we use a standard ridge as regulariser.
The simple loss function (1) has many applications, as we discuss in the following.
2.1 Ridge regression
In ridge regression, when we have dsamples of n-dimensional inputs and 1D outputs, we
have a minimisation problem
L=||yXβ||2+λ||β||2
with a input data matrix XRd×n, an output data vector yRdand a regressor βRn.
The first term measures the standard squared error (with uniform output covariance C=
Id), the second is a regulariser (or stabiliser) as introduced by Tikhonov. The special form
λ||β||2of the regulariser is called ridge. Replacing the previous notation with q β,x y,
J X,C Id,W λIn,h 0, equation (6) tells us the solution
β=XT(XXT+λId)-1y .
In the Bayesian interpretation of ridge regression, the ridge λ||β||2defines a prior exp{−1
2λ||β||2}
over the regressor β. The above equation gives the MAP β. Since ridge regression has a
Bayesian interpretation, also standard motion rate control, as discussed shortly, will have a
Bayesian interpretation.
2.2 Redundant motion rate control
Consider a robot with nDoFs and a d-dimensional endeffector (d<n). The current joint
state is q. In a given state we can compute the end-effector Jacobian Jand we are given
a joint space potential H(q). We would like to compute joint velocities ˙qwhich fulfil the
task constraint ˙x=J˙qwhile minimising the absolute joint velocity || ˙q||2
Wand following the
negative gradient h=W-1H(q). The solution is readily given by equation (5) when
replacing the previous notation q ˙q,x ˙x:
˙q=J]
W˙x(InJ]
WJ)W-1H(q).(7)
2.3 Redundant dynamic control
Consider a robot with dynamics ¨q=M-1(u+F), where Mis some generalised mass matrix,
Fsubsumes external (also Coriolis and gravitational) forces, and uis the n-dimensional
4
torque control signal. We want to compute a control signal uwhich generates an accelera-
tion such that a general task constraint ¨x=J¨q+˙
J˙qremains fulfilled while also minimising
the absolute norm ||u||2
Wof the control. Reformulating the constraint as ¨x˙
J˙qJM-1(u+
F)=0, replacing the previous notation as q u,x ¨x˙
J˙qJM-1F,J JM-1, and
taking the strict constraint limit C 0we can consult equation (5) and get the solution
u=T]
W(¨x˙
J˙qTF)+(InT]
WT)h , with T=JM-1,(8)
which, for h= 0, is identical to Theorem 1 in (Peters et al. 2005). Peters et al. discuss in
detail important special cases of this control scheme.
2.4 Multiple variables and prioritisation
The equations trivially extend to the case when we have multiple task variables x1, .., xm
with xidi-dimensional, Jacobians J1, .., Jmfor each task variable and, we want to minimise
L=X
i
||xiJiq||2
C-1
i
+||q||2
W+ 2hTWq .
This is the same as combining all task variables to a single variable x= (x1, .., xm)(d-
dimensional with d=Pidi) and defining the covariance matrix C=diag(C1, .., Cm)to be
the block matrix with sub-matrices Ci. For instance, equation (3) decomposes into
˙q= (X
i
JT
iC-1
iJi+W)-1[X
k
JT
kC-1
k˙xi+Wh].
Classical prioritised inverse kinematics (Nakamura, Hanafusa, & Yoshikawa 1987; Baer-
locher & Boulic 2004) addresses the multiple variable case from a different perspective. It
assumes that the task variables x1, .., xmare given strict priority (w.l.o.g. following our in-
dexing). The control law is based on standard motion rate control but iteratively projects
each desired task rate ˙xiin the remaining nullspace of all higher level control signals. Ini-
tialising the nullspace projection with N0=Iand δq0= 0, the control law is defined by
iterating for i= 1, .., m
¯
Ji=JiNi-1(9)
˙qi= ˙qi-1+¯
J]
i( ˙xiJi˙qi-1)(10)
Ni=Ni-1¯
J]
i¯
Ji.(11)
We call ¯
Jia nullspace Jacobian which has the property that ¯
J]
iprojects to changes in qthat
do not change control variables x1,..,i-1with higher priority. Also an additional nullspace
movement hin the remaining nullspace of all control signals can be included when defining
the final control law as
b˙q= ˙qm+Nmh . (12)
In effect, the first task rate ˙x1is guaranteed to be fulfilled exactly (up to 1st order). The
second ˙x2is guaranteed to be fulfilled “as best as possible” given that ˙x1must be fulfilled,
et cetera. Handling many control signals (e.g., the over-determined case Pdi> n) is prob-
lematic with this control law since the nullspace Jacobians ¯
Jiwill become singular (with
rank < di) and the computation of the pseudo-inverse needs extra regularisation.
This prioritised inverse kinematics can be retrieved from the general equation (3) when
iteratively taking the limit Ci0starting with i= 1 up to i=m. This cascaded limit
can for instance be generated when defining Ci=miIdiand simultaneously taking the
limit 0. We could not find an elegant proof for the equality between this limit and the
above explained prioritised IK, but we confirmed this equivalence numerically for random
matrices.
5
(a)
qt
xt
qt-1
(b)
ut
˙qt
˙xt
˙qt-1
Figure 1: Graphical models of Bayesian control, (a) for motion rate control, (b) for dynamic
control.
3 Bayesian motion control
The previous section derived basic classical control laws from a single loss function. We
will now show that these control laws can also be understood in a Bayesian view: optimal
control means to “fuse different objectives” in a Bayesian way similar to how it is often
done for sensor fusion. However, here the objectives are given by the possible transition in
joint space and the desired endeffector position after a step. Indeed, the MAP solutions in
this Bayesian approach will exactly reproduce the classical control laws as a special case.
However, they generalise them in some respects (e.g., by the kind of control noise that can
be taken into account) and they will lead the way to temporal reasoning about control:
Bayesian motion planning.
In the following we will make use of some algebraic identities for Gaussians which are
summarised in the footnote.1
3.1 Redundant kinematic control
Consider random variables qt-1,qt,xtand their joint distribution according to Figure 1(a)
and
P(qt|qt-1) = N(qt|qt-1+h, W-1)
P(xt|qt) = N(xt|φ(qt), C)
1We define the Gaussian over xwith mean aand covariance matrix Aas the function
N(x|a, A) = 1
|2πA|1/2exp{− 1
2(xa)TA-1(xa)}
with property N(x|a, A) = N(a|x, A). We also define the canonical representation
N[x|a, A] = exp{− 1
2aTA-1a}
|2πA-1|1/2exp{− 1
2xTA x +xTa}
with properties
N[x|a, A] = N(x|A-1a, A-1),N(x|a, A) = N[x|A-1a, A-1]
N[x|a, A]N[x|b, B] = N[x|a+b, A +B]N(A-1a|B-1b, A-1+B-1)
N(F x +f|a, A) = 1
|F|N(x|F-1(af), F -1AF -T)
N[F x +f|a, A] = 1
|F|N[x|FT(aAf), F TAF ]
6
Here φ(q)is the task kinematics with Jacobian J=φ
q . We want to compute the posterior
over qtgiven the previous state qt-1and the constraint xt,
P(qt|xt, qt-1)P(xt|qt)P(qt|qt-1)
=N(xt|φ(qt), C)N(qt|qt-1+h, W -1)
N(xt|φ(qt-1) + J(qtqt-1), C)N(qt|qt-1+h, W -1)
=N(Jqt|Jqt-1+xtφ(qt-1), C)N(qt|qt-1+h, W -1)
N[qt|JTC-1Jqt-1+JTC-1(xtφ(qt-1)), JTC-1J]N[qt|W qt-1+W h, W]
N[qt|(JTC-1J+W)qt-1+JTC-1(xtφ(qt-1)) Wh, JTC-1J+W]
=N(qt|a, A)(13)
A= (JTC-1J+W)-1, a =qt-1+A[JTC-1(xtφ(qt-1))+Wh]
qMAP
t=qt-1+ (JTC-1J+W)-1[JTC-1(xtφ(qt-1))+Wh]
C0
=qt-1+J]
W(xtφ(qt-1)) + (InJ]
WJ)h
In the third line we use the linear approximation to the kinematics (as in the extended
Kalman filter). The fifth line switches to canonical form of the Gaussians and the sixth line
uses the rule for products of Gaussians (see the footnote). The MAP solution is the same as
equation (3) and hence the limit C0also equals (5).
3.2 Redundant dynamic control
Consider the model in Figure 1(b) with
P(ut) = N(ut|h, W-1)
P( ˙qt|˙qt-1, ut) = N( ˙qt|˙qt-1+M-1(ut+F), Q)
P( ˙xt|˙qt) = N( ˙xt|J˙qt, C)
We compute the posterior on utconditioned on a desired motion ˙xtand the previous mo-
tion ˙qt-1,
P(ut|˙xt,˙qt-1)X
˙qt
P( ˙xt|˙qt)P( ˙qt|˙qt-1, ut)P(ut)
=X
˙qt
N( ˙xt|J˙qt, C)N( ˙qt|˙qt-1+M-1(ut+F), Q)N(ut|h, W-1)
=X
˙qt
N[ ˙qt|JTC-1˙xt, JTC-1J]N[ ˙qt|Q-1˙qt-1+Q-1M-1(ut+F), Q-1]N(ut|h, W-1)
=Nh˙qt-1+M-1(ut+F)|JTA˙xt, JTAJiN[ut|Wh, W], A = (JQJT+C)-1
Nhut|M-TJTA( ˙xtJ˙qt-1JM-1F), M-TJTAJM-1iN[ut|Wh, W]
Nhut|TTA( ˙xtJ˙qt-1TF) + Wh, TTAT +Wi, T =JM-1
=N(ut|b, B)(14)
B= (TTAT +W)-1, b =B[TTA( ˙xtJ˙qt-1TF) + Wh]
uMAP
t= (TTAT +W)-1[TTA( ˙xtJ˙qt-1TF) + Wh](15)
Q0,C0
=T]
W( ˙xtJ˙qt-1TF)+(InT]
WT)h
The last line shows that the classical dynamical law (8) is a special case of this Bayesian
7
setup for Q0and C0. The solution (15) generalises this to account for control noise
Q6= 0 and potentially soft task constraints C6= 0.
3.3 Summary
We derived two new control laws in the previous sections, Bayesian motion rate control
(BMC)
qMAP
t=qt-1+ (JTC-1J+W)-1[JTC-1(xtφ(qt-1))+Wh](16)
=qt-1+W-1JT(JW -1JT+C)-1(xtφ(qt-1)) + [InW-1JT(JW -1JT+C)-1J]h ,
and Bayesian dynamic control (BDC)
uMAP
t= (TTAT +W)-1[TTA( ˙xtJ˙qt-1TF) + Wh](17)
=W-1TT(TW-1TT+A-1)-1( ˙xtJ˙qt-1TF)+[InW-1TT(TW-1TT+A-1)-1T]h ,
with A= (JQJT+C)-1and T=JM-1. In the seconds lines we used the matrix identities
to rewrite both equations to yield only inversions of d-dimensional matrices rather than
n-dimensional ones. These two control laws have the classical motion rate control (7) and
dynamic control (8) as limits,
lim
C0qMAP
t=qt-1+J]
W(xtφ(qt-1)) + (InJ]
WJ)h
lim
Q0
C0
uMAP
t=T]
W( ˙xtJ˙qt-1TF)+(InT]
WT)h .
We can now summarize some correspondences between the classical and the Bayesian
views in the following table
classical view Bayesian view
metric Wof the pseudo-inverse
J]
W=W-1JT(JW-1JT)-1
covariance W1of the transition prior
N(qt+1 |qt+h, W1)
nullspace motion (IJ]
WJ)hasymmetry hof the transition prior
N(qt+1 |qt+h, W1)
regulariser in the singularity-robust
˜
J]
W=W-1JT(JW-1JT+kId)-1
covariance Cof the task coupling
N(xt|φ(qt), C)
computation of an optimal motion mini-
mizing the loss (1)
computation of the motion posterior
As discussed in (Peters et al. 2005), special choices of regularisation Win the dynamic con-
trol, e.g., W=M-1,W=M-2, or W=Incorrespond to special classical control strategies,
e.g., Khatib’s operational space control for W=M-1. Below we will also briefly discuss
the close relation of Bayesian motion rate control to singularity-robust inverse kinematics.
However, the Bayesian control laws not only computes the MAP control signals but also
variance for the control signal in equations (13) and (14). Ordinarily these might not be of
much practical relevance. However, they can be exploited when including other criteria
for action selection. For instance, when the variance (as computed by (14)) along a cer-
tain dimension of the control signal uis very large this means that we do not require large
precision in the control along this dimension.
3.3.1 Comparison to singularity-robust inverse kinematics
The Bayesian control laws are closely related to classical approaches to deal with singu-
larities (where JJTis not invertible). One typically considers a singularity-robust inverse
8
(a) (b)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 10 20 30 40 50 60 70 80 90 100
variable errors
time step
e_1 BMC
e_1 pIK
e_2 BMC
e_2 pIK
e_3 BMC
e_3 pIK
e_4 BMC
e_4 pIK
E BMC
E pIK
(c)
0
0.5
1
1.5
2
2.5
3
0 10 20 30 40 50 60 70 80 90 100
trajectory q-length
time step
L BMC
L pIK
(d)
0
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50 60 70 80 90 100
variable errors
time step
e_1 BMC
e_1 pIK
e_2 BMC
e_2 pIK
e_3 BMC
e_3 pIK
e_4 BMC
e_4 pIK
E BMC
E pIK
Figure 2: Snake benchmark with four concurrent constraints. (b) Errors ei=|xix
i|in
each control variable (thin lines) and the total error Piei(think lines) during the movement
using prioritised inverse kinematic (pIK, dashed) and a soft BMC (solid). (c) Accumulative
trajectory length in q-space over the movement. (d) Errors during a movement towards an
unreachable target.
(Nakamura & Hanafusa 1986)
˜
J]
W=W-1JT(JW-1JT+kId)-1,
which is the standard pseudo-inverse with an additional regularisation. The scale factor k
of the regularisation can heuristically be tuned to be non-zero in the neighbourhood of a
singularity and zero when sufficiently far away from singular configurations. For instance,
a common heuristic is
k=(k0(1 w/w0)2w < w0
0ww0
where wis some manipulability measure (e.g., pdet(JJT)or the minimum singular value
of J), and w0is a fixed threshold.
Clearly, this approach is equivalent to the Bayesian MAP solution when we reinterpret C
as a regularisation of the pseudo-inverse. Note that the covariance Cwas initially be inter-
preted as determining the standard deviation from the task constraint we are willing to ac-
cept. We may conclude that the Bayesian approach applied to control in a single time-slice
largely reproduces classical solution techniques in a different interpretation. However, the
Bayesian approach generalises also to reasoning over many time slices, which we discuss
in the next section and will lead to a Bayesian planning technique.
3.4 Basic illustration of Bayesian motion control
As these limits show, in standard situations Bayesian control will yield results very similar
to the classical control laws. One can also think of the Bayesian laws as the classical laws
with certain additional regularisers as long as we are far from singularities the two behave
very similar. In practise we will see a difference only in extreme cases. We illustrate this
here on a simple snake benchmark proposed by (Baerlocher & Boulic 2004). The problem
is to control a simulated planar snake configuration composed of 20 segments under four
constraints: (1) the centre of gravity should always remain vertically aligned with the foot,
(2) the goal for the first segment (foot) is to be oriented with 30from the ground, (3) the
positional goal for the last segment (finger) is at (.5,0,1), (4) the orientational goal for the
last segment (finger) is to be oriented parallel to the ground. Figure 2(a) shows the snake in
a valid target configuration. (Baerlocher & Boulic 2004) solve the problem using prioritised
IK with priority as listed above. We solve the problem by defining four variables x1, .., x4
according to the constraints above. For all task variables xiwe define that we want to
follow a target that linearly interpolates from the start position x0
i(straight upward posture)
9
to the goal position x
i,
xi,t =t
Tx0
i+Tt
Tx
i.
We investigate the following cases:
Reproducing hierarchical prioritisation We first assumed rather tight and prioritised
precisions C-1
i=νiIwith ν1:4 = (106,105,104,103). As a result, the joint trajectories gener-
ated with BMC and the classical prioritised inverse kinematics (9-12) are virtually indistin-
guishable: the max norm ||q1:Tq0
1:T||between the two trajectories is <0.01.
Smoother trajectories with BMC Assume that we require high task precision only at
the final time step T. An efficient way to realise this with BMC is to start with rather
soft precisions ν2:4 = 1 at t= 0 and then slowly increasing them to the desired precision
ν2:4 = (105,104,103)at t=T. As an exception we always keep the precision of the balance
constraint high, ν1= 106. The trajectory generated by BMC is now quite different from the
previous one: it is much smoother. For instance, the integrated length of the joint trajectory
L=Pt|qtqt-1|is L= 2.31 for BMC and L= 2.96 for pIK. Nevertheless, all target
variables meet the final goal constraint with high precision. Figure 2(b) shows the errors
ei=|xix
i|in the control variables and the total error E=Pieiduring the movement
for both approaches, and Figure 2(c) also plots the accumulative length L.
Conflicting and Infeasible constraints Assume we want to generate trajectories where
the snake curvature is minimal, as measured by a fifth variable x5=Pn
i=1 q2
iand a target
x
5= 0. This curvature constraint is in conflict with most other constraints. As a result, the
prioritised IK numerically breaks down when adding this fifth variable without further reg-
ularisation. In contrast, BMC (with constant ν1:5 = (106,101,101,101,100)) yields a smooth
movement which eventually fulfils the targets for x1:4 but additionally realises a final cur-
vature e5= 1.57494 much lower than without this additional constraint (e5= 3.11297).
In another case, assume we set a target (1,0, .5) for the last segment (finger) which is ac-
tually out of reach. BMC (with constant ν1:4 = (106,101,101,101)) yields smooth and but
balanced postures where necessarily the error e2increases. As can be seen in Figure 2(d),
classical pIK drastically diverges as soon as the hard constraint in finger position becomes
infeasible (at t= 75).
Illustration on a Humanoid reaching task Figure 6 shows the posture BMC creates for
a more challenging problem. A simulated humanoid figure (39 DoFs) is to reach a target
point with the right finger under a big obstacle standing on one foot without loosing bal-
ance. In particular collisions between head or arm and the obstacle become critical. We
introduced three control variables for the finger tip position, the centre of gravity, and a
global collision cost. The naive application of prioritised IK fails since the Jacobian of the
collision cost is highly non-stationary and most of the time singular (zero, in fact). Only
additional heuristics to switch on and off the collision constraint at appropriate times can
repair the classical approach. BMC (with ν1:3 = (103,103,106)) robustly generates a move-
ment from an upright initialisation to the posture seen in Figure 6.
10
x6
q1q2q3q4q5q6
x1x2x3x4x5
q0
Figure 3: The same graphical model as for redundant motion control (Figure 1(a)) but for
multiple time slices.
4 Bayesian motion planning
In the previous section we developed a Bayesian inference view on classical control laws.
One may consider them as mere reformulations of well-known classical laws including
some regularisations. In particular, in these ‘single time slice’ models one is typically only
interested in the MAP solutions and the additional covariances we derived do not seem of
much practical relevance.
This changes when we move on from single time slice models of the immediate control to
time extended models of the whole trajectory. The probabilistic inference approach natu-
rally extends to inference over time and will eventually become a planning method. For
instance, we discussed inference in Figure 1(a). What if we now do inference in Figure 3?
The resulting posterior over q1:6 will in some sense describe the set of likely trajectories
starting in q0that are consistent with the constraint x6.
The inference techniques in such temporal models typically have the flavour of forward-
backward algorithms, similar to the well-known Baum-Welch in HMMs. A modern and
very flexible description of such inference techniques is in terms of belief propagation or
message passing algorithms. These also extend to more complex temporal models with
many random variables in one time slice. In most realistic cases exact inference is infea-
sible because the shape of the probability distributions would be very complex (not part
of a simple parametric family of distributions). Again, belief propagation is a convenient
framework to handle approximations systematically by maintaining approximate belief
representations.
In this section we introduce Bayesian motion planning. Technically, this mainly amounts
to deriving the necessary messages for belief propagation in a factor graph describing the
problem. Therefore we start with a brief introduction to factor graphs.
4.1 Probabilistic models to represent planning problems
Factor graphs (Kschischang, Frey, & Loeliger 2001) are structured probability distributions
in the form
P(X1:n) = Y
C
fC(XC),
where C {X1, .., Xn}indexes different subsets of variables. Pictorially, such a structural
property of the probability distribution is illustrated with a (bi-partite) graph (e.g., see Fig-
ure 4) where the random variables Xiare one type of nodes (circles), and the factors fiare
of another type (black boxes).
Clearly, Bayesian Networks (which are distributions structured in the form P(X1:n) =
QiP(Xi|Xparents(i))) and also Dynamic Bayesian Networks as well as undirected graph-
ical models (e.g., Markov Random fields) fall into this category. Belief propagation on
11
µj0C
µjC
µSj
µCi
fS
Xj0
Xj
Xi
fC
Figure 4: Part of a factor graph. Circle nodes represent random variables Xi, black boxes
represent factor functionals fi=fi(XCi)where Ci {X1, .., Xn}are subsets of vari-
ables, the graph as a whole represents the structure of the joint probability distribution
P(X1:n) = Qifi(XCi). Some messages illustrate the defining equations (18) and (19) of
belief propagation.
factor graphs is defined by the following recursive equations for messages from factors to
random variables and messages from random variables to factors:
µCi(Xi) = X
XC\Xi
fC(XC)Y
jC\i
µjC(Xj) = 1
µiC(Xi)X
XC\Xi
bC(XC)(18)
µjC(Xj) = Y
{S:jS}\C
µSj(Xj) = 1
µCj(Xj)bj(Xj)(19)
with
bj(Xj) = Y
S:jS
µSj(Xj), bC(XC) = fC(XC)Y
jC
µjC(Xj).(20)
To clarify this notation, assume C={X1, X2, X3}, then PC\X2PX1PX3 and QjC\2
Qj∈{1,3}”. When a factor depends only on two variables, C={Xi, Xj}, we can simplify
the equations by directly passing a message from jto i,
µji(Xi) = X
Xj
fC(Xi, Xj)Y
{S:jS}\C
µSj(Xj) = X
Xj
fC(Xi, Xj)bj(Xj)
µij(Xj).(21)
For chains and trees it is straight-forward to show that belief propagation eventually com-
putes a belief biequal to the correct posterior marginal PXj:j6=iP(X1:n)(by comparison
with standard variable elimination). For loopy graphs there is no order to resolve these
recursive equations but BP can still be applied iteratively and in practise often is but
convergence is not guaranteed and if it converges then only to a certain (Bethe) approxi-
mation of the true marginals (Yedidia, Freeman, & Weiss 2001). Further discussion of BP is
beyond the scope of this paper, please refer to (Yedidia, Freeman, & Weiss 2001; Murphy
2002; Minka 2001) for more details. The given background should be sufficient to derive
BP algorithms for the motion planning scenarios we discuss in the following.
4.2 Motion planning under multiple constraints
Consider the model in Figure 5 with
P(qt|qt-1) = N(qt|qt-1, W-1)
P(xi,t |qt) = N(xi,t |φi(qt), Ci).
The choice of these factors should be apparent from the analogy with the single time slice
models and their relation to classical redundant motion rate control (section 2.2). All depen-
dencies are pair-wise and the DBN amounts to a simple factor graph with only pair-wise
12
x4,1
x3,2x3,T
x4,T
µqt-1qtµqt+1qt
qT
q2
q1
x2,2x2,T
x1,2
x1,1x1,T
x2,1
x3,1
x4,2
Figure 5: Factor graph for motion planning under multiple constraints. All coupling are
pair-wise so we omit indicating the factor as black boxes.
factors. We represent the current belief over qtas a Gaussian b(qt) = N(qt|ˆ
bt, Bt). Using
equations (20) and (21) we get the following recursive system of equations,
µqt-1qt(qt) = Rqt-1P(qt|qt-1)µx1:m,t-1qt-1(qt-1)µqt-2qt-1(qt-1) = N(qt|at, At),
At=W-1+ (A-1
t-1+Rt-1)-1, at= (A-1
t-1+Rt-1)-1(A-1
t-1at-1+rt-1)
µqt+1qt(qt) = Rqt+1P(qt+1|qt)µx1:m,t+1qt+1(qt+1)µqt+2qt+1(qt+1) = N(qt|zt, Zt),
Zt=W-1+ (Z-1
t+1+Rt+1)-1, zt= (Z-1
t+1+Rt+1)-1(Z-1
t+1zt+1+rt+1)
µx1:m,tqt(qt) =
m
Y
i=1
P(xi,t |qt)N(qt|R-1
trt, R-1
t)(22)
rt=Rtˆq+X
i
JT
iC-1
i(xi,t φi(ˆq)) , Rt=X
i
JT
iC-1
iJi
b(qt) = µqt-1qtµqt+1qtµx1:m,tqt=N(qt|ˆ
bt, Bt)(23)
B-1
t=A-1
t+Z-1
t+Rt,ˆ
bt=Bt[A-1
tat+Z-1
tzt+rt]
Here, in equation (22), we used a linearisation of each φiat ˆq. As with all the algorithms we
will derive, these equations tell us how to compute messages but they are recursive and we
need to specify the order in which we update beliefs with these equations. Here we apply
the following scheme:
(i) Initialise all b(qt)to be the uniform density (we use precision matrices), and b(q0) =
N(q0,1010).
(ii) Update all b(qt)iterating forward t= 1, .., T using equation (23). Linearise at ˆq=
hbold(qt)i, except for the first iteration, where we linearise at ˆq=µtt-1qt.
(iii) Update all b(qt)iterating backward for t=T1, .., 1using equation (23). Linearise at
ˆq=hbold(qt)i.
(iv) Iterate (ii) and (iii). (The points of linearisation will refine.)
Before exemplifying the approach, let us summarise some properties of this Bayesian kind
of motion planning.
(1) Clearly, BMC (equation (16)) is a special case of this motion planning for T= 1 and
if not iterating the belief propagation. Iterating BP makes a difference since the point ˆq
of linearisation of kinematics moves towards the final mean ˆ
b1and thus the result of BP
converges to an exact step rather than the usual 1st order approximation.
(2) This form of Bayesian motion planning allows us to continuously adjust the tightnesses
C-1
iof the control variables depending on time. For instance we can impose that a desired
13
Figure 6: Solution to a complex reaching movement under balance and collision con-
straints. The target for the right finger is illustrated as a black dot. Views from 3 different
perspectives.
fwd-bwd iterations kR|˙q|dt E =RPiei,t dt
½ (reactive controller) 13.0124 0.0873
1 7.73616 0.1366
7.70018 0.0071
2 7.68302 0.0027
5 7.65795 0.0013
10 7.62888 0.0012
Table 1: Trajectory length and control errors after different number of forward-backward
iterations of extended BMC. k=½ means a single forward pass and corresponds to the
reactive forward controller using the single time slice BMC. k= 1 additionally includes a
single backward smoothing.
control trajectory xi(t)is to be followed tightly in the early stage of the movement while
only loosely in a later stage.
(3) This form of Bayesian motion planning solves a generalisation of redundant movement
planning: Redundant planning means that a future goal is only determined by an endef-
fector constraint xT=x
Trather than a state space goal qT=q
T. By adjusting the tightness
(precision C-1) of a control variable to be 0for t= 1, .., T-1but tight in the last time step
t=Twe can reproduce the redundant planning problem. Our approach generalises this
in that we can have an arbitrary number of control variables, and can adjust the control
tightness e.g. to introduce intermediate goals or intermediate feasible ranges.
(4) Even when we require tightness in the controls over the full trajectory, the extended
BMC solves a planning problem in the remaining nullspace.
4.2.1 Illustration on a Humanoid reaching task
We again consider the problem illustrated in Figure 6 of reaching under an obstacle while
keeping balance. This time the desired motion is not defined by reactive dynamical sys-
tems but rather by trajectories xi,1:Tfor each control variable xi. We defined these to be
linear interpolations from the start state to the target with T= 100, while keeping the pre-
cisions ν1:3 = (103,103,106)constant over time. Table 1 displays the trajectory length and
control errors after some forward-backward iterations. Note that the first line k=½ (only
one forward pass) corresponds to the reactive application of the single time-slice BMC we
discussed in the previous section. For instance, if we fix the total computational cost to 3
times that of a simple forward controller (k= 1½ iterations) we find an improvement of
40.8% w.r.t. the trajectory length and 91.9% w.r.t. control errors of extended BMC versus
the forward controller. The reason for these improvements are that the forward controller
14
µt
qx
µt
xx0
µt
qq0
x3
q1
x1
q2
x2
q3
Figure 7: The factor graph for the decomposed planning scenario.
chooses a non-efficient path which first moves straight towards the obstacle and later need
longer motions to circumvent the obstacle. In contrast, the probabilistic smoothing of ex-
tended BMC leads to early nullspace movements (leaning to the right) which make the
later circumvention of the obstacle more efficient.
4.3 Decomposed planning
Consider the model in Figure 7 with factors
f1(qt+1, qt) = N(qt+1|qt, W-1)(24)
f2(qt) = N(qt|0,1.0)
f3(xt+1, xt) = N(xt+1|xt,0.1)
f4(xt, qt) = N(xt|φ(qt),0.001)
f5(xT) = N(xT|x
T,0.001)
The 1st factor is the usual transition prior in the joint state. The 2nd factor implies a prior
P(qt)which limits the joint range for simplicity we use again a Gaussian potential (q= 0
indicates the joint centres). The 3rd factor expresses a prior about the endeffector move-
ments since we do not have a strong prior about this, we assume a weak potential with
standard deviation of endeffector movements of 0.1. The 4th factor is the usual coupling
between joint state and endeffector. Generally, in factor graphs conditioning a variable can
be expressed as including a Kronecker factor. Hence, the 5th factor represents the goal
constraint, conditioning the target of the endeffector to be close to x
T.
This model is different to the previously discussed one in two respects: We condition only
on the final task state xT, and we included a transition prior also within the task space.
The reason we investigate this graph is because it allows for a qualitative new approach
to decompose planning. Li, Todorov, & Pan (2004, Todorov & Li (2004) have proposed an
algorithm for hierarchical planning which first computes and optimal trajectory only in the
task space and then in a second stage, constraint on this task space trajectory, computes
an optimal trajectory in joint space. Clearly, this approach is limited since the task space
trajectory was computed without any concern whether this task space trajectory might lead
to high costs when realised in joint space.
In the factor graph 7 we can follow a very similar approach to hierarchically decompose
planning but guarantee a more global optimality. It is straight forward to derive all nec-
essary messages for belief propagation from equation (20). The algorithm we propose is
given by the following message passing scheme:
(i) Initialise all beliefs uniformly, except for q0,x0and xT.
(ii) Update the task space beliefs
b(xt) = µxt-1xtµt
xt+1xtµt
qtxt,
15
(a)
(b)
Figure 8: (a) The belief over the endeffector trajectory after the first stage of inference
neglecting the coupling to the joint state. (b) The belief over the endeffector trajectory
after coupling to the joint state; also the MAP joint configurations for each time step are
displayed.
first iterating forward for t= 1, .., T , then iterating backward for t=T-1, .., 1. This
will yield a preliminary belief over possible trajectories in task space.
(iii) Update the q-space beliefs
b(qt) = µqt-1qtµqt+1qtµt
xtqt,
first iterating forward for t= 1, .., T , then iterating backward for t=T-1, .., 1. This ex-
actly as described in the previous section, using local linearisations of the kinematics
at ˆqt=hb(qt)i. This generates a belief over possible trajectories in q-space.
(iv) Iterate steps (ii) and (iii).
Conceptually, the most interesting aspect is that in step (ii) we do not compute a single
task space trajectory, but rather represent the whole variety of possible task space trajectories as
given by the beliefs. The coupling to the q-space then narrows down this variety according
to the prior in q-space. Iterating steps (ii) and (iii) means to propagate up (µt
qtxt) and
down (µt
xtqt) messages between the x-level and the q-level until coherence between both
levels is achieved.
4.3.1 Illustration on a planar arm
We would first like to illustrate the approach on a simple 3-link planar arm described by the
joint state qR3and the endeffector position xR2. We are given the initial configuration
q0= (.5, .5, .5), the endeffector target x
T= (1.5, .5) and T= 20.
Figure 8(a) displays the preliminary belief over endeffector states after the first stage of
inference (step (ii)). We find that at this stage, the belief over the endeffector trajectory is
simply a straight line with quite a large Gaussian variance associated with each via-point.
This “Gaussian worm” can be interpreted as the range of possible endeffector trajectories
neglecting the coupling to any other variables or constraints. All subsequent inference
steps will further refine and narrow down this initial belief by fusing it with messages
from the q-space. Figure 8(b) displays the belief over endeffector trajectories after a cycle
of inference steps (ii), (iii), (ii), i.e., the coupling to the joint state has been accounted for.
16
Figure 9: A humanoid upper body with n= 13 hinge joints. The hip is fixed, the right hand
is used as endeffector.
(a)
(b)
Figure 10: Results of probabilistic inference planning with an humanoid upper body.
Reaching to a target without obstacles, displayed from two different perspectives. We see
the MAP joint configuration and the Gaussian endeffector distribution (indicated as ellip-
soid) at different intermediate time steps. The optimal trajectory in joint space leads to a
curve trajectory in endeffector space.
Also the MAP joint configuration is displayed at each time step. We find that the MAP
endeffector trajectory is not anymore a straight line. The reason is that the constraints we
induced via the prior joint transition probability favours small steps in joint space.
4.3.2 Illustration with a humanoid upper body
As another illustration, consider the n= 13 joint humanoid upper body displayed in Figure
9. We take the right hand as the endeffector and plan a target reaching trajectory (of length
T= 50) to a goal in the upper left working domain of the robot.
Figures 10(a&b) display the result after 2 iterations of the inference steps (1-4), which pro-
vided sufficient convergence. The figures display the maximum a posteriori joint config-
uration (MAP, the maximum of the posterior joint state distribution) and the variance of
the endeffector distribution at different time steps. As we can see, the MAP endeffector
trajectory is not a straight line. We can give a more quantitative measure of the quality of
the trajectory: We compare the MAP joint trajectory computed via probabilistic inference
with the joint trajectory that results from a standard redundant control approach. More
precisely, the redundant control approach first presumes a straight endeffector line equally
divided in T= 50 segments and then used equation (7) for control.
We define a global trajectory cost using the q-space metric W,
C(q1,..,T ) =
t-1
X
t=1
||qt+1qt||W.
17
trajectory cost C(q1,..,T )
forward controller 11.19
MAP trajectory 8.14
Table 2: Global cost of joint space trajectories.
q2q3
c2
c1
x3
q1
x1x2
Figure 11: The factor graph including collision constraints.
Table 2 displays the trajectory costs for the trajectories computed via the forward controller
and the MAP trajectory computed via probabilistic inference. The MAP trajectory is clearly
more efficient in terms of this cost. This stems from the fact that equation (24) imposes a
prior transition likelihood f1(qt+1qt)N(q, W -1)which reflects exactly this metric in joint
space. The curve of the endeffector trajectory is a result of this efficiency.
4.3.3 Coupling with collision constraints
Coupling extra constraints into the system is straight-forward. To demonstrate this we
augment the model with a collision risk variable ct. In the experiment, we will consider
collisions of the endeffector with a table. Figure 11 displays the factor graph augmented
with this variable. Note that we decided to couple this variable not to a single endeffector
position xtbut rather to an endeffector transition given by a tuple (xt, xt+1). This turned out
to be more robust and accurate, particularly in view of the time discretisation used.
We define the risk variable with a non-linear sigmoidal function as follows. Let zt, ytR
be z- and y-coordinate of the endeffector position xtR3, let zbe the table height. Then
c(xt+1, xt) =
(0if zt, zt+1> z+δor zt, zt+1< zδ
ψ(yt+yt+1)otherwise
with δ=.02. Thus, the risk is zero if xtand xt+1are both either above or below the table,
and a sigmoid depending on the y-distance to the table corner otherwise. We used ψ(y) =
11
1+exp(3y). Just as for the kinematics, this risk function defines a coupling factor
f6(ct, xt+1, xt)N(c(xt+1, xt), .001) .
Importantly, we also impose a prior on the risk variable, effectively constraining it to be
low, by including the factor
f7(ct)N(0, .1) .
18
(a)
(b)
Figure 12: Reaching to a target above (a) the table and below (b) the table whilst avoiding
collision.
The other factors f1, .., f5remain the same as in the previous experiment, and together with
f6and f7define the factor graph in Figure 11. Since f6is not pair-wise but depends on three
variables, we maintain beliefs also for this factor. We apply the following message passing
scheme: 11, we choose
(i) Initialise all beliefs uniform, except for q0,x0,xT, and the collision variables ct.
(ii) Update all task space beliefs b(xt)iterating forward (t= 1, .., T) and backward (t=
T-1, .., 1).
(iii) Update all beliefs b(ct, xt, xt+1)for the f6factors for t= 1, .., T-1.
(iv) Update all task space beliefs b(xt)iterating forward (t= 1, .., T) and backward (t=
T-1, .., 1).
(v) Update all q-space beliefs b(qt)iterating forward (t= 1, .., T ) and backward (t=
T-1, .., 1).
(vi) Iterate steps (ii)-(v)
In the first iteration, step (ii) will compute a preliminary task space belief neglecting the
collision constraint, since the f6-beliefs are still uniform. In step (iv) the collision constraint
is then coupled into the task space belief.
Figures 12(a&b) display the result after two iterations of this message passing scheme for
T= 30. In the first case (Figure (a)) the target endeffector position is slightly above the
table and the generated movement avoids the obstacle. In the second case, the target is only
slightly displaced but now below the table. Here, the result is a rather straight trajectory. A
standard forward controller that follows a gradient of a superimposed target potential and
obstacle avoidance potential would typically get trapped under the table in the first case
19
ˆqt-3
ut
ˆqt
ˆqt-2
ut-2ut-1
ˆqt-1
˙xt
Figure 13: Dynamic control in the case of delayed feedback. Here, ˆqt= (qt,˙qt)subsumes
positions and velocities.
so we cannot present a quantitative comparison here. Also, the local target potential of a
reactive controller would hardly distinguish between the first and the second case.
The experiments ran on a simple Laptop with a 1.1GHz Centrino Mobile processor. The
first experiment (T= 50, without constraints, k= 2 inference sweeps) takes 3.56 seconds,
the second experiment (T= 50, with constraints, k= 2 sweeps) takes 3.97 seconds.
5 Online application issues
5.1 Handling delayed feedback
In realistic cases the sensor feedback on the precise robot configuration (qtand ˙qt) is de-
layed. Thus, for instance in the case of torque control, the direct Bayesian dynamic control
law (17) is not applicable. However, it is straight-forward to explicitly include the delay
in the probabilistic model and then apply Bayesian inference as before. Figure 13 is an ex-
ample for representing delayed feedback in the model here the feedback is delayed for 2
time steps. The Bayesian control law is then given by computing P(ut|ˆqt-3, ut-1, ut-2,˙xt). In
some sense, this approach naturally combines a probabilistic state estimation of ˆqt-1with
the ordinary Bayesian dynamic control.
5.2 Online plan update
Assume we have applied Bayesian motion planning to compute a posterior over q1:Tcon-
ditioned on q0and xT. Further assume that we want to use motion rate control to follow
this trajectory but the control is imprecise and might lead to perturbations form the MAP
sequence qMAP
1:T.
In the beginning, for t= 0, we may directly sent qMAP
1as the motion rate control. However,
for t= 1 we have additional sensor information on the actual state q1. To guarantee op-
timality we have to recompute the motion posterior conditioned on the new observation
q1.
Let us (unrealistically) assume that we have done exact inference without using approxi-
mations or linearisations. In this case, to compute the updated posterior motion over q2:T
we only have to recompute the forward messages! The backward messages remain un-
changed. This is a basic fact of inference over Markov chains. Further, for motion rate
control at time t= 1 we are actually not interested in the exact posterior over the whole
motion but only at the correct posterior over q2. Hence, we only need to recompute a single
20
x1x2
q
x3
0
T
T
0
0
T
Figure 14: Illustration of BMP in analogy to Bayesian sensor fusion: On m= 3 low-
dimensional control variables xiwe have desired trajectories and associated tolerances.
Bayesian motion planning takes these as “input” and “fuses” them to a posterior distribu-
tion over possible joint space trajectories.
message µq1q2. To conclude, in the (unrealistic) case of exact inference there is no need for
replanning because all the initially computed backward messages remain valid. This is fun-
damentally different from classical planning which only computes a single trajectory the
core of this difference is that inference computes full distributions which remain valid also
when perturbed from the MAP path (at least the backward messages remain valid).
In the realistic case of approximate inference using approximate belief representations (e.g.
Gaussians) and linearising the kinematics locally, the previous statement only holds ap-
proximately. We still compute full distributions instead of single trajectories which natu-
rally lends to handling also perturbations from the MAP path. However, the backward
messages do depend on the forward messages (and thereby on the start state conditioning)
because of the iterated belief updates (e.g., the point of linearisation of the kinematics de-
pends also on the forward messages). Hence, ideally we would have to recompute the full
posterior when we get the new observation of q1by performing a full forward-backward
propagation of messages. However, we may expect that many of the approximations (e.g.,
points of linearisations) hardly change during this update. In practise, it is reasonable to
only update the first forward message µq1q2as in the case of exact inference; and in certain
intervals of time update all forward and backward messages to account for the perturba-
tions. In conclusion, we might need some form of replanning occasionally when we expect
that the perturbations accumulated so large that the approximations (linearisations) we
have made to compute messages do not hold anymore.
6 Discussion
Bayesian motion control and planning is based on the idea of fusing motion objectives (con-
straints, goals, priors, etc) in a way similar to Bayesian sensor fusing, which is illustrated
again in Figure 14. We formulate probabilistic models which represent these objectives,
condition on constraints and goals, and use probabilistic inference to compute a posterior
distribution over the possible trajectories and control signals. This distribution is a repre-
sentation of the variety of likely trajectories and the corresponding control signals given
that constraints and goals must be met.
We have shown in section 3 that classical redundant motion rate control and dynamic con-
trol can be derived as special cases of Bayesian motion control when analogous proba-
bilistic models are formulated. These models are one-time-slice models. By extending the
approach to multiple time slices, i.e. inference over whole motion trajectories, we derived
new motion planning algorithms. Since inference techniques exist also for factored (dis-
tributed) state representations we gained some freedom in formulating the probabilistic
models. As examples we derived algorithms in the cases of multiple constraints and for
the case of decomposed planning in task and joint space.
An important aspect often discussed in the context of robotic planning is locality. Many
21
classical approaches like RRTs (Kuffner & LaValle 2000; Kuffner, Nishiwaki, Kagami, In-
aba, & Inoue 2003) try to tackle global planning problems, for instance where one first
has to move away from a goal before moving towards the goal becomes possible. Theo-
retically, planning based on exact inference would also generate globally correct posterior
distributions about all possible trajectories and thereby perform global planning. For dis-
crete MDPs this is feasible and has been demonstrated (Toussaint, Harmeling, & Storkey
2006). However, in almost any realistic robotics case exact inference is infeasible since this
would require to represent very complex distributions (beliefs) which are not part of a fea-
sible parametric family of distributions. For instance, in the algorithms we derived for BMP
we assumed Gaussian belief representations and had to use local linearisations to stay in
this simple family of belief representations. If we had tried exact inference in domains with
collision constraints, the exact beliefs would have had very complex, discontinuous forms.
Hence, in particular the Gaussian belief approximations effectively introduce a kind of “lo-
cality” since the set of likely trajectories is assumed close to some mean trajectory. If we
would use other kinds of belief representations, e.g. sample based representations (parti-
cle filters) or mixture of Gaussians, the planning would have a “more global character”.
In conclusion, the inference approach is certainly not miraculously solving the problem of
global planning. It is very much a matter of which approximations and belief representa-
tions are chosen which determine how global this approach is.
Acknowledgements
MT is grateful to Honda RI Europe for their hospitality as a guest scientist. MT also acknowledges
support by the German Research Society (DFG), Emmy Noether fellowship TO 409/1-3.
References
Attias, H. (2003). Planning by probabilistic inference. In C. M. Bishop & B. J. Frey (Eds.), Proc. of the
9th Int. Workshop on Artificial Intelligence and Statistics.
Baerlocher, P. & R. Boulic (2004). An inverse kinematic architecture enforcing an arbitrary number
of strict priority levels. The Visual Computer.
Boutilier, C., R. Dearden, & M. Goldszmidt (1995). Exploiting structure in policy construction. In
Proc. of the 14th Int. Joint Conf. on Artificial Intelligence (IJCAI 1995), pp. 1104–1111.
Bui, H., S. Venkatesh, & G. West (2002). Policy recognition in the abstract hidden markov models.
Journal of Artificial Intelligence Research 17, 451–499.
Chavira, M., A. Darwiche, & M. Jaeger (2006). Compiling relational bayesian networks for exact
inference. International Journal of Approximate Reasoning 42, 4–20.
Guestrin, C., D. Koller, R. Parr, & S. Venkataraman (2003). Efficient solution algorithms for factored
MDPs. Journal of Artificial Intelligence Research (JAIR) 19, 399–468.
Hauskrecht, M., N. Meuleau, L. P. Kaelbling, T. Dean, & C. Boutilier (1998). Hierarchical solution of
Markov decision processes using macro-actions. In Proc. of Uncertainty in Artificial Intelligence
(UAI 1998), pp. 220–229.
Kavraki, L., P. Svestka, J. Latombe, & M. Overmars (1996). Probabilistic roadmaps for path plan-
ning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation 12,
566–580.
Koller, D. & R. Parr (1999). Computing factored value functions for policies in structured MDPs.
In Proc. of the 16th Int. Joint Conf. on Artificial Intelligence (IJCAI 1999), pp. 1332–1339.
Kschischang, Frey, & Loeliger (2001). Factor graphs and the sum-product algorithm. IEEE Transac-
tions on Information Theory 47.
Kuffner, J., K. Nishiwaki, S. Kagami, M. Inaba, & H. Inoue (2003). Motion planning for humanoid
robots. In Proc. 20th Int. Symp. Robotics Research (ISRR’03).
22
Kuffner, J. J. & S. M. LaValle (2000). RRT-connect: An efficient approach to single-query path plan-
ning. In Proc. of IEEE Int’l Conf. on Robotics and Automation.
Kveton, B. & M. Hauskrecht (2005). An MCMC approach to solving hybrid factored MDPs. In Proc.
of the 19th Int. Joint Conf. on Artificial Intelligence (IJCAI 2005).
Li, W., E. Todorov, & X. Pan (2004). Hierarchical optimal control of redundant biomechanical sys-
tems. In 26th Annual Int. Conf. of the IEEE Engineering in Medicine and Biology Society.
Littman, M. L., S. M. Majercik, & T. Pitassi (2001). Stochastic boolean satisfiability. Journal of Auto-
mated Reasoning 27(3), 251–296.
Minka, T. (2001). A family of algorithms for approximate bayesian inference. PhD thesis, MIT.
Murphy, K. (2002). Dynamic bayesian networks: Representation, inference and learning. PhD The-
sis, UC Berkeley, Computer Science Division.
Nakamura, Y. & H. Hanafusa (1986). Inverse kinematic solutions with singularity robustness for
robot manipulator control. Journal of Dynamic Systems, Measurement and Control 108.
Nakamura, Y., H. Hanafusa, & T. Yoshikawa (1987). Task-priority based redundancy control of
robot manipulators. International Journal of Robotics Research 6.
Ng, A. Y., R. Parr, & D. Koller (1999). Policy search via density estimation. In Advances in Neural
Information Processing Systems 12, pp. 1022–1028.
Peters, J., M. Mistry, F. E. Udwadia, R. Cory, J. Nakanishi, & S. Schaal (2005). A unifying framework
for the control of robotics systems. In IEEE Int. Conf. on Intelligent Robots and Systems (IROS
2005), pp. 1824–1831.
Raiko, T. & M. Tornio (2005). Learning nonlinear state-space models for control. In Proc. of Int. Joint
Conf. on Neural Networks (IJCNN 2005).
Theocharous, G., K. Murphy, & L. Kaelbling (2004). Representing hierarchical POMDPs as DBNs
for multi-scale robot localization. In Intl. Conf. on Robotics and Automation (ICRA 2004).
Todorov, E. & W. Li (2004). Hierarchical optimal feedback control of redundant systems. Advances
in Computational Motor Control IV, Extended Abstract.
Toussaint, M. & C. Goerick (2007). Probabilistic inference for structured planning in robotics. In Int
Conf on Intelligent Robots and Systems (IROS 2007).
Toussaint, M., S. Harmeling, & A. Storkey (2006). Probabilistic inference for solving (PO)MDPs.
Research Report EDI-INF-RR-0934, University of Edinburgh, School of Informatics.
Toussaint, M. & A. Storkey (2006). Probabilistic inference for solving discrete and continuous state
Markov Decision Processes. In 23nd Int. Conf. on Machine Learning (ICML 2006).
Verma, D. & R. P. N. Rao (2006). Goal-based imitation as probabilistic inference over graphical
models. In Advances in Neural Information Processing Systems 18 (NIPS 2005).
Wiegerinck, W., B. van den Broek, & H. Kappen (2006). Stochastic optimal control in continuous
space-time multi-agent systems. In Proc. of Conf. on Uncertainty in Artificial Intelligence (UAI
2006).
Yedidia, J., W. Freeman, & Y. Weiss (2001). Understanding belief propagation and its generaliza-
tions.
Zettlemoyer, L., H. Pasula, & L. P. Kaelbling (2005). Learning planning rules in noisy stochastic
worlds. In Proc. of the Twentieth National Conference on Artificial Intelligence (AAAI-05).
23