forms”, such as Yet Another Robot Platform (YARP)
(Metta et al., 2006; Fitzpatrick et al., 2008), Robot
Operating System (ROS) (Quigley et al., 2009), and
Microsoft Robotics Studio (MSRS) (Jackson, 2007),
have gained widespread popularity. Not only do these
middleware solutions abstract away the details of sen-
sors and actuators, they offer simple network commu-
nication from virtually any language on Mac, Win-
dows or Linux. Robots can be controlled with rela-
tive ease by one or more distributed applications run-
ning on a cluster. By providing hardware abstraction,
YARP, ROS, and MSRS have drastically improved
the efficiency with which experimental robots can be
programmed. In the process of developing behaviors,
we would do well to follow the example set by these
projects, and develop modular, reusable behavioral
components around abstract interfaces.
Rodney Brooks was successful in building au-
tonomous behaviors incrementally, from modu-
lar components with his Subsumption Architecture
(Brooks, 1991). His embodied “Critters” were pre-
dominantly simple mobile robots and they operated
with considerable autonomy in real-world settings. In
this paper we introduce a modular behavioral frame-
work for humanoids and other complex robots, which
is in many ways similar to the Subsumption Architec-
ture.
The Subsumption Architecture is based on asyn-
chronous networks of Finite State Machines (FSM)
and one of its defining characteristics is that the it
does not maintain a world model. Instead, sensors are
connected directly to actuators via the FSM network.
Brooks argues that the world is its own best model,
and the claim is well demonstrated in the domain of
mobile robots. However, we are interested in devel-
oping manipulation behaviors for humanoids, and this
poses a different set of problems than does the control
of a mobile robot.
Consider for a moment the relationship between
the sensory and action spaces of mobile robots and
humanoids respectively. Mobile robots have a few
controllable degrees of freedom (DOF), and are con-
fined to move on a planar surface. They typically
carry a number of cameras or range finding sensors,
arranged radially about the robot and facing outward.
Such a sensor array gives a natural representation of
obstacles and free space around the robot, and be-
havioral primitives can therefore be designed conve-
niently in that same planar space.
A humanoid, on the other hand, has a very large
number of controllable DOF, and operates in 3D
space where an object has 6DOF. Still, it has a similar
sensory system to the mobile robot, an array of cam-
eras or range finders, which capture 2 and 3D projec-
tions of the state of the high dimensional humanoid-
world system. When compared to a mobile robot, the
humanoid is quite information poor with respect to
the size of its action space.
It is for this reason that in contrast to the Sub-
sumption Architecture, we have built MoBeE around
a parsimonious, egocentric, kinematic model of the
robot/world system. The model provides a Cartesian
operational space, in which we can define task rel-
evant states, state changes, cost/objective functions,
rewards, and the like. By computing forward kine-
matics, and maintaining a geometric representation of
the 3D robot/world system, we can define a useful
and general state machine, that does not arise natu-
rally from the “raw” sensory data.
In addition to providing a task space for behav-
iors, the kinematic model is the center of our hub-
and-spokes behavioral architecture (figure 1). We de-
compose behavior into three abstract tasks that corre-
spond to key objectives in Computer Vision, Motion
Planning, and Feedback Control. The Sensor pro-
cesses sensory data (visual data in the experiments
presented here) and reports the world state, the Agent
plans actions that are temporally extended and may
or may not be feasible, and the Controller reacts to
particular world states or state changes, suppressing
commands from the Agent, and issuing its own com-
mands to avoid danger for example. Our implementa-
tion is similar to the subsumption architecture in that
MoBeE tightly integrates planning and control, which
drastically facilitates the development of autonomous,
adaptive behaviors.
In contrast to the Subsumption Architecture how-
ever, the hub-and-spokes model of MoBeE allows us
to easily combine, compare, and contrast different
behavioral modules, even running them on different
hardware, all within the same software framework.
In the following two sections we describe our behav-
ioral decomposition in some detail, according to the
requirements listed at the beginning of this section.
To paraphrase these, the robot must be able to “see”
and “act”.
1.1 To See
When humans “see” an object on the table, it’s not
really the same behavior as when we see a face or a
painting or a page of text. Seeing to facilitate reaches
and grasps implies that we can recognize objects of
interest in images and that we can use the visual in-
formation to build representations of our surround-
ings, which facilitate motion planning. For the pur-
poses of the work presented here, “seeing” will be
considered in terms of two tasks, identifying objects
TheModularBehavioralEnvironmentforHumanoidsandotherRobots(MoBeE)
305