I presented a novel online POMDP algorithm which
performs at least twice as good as the other algo-
rithms on a particular grid-world problem. The basic
algorithm with mean-as-threshold belief-state com-
pression always collected the most items (ic). How-
ever, because it takes more than twice (for h = 3) or
three times (for h = 4) as long as OUCEF to select
the next action, its effectiveness is significantly be-
low OUCEF’s (w.r.t. ic/s). OUCEF’s nominal perfor-
mance (ic) is comparable with that of the other algo-
rithms over the four experiment parameter combina-
The effectiveness of the OUCEF algorithm is due
to (i) unifying the branches due to nondeterministic
observations by collecting all belief-nodes at the ends
of these branchesinto one set B, and then (ii) selecting
the state most representative of B, by calculating the
expected values of the features of the states in B.
The aspect of this work most in need of attention is
to validate the approach on different benchmark prob-
lems. It might be the case that the OUCEF algorithm
is well suited to the kind of grid-world problems pre-
sented here, but to few other problems. Or it might
be suited to many kinds of problems. This paper is,
however, a first step in introducing and testing the al-
gorithm. At the very least, the new ideas presented
here might lead other researchers to new insights in
their online POMDP algorithm. A theoretical analy-
sis of the optimality of OUCEF is also required and
could lead to interesting insights.
