2 ∗d
m
−d(x,x
i
)
2 ∗d
m
p
. Parameter p is a positive inte-
ger and we have observed experimentally that p = 4
is satisfactory. With p = 4, the value of the spring
variable varies from 1/16 (further points) to 1 (nearest
points). Then, we move point x proportionally to each
spring variable into the direction of vector x −x
i
: the
closer the points are, the more the algorithm spaces
them. Moreover proportionality used is decreasing in
time until a threshold value, from which it becomes
constant.
This process is similar to the minimax approach and
pushes the points outside of the unit cube. Problem
of generating a low dispersion sequence consists then
in a minimization criteria with box constraints. We
apply also these contraints: cube’s borders are repul-
sive in the direction of cube’s center, depending on
the distance between these points and borders.
Applying box Constraints. In order to keep points
inside the hypercube, we apply a repulsive force on
points near borders. These points are detected with
one of their coordinates which is inferior to
d
m
2
+ ε
m
or superior to 1 −
d
m
2
−ε
m
: these values are the coor-
dinates of extremum points of a Sukharev grid with
a tolerance about ε
m
=
d
m
4
. Intensity of this force
has the same proprieties as forces used in step called
Spreading the points. Globally, we perform a local
dispersion minimization which becomes global after
iterating this process.
Avoiding Configuration with Local Minimum. Ap-
plying iteratively these two previous steps can lead to
local minimum and oscillations : a high number of
points can be aligned on the edge. Repulsive forces
push points on the same direction with the same in-
tensity and the trend to push points outside hypercube
at the step called Spreading the points nullifies the ac-
tion of repulsive forces. There are then oscillations of
these points. In order to avoid development of theses
local minimum configurations, after a few number of
iterations, we select randomly a point on each borders
in each dimension and change their coordinate along
these dimensions to a random value near the middle
of cube.
Stopping Criteria: Different stopping criteria
can be used to end the iterations: a maximal num-
ber of iterations, the stabilization of the dispersion or
a minimum threshold on the average changes of the
points during one iteration. This point still requires
some further exploration.
6 CONCLUSIONS
In this paper, we illustrate experimentally the theoret-
ical result established by (Gandar et al., 2009), show-
ing that dispersion is probably a pertinent criterion
for generating samples for classification and we deal
with the question of generating the best low disper-
sion samples. We provide a quite simple algorithm
able to minimize the dispersion for a fixed size se-
quence.In experimental design, grids are usually used
to select sets of parameters for experiments. How-
ever, using grids imposes hard limits to the number
of parameters that can be explored (often less than 6).
We believe that being able to efficiently generate low
dispersion sequences can help in this context, since
the number of points can be fixed to any value and in
any dimension (obviously, the number of points has
be realistic depending on the dimension). In active
learning, the learning algorithm has to select which
training point will be used (implying that its label is
asked for, which has a cost). Most of times, the train-
ing points pre-exist but it happens that one can ask for
any point in the space. In that particular case, given
a limited budget for labels (which provides the max-
imum number of training points), the proposed algo-
rithm could be directly applied. Concerning the se-
lection task, our algorithm can be adapted to be able
to select an existing point nearby the ideal position.
This is our future work.
REFERENCES
Gandar, B., Loosli, G., and Deffuant, G. (2009). How to op-
timize sample in active learning : Dispersion, an opti-
mum criterion for classification ? In European confer-
ence ENBIS European Network for Business and In-
dustrial Statistics.
Johnson, M., Moore, L., and Ylvisaker, D. (1990). Mimi-
max and maximin distance designs. Journal of Statis-
tical Planning Inference, 26(2):131–148.
Lindemann, S. and LaValle, S. (2004). Incrementally
Reducing Dispersion by Increasing Voronoi Bias in
RRTs. In IEEE International Conference on Robotics
and Automation.
Niederreiter, H. (1992). Random Number Generation and
Quasi-Monte Carlo Methods. Society for Industrial
and Applied Mathematics.
Sergent, M., Phan Tan Luu, R., and Elguero, J. (1997). Sta-
tistical Analysis of Solvent Scales. Anales de Quim-
ica, 93(Part. 1):3–6.
Teytaud, O., Gelly, S., and Mary, J. (2007). Active learning
in regression, with application to stochastic dynamic
programming. In Proceedings of International Con-
ference on Informatics to Control, Automation and
Robotics.
ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence
706