Constraint-handling for Optimization with Support Vector Surrogate
Models
A Novel Decoder Approach
J
¨
org Bremer and Michael Sonnenschein
Department of Computing Science, University of Oldenburg, Uhlhornsweg, Oldenburg, Germany
Keywords:
Smart Grid, Soft Computing, Evolutionary Optimization, Constraint Modeling, SVDD.
Abstract:
A new application for support vector machines is their use for meta-modeling feasible regions in constrained
optimization problems. We here describe a solution for the still unsolved problem of a standardized integration
of such models into (evolutionary) optimization algorithms with the help of a new decoder based approach.
This goal is achieved by constructing a mapping function that maps the whole unconstrained domain of a given
problem to the region of feasible solutions with the help of the the support vector model. The applicability to
real world problems is demonstrated using the load balancing problem from the smart grid domain.
1 INTRODUCTION
A popular class of commonly used heuristics for solv-
ing hard optimization problems is known as evolu-
tionary search methods. These methods usually work
with candidate solutions that encode each parameter
within an allowed interval between a lower and an up-
per limit and try to improve them within these bounds.
Thus, all solutions are defined in a d-dimensional hy-
percube. Nevertheless, due to additional constraints,
not all of these solutions are usually feasible. Effec-
tively solving real world optimization problems often
suffers from the presence of constraints that have to
be obeyed when exploring alternative solutions. Evo-
lutionary algorithms have been widely noticed due to
their potential for solving complex (discontinuous or
non differentiable) numerical functions. However, a
full success in the field of nonlinear programming
problems is still missing, because constraints have not
been addressed for integration in a systematic way
(Michalewicz and Schoenauer, 1996; Kramer, 2010).
For constraint-handling techniques, the gen-
eral constrained continuous nonlinear programming
(NLP) problem is often used as problem formulation:
Find x R
d
that optimizes f (x) subject to a set of
constraints:
equalities: g
i
(x) = 0; 1 i m
inequalities: g
j
(x) 0; 1 j n.
(1)
Real world problems often additionally face nonlinear
constraints or such that are not given in an explicit
formulation. One example for a not explicitly given
constraint is a simulation model that devalues given
solutions as not feasible judged by simulation runs.
We are going to focus (but not restrict ourselves) on
the latter type.
In general, the set of constraints defines an ar-
bitrary shaped region within the search space (the
hypercube defined by parameter bounds) that con-
tains all feasible solutions. Taking into account non-
linear constraints, the NLP is generally intractable
(Michalewicz and Schoenauer, 1996). Evolutionary
Algorithms approximately solve non linear optimiza-
tion very efficiently. Nevertheless, surprisingly low
effort has been put in the integration of constraint
handling techniques (cf. (Kramer, 2010)). Standard
constraint-handling techniques are for example the in-
troduction of a penalty for infeasible solutions, the
separation of objectives and constraints to transforms
a given optimization problem into an unconstrained
many-objective one, or decoder approaches to give an
algorithm hints on how to construct feasible solutions
by imposing a relationship between feasibility and de-
coder solution.
At the same time, support vector machines and
related approaches have been shown to have excel-
lent performance when trained as classifiers for mul-
tiple purposes, especially real world problems. As a
use case related to describing the region where some
given data resides in, (Tax and Duin, 1999) devel-
oped the support vector domain description (SVDD)
as a one-class support vector classification approach
91
Bremer J. and Sonnenschein M..
Constraint-handling for Optimization with Support Vector Surrogate Models - A Novel Decoder Approach.
DOI: 10.5220/0004241100910100
In Proceedings of the 5th International Conference on Agents and Artificial Intelligence (ICAART-2013), pages 91-100
ISBN: 978-989-8565-39-6
Copyright
c
2013 SCITEPRESS (Science and Technology Publications, Lda.)
that is capable of learning the region that is defined
by some given training data and has therefore been
harnessed for example by (Bremer et al., 2011) as a
model for the feasible region. This model so far only
allows for afterwards checking the feasibility of an al-
ready given solution.
What we will now add to these two worlds is a
new decoder approach for integrating constraints that
are modeled by a support vector classifier into evolu-
tionary optimization in a standardized way. The basic
idea is to construct a mapping from the original, un-
constrained domain of the problem (the hypercube) to
the feasible space. In this way, the problem is trans-
ferred into an unconstrained one. All that we need for
constructing this mapping is the set of support vectors
together with the associated weights from the model.
After a brief review of constraint handling tech-
niques and black-box modelling with support vector
approaches, we introduce the underlying model and
describe the construction of our mapping approach in
detail. We present results from several test scenar-
ios with artificial optimization problems and conclude
with results from applying our method to the load bal-
ancing problem in smart grid scenarios.
2 RELATED WORK
Several techniques for handling constraints are
known. Nevertheless, many are concerned with spe-
cial cases of NLP or require priori knowledge of the
problem structure for proper adaption (Michalewicz
and Schoenauer, 1996). We will briefly discuss some
prominent representatives of such techniques. A good
overview can for instance be found in (Coello Coello,
2002) or, more recently, in (Kramer, 2010).
2.1 Constraint-handling
We will start with an overview on some currently
used conventional constraint handling techniques fol-
lowed by an introduction to the upcoming black-box
approach of meta-modelling constraints.
Penalty. A widely and long since used approach for
constraint handling is the introduction of a penalty
into the objective function that devalues all solutions
that violate some constraint. In this way, the prob-
lem is transformed into an unconstrained one. Most
commonly used are exterior penalties that draw out-
side solutions towards the feasible region in contrast
to interior ones that keep solutions inside, but require
to start with a feasible solution (Coello Coello, 2002).
Separation of Objectives and Constraints. Con-
straints or aggregations of constraints may be treated
as separate objectives. This leads to a transformation
into a (unconstrained) many objective problem. Such
approaches have some computational disadvantages
from determining Pareto optimality or may lack the
ability (in the case of a disjoint region) to escape a
sub-region (Coello Coello, 2002). Moreover, a func-
tional description of constraints must be known here
in advance. Such description is not readily available
when using surrogate models that hide original rela-
tions and only model the original behaviour.
Solution Repair. Some combinatorial optimization
problem allow for an easy repair of infeasible solu-
tions. In this case, it has been shown that repairing in-
feasible solutions often outperforms other approaches
(Liepins and Vose, 1990). This approach is closely
related to the decoder based approaches.
Decoder. In order to give an algorithm hints an how
to construct a solution, sometimes so called decoders
impose a relationship between feasibility and decoder
solutions. For example, (Koziel and Michalewicz,
1999) proposed a homomorphous mapping between
an n-dimensional hyper cube and the feasible region
in order to transform the problem into an topological
equivalent one that is easier to handle, although with a
need for extra parameters that have to be found empir-
ically and with some extra computational efforts. In
contrast, we will see later how a similar approach can
be automatically derived from a given support vector
description. Earlier approaches e.g. used Riemannien
mapping (Kim, 1998).
2.2 Meta-modelling of Constraints
A relatively new constraint handling technique is the
use of meta-models for black-box optimization sce-
narios with no explicitly given constraint boundaries.
Such a model allows for efficiently checking feasi-
bility and may allow for a search for the constraint
boundary between a feasible and an infeasible solu-
tion in case a repair of a mutation is needed.
Various classification or regression methods might
be harnessed for creating such models for the bound-
ary (Kramer, 2010). There are two main reasons for
using such an approach:
1. Substituting computational costs for constraint
evaluation by a comparatively easy check through
the model in case of computational hard constraint
evaluation (e.g. in technical simulations).
2. Efficient communication in distributed environ-
ments and for spreading one’s own set of alter-
native solutions to other actors in a distributed op-
timization scenario (e.g. in multi agent systems).
ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence
92
An example from the smart grid (the prospective fu-
ture electricity grid) domain for the latter case that has
been realized by an SVDD approach can be found in
(Bremer et al., 2011). Besides, this domain serves as
example for scenarios with (at least partly) unknown
functional relationships of the constraints. The feasi-
ble region can sometimes only be derived with lack-
ing full knowledge on hidden variables or intrinsic re-
lations that determine the operability of a electric de-
vice and therewith the feasible region. This is due
to the fact that the feasibility of a solution is derived
from a technical simulation of the energy resources
in order to gain a set of alternative feasible solutions
(Bremer et al., 2010). The authors therefore have their
model learned by SVDD from a set of operable (fea-
sible) examples; (Blank et al., 2011) used a two-class
SVM for learning operation point and bias (regard-
ing allowed voltage and current bands) of a line in a
power grid for easier classifying a grid state as fea-
sible or not. In both cases, at the time of searching
for the optimum, the only available information is the
model, i.e. a set of support vectors and associated
weights. Every information about the original con-
straints is no longer available in such scenarios.
2.3 Load Balancing in Smart Grids
For our real world use case we will briefly review load
balancing approaches. Within the framework of to-
day’s (centralized) operation planning for power sta-
tions, different heuristics are harnessed. Short-term
scheduling of different generators assigns (in its clas-
sical interpretation) discrete-time-varying production
levels to energy generators for a given planning hori-
zon (Pereira et al., 2008). It is known to be an NP-
hard problem (Guan et al., 2003). Determining an
exact global optimum is, in any case, not possible in
practice until ex post due to uncertainties and fore-
cast errors. Additionally, it is hard to exchange oper-
ational constraints in case of a changed setting (e.g. a
new composition of energy resources) of the genera-
tion system.
Coordinating a pool of distributed generators and
consumers with the intend to provide a certain aggre-
gated load schedule for active power has some ob-
jective similarities to controlling a virtual power plant
(VPP). Within the smart grid domain the volatile char-
acter of such a group of distributed energy resources
(DER) has additionally to be taken into account. On
an abstract level, approaches for controlling groups of
distributed devices can be roughly divided into cen-
tralized and distributed scheduling algorithms.
Centralized approaches have long time dominated
the discussion (Tr
¨
oschel and Appelrath, 2009) and
are discussed in the context of static pools of DER
with drawbacks and restrictions regarding scalabil-
ity and particularly flexibility. Recently, distributed
approaches gained more and more importance. Dif-
ferent works proposed hierarchical and decentralized
architectures based on multi-agent systems and mar-
ket based computing (Kok et al., 2008). Newer ap-
proaches try to establish self-organization between
actors within the grid (Kamper and Esser, 2009; Mi-
hailescu et al., 2011; Ramchurn et al., 2011).
In load balancing scenarios, a scheduling algo-
rithm (centralized or distributed) must know for each
participating energy resource which load schedules
are actually operable (satisfy all constraints) and
which are not. Each energy resource has to restrict its
possible operations due to several constraints. These
can be distinguished into hard constraints (usually
technically rooted, e.g. minimum and/or maximum
power input or output) and soft constraints (often eco-
nomically or ecologically rooted, e.g. personal pref-
erences like noise pollution in the evening).
When determining an optimal partition of the
schedule for load distribution, an alternative sched-
ule is taken from each DER’s search space of indi-
vidual operable schedules (individual feasible region)
in order to assemble a desired aggregate load sched-
ule. The feasible region of each DER might be effi-
ciently encoded by SVDD. The model of each DER is
so far submitted to a scheduling instance and used for
checking feasibility of solutions. The question for a
direct integration of these models in any optimization
algorithm has as yet not been answered.
3 MAPPING ALGORITHM
We will now describe the integration of a SVDD
based black-box model into an arbitrary evolution-
ary optimization algorithm with proper and effective
constraint handling. The integration will not rely on
any knowledge on the set of constraints as these are
hidden by the model. We here propose handling the
constraints in a different way. When learning the re-
gion that contains all feasible solutions by harnessing
SVDD, the topological traits of this region are cap-
tured and encoded in a set of support vectors and a
weight vector. We will harness this implicitly inher-
ent information to transform the original parameter
hypercube to resemble this region.
3.1 SVDD-Model for Feasible Regions
As a prerequisite for our mapping, we assume that the
feasible region of an optimization problem has been
Constraint-handlingforOptimizationwithSupportVectorSurrogateModels-ANovelDecoderApproach
93
encoded by SVDD as e.g. described in (Bremer et al.,
2011). We will briefly describe this approach before
deriving our new utilization method. Given a set of
data samples x
i
X , the inherent structure of the re-
gion where the data resides in is derived as follows:
After mapping the data to a high dimensional feature
space, the smallest images enclosing sphere is deter-
mined. When mapping back the sphere to data space,
its pre-image forms a contour (not necessarily con-
nected) enclosing the data sample.
This task is achieved by determining a mapping
Φ : X R
d
H ,x 7→ Φ(x) such that all data from
a sample from a region X is mapped to a minimal
hypersphere in some high-dimensional space H . The
minimal sphere with radius R and center a in H that
encloses {Φ(x
i
)}
N
can be derived from minimizing
kΦ(x
i
)ak
2
R
2
+ξ
i
with k·k as the Euclidean norm
and slack variables ξ
i
0 for soft constraints.
After introducing Lagrangian multipliers and fur-
ther relaxing to the Wolfe dual form, the well known
Mercer’s theorem (cf. e.g. (Sch
¨
olkopf et al., 1999))
may be used for calculating dot products in H by
means of a kernel in data space: Φ(x
i
) · Φ(x
j
) =
k(x
i
,x
j
). In order to gain a more smooth adaption,
it is known to be advantageous to use a Gaussian ker-
nel: k
G
(x
i
,x
j
) = e
1
2σ
2
kx
i
x
j
k
2
(Ben-Hur et al., 2001).
Putting it all together, the equation that has to be max-
imized in order to determine the desired sphere is:
W (β) =
i
k(x
i
,x
i
)β
i
i, j
β
i
β
j
k(x
i
,x
j
). (2)
With k = k
G
we get two main results: the center a =
i
β
i
Φ(x
i
) of the sphere in terms of an expansion into
H and a function R : R
d
R that allows to determine
the distance of the image of an arbitrary point from
a H , calculated in R
d
by:
R
2
(x) = 1 2
i
β
i
k
G
(x
i
,x) +
i, j
β
i
β
j
k
G
(x
i
,x
j
). (3)
Because all support vectors are mapped right onto the
surface of the sphere, the sphere radius R
S
can be eas-
ily determined by the distance of an arbitrary support
vector to the center a. Thus the feasible region can
now be modeled as F = {x R
d
|R(x) R
S
} X .
So far, such models have for example been used
for efficiently communicating the feasible region of
controllable energy resources in smart grid scenarios
(Bremer et al., 2011). Only the comparably small
set of support vectors together with a reduced ver-
sion of vector β that contains non zero weight val-
ues (denoted w) for the support vectors has to be sub-
mitted. The model might then be used as a black-
box that abstracts from any explicitly given form of
constraints and allows for an easy and efficient deci-
sion on whether a given solution is feasible or not.
Moreover, as the radius functions Eq. 3 maps to
R, it allows for a conclusion about how far away a
solution is from feasibility. Nevertheless, a system-
atic constraint-handling during optimization is not in-
duced in this way. In the following, we will develop
a more sophisticated way of integrating such SVDD
surrogate models into optimization.
3.2 The Decoder Approach
Let F denote the feasible region within the parameter
domain of some given optimization problem bounded
by an associated set of constraints. We do not make
any assumptions on the constraints. It is known, that
pre-processing the data by scaling it to [0,1]
d
leads to
better adaption (Juszczak et al., 2002). According to
(Bremer et al., 2011), some energy domain problems
require a rescaling of the domain to [0,1]
d
for easier
handling, too. For this reason, we consider optimiza-
tion problems with scaled domains and denote with
F
[0,1]
the likewise scaled region of feasible solutions.
We want to construct a mapping
γ : [0,1]
d
F
[0,1]
[0,1]
d
; x 7→ γ(x) (4)
that is able to map the unit hypercube [0, 1]
d
onto
the d-dimensional region of feasible solutions. We
achieve this mapping as a composition of three
functions: γ = Φ
1
`
Γ
a
ˆ
Φ
`
. Instead of trying to find
a direct mapping to F
[0,1]
we go through the kernel
space. The commutative diagram (Eq. 5) sketches the
idea. We start with an arbitrary point x [0, 1]
d
from
the unconstrained d-dimensional hypercube and map
it to an `-dimensional manifold in kernel space that is
spanned by the images of the ` support vectors. After
drawing the mapped point to the sphere in order to
pull it into the image of the feasible region, we search
the pre-image of the modified image to get a point
from F
[0,1]
.
x [0,1]
d
ˆ
Φ
`
-
ˆ
Ψ
x
H
(`)
x
F
[0,1]
[0, 1]
d
γ
?
Φ
1
`
˜
Ψ
x
H
(`)
Γ
a
?
(5)
3.2.1 Mapping to the SV Induced Subspace H
(`)
with an Empirical Kernel Map
We will now have a closer look onto the respective
steps of this procedure. Let
Φ
`
: R
d
R
`
,
x 7→ k(.,x)|
{s1,...,s
`
}
= (k(s
1
,x), ... ,k(s
`
,x))
(6)
ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence
94
be the empirical kernel map w.r.t. the set of support
vectors {s
1
,..., s
`
}. If Φ
`
is modified to
ˆ
Φ
`
: R
d
H
(`)
,
x 7→ K
1
2
(k(s
1
,x), ... ,k(s
`
,x))
(7)
with K
i j
= k(s
i
,s
j
): the kernel Gram Matrix, then Eq.
7 maps points x, y from input space to R
`
, such that
k(x, y) =
ˆ
Φ
`
(x) ·
ˆ
Φ
`
(y) (cf. (Sch
¨
olkopf et al., 1999)).
With
ˆ
Φ
`
we are able to map arbitrary points from
[0,1]
d
to some `-dimensional space H
(`)
that contains
a lower dimensional projection of the sphere. Again,
points from F
[0,1]
are mapped into or onto the pro-
jected sphere, outside points go outside the sphere and
must be moved in H
(`)
towards the center in order to
draw them into the image of the feasible region.
3.2.2 Re-adjustment in Kernel Space
In general, in kernel space H the image of the re-
gion is represented as a hypersphere S with center
a and radius R
S
(Eq. 3). Points outside this hyper-
sphere are not images of points from X , i.e. in our
case, points from F
[0,1]
are mapped (by Φ) into the
sphere or onto its surface (support vectors), points
from outside F
[0,1]
are mapped outside the sphere.
Actually, using a Gaussian kernel, Φ maps each point
into a n-dimensional manifold (with sample size n)
embedded into infinite dimensional H . In principle,
the same holds true for a lower dimensional embed-
ding spanned by ` mapped support vectors and the
`-dimensional projection of the hypersphere therein.
We now want to pull points from outside the fea-
sible region into that region. As we do have rather
a description of the image of the region, we draw im-
ages of outside points into the image of the region, i.e.
into the hypersphere; precisely into its `-dimensional
projection. For this purpose we use
˜
Ψ
x
= Γ
a
(
ˆ
Ψ
x
) =
ˆ
Ψ
x
+ µ · (a
ˆ
Ψ
x
) ·
R
x
R
S
R
x
(8)
to transform the image
ˆ
Ψ
x
produced in step 1) into
˜
Ψ
x
ˆ
Φ
`
(F
[0,1]
) by drawing
ˆ
Ψ
x
into the sphere. Alter-
natively, the simpler version
˜
Ψ
x
= a +
(
ˆ
Ψ
x
a) · R
S
R
x
(9)
may be used for drawing
ˆ
Ψ
x
just onto the sphere but
then without having to estimate parameter µ [1,R
x
].
Parameter µ allows us to control how far a point is
drawn into the sphere (µ = 1 is equivalent to eq. 9,
µ = R
x
draws each point onto the center). In this way,
each image is re-adjusted proportional to the original
distance from the sphere and drawn into the direction
of the center.
Points from the interior are also moved under
mapping gamma in order to compensate for additional
points coming from the exterior. In this way, the
whole unit hypercube is literally squeezed to the form
of the feasible region without a too large increasing
of the density at the boundary. Though, if the fea-
sible region is very small compared with the hyper-
cube, density at the boundary increases (depending on
the choice of µ). On the other hand, the likelihood of
an optimum being at the boundary increases likewise.
So, this might be a desired effect.
After this procedure we have
˜
Ψ
x
which is the
image of a point from F
[0,1]
in terms of a modified
weight vector ˜w
Γ
a
.
3.2.3 Finding an Approximate Pre-image
As a last step, we will have to find the pre-image of
˜
Ψ
x
in order to finally get the wanted mapping to F
[0,1]
.
A major problem in determining the pre-image of a
point from kernel space is that not every point from
the span of Φ is the image of a mapped data point
(Sch
¨
olkopf et al., 1999). As we use a Gaussian ker-
nel, none of our points from kernel space can be re-
lated to an exact pre-image except for trivial expan-
sions with only one term (Kwok and Tsang, 2004).
For this reason, we will look for an approximate pre-
image whose image lies closest to the given image
using an iterative procedure after (Mika et al., 1999).
In our case (Gaussian kernel), we iterate x
to find the
point closest to the pre-image and define approxima-
tion Φ
1
`
by equation
x
n+1
=
`
i=1
( ˜w
Γ
a
i
e
−ks
i
x
n
k
2
/2σ
2
s
i
)
`
i=1
( ˜w
Γ
a
i
e
−ks
i
x
n
k
2
/2σ
2
)
. (10)
As an initial guess for x
0
we take the original point x
and iterate it towards F
[0,1]
. As this procedure is sen-
sitive to the choice of the starting point, it is important
to have x as a fixed starting point in order to ensure de-
terminism of the mapping. Empirically, x has showed
up to be a useful guess.
Finally, we have achieved our goal to map an ar-
bitrary point from [0, 1]
d
into the region of feasible
solutions described merely by a given set of support
vectors and associated weights: x
n
is the sought after
image under mapping γ of x that lies in F
[0,1]
.
We will now use this method for a decoder ap-
proach that transforms constrained optimization prob-
lems into unconstrained ones by automatically con-
structing mapping γ from a SVDD model of the feasi-
ble region that has been learned from a set of feasible
example solutions.
Constraint-handlingforOptimizationwithSupportVectorSurrogateModels-ANovelDecoderApproach
95
(a) (b) (c)
Figure 1: 1(a): Sample from a artificial double banana shaped region. 1(b): Re-sampling the feasible region by mapping
random points from [0,1]
2
. 1(c): marked optima of the Six-hump camel back objective function withing the used domain.
4 EXPERIMENTS
We present evaluation results of our decoder method,
starting with several theoretical test cases and results
from the smart grid load balancing problem.
4.1 General Test Cases
We started testing our method with several artificially
constrained optimization problems and test functions.
We consider optimization problems as described in
section 1 and use the above described procedure as
constraint handling technique, i.e. we transform prob-
lem Eq. 1 into an unconstrained optimization problem
by applying mapping γ:
optimize f (γ(x)) , s.t. x [0, 1]
d
. (11)
Of course, the restriction to the unit hypercube still
entails a box constraint, but as these are easily han-
dled by almost all algorithm implementations, this is
not a serious obstacle. In the case of evolutionary al-
gorithms, the method can be easily applied by defin-
ing the neighborhood in [0, 1]
d
and search the whole
unit hypercube, but evaluate a solution x at the po-
sition γ(x). Note that mapping γ(x) generates a fea-
sible solution regardless of the choice of x. There-
fore optimization might always start with an arbitrary
(randomly chosen) x [0, 1]
d
without having to find a
feasible start solution first.
For some first tests, we generated random samples
from toy regions and used them as training sets for
SVDD. The retrieved support vectors and weights are
taken as a model for feasible region F
[0,1]
as it would
be determined by constraints in real world problems.
Figure 1(a) shows an example with a 2-dimensional
double banana set. With these models, we constructed
our mapping γ. As a first test, a set of 1 million
equally distributed points has been randomly picked
from [0,1]
2
and mapped. Figure 1(b) shows the result
with mapped points depicted in green.
(a) (b)
Figure 2: 2(a): Randomly generated double ring data set
as representation of a second toy region. Again, the orange
line denotes the learned boundary that encloses the feasible
space. 2(b): Mapping a mesh with grid size 0.02 into the
learned region of a double ring data set.
Next, we applied standard particle swarm opti-
mization (PSO) (Kennedy and Eberhart, 1995) and
standard artificial bee colony (ABC) optimization
(Karaboga and Basturk, 2007) in order to find optima
of several standard test objective functions. For this
purpose, both algorithms have been equipped with
mapping γ in order to evaluate solutions at the cor-
responding position inside F
[0,1]
, while the topology
of the neighbourhood that both algorithms operate
on is defined as the whole unit hypercube. Figure
1(c) shows an example of found optima for the above
sketched setting (both succeeded equally good). In
this case, the well known Six-hump camel back func-
tion
f
CB
(x
1
,x
2
) = (4 2.1x
2
1
+
x
4
1
3
)x
2
1
+ x
1
x
2
+ (4 + 4x
2
2
)x
2
2
(12)
has been used with the domain 1.9 x
1
1.9, 1.1 x
2
1.1 scaled to [0,1]
2
.
As a second test case, double ring data sets (Fig-
ure 2(a)) have been generated. The contour plot in
the background shows as objective function the well
ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence
96
known Shubert function (Michalewicz, 1996). For the
depicted configuration, different almost equally good
local optima are situated near different distant posi-
tions at the boundary of the feasible region. Never-
theless, all algorithms equipped with mapping γ suc-
ceeded in finding optima inside (or at the boundary
of) the feasible region.
As a next step, we compared how fast a solu-
tion converges with a mapped objective function. We
compared the performance, i.e. the speed of conver-
gence, with exterior penalty approaches. Such a con-
straint handling approach entails additional penalty
values to solutions outside the feasible region. Ac-
cording to (Richardson et al., 1989), a penalty func-
tion that reflects the distances from the feasible re-
gion, is supposed to lead to better performance.
Therefore, we have chosen the distance function of
the SVDD (Eq. 3) as the penalty that attracts an out-
side solution to the feasible region. As we do not have
any information on the original constraints, it is not in
general possible to model penalties based on the num-
ber of or based on any individual constraint.
Figure 3: Speed of convergence for solving the scenario
depicted in Figure 2 by a PSO. Top: PSO with mapping γ;
bottom: PSO with exterior penalty relative to the distance
from the sphere center.
Figure 3 shows the result of 1 million test runs on
a scenario with double rings and Shubert as objective.
The PSO algorithm converges faster with mapping
than with penalty. The whole swarm operates com-
pletely inside the feasible region from the beginning
when using a mapped objective function while re-
taining normal swarm behaviour in [0,1]
d
. The same
holds true for the ACO case.
Tables 1 to 3 show further results. Table 1 fo-
cusses on the population size of swarm based ap-
proaches. Table 2 shows further results (the lower
the better) on various combinations of algorithms and
Table 1: PSO and ABC and their absolute fitnesses (lower
is better) after n iterations for the Shubert function as test
function and double banana as constraint region.
ALGORITHM n PENALTY MAPPING
PSO 5 70.912 ± 89.714 -13.360±0.256
10 24.734 ± 68.949 -13.435±0.278
25 5.923 ± 47.904 -13.446±0.28
50 0.013 ± 41.368 -13.477±0.286
ABC 5 67.189 ± 89.783 -13.310±0.17
10 22.128 ± 64.272 -13.367±0.137
25 -2.897 ± 35.965 -13.496±0.175
50 -10.236 ± 15.765 -13.596±0.153
Table 2: Results for different objective functions and Algo-
rithms. A double banana set has been used for all objectives.
ABC: 87.98 and PSO: 83.22 percent were valid solutions
for the penalty case; mapping: 100 percent.
OBJECTIVE ALG. PENALTY MAPPING
SHUBERT PSO 5 -1.63 ± 38.95 -13.3 ± 0.41
SHUBERT ABC 5 -11.59 ± 8.35 -13.6 ± 0.35
SHUBERT PSO 50 -13.99 ± 0.06 -13.87 ± 0.13
SHUBERT ABC 50 -13.93 ± 0.08 -13.89 ± 0.01
BRANIN PSO 5 135.39 ± 80.35 36.36 ± 0.91
BRANIN ABC 5 46.91 ± 33.78 36.15 ± 0.19
ZAKHAROV PSO 5 1.54 ± 14.9 0.39 ± 0.08
ZAKHAROV ABC 5 0.51 ± 3.9 0.38 ± 0.0
SHEKELS FOXHOLES PSO 5 12.68 ± 0.01 12.67 ± 0.01
SHEKELS FOXHOLES ABC 5 12.67 ± 0.01 12.67 ± 0.0
BOHACHEVSKY 2 PSO 5 0.93 ± 11.21 0.26 ± 0.1
BOHACHEVSKY 2 ABC 5 0.4 ± 2.76 0.24 ± 0.01
BOHACHEVSKY 2 PSO 50 0.24 ± 0.0 0.24 ± 0.0
BOHACHEVSKY 2 ABC 50 0.24 ± 0.01 0.24 ± 0.0
HIMMELBLAU PSO 5 179.84 ± 30.05 137.6 ± 1.96
HIMMELBLAU ABC 5 145.08 ± 14.38 136.03 ± 1.1
Table 3: Results for different objective functions and Al-
gorithms. This time, the double ring has been used for all
objectives. ABC: 91.71 and PSO: 98.4 percent were valid
for the penalty case; mapping 100 percent.
OBJECTIVE ALG. PENALTY MAPPING
SHUBERT PSO 5 -13.61 ± 9.02 -14.07 ± 0.75
SHUBERT ABC 5 -14.14 ± 0.49 -14.25 ± 0.22
SHUBERT PSO 50 -14.43 ± 0.0 -14.43 ± 0.01
SHUBERT ABC 50 -14.43 ± 0.0 -14.42 ± 0.01
BRANIN PSO 5 43.61 ± 40.24 33.14 ± 0.17
BRANIN ABC 5 33.56 ± 1.73 33.12 ± 0.02
ZAKHAROV PSO 5 0.18 ± 0.15 0.14 ± 0.01
ZAKHAROV ABC 5 0.18 ± 0.08 0.14 ± 0.0
SHEKELS FOXHOLES PSO 5 12.67 ± 0.01 12.67 ± 0.0
SHEKELS FOXHOLES ABC 5 12.67 ± 0.0 12.67 ± 0.0
BOHACHEVSKY 2 PSO 5 0.43 ± 0.22 0.32 ± 0.05
BOHACHEVSKY 2 ABC 5 0.38 ± 0.11 0.28 ± 0.02
BOHACHEVSKY 2 PSO 50 0.27 ± 0.0 0.27 ± 0.0
BOHACHEVSKY 2 ABC 50 0.28 ± 0.01 0.27 ± 0.0
HIMMELBLAU PSO 5 127.11 ± 19.04 121.84 ± 0.31
HIMMELBLAU ABC 5 123.1 ± 1.69 121.74 ± 0.06
further objective functions for the case of the double
banana dataset. Table 3 shows the respective results
for the ring data set. Stated are respectively absolute
achieved fitnesses, thus the ratio between mapping
and penalty is to be compared. A major drawback of
penalty approaches is the fact that it not always con-
verges to a feasible solution. Whereas the mapping
method always served feasible solutions, the penalty
approach failed in up to 17 percent of the test runs.
Constraint-handlingforOptimizationwithSupportVectorSurrogateModels-ANovelDecoderApproach
97
Nevertheless, the inaccuracy inherent in the model
from learning the region still remains an inaccuracy
for mapping. But, the same holds true for all ap-
proaches that are based on such a surrogate model,
including the penalty approach. Although, we made
the observation that γ performs better at sharp edges
than the decision boundary Eq. 3 (cf. Figure 1(b)).
Figure 2(b) shows the result of mapping a regular
mesh from [0,1]
2
onto the double rings. The mapped
mesh shows how points from different parts of the
feasible region become neighbours under the γ by by-
passing the infeasible region inside the rings.
Figure 4: A typical result for a higher dimensional test case;
dashed: penalty approach.
Figure 4 shows some results for test runs on an 8-
dimensional problem with a stretched ellipse as fea-
sible region and Himmelblau (Himmelblau, 1972) as
objective function. We compared artificial bee algo-
rithms with 5 individuals for the mapping case and
200 individuals for the penalty case. Nevertheless, the
mapping approach performs better and some penalty
runs still converged to a infeasible solution, showing
the superiority of the mapping approach.
4.2 The Smart Grid Use Case
Next, we applied our method to the following real
world problem from the smart grid domain: An indi-
vidual schedule has to be determined for each mem-
ber of a pool of micro-co-generation (CHP) plants
such that the aggregated electric load schedule of all
plants resembles a given (probably demanded by mar-
ket) target schedule in an optimal way. For the sake
of simplicity, we will consider optimality as a close as
possible adaption of the aggregated (sum of individ-
ual loads) schedule to the requested on. Optimality
usually refers to additional local (individual cost) as
well as to global (e.g. environmental impact) objec-
tives. When determining an optimal partition of the
schedule for load distribution, exactly one alternative
schedule is taken from each generators search space
of individual operable schedules in order to assemble
the desired aggregate schedule.
Therefore, the optimization problem is: finding
any combination of schedules (one from each DER
with X
i
as the set of possible choices) that resembles
the target schedule l
T
as close as possible, i.e. mini-
mize the Euclidean distance between aggregated and
target schedule:
k
i
x
i
l
T
k min, s.t. x
i
X
i
. (13)
Of course, each generator has individual constraints
such as time varying buffer charging, power ranges,
minimum ON/ OFF times, etc. Thus, we simu-
lated individual plants. For our simulations, we used
simulation models of modulating CHP-plants (com-
bined heat and power generator capable of varying the
power level) with the following specification: Min. /
max. electrical power: 1.3 / 4.7 kW, min. / max.
thermal power: 4 / 12.5 kW; after shutting down, the
device has to stay off for at least 2 hours.
Figure 5: Relationship of electrical and thermal power for
an EcoPower CHP; modified after (Thomas, 2007).
The relationship between electrical (active) power
and thermal power was modeled after Figure 5. In
order to gain enough degrees of freedom for vary-
ing active power, each CHP is equipped with an 800
litre thermal buffer store. Thermal energy consump-
tion is modeled and simulated by a model of a de-
tached house with its several heat losses (heater is
supposed to keep the indoor temperature on a con-
stant level) and randomized warm water drawing for
gaining more diversity among the devices.
For each simulated household, we implemented
an agent capable of simulating the CHP (and sur-
roundings and auxiliary devices) on a meso-scale
level with energy flows among different model parts
but no technical details. All simulations have so far
been done with a time resolution of 15 minutes for
different forecast horizons. Although, our method is
indifferent about any such time constraints. We have
run several test series with each CHP randomly ini-
tialized with different buffer charging levels, temper-
atures and water drawing profiles. The feasible spaces
of individual CHP had been encoded with the SVDD
approach. These support vector models have then
been used for the search for optimal schedules: with
a penalty approach on the one hand and with the pro-
posed mapping on the other. Figure 6 shows a typical
result. We used a co-variance matrix adaption evolu-
tion strategy (CMA-ES) approach (Hansen, 2006) for
finding combinations of schedules that best resemble
the dashed target schedule in the top chart (Figures
ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence
98
6(a) and 6(b)). Both seem to have equally good re-
sults, but, looking at individual loads (in middle) and
the temperatures (bottom) reveals that the penalty ap-
proach gets easily stuck at an (at least partly) infeasi-
ble solution whereas the mapping approach succeeds
with feasible solutions. This effect amplifies with
the number of plants and therefore with the number
of used penalties. Moreover, the mapping approach
most times converges faster as Figure 6(c) shows for
this specific example.
Considering the complexity, additional computa-
tional costs are entailed on solution evaluation. Step
2 growing quadratically with the number of support
vectors ` is decisive together with the number of iter-
ations necessary for finding the pre-image in step 3.
Empirically, during our experiments, we observed for
instance a mean number of 6.75 ± 0.3 for the case of
the 2-dimensional double banana and 36.3 ± 26.4 for
the case of a stretched 8-dimensional ellipse in order
to reach convergence with 10
8
accuracy. Addition-
ally, this number reduces in the course of optimization
as soon as the evolution approaches feasible space.
Otherwise, fewer function evaluations are necessary
with our decoder approach, because we never evalu-
ate infeasible solutions and we do not have to check
feasibility during optimization. Both effects put into
perspective the computational costs.
5 CONCLUSIONS
Many real world optimization problems face the ef-
fect of constraints that restrict the search space to
an arbitrary (nonlinear) shaped possibly disjoint re-
gion that contains the feasible solutions. Conven-
tional constraint handling techniques often require the
set of constraints to be a priori known and are hardly
applicable for black-box models of feasible regions.
Although penalties may be used with such models,
the task of correctly tuning the objective with these
additional losses stays an error prone job due to the
unknown nature of the original constraints that are no
longer known at optimization time.
We proposed a new constraint handling technique
for support vector modeled search spaces and demon-
strated its applicability and usefulness with the help
of theoretical test problems as well as for a real world
optimization problem taken from the smart grid do-
main. The major benefit of this approach is the univer-
sal applicability for problem transformation, solution
repair and standardized integration in arbitrary evo-
lutionary algorithms by constructing a modified ob-
jective function and treating the whole unconstrained
domain as valid for search. So far, we have restricted
(a)
(b)
(c)
Figure 6: Results from a smart grid load balancing scenario
with a standard penalty constraint-handling technique on
the left (6(a)) and the mapping based approach on the right
(6(b)); 6(c) compares the speed of convergence.
ourselves to problems scaled to [0, 1]
d
. Further tests
will show whether this limitation should be kept or
whether arbitrary domains perform equally good.
ACKNOWLEDGEMENTS
The Lower Saxony research network ’Smart Nord’
acknowledges the support of the Lower Sax-
ony Ministry of Science and Culture through the
Nieders
¨
achsisches Vorab grant programme (grant ZN
2764).
Constraint-handlingforOptimizationwithSupportVectorSurrogateModels-ANovelDecoderApproach
99
REFERENCES
Ben-Hur, A., Siegelmann, H. T., Horn, D., and Vapnik, V.
(2001). Support vector clustering. Journal of Machine
Learning Research, 2:125–137.
Blank, M., Gerwinn, S., Krause, O., and Lehnhoff, S.
(2011). Support vector machines for an efficient rep-
resentation of voltage band constraints. In Innovative
Smart Grid Technologies. IEEE PES.
Bremer, J., Rapp, B., and Sonnenschein, M. (2010). Sup-
port vector based encoding of distributed energy re-
sources’ feasible load spaces. In IEEE PES Confer-
ence on Innovative Smart Grid Technologies Europe,
Chalmers Lindholmen, Gothenburg, Sweden.
Bremer, J., Rapp, B., and Sonnenschein, M. (2011). Encod-
ing distributed search spaces for virtual power plants.
In IEEE Symposium Series in Computational Intelli-
gence 2011 (SSCI 2011), Paris, France.
Coello Coello, C. A. (2002). Theoretical and numerical
constraint-handling techniques used with evolutionary
algorithms: a survey of the state of the art. Com-
puter Methods in Applied Mechanics and Engineer-
ing, 191(11-12):1245–1287.
Guan, X., Zhai, Q., and Papalexopoulos, A. (2003). Op-
timization based methods for unit commitment: La-
grangian relaxation versus general mixed integer pro-
gramming. volume 2, page 1100 Vol. 2.
Hansen, N. (2006). The CMA evolution strategy: a compar-
ing review. In Lozano, J., Larranaga, P., Inza, I., and
Bengoetxea, E., editors, Towards a new evolutionary
computation. Advances on estimation of distribution
algorithms, pages 75–102. Springer.
Himmelblau, D. (1972). Applied nonlinear programming.
McGraw-Hill.
Juszczak, P., Tax, D., and Duin, R. P. W. (2002). Feature
scaling in support vector data description. In Depret-
tere, E., Belloum, A., Heijnsdijk, J., and van der Stap-
pen, F., editors, Proc. ASCI 2002, 8th Annual Conf.
of the Advanced School for Computing and Imaging,
pages 95–102.
Kamper, A. and Esser, A. (2009). Strategies for decen-
tralised balancing power. In A. Lewis, S. Mostaghim,
M. R., editor, Biologically-inspired Optimisation
Methods: Parallel Algorithms, Systems and Applica-
tions, number 210 in Studies in Computational Intel-
ligence, pages 261–289. Springer, Berlin, Heidelberg.
Karaboga, D. and Basturk, B. (2007). A powerful and
efficient algorithm for numerical function optimiza-
tion: artificial bee colony (ABC) algorithm. Journal
of Global Optimization, 39(3):459–471.
Kennedy, J. and Eberhart, R. (1995). Particle swarm op-
timization. In Neural Networks, 1995. Proceedings.,
IEEE International Conference on, volume 4, pages
1942–1948 vol.4. IEEE.
Kim, D. G. (1998). Riemann mapping based constraint han-
dling for evolutionary search. In SAC, pages 379–385.
Kok, K., Derzsi, Z., Gordijn, J., Hommelberg, M., Warmer,
C., Kamphuis, R., and Akkermans, H. (2008). Agent-
based electricity balancing with distributed energy re-
sources, a multiperspective case study. Hawaii Inter-
national Conference on System Sciences, 0:173.
Koziel, S. and Michalewicz, Z. (1999). Evolutionary al-
gorithms, homomorphous mappings, and constrained
parameter optimization. Evol. Comput., 7:19–44.
Kramer, O. (2010). A review of constraint-handling tech-
niques for evolution strategies. Appl. Comp. Intell.
Soft Comput., 2010:3:1–3:19.
Kwok, J. and Tsang, I. (2004). The pre-image problem in
kernel methods. Neural Networks, IEEE Transactions
on, 15(6):1517–1525.
Liepins, G. E. and Vose, M. D. (1990). Representational is-
sues in genetic optimization. Journal of Experimental
and Theoretical Artificial Intelligence, 2.
Michalewicz, Z. (1996). Genetic algorithms + data struc-
tures = evolution programs (3rd ed.). Springer-Verlag,
London, UK.
Michalewicz, Z. and Schoenauer, M. (1996). Evolution-
ary algorithms for constrained parameter optimization
problems. Evol. Comput., 4:1–32.
Mihailescu, R.-C., Vasirani, M., and Ossowski, S. (2011).
Dynamic coalition adaptation for efficient agent-based
virtual power plants. In Proceedings of the 9th
German conference on Multiagent system technolo-
gies, MATES’11, pages 101–112, Berlin, Heidelberg.
Springer-Verlag.
Mika, S., Sch
¨
olkopf, B., Smola, A., M
¨
uller, K. R., Scholz,
M., and R
¨
atsch, G. (1999). Kernel PCA and de-
noising in feature spaces. In Proceedings of the 1998
conference on Advances in neural information pro-
cessing systems II, pages 536–542, Cambridge, MA,
USA. MIT Press.
Pereira, J., Viana, A., Lucus, B., and Matos, M. (2008). A
meta-heuristic approach to the unit commitment prob-
lem under network constraints. International Journal
of Energy Sector Management, 2(3):449–467.
Ramchurn, S. D., Vytelingum, P., Rogers, A., and Jennings,
N. R. (2011). Agent-based control for decentralised
demand side management in the smart grid. In AA-
MAS, pages 5–12.
Richardson, J. T., Palmer, M. R., Liepins, G. E., and
Hilliard, M. R. (1989). Some guidelines for genetic
algorithms with penalty functions. In Proceedings
of the 3rd International Conference on Genetic Al-
gorithms, pages 191–197, San Francisco, CA, USA.
Morgan Kaufmann Publishers Inc.
Sch
¨
olkopf, B., Mika, S., Burges, C., Knirsch, P., M
¨
uller, K.-
R., R
¨
atsch, G., and Smola, A. (1999). Input space vs.
feature space in kernel-based methods. IEEE Trans-
actions on Neural Networks, 10(5):1000–1017.
Tax, D. M. J. and Duin, R. P. W. (1999). Data domain
description using support vectors. In ESANN, pages
251–256.
Thomas, B. (2007). Mini-Blockheizkraftwerke: Grundla-
gen, Ger
¨
atetechnik, Betriebsdaten. Vogel Buchverlag.
Tr
¨
oschel, M. and Appelrath, H.-J. (2009). Towards reac-
tive scheduling for large-scale virtual power plants. In
Braubach, L., van der Hoek, W., Petta, P., and Pokahr,
A., editors, MATES, volume 5774 of Lecture Notes in
Computer Science, pages 141–152. Springer.
ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence
100