LINEAR PROGRAMMING FOR DATABASE ENVIRONMENT
Akira Kawaguchi and Jose Alfredo Perez
Department of Computer Sciences, The City College of New York, New York, New York 10031, U.S.A.
Keywords:
Linear programming, simplex method, revised simplex method, database, stored procedure.
Abstract:
Solving large-scale optimization problems requires an integration of data-analysis and data-manipulation capa-
bilities. Nevertheless, little attempt has been made to facilitate general linear programming solvers for database
environments. Dozens of sophisticated tools and software libraries that implement linear programming model
can be found. But, there is no database-embedded linear programming tool seamlessly and transparently uti-
lized for database processing. The focus of this study is to fill out this kind of technical gap of data analysis
and data manipulation, in the event of solving large-scale linear programming problems for the applications
built on the database environment. Specifically, this paper studies the representation of the linear program-
ming model in relational structures and the computational method to solve the linear programming problems.
Foundations for and preliminary experimental results of this study are presented.
1 INTRODUCTION
Linear programming is a powerful technique for deal-
ing with the problem of allocating limited resources
among competing activities, as well as other prob-
lems having a similar mathematical formation (Win-
ston, 1994; Richard, 1991; Walsh, 1985). It has be-
come an important field of optimization in the areas
of science and engineering. It has become a standard
tool of great importance for numerous business and
industrial organizations. In particular, large-scale lin-
ear programming problems arise in practical applica-
tions such as logistics for large spare-parts inventory,
revenue management and dynamic pricing, finance,
transportation and routing, network design, and chip
design (Hillier and Lieberman, 2001).
While these problems inevitably involve the anal-
ysis of a large amount of data, there has been rel-
atively little work in the context of database pro-
cessing. Little attempt has been made to facilitate
data-driven analysis with data-oriented techniques. In
today’s marketplace, dozens of sophisticated tools
and software libraries that implement linear program-
ming model can be found. Nevertheless, these prod-
ucts do not work with database systems seamlessly.
They rather require additional software layers built
on top of databases to extract and transfer data in the
databases. The focus of our study gathered here is
to fill out this kind of technical gap of data analysis
and data manipulation, in the event of solving large-
scale linear programming problems for the applica-
tions built on the database environment.
In mathematics, linear programming problems are
optimization problems in which the objective func-
tion to characterize an optimality of a problem and
the constraints to express specific conditions for that
problem are all linear (Hillier and Lieberman, 2001;
Thomas H. Cormen and Stein, 2001). Two fam-
ilies of solution methods, so-called simplex meth-
ods (Dantzig, 1963) and interior-point methods (Kar-
markar, 1984), are in wide use and available as com-
puter programs today. Both methods progressively
improve series of trial solutions by visiting edges of
the feasible boundary or the points within the interior
of feasible region, until a solution is reached to satisfy
the conditions for an optimum. In fact, it is known
that large problem instances render even the best of
codes nearly unusable (Winston, 1994). Furthermore,
the program libraries available today are found out-
side the standard database environment, thus mandat-
186
Kawaguchi A. and Alfredo Perez J. (2007).
LINEAR PROGRAMMING FOR DATABASE ENVIRONMENT.
In Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics, pages 186-191
DOI: 10.5220/0001652701860191
Copyright
c
SciTePress
ing the use of a special interface to interact with these
tools for linear programming computations.
This paper presents a detailed account on the
methodology and technical issues to realize a gen-
eral linear programming method for the relational
database environment. In general, relational database
is not for matrix operations for solving linear pro-
gramming problems. Indeed, realizing matrix oper-
ations on top of the standard relational structure is
non-trivial. In this paper, implementation techniques
and key issues for this development are studied ex-
tensively, and a model suitable to capture the dynam-
ics of linear programming computations is incorpo-
rated into the aimed development, by way of realiz-
ing a set of procedural interfaces that enables a stan-
dard database language to define problems within a
database and to derive optimal solutions for those
problems without requiring users to write detailed
program statements.
Specifically, we develop a set of ready to use
stored procedures to solve general linear program-
ming problems. A stored procedure is a group of
SQL statements, precompiled and physically stored
within a database (Gulutzan and Pelzer, 1999; Gu-
lutzan, 2007). It forms a logical unit to encapsulate
a set of database operations, defined with applica-
tion program interface to perform a particular task,
thereby having complex logic run inside the database
via a stored procedure. The exact implementation of
a stored procedure varies from one database to an-
other, but is supported by most major database ven-
dors. To this end, we will show an implementation
using MySQL open-source database system.
As a summary, contributions of this paper are
threefold: First, we present a detailed account on
the methodology and technical issues to integrate a
general linear programming method into relational
databases. Second, we present the development as
forms of stored procedures for today’s representative
database systems. Third, we present an experimental
performance study based on a comprehensive system
that implements all these concepts. This is to inves-
tigate applications of business decision problems to a
database environment for practical use.
2 FUNDAMENTALS
Consider the matrix notation expressed in the set of
equations (1) below. The standard form of the lin-
ear programming problem is to maximize an objective
function Z = c
T
x, subject to the functional constraints
of Ax b and non-negativity constraints of x 0,
with 0 in this case being the n-dimensional zero col-
umn vector. A coefficient matrix A and column vec-
tors c, b, and x are defined in the obvious manner such
that each component of the column vector Ax is less
than or equal to the corresponding component of the
column vector b.
x =
x
1
x
2
.
.
.
x
n
, c =
c
1
c
2
.
.
.
c
n
, b =
b
1
b
2
.
.
.
b
m
,
0 =
0
0
.
.
.
0
, A =
a
11
a
12
· · · a
1n
a
21
a
22
· · · a
2n
.
.
.
.
.
.
.
.
.
a
m1
a
m2
· · · a
mn
(1)
The goal is to find an optimal solution, that is, the
most favorable values of the objective function among
feasible ones for which all the constraints are satis-
fied. The simplex method (Dantzig, 1963) is an alge-
braic iterative procedure where each round of compu-
tation involves solving a system of equations to ob-
tain a new trial solution for the optimality test. The
simplex method relies on the mathematical property
that the objective function’s maximum must occur on
a corner of the space bounded by the constraints of
the feasible region.
To apply the simplex method, linear programming
problems must be converted into augmented form, by
introducing non-negative slack variables to replace
non-equalities with equalities in the constraints. The
problem can then be rewritten on the following form:
x
s
=
x
n+1
x
n+2
.
.
.
x
n+m
, [A, I]
x
x
s
= b,
x
x
s
0,
1 c
T
0
0 A I
Z
x
x
s
=
0
b
(2)
In equations (2) above, x 0, a column vector of
slack variables x
s
0, and I is the m× m identity ma-
trix. Following the convention, the variables set to
zero by the simplex method are called nonbasic vari-
ables and the others are called basic variables. If all
of the basic variables are non-negative, the solution
is called a basic feasible solution. Two basic feasible
solutions are adjacent if all but one of their nonba-
sic variables are the same. The spirit of the simplex
method utilizes a rule for generating from any given
basic feasible solution a new one differing from the
old in respect of just one variable.
Thus, moving from the current basic feasible so-
lution to an adjacent one involves switching one vari-
LINEAR PROGRAMMING FOR DATABASE ENVIRONMENT
187
able from nonbasic to basic and vice versa for one
other variable. This movement involves replacing one
nonbasic variable (called entering basic variable) by
a new one (called leaving basic variable) and iden-
tifying the new basic feasible solution. The simplex
algorithm is summarized as follows:
Simplex Method:
1. Initialization: transform the given problem into an
augmented form, and select original variables to
be the nonbasic variables (i.e., x = 0), and slack
variable to be the basic variables (i.e., x
s
= b).
2. Optimality test: rewrite the objective function by
shifting all the nonbasic variables to the right-
hand side, and see if the sign of the coefficient of
every nonbasic variable is positive, in which case
the solution is optimal.
3. Iterative Step:
3.1 Selecting an entering variable: as the nonba-
sic variable whose coefficient is largest in the
rewritten objective function used in the opti-
mality test.
3.2 Selecting a leaving variable: as the basic vari-
able that reaches zero first when the entering
basic variable is increased, that is, the basic
variable with the smallest upper bound.
3.3 Compute a new basic feasible solution: by ap-
plying the Gauss-Jordan method of elimination,
and apply the above optimality test.
2.1 Revised Simplex Method
The computation of the simplex method can be im-
proved by reducing the number of arithmetic opera-
tions as well as the amount of round-off errors gen-
erated from these operations (Hillier and Lieberman,
2001; Richard, 1991; Walsh, 1985). Notice that n
nonbasic variables from among the n + m elements of
x
T
, x
s
T
T
are always set to zero. Thus, eliminating
these n variables by equating them to zero leaves a set
of m equations in m unknowns of the basic variables.
The spirit of the revised simplex method (Hillier and
Lieberman, 2001; Winston, 1994) is to preserve the
only pieces of information relevant at each iteration,
which consist of the coefficients of the nonbasic vari-
ables in the objective function, the coefficients of the
entering basic variable in the other equations, and the
right-hand side of the equations.
Specifically, consider the equations (3) below.
The revised method attempts to derive a basic (square)
matrix B of size m × m by eliminating the columns
corresponding to coefficients of nonbasic variables
from [A, I] in equations (2). Furthermore, let c
B
T
be
the vector obtained by eliminating the coefficients of
nonbasic variables from
c
T
, 0
T
T
and reordering the
elements to match the order of the basic variables.
Then, the values of the basic variables become B
1
b
and Z = c
B
T
B
1
b. The equations (2) become equiv-
alent with equations (3) after any iteration of the sim-
plex method.
B =
B
11
B
12
· · · B
1m
B
21
B
22
· · · B
2m
.
.
.
.
.
.
.
.
.
B
m1
B
m2
· · · B
mm
,
1 c
B
T
B
1
A c
T
c
B
T
B
1
0 B
1
A B
1
Z
x
x
s
=
c
B
T
B
1
b
B
1
b
(3)
This means that only B
1
needs to be derived to be
able to calculate all the numbers used in the simplex
method from the original parameters of A, b, c
B
providing efficiency and numerical stability.
2.2 Relational Representation
A relational model provides a single way to repre-
sent data as a two-dimensional table or a relation. An
n-ary relation being a subset of the Cartesian prod-
uct of n domains has a collection of rows called tu-
ples. Implementing the simplex and revised simplex
methods must locate the exact position of the values
for the equations and variables of the linear program-
ming problem to be solved. However, the position
of the tuples in the table is not relevant in the rela-
tional model. Tuple ordering and matrix handling are
beyond the standard relational features, and these are
the important issues addressed to implement the lin-
ear programming solver using the simplex method.
Simplex calculations are most conveniently per-
formed with the help of a series of tables known as
simplex tableaux (Dantzig, 1963; Hillier and Lieber-
man, 2001). A simplex tableaux is a table that con-
tains all the information necessary to move from one
iteration to another while performing the simplex
method. Let x
B
be a column vector of m basic vari-
ables obtained by eliminating the nonbasic variables
from x and x
s
. Then, the initial tableaux can be ex-
pressed as,
Z 1 c
T
0 0
x
B
0 A I b
(4)
The algebraic treatment based on the revised sim-
plex method (Hillier and Lieberman, 2001; William
ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics
188
H. Press and Flannery, 2002) derives the values at any
iteration of the simplex method as,
Z 1 c
B
T
B
1
A c
T
c
B
T
B
1
c
B
T
B
1
b
x
B
0 B
1
A B
1
B
1
b
(5)
For the matrices expressed (4) and (5), the first two
column elements do not need to be stored into per-
sistent memory. Thus, the simplex tableaux can be a
table using the rest of the three column elements in
the relational model. Specifically, a linear program-
ming problem in the augmented form (equations (2))
can be seen as a relation:
tableaux(id, x
1
, x
2
, · · · x
n
, rhs)
Every variable of the constraints and the objective
function becomes an attribute of the relation, together
with the right hand that becomes the rhs column
on the table. The id column serves a key that can
uniquely determine every variable of the constraints
and of the objective function. When the LP problem
in augmented form is inputted into the database, ev-
ery constraint is identified by a unique integer value
going from 1 to n in the id column, where n is the
number of constraints for the problem plus the objec-
tive function. Thus by applying relational operations,
it is feasible to know the position of every constraint
and variable for a linear programming problem, and
to proceed with the matrix operations necessary to im-
plement the simplex algorithm.
3 SYSTEM DEVELOPMENT
The availability of real-time databases capable of ac-
cepting and solving linear programming problems
helps us examine the effectiveness and practical us-
ability in integrating linear programming tools into
database environment. Towards this end, a general
linear programming solver is developed on top of
the de facto standard database environment, with the
combination of a PHP application for the front-end
and a MySQL application for the back-end. Note that
the implementation of this linear programming solver
is strictly within the database technology, not relying
on any outside programming language.
The PHP front-end enables the user to input the
number of variables and number of constraints of the
linear programming problem to solve. With these
values, it generates a dynamic Web interface to ac-
cept the values of the objective function and the val-
ues of the constraining equations. The Web inter-
face also allows the user to upload a file in a MPS
(Mathematical Programming System) format that de-
fines a linear programming problem. The MPS file
format serves a standard for describing and archiving
linear programming and mixed integer programming
problems (Organization, 2007). A special program is
built to convert MPS data format into SQL statements
for populating an linear programming instance—the
main objective of this development is to obtain bench-
mark performances for large-scale linear program-
ming problems.
The MySQL back-end performs iterative compu-
tations by the use of a set of stored procedures pre-
compiled and integrated into the database structure.
The system encapsulates an API for processing a sim-
plex method that requires the execution of several
SQL queries to produce a solution.
The output of the system is shown in Figure 1,
in which each table represents the tableaux contain-
ing the values resulted at each iteration. The system
presents successive transformations and optimal solu-
tion if it exists.
Figure 1: Simplex method iterations and optimal solution.
3.1 Stored Procedure Implementation
Stored procedures can have direct accesses to the data
in the database, and run their steps directly and en-
tirely within the database. The complex logic runs
inside the database engine, thus faster in processing
requests because numerous context switches and a
great deal of network traffic can be eliminated. The
database system only needs to send the final results
back to the user, doing away with the overhead of
communicating potentially large amounts of interim
LINEAR PROGRAMMING FOR DATABASE ENVIRONMENT
189
Table 1: Experimental set and measured execution time.
name m n nonzeros optimal value time standard deviation
ADLITTLE 57 97 465 2.2549496316E+05 1 min. 25 sec. 2.78 sec.
AFIRO 28 32 88 -464.7531428596 35 sec. 1.67 sec.
BLEND 75 83 521 -3.0812149846E+01 1 min. 5 sec. 3.20 sec.
BRANDY 221 249 2150 1.5185098965E+03 2 min. 50 sec. 4.25 sec.
data back and forth (Gulutzan, 2007; Gulutzan and
Pelzer, 1999).
Stored procedures are supported by most DBMSs,
but there is a fair amount of variation in their syn-
tax and capabilities even their internal effects are al-
most invisible. Our development uses MySQL ver-
sion 5.0.22 at the time of this paper writing (as for
MySQL version 5, stored procedures are supported).
The next code listing is the stored procedure used to
create the table to store the linear programming prob-
lem to be solved by the application. The first part
of the stored procedure consists of the prototype of
the function and the declaration of the variables to be
used in the procedure.
DELIMITER $$
DROP PROCEDURE IF EXISTS
‘lpsolver‘.‘createTable‘ $$
CREATE PROCEDURE ‘lpsolver‘.‘createTable‘
(constraints INT, variables INT)
BEGIN
DECLARE i INT;
DECLARE jiterator VARCHAR(50);
DECLARE statement VARCHAR(1000);
DROP TABLE IF EXISTS tableaux;
Because of the dynamic nature of the calculations
for solving linear programming problems, our stored
procedure relies on the extensive use of prepared SQL
statements. In the next code block, the SQL statement
to create a table is generated on the fly, based on the
number of variables and constraints of the problem to
solve. The generated procedure is then passed to the
database for execution.
SET statement = ’CREATE TABLE
tableaux(id INT(5) PRIMARY KEY, ’;
SET i = 1;
table_loop:LOOP
IF i > constraints + variables + 1 THEN
LEAVE table_loop;
END IF;
SET jiterator = CONCAT(’j’,i);
SET statement = CONCAT(statement,
jiterator);
SET statement = CONCAT(statement,
DOUBLE DEFAULT 0’);
IF i <= constraints + variables THEN
SET statement = CONCAT(statement, ’, ’);
END IF;
SET i = i + 1;
END LOOP table_loop;
SET statement = CONCAT(statement, ’)’);
SET @sql_call = statement;
PREPARE s1 FROM @sql_call;
EXECUTE s1;
DEALLOCATE PREPARE s1;
END $$
DELIMITER ;
3.2 Experimental Results
To see the effectiveness of the implementation, vari-
ous linear programming problems were selected from
commonly available Netlib linear programming li-
brary (Organization, 2007). As one case, see Table 1
for a sufficiently large problem set. The values m and
n indicate the size, m × n, of the coefficient matrix A
in equations (1), or equivalently, m is the number of
constraints and n is the number of decision variables.
All experiments were performed by an Intel 586
based standalone machine with 1.2 GHz CPU and 512
MB memory that running MySQL 5.0.22. The data
values were extracted from Netlib MPS files to pop-
ulate the problems into the database prior to run the
simplex method. The time measured does not include
this data preparation process, but only the execution
of the stored procedure to produce a solution. The
time listed in the following table is the average of ten
executions of each problem. The set of results shown
in Table 1 are based on the implementation of the re-
vised simplex method contained in the stored proce-
dures.
One limiting factor is the fact that MySQL allows
to have up to 1000 columns on a table. Given that
our implementation is based on mapping of a simplex
tableaux into a database relation, the number of vari-
ables plus the number of constraints cannot exceed the
number of columns allowed for a MySQL table. This
prohibited us from testing the problems in the Netlib
library that exceed the column size of 1000. Finally,
we observed one problem when trying to find optimal
solutions for larger problems with higher numbers of
columns, variables and zero elements. The computa-
tion never came to an end, indicating that the prob-
lem had become unbounded, which can be attributed
to the tableaux becoming ill-conditioned as a con-
sequence of truncation errors resulted from repeated
matrix operations.
ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics
190
4 RELATED WORK
A vast amount of effort for the establishment of the-
ory and practice is observed today. Certain spe-
cial cases of linear programming, such as network
flow problems and multi-commodity flow problems
are considered important enough to have generated
much research on specialized algorithms for their so-
lution (Winston, 1994; Thomas H. Cormen and Stein,
2001; Hillier and Lieberman, 2001). A number of
algorithms for other types of optimization problems
work by solving linear programming problems as sub-
problems. Historically, ideas from linear program-
ming have inspired many of the central concepts of
optimization theory, such as duality, decomposition,
and the importance of convexity and its generaliza-
tions (Hillier and Lieberman, 2001).
There are approaches considered to fit a linear pro-
gramming model, such as integer programming and
nonlinear programming (Alexander, 1998; Richard,
1991; Hillier and Lieberman, 2001). But, our re-
search focuses on the area of iterative methods for
solving linear systems. Some of the most significant
contributions and the chain of contributions building
on each other are summarized in (Saad and van der
Vorst, 2000), especially a survey of the transition
from simplex methods to interior-point methods is
presented in (Wang, 99). In terms of implementa-
tion techniques, the work of (Morgan, 1976; Shamir,
1987) provided us with introductory sources for ref-
erence. There are online materials such as (Optimiza-
tion Technology Center and Laboratory, 2007; Orga-
nization, 2007) to help us understand the details and
plan for experimental design.
5 CONCLUSION
The subject of this research is to respond a lack of
database tools for solving a linear programming prob-
lem defined within a database.
We described the aim and approach for integrat-
ing a linear programing method into today’s database
system, with our goal in mind to establish a seamless
and transparent interface between them. As demon-
strated, this is feasible by the use of stored procedures,
the emerging database programming standard that al-
lows for complex logic to be embedded as an API in
the database, thus simplifying data management and
enhancing overall performance.
We implemented a general linear programming
solver on top of the PHP and MySQL software layers.
The experiments with several benchmark problems
extracted from the Netlib library showed its correct
optimal solutions and basic performance measures.
The contents of this paper are the first dissemina-
tion of our upcoming series of publications—we plan
to extend this research into several directions. First,
the limiting factor of MySQLs table column size for
the implementation needs to be addressed. Also, the
code must be optimized to reduce the overall execu-
tion time. Second, more experiments must be done
to collect additional performance measures. Further-
more, other commercial databases such as Oracle
should be included for comparative study, and non-
linear and other optimization methods should also be
explored.
REFERENCES
Alexander, S. (1998). Theory of Linear and Integer Pro-
gramming. John Wiley & Sons, New York, NY.
Dantzig, G. B. (1963). Linear Programming and Exten-
sions. Princeton University Press, Princeton, N.J.
Gulutzan, P. (2007). MySQL 5.0 New Features: Stored Pro-
cedures. MySQL AB, http://www.mysql.com.
Gulutzan, P. and Pelzer, T. (1999). SQL-99 Complete, Re-
ally. CMP Books.
Hillier, F. S. and Lieberman, G. J. (2001). Introduction to
Operations Research. McGraw-Hill, 8th edition.
Karmarkar, N. K. (1984). A new polynomial-time algorithm
for linear programming and extensions. Combinator-
ica, 4:373–395.
Morgan, S. S. (1976). A comparison of simplex method
algorithms. Master’s thesis, University of Florida.
Optimization Technology Center, N. U. and Laboratory,
A. N. (2007). The linear programming frequently
asked questions.
Organization, T. N. (2007). The netlib repository at utk and
ornl.
Richard, B. D. (1991). Introduction To Linear Program-
ming: Applications and Extensions. Marcel Dekker,
New York, NY.
Saad, Y. and van der Vorst, H. (2000). Iterative solution of
linear systems in the 20-th century. JCAM.
Shamir, R. (1987). The efficiency of the simplex method: a
survey. Manage. Sci., 33(3):301–334.
Thomas H. Cormen, Charles E. Leiserson, R. L. R. and
Stein, C. (2001). Introduction to Algorithms, Chap-
ter29: Linear Programming. MIT Press and McGraw-
Hill, 2nd edition.
Walsh, G. R. (1985). An Introduction to Linear Program-
ming. John Wiley & Sons, New York, NY.
Wang, X. (99). From simplex methods to interior-point
methods: A brief survey on linear programming al-
gorithms.
William H. Press, Saul A. Teukolsky, W. T. V. and Flannery,
B. P. (200 2). Numerical Recipes in C++: The Art of
Scientific Computing. Cambridge University.
Winston, W. L. (1994). Operations Research, Applications
and Algorithms. Duxbury Press.
LINEAR PROGRAMMING FOR DATABASE ENVIRONMENT
191