LINEAR PROGRAMMING FOR DATABASE ENVIRONMENT

Akira Kawaguchi and Jose Alfredo Perez

Department of Computer Sciences, The City College of New York, New York, New York 10031, U.S.A.

Keywords:

Linear programming, simplex method, revised simplex method, database, stored procedure.

Abstract:

Solving large-scale optimization problems requires an integration of data-analysis and data-manipulation capa-

bilities. Nevertheless, little attempt has been made to facilitate general linear programming solvers for database

environments. Dozens of sophisticated tools and software libraries that implement linear programming model

can be found. But, there is no database-embedded linear programming tool seamlessly and transparently uti-

lized for database processing. The focus of this study is to ﬁll out this kind of technical gap of data analysis

and data manipulation, in the event of solving large-scale linear programming problems for the applications

built on the database environment. Speciﬁcally, this paper studies the representation of the linear program-

ming model in relational structures and the computational method to solve the linear programming problems.

Foundations for and preliminary experimental results of this study are presented.

1 INTRODUCTION

Linear programming is a powerful technique for deal-

ing with the problem of allocating limited resources

among competing activities, as well as other prob-

lems having a similar mathematical formation (Win-

ston, 1994; Richard, 1991; Walsh, 1985). It has be-

come an important ﬁeld of optimization in the areas

of science and engineering. It has become a standard

tool of great importance for numerous business and

industrial organizations. In particular, large-scale lin-

ear programming problems arise in practical applica-

tions such as logistics for large spare-parts inventory,

revenue management and dynamic pricing, ﬁnance,

transportation and routing, network design, and chip

design (Hillier and Lieberman, 2001).

While these problems inevitably involve the anal-

ysis of a large amount of data, there has been rel-

atively little work in the context of database pro-

cessing. Little attempt has been made to facilitate

data-driven analysis with data-oriented techniques. In

today’s marketplace, dozens of sophisticated tools

and software libraries that implement linear program-

ming model can be found. Nevertheless, these prod-

ucts do not work with database systems seamlessly.

They rather require additional software layers built

on top of databases to extract and transfer data in the

databases. The focus of our study gathered here is

to ﬁll out this kind of technical gap of data analysis

and data manipulation, in the event of solving large-

scale linear programming problems for the applica-

tions built on the database environment.

In mathematics, linear programming problems are

optimization problems in which the objective func-

tion to characterize an optimality of a problem and

the constraints to express speciﬁc conditions for that

problem are all linear (Hillier and Lieberman, 2001;

Thomas H. Cormen and Stein, 2001). Two fam-

ilies of solution methods, so-called simplex meth-

ods (Dantzig, 1963) and interior-point methods (Kar-

markar, 1984), are in wide use and available as com-

puter programs today. Both methods progressively

improve series of trial solutions by visiting edges of

the feasible boundary or the points within the interior

of feasible region, until a solution is reached to satisfy

the conditions for an optimum. In fact, it is known

that large problem instances render even the best of

codes nearly unusable (Winston, 1994). Furthermore,

the program libraries available today are found out-

side the standard database environment, thus mandat-

186

Kawaguchi A. and Alfredo Perez J. (2007).

LINEAR PROGRAMMING FOR DATABASE ENVIRONMENT.

In Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics, pages 186-191

DOI: 10.5220/0001652701860191

 SciTePress

ing the use of a special interface to interact with these

tools for linear programming computations.

This paper presents a detailed account on the

methodology and technical issues to realize a gen-

eral linear programming method for the relational

database environment. In general, relational database

is not for matrix operations for solving linear pro-

gramming problems. Indeed, realizing matrix oper-

ations on top of the standard relational structure is

non-trivial. In this paper, implementation techniques

and key issues for this development are studied ex-

tensively, and a model suitable to capture the dynam-

ics of linear programming computations is incorpo-

rated into the aimed development, by way of realiz-

ing a set of procedural interfaces that enables a stan-

dard database language to deﬁne problems within a

database and to derive optimal solutions for those

problems without requiring users to write detailed

program statements.

Speciﬁcally, we develop a set of ready to use

stored procedures to solve general linear program-

ming problems. A stored procedure is a group of

SQL statements, precompiled and physically stored

within a database (Gulutzan and Pelzer, 1999; Gu-

lutzan, 2007). It forms a logical unit to encapsulate

a set of database operations, deﬁned with applica-

tion program interface to perform a particular task,

thereby having complex logic run inside the database

via a stored procedure. The exact implementation of

a stored procedure varies from one database to an-

other, but is supported by most major database ven-

dors. To this end, we will show an implementation

using MySQL open-source database system.

As a summary, contributions of this paper are

threefold: First, we present a detailed account on

the methodology and technical issues to integrate a

general linear programming method into relational

databases. Second, we present the development as

forms of stored procedures for today’s representative

database systems. Third, we present an experimental

performance study based on a comprehensive system

that implements all these concepts. This is to inves-

tigate applications of business decision problems to a

database environment for practical use.

2 FUNDAMENTALS

Consider the matrix notation expressed in the set of

equations (1) below. The standard form of the lin-

ear programming problem is to maximize an objective

function Z = c

x, subject to the functional constraints

of Ax ≤ b and non-negativity constraints of x ≥ 0,

with 0 in this case being the n-dimensional zero col-

umn vector. A coefﬁcient matrix A and column vec-

tors c, b, and x are deﬁned in the obvious manner such

that each component of the column vector Ax is less

than or equal to the corresponding component of the

column vector b.

x =













, c =













, b =













0 =













, A =







· · · a







(1)

The goal is to ﬁnd an optimal solution, that is, the

most favorable values of the objective function among

feasible ones for which all the constraints are satis-

ﬁed. The simplex method (Dantzig, 1963) is an alge-

braic iterative procedure where each round of compu-

tation involves solving a system of equations to ob-

tain a new trial solution for the optimality test. The

simplex method relies on the mathematical property

that the objective function’s maximum must occur on

a corner of the space bounded by the constraints of

the feasible region.

To apply the simplex method, linear programming

problems must be converted into augmented form, by

introducing non-negative slack variables to replace

non-equalities with equalities in the constraints. The

problem can then be rewritten on the following form:







n+1

n+2

n+m







, [A, I]





= b,





≥ 0,



1 −c

0 A I















(2)

In equations (2) above, x ≥ 0, a column vector of

slack variables x

≥ 0, and I is the m× m identity ma-

trix. Following the convention, the variables set to

zero by the simplex method are called nonbasic vari-

ables and the others are called basic variables. If all

of the basic variables are non-negative, the solution

is called a basic feasible solution. Two basic feasible

solutions are adjacent if all but one of their nonba-

sic variables are the same. The spirit of the simplex

method utilizes a rule for generating from any given

basic feasible solution a new one differing from the

old in respect of just one variable.

Thus, moving from the current basic feasible so-

lution to an adjacent one involves switching one vari-

LINEAR PROGRAMMING FOR DATABASE ENVIRONMENT

187

able from nonbasic to basic and vice versa for one

other variable. This movement involves replacing one

nonbasic variable (called entering basic variable) by

a new one (called leaving basic variable) and iden-

tifying the new basic feasible solution. The simplex

algorithm is summarized as follows:

Simplex Method:

1. Initialization: transform the given problem into an

augmented form, and select original variables to

be the nonbasic variables (i.e., x = 0), and slack

variable to be the basic variables (i.e., x

= b).

2. Optimality test: rewrite the objective function by

shifting all the nonbasic variables to the right-

hand side, and see if the sign of the coefﬁcient of

every nonbasic variable is positive, in which case

the solution is optimal.

3. Iterative Step:

3.1 Selecting an entering variable: as the nonba-

sic variable whose coefﬁcient is largest in the

rewritten objective function used in the opti-

mality test.

3.2 Selecting a leaving variable: as the basic vari-

able that reaches zero ﬁrst when the entering

basic variable is increased, that is, the basic

variable with the smallest upper bound.

3.3 Compute a new basic feasible solution: by ap-

plying the Gauss-Jordan method of elimination,

and apply the above optimality test.

2.1 Revised Simplex Method

The computation of the simplex method can be im-

proved by reducing the number of arithmetic opera-

tions as well as the amount of round-off errors gen-

erated from these operations (Hillier and Lieberman,

2001; Richard, 1991; Walsh, 1985). Notice that n

nonbasic variables from among the n + m elements of



, x



are always set to zero. Thus, eliminating

these n variables by equating them to zero leaves a set

of m equations in m unknowns of the basic variables.

The spirit of the revised simplex method (Hillier and

Lieberman, 2001; Winston, 1994) is to preserve the

only pieces of information relevant at each iteration,

which consist of the coefﬁcients of the nonbasic vari-

ables in the objective function, the coefﬁcients of the

entering basic variable in the other equations, and the

right-hand side of the equations.

Speciﬁcally, consider the equations (3) below.

The revised method attempts to derive a basic (square)

matrix B of size m × m by eliminating the columns

corresponding to coefﬁcients of nonbasic variables

from [A, I] in equations (2). Furthermore, let c

the vector obtained by eliminating the coefﬁcients of

nonbasic variables from



, 0



and reordering the

elements to match the order of the basic variables.

Then, the values of the basic variables become B

−1

and Z = c

−1

b. The equations (2) become equiv-

alent with equations (3) after any iteration of the sim-

plex method.

B =







· · · B









1 c

−1

A − c

−1

0 B

−1

A B

−1













−1



(3)

This means that only B

−1

needs to be derived to be

able to calculate all the numbers used in the simplex

method from the original parameters of A, b, c

—

providing efﬁciency and numerical stability.

2.2 Relational Representation

A relational model provides a single way to repre-

sent data as a two-dimensional table or a relation. An

n-ary relation being a subset of the Cartesian prod-

uct of n domains has a collection of rows called tu-

ples. Implementing the simplex and revised simplex

methods must locate the exact position of the values

for the equations and variables of the linear program-

ming problem to be solved. However, the position

of the tuples in the table is not relevant in the rela-

tional model. Tuple ordering and matrix handling are

beyond the standard relational features, and these are

the important issues addressed to implement the lin-

ear programming solver using the simplex method.

Simplex calculations are most conveniently per-

formed with the help of a series of tables known as

simplex tableaux (Dantzig, 1963; Hillier and Lieber-

man, 2001). A simplex tableaux is a table that con-

tains all the information necessary to move from one

iteration to another while performing the simplex

method. Let x

be a column vector of m basic vari-

ables obtained by eliminating the nonbasic variables

from x and x

. Then, the initial tableaux can be ex-

pressed as,



Z 1 −c

0 0

0 A I b



(4)

The algebraic treatment based on the revised sim-

plex method (Hillier and Lieberman, 2001; William

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

188

H. Press and Flannery, 2002) derives the values at any

iteration of the simplex method as,



Z 1 c

−1

A − c

−1

0 B

−1

A B

−1



(5)

For the matrices expressed (4) and (5), the ﬁrst two

column elements do not need to be stored into per-

sistent memory. Thus, the simplex tableaux can be a

table using the rest of the three column elements in

the relational model. Speciﬁcally, a linear program-

ming problem in the augmented form (equations (2))

can be seen as a relation:

tableaux(id, x

, x

, · · · x

, rhs)

Every variable of the constraints and the objective

function becomes an attribute of the relation, together

with the right hand that becomes the rhs column

on the table. The id column serves a key that can

uniquely determine every variable of the constraints

and of the objective function. When the LP problem

in augmented form is inputted into the database, ev-

ery constraint is identiﬁed by a unique integer value

going from 1 to n in the id column, where n is the

number of constraints for the problem plus the objec-

tive function. Thus by applying relational operations,

it is feasible to know the position of every constraint

and variable for a linear programming problem, and

to proceed with the matrix operations necessary to im-

plement the simplex algorithm.

3 SYSTEM DEVELOPMENT

The availability of real-time databases capable of ac-

cepting and solving linear programming problems

helps us examine the effectiveness and practical us-

ability in integrating linear programming tools into

database environment. Towards this end, a general

linear programming solver is developed on top of

the de facto standard database environment, with the

combination of a PHP application for the front-end

and a MySQL application for the back-end. Note that

the implementation of this linear programming solver

is strictly within the database technology, not relying

on any outside programming language.

The PHP front-end enables the user to input the

number of variables and number of constraints of the

linear programming problem to solve. With these

values, it generates a dynamic Web interface to ac-

cept the values of the objective function and the val-

ues of the constraining equations. The Web inter-

face also allows the user to upload a ﬁle in a MPS

(Mathematical Programming System) format that de-

ﬁnes a linear programming problem. The MPS ﬁle

format serves a standard for describing and archiving

linear programming and mixed integer programming

problems (Organization, 2007). A special program is

built to convert MPS data format into SQL statements

for populating an linear programming instance—the

main objective of this development is to obtain bench-

mark performances for large-scale linear program-

ming problems.

The MySQL back-end performs iterative compu-

tations by the use of a set of stored procedures pre-

compiled and integrated into the database structure.

The system encapsulates an API for processing a sim-

plex method that requires the execution of several

SQL queries to produce a solution.

The output of the system is shown in Figure 1,

in which each table represents the tableaux contain-

ing the values resulted at each iteration. The system

presents successive transformations and optimal solu-

tion if it exists.

Figure 1: Simplex method iterations and optimal solution.

3.1 Stored Procedure Implementation

Stored procedures can have direct accesses to the data

in the database, and run their steps directly and en-

tirely within the database. The complex logic runs

inside the database engine, thus faster in processing

requests because numerous context switches and a

great deal of network trafﬁc can be eliminated. The

database system only needs to send the ﬁnal results

back to the user, doing away with the overhead of

communicating potentially large amounts of interim

LINEAR PROGRAMMING FOR DATABASE ENVIRONMENT

189

Table 1: Experimental set and measured execution time.

name m n nonzeros optimal value time standard deviation

ADLITTLE 57 97 465 2.2549496316E+05 1 min. 25 sec. 2.78 sec.

AFIRO 28 32 88 -464.7531428596 35 sec. 1.67 sec.

BLEND 75 83 521 -3.0812149846E+01 1 min. 5 sec. 3.20 sec.

BRANDY 221 249 2150 1.5185098965E+03 2 min. 50 sec. 4.25 sec.

data back and forth (Gulutzan, 2007; Gulutzan and

Pelzer, 1999).

Stored procedures are supported by most DBMSs,

but there is a fair amount of variation in their syn-

tax and capabilities even their internal effects are al-

most invisible. Our development uses MySQL ver-

sion 5.0.22 at the time of this paper writing (as for

MySQL version 5, stored procedures are supported).

The next code listing is the stored procedure used to

create the table to store the linear programming prob-

lem to be solved by the application. The ﬁrst part

of the stored procedure consists of the prototype of

the function and the declaration of the variables to be

used in the procedure.

DELIMITER $$

DROP PROCEDURE IF EXISTS

‘lpsolver‘.‘createTable‘ $$

CREATE PROCEDURE ‘lpsolver‘.‘createTable‘

(constraints INT, variables INT)

BEGIN

DECLARE i INT;

DECLARE jiterator VARCHAR(50);

DECLARE statement VARCHAR(1000);

DROP TABLE IF EXISTS tableaux;

Because of the dynamic nature of the calculations

for solving linear programming problems, our stored

procedure relies on the extensive use of prepared SQL

statements. In the next code block, the SQL statement

to create a table is generated on the ﬂy, based on the

number of variables and constraints of the problem to

solve. The generated procedure is then passed to the

database for execution.

SET statement = ’CREATE TABLE

tableaux(id INT(5) PRIMARY KEY, ’;

SET i = 1;

table_loop:LOOP

IF i > constraints + variables + 1 THEN

LEAVE table_loop;

END IF;

SET jiterator = CONCAT(’j’,i);

SET statement = CONCAT(statement,

jiterator);

SET statement = CONCAT(statement,

’ DOUBLE DEFAULT 0’);

IF i <= constraints + variables THEN

SET statement = CONCAT(statement, ’, ’);

END IF;

SET i = i + 1;

END LOOP table_loop;

SET statement = CONCAT(statement, ’)’);

SET @sql_call = statement;

PREPARE s1 FROM @sql_call;

EXECUTE s1;

DEALLOCATE PREPARE s1;

END $$

DELIMITER ;

3.2 Experimental Results

To see the effectiveness of the implementation, vari-

ous linear programming problems were selected from

commonly available Netlib linear programming li-

brary (Organization, 2007). As one case, see Table 1

for a sufﬁciently large problem set. The values m and

n indicate the size, m × n, of the coefﬁcient matrix A

in equations (1), or equivalently, m is the number of

constraints and n is the number of decision variables.

All experiments were performed by an Intel 586

based standalone machine with 1.2 GHz CPU and 512

MB memory that running MySQL 5.0.22. The data

values were extracted from Netlib MPS ﬁles to pop-

ulate the problems into the database prior to run the

simplex method. The time measured does not include

this data preparation process, but only the execution

of the stored procedure to produce a solution. The

time listed in the following table is the average of ten

executions of each problem. The set of results shown

in Table 1 are based on the implementation of the re-

vised simplex method contained in the stored proce-

dures.

One limiting factor is the fact that MySQL allows

to have up to 1000 columns on a table. Given that

our implementation is based on mapping of a simplex

tableaux into a database relation, the number of vari-

ables plus the number of constraints cannot exceed the

number of columns allowed for a MySQL table. This

prohibited us from testing the problems in the Netlib

library that exceed the column size of 1000. Finally,

we observed one problem when trying to ﬁnd optimal

solutions for larger problems with higher numbers of

columns, variables and zero elements. The computa-

tion never came to an end, indicating that the prob-

lem had become unbounded, which can be attributed

to the tableaux becoming ill-conditioned as a con-

sequence of truncation errors resulted from repeated

matrix operations.

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

190

4 RELATED WORK

A vast amount of effort for the establishment of the-

ory and practice is observed today. Certain spe-

cial cases of linear programming, such as network

ﬂow problems and multi-commodity ﬂow problems

are considered important enough to have generated

much research on specialized algorithms for their so-

lution (Winston, 1994; Thomas H. Cormen and Stein,

2001; Hillier and Lieberman, 2001). A number of

algorithms for other types of optimization problems

work by solving linear programming problems as sub-

problems. Historically, ideas from linear program-

ming have inspired many of the central concepts of

optimization theory, such as duality, decomposition,

and the importance of convexity and its generaliza-

tions (Hillier and Lieberman, 2001).

There are approaches considered to ﬁt a linear pro-

gramming model, such as integer programming and

nonlinear programming (Alexander, 1998; Richard,

1991; Hillier and Lieberman, 2001). But, our re-

search focuses on the area of iterative methods for

solving linear systems. Some of the most signiﬁcant

contributions and the chain of contributions building

on each other are summarized in (Saad and van der

Vorst, 2000), especially a survey of the transition

from simplex methods to interior-point methods is

presented in (Wang, 99). In terms of implementa-

tion techniques, the work of (Morgan, 1976; Shamir,

1987) provided us with introductory sources for ref-

erence. There are online materials such as (Optimiza-

tion Technology Center and Laboratory, 2007; Orga-

nization, 2007) to help us understand the details and

plan for experimental design.

5 CONCLUSION

The subject of this research is to respond a lack of

database tools for solving a linear programming prob-

lem deﬁned within a database.

We described the aim and approach for integrat-

ing a linear programing method into today’s database

system, with our goal in mind to establish a seamless

and transparent interface between them. As demon-

strated, this is feasible by the use of stored procedures,

the emerging database programming standard that al-

lows for complex logic to be embedded as an API in

the database, thus simplifying data management and

enhancing overall performance.

We implemented a general linear programming

solver on top of the PHP and MySQL software layers.

The experiments with several benchmark problems

extracted from the Netlib library showed its correct

optimal solutions and basic performance measures.

The contents of this paper are the ﬁrst dissemina-

tion of our upcoming series of publications—we plan

to extend this research into several directions. First,

the limiting factor of MySQL’s table column size for

the implementation needs to be addressed. Also, the

code must be optimized to reduce the overall execu-

tion time. Second, more experiments must be done

to collect additional performance measures. Further-

more, other commercial databases such as Oracle

should be included for comparative study, and non-

linear and other optimization methods should also be

explored.

REFERENCES

Alexander, S. (1998). Theory of Linear and Integer Pro-

gramming. John Wiley & Sons, New York, NY.

Dantzig, G. B. (1963). Linear Programming and Exten-

sions. Princeton University Press, Princeton, N.J.

Gulutzan, P. (2007). MySQL 5.0 New Features: Stored Pro-

cedures. MySQL AB, http://www.mysql.com.

Gulutzan, P. and Pelzer, T. (1999). SQL-99 Complete, Re-

ally. CMP Books.

Hillier, F. S. and Lieberman, G. J. (2001). Introduction to

Operations Research. McGraw-Hill, 8th edition.

Karmarkar, N. K. (1984). A new polynomial-time algorithm

for linear programming and extensions. Combinator-

ica, 4:373–395.

Morgan, S. S. (1976). A comparison of simplex method

algorithms. Master’s thesis, University of Florida.

Optimization Technology Center, N. U. and Laboratory,

A. N. (2007). The linear programming frequently

asked questions.

Organization, T. N. (2007). The netlib repository at utk and

ornl.

Richard, B. D. (1991). Introduction To Linear Program-

ming: Applications and Extensions. Marcel Dekker,

New York, NY.

Saad, Y. and van der Vorst, H. (2000). Iterative solution of

linear systems in the 20-th century. JCAM.

Shamir, R. (1987). The efﬁciency of the simplex method: a

survey. Manage. Sci., 33(3):301–334.

Thomas H. Cormen, Charles E. Leiserson, R. L. R. and

Stein, C. (2001). Introduction to Algorithms, Chap-

ter29: Linear Programming. MIT Press and McGraw-

Hill, 2nd edition.

Walsh, G. R. (1985). An Introduction to Linear Program-

ming. John Wiley & Sons, New York, NY.

Wang, X. (99). From simplex methods to interior-point

methods: A brief survey on linear programming al-

gorithms.

William H. Press, Saul A. Teukolsky, W. T. V. and Flannery,

B. P. (200 2). Numerical Recipes in C++: The Art of

Scientiﬁc Computing. Cambridge University.

Winston, W. L. (1994). Operations Research, Applications

and Algorithms. Duxbury Press.

LINEAR PROGRAMMING FOR DATABASE ENVIRONMENT

191