Implementation of Heuristic Principles for Index Recommendations

using Java in the MariaDB Database

Radityo P. Wibowo

, Risa Perdana Sujanawati

, Faizal Mahananto

and Prasasti Karunia Farista

Ananto

Information System Departement, Faculty of Information and Communication Technology, Institut Teknologi Sepuluh

Nopember (ITS)

Keywords:

Index, Database Performance, Java, MariaDB.

Abstract:

Almost all ﬁelds in life today use database and demanding the most optimal performance from a database.

Therefore, a database must have the ability to optimize performance that can be obtained one of them by

choosing an optimal index because the index plays an important role in determining the performance of queries

that are executed against the system. The right index can increase speed in data processing. According to

previous research, Index can improve database performance, namely the number of transactions per minute

which increases three times and the cost is reduced by about 50% to 90%. That way index selection is a

dilemma for database users. The output of the following research is a Java-based software that can be used in

the MariaDB Database to obtain recommended indexes based on heuristic principles that have been used by

previous researchers. The following software can evaluate database performance based on response time and

Queries per Second or QPS. The recommendations from the following research can improve the performance

of the Monitordom database by 43.65% and the Adventureworks 2012 database to 2.77%.

1 INTRODUCTION

The database is an integral part of our daily lives that

we use unconsciously (Connolly and Begg, 2005).

Database performance is often an important measure-

ment related to user satisfaction. The performance of

a database depends on the logical structure, physical

structure and design of the database. So that the better

the logical structure, physical structure and database

design, the better the performance of the database

(Osman and Knottenbelt, 2012). Almost all ﬁelds

in life today have used databases and demanded the

most optimal performance from a database. There-

fore, a database must have the ability to optimize per-

formance that can be obtained, one of which is by se-

lecting an optimal index.

Index is an orderly arrangement that is logically

used to sort rows in a table (Connolly and Begg,

2005). index plays an important role in determin-

ing the performance of queries executed against the

system. (Chaudhuri et al., 2004). The right index

can increase speed in data processing (Ameri et al.,

2015). Misunderstanding often occurs, the more in-

dexes the better, where it can increase database main-

tenance such as storage and does not provide optimal

database performance results (Winand, 2012). The

more indexes the more storage will be used, and this

will have a negative effect on the performance of the

database. The solution to overcome this is to choose

the optimal index as proven in a database performance

measured by transaction events per minute, can in-

crease up to three times and can reduce the cost of

a workload of 50% to 90% (Pedrozo and Vaz, 2014)

(Valentin et al., 2000), so the index selection with a

certain method needs to be done to get optimal per-

formance improvement.

There are several index selection methods accord-

ing to previous research, one of which is by using an

algorithm developed based on the heuristic method

and using the na

ıve probability method in the algo-

rithm (S. Choenni, 1993) (Chaudhuri and Narasayya,

1997) (Chaudhuri et al., 2004). Almost all research

for index recommendations is to use the heuristic

method principle by developing an algorithm that is

oriented towards optimal results.

Some DBMS have provided a default feature

for index recommendations including SQL server

(Chaudhuri et al., 2004). Various studies on index

optimization with several other DBMS cases have

also been developed such as AISIO for PostgreSQL

Wibowo, R., Sujanawati, R., Mahananto, F. and Ananto, P.

Implementation of Heuristic Principles for Index Recommendations using Java in the MariaDB Database.

DOI: 10.5220/0009906200920097

In Proceedings of the International Conferences on Information System and Technology (CONRIST 2019), pages 92-97

ISBN: 978-989-758-453-4

 2020 by SCITEPRESS – Science and Technology Publications, Lda. All r ights reserved

8.4 (Pedrozo and Vaz, 2014), other recommendation

index studies make MySQL 5.5 a trial and imple-

mentation. Even no exception for NoSQL databases

such as MongoDB has become the object of research

for index recommendations using mining algorithms

(Ameri et al., 2015). However, open source such

as Maria DB does not yet have a default feature in-

dex recommendation and there has not been much re-

search on recommendations with the MariaDB case

study. One way to overcome the problem of index se-

lection on MariaDB can be developed by implement-

ing heuristic principles using Java.

Java is an object-based programming language

(Hermawan, 2004). Java is also multiplatform which

can be interpreted for various operating systems so

that java will be used by various operating system

users. Java has a pattern matcher method such as Reg-

ular Expression which can ﬁlter the SQL you want to

optimize.

This research is intended to make a software as

a java-based index recommendation by implementing

a heuristic method that has an index selection algo-

rithm as previously done ((Chaudhuri et al., 2004)but

what differs in the following research is in the index

selection process using the General log or can be in

the form of query as input and provide an output in

the form of a recommended column name as an index

as a result of the selection process carried out in the

Maria DB database.

2 BACKGROUND

2.1 Index Selection

Logical and description, which is designed to meet

the information needs of an organization (Connolly

and Begg, 2005). Meanwhile, to control the database

users can use a DBMS that is the Database Manage-

ment System. (Connolly and Begg, 2005). According

to Conolly, Database has three design phases namely

Conceptual, Logical, and Physical. While the tools of

this research are in the scope of the Physical Design

phase where the output is intended to help the phys-

ical design database process. Performance measure-

ment from the database can be done by assessing sev-

eral aspects according to winand, as follows (Winand,

2012): Data Volume, System Load, and Response

Time and Throughput. In this study will use measure-

ments in the form of response time and throughput

which can be interpreted as Query per Second (QPS)

that calculate from average query execution in itera-

tion of 60 seconds.

Index is a data structure that organizes data

records in storage to optimize certain types of re-

trieval operations (Raghu and Gehrke, 2004). The

right index can increase speed in data processing

(Ameri et al., 2015). The basis of the algorithm in

the following research is the AISIO heuristic principle

(Pedrozo and Vaz, 2014) where in the selection of the

index has four main processes but there are some de-

tails that are still not very clear, such as when making

single and multi-level-index candidates so that they

can use some other research references such as the in-

dex selection process on SQL server (Chaudhuri and

Narasayya, 1997) is by doing permutations for col-

umn names, and obtaining heuristics that are in ac-

cordance with MariaDB and applied in the following

research:

1. Parsing and Filter Queries. To get the initial in-

dex candidates in the form of the names of the

columns contained in the input as did all research

on the previous index recommendations.

2. Retrieving previous indexes. The following is an

addition to this study because in previous stud-

ies the existing index was rarely considered, even

though the index could be an optimal index.

3. Identifying index Conﬁguration. After getting an

index candidate in the form of a column name

the next is to make an index conﬁguration or in

the form of a set of indexes based on a combina-

tion of column names so that a multi-level-index

will appear like the main heuristic and making this

conﬁguration adheres to the details of making the

conﬁguration set in research on sql server (Chaud-

huri and Narasayya, 1997) which found that the

optimal combination is with the value j = 2, the

combination is concerned with sequences so that

it is done by permutation.

4. Conﬁguration cost evaluation Cost evaluation is

an evaluation activity with measurements in the

form of QPS or Response time which is per-

formed to ﬁnd the best performance of all index

conﬁgurations prior to enumeration as in previous

studies (Chaudhuri and Narasayya, 1997).

5. Conﬁguration enumeration. After getting a set of

index conﬁgurations with the best performance,

the enumeration was done as in the research for

sql server (Chaudhuri and Narasayya, 1997), but

the selection of the best enumeration candidates

was not based on the most ﬁt storage but used

the selection factor by selecting the candidates

who most often appeared in the workload based

on other studies (Ameri, 2016). This selection

factor in other references is considered a selec-

tivity factor calculated using the na

ıve probabil-

Implementation of Heuristic Principles for Index Recommendations using Java in the MariaDB Database

ity (Choenni, Blanken and Chang, 1993). To ob-

tain better correlation probabilities this research

enhances the selectivity factor using na

ıve bayes

probability.

6. Generate recommended index query After get-

ting the selected index results, the recommenda-

tion will be generated to be made to create an in-

dex query that is ready to be implemented in the

database.

2.2 Data Sample

The data used in the following research is Monitor-

dom and Adventureworks 2012 which represent two

different database types, namely a database with a

simple table structure condition and an existing index

only as a foreign key, and other database conditions

that have a complex table structure with a number of

tables of 70 and more than one supporting index for

each.

3 METHODOLOGY

Adopting previous heuristic algorithm method of this

research is in Figure 1. Index Recommendation and

more detail explanation is conducted in this section

3.1 Settings

When making the initial encoding settings done with

java and using the library in the form of keneksi

owned by MariaDB with java which can be obtained

on the ofﬁcial MariaDB website openly (MariaDB,

2017). Next is to prepare the page for log and query

input because the database admin can only choose one

of these inputs.

The main purpose of this coding is to be able to

connect to the database and determine the parsing

steps to be carried out and starting from here is the

application of the heuristic algorithm (Chaudhuri and

Narasayya, 1997) from previous studies to the process

of selecting selectivity factors.

space

Figure 1: Index Recommendation

3.2 Parsing and Taking Index

Beforehand

The process of retrieving column and table names

when parsing is done using the JSqlParser library that

is obtained openly on the GitHub forum (Binus uni-

versity, 2016). Meanwhile, to display the previous

index, a query is called which calls the index from

informatics schema from the database.

3.3 Create Conﬁguration Index and

Selecting the Performance of the

Best Conﬁguration

ow to make conﬁguration is to combine several col-

umn names that appear and the possibility of the index

that occurs per conﬁguration can be calculated with

the following equation (1). Number of Conﬁguration

(Raghu and Gehrke, 2004):

= 1

(n! − j!)

(1)

Where n is the number of columns that are index

candidates and j is the number of different columns

that you want to index. Due to the combination of

index conﬁgurations that can be very numerous, as

in the study of chauduri (Chaudhuri and Narasayya,

1997) that the most effective number of column com-

binations in an index is 2 so j = 2. Cost Evalua-

tion itself is calculated based on Response time and

throughput in the form of Queries per Second.

3.4 Conﬁguration Enumeration

The enumeration conﬁguration that is dividing the in-

dex candidates from the index conﬁguration into units

CONRIST 2019 - International Conferences on Information System and Technology

and selecting candidate units using the na

ıve bayes

method as in equation (2). Selectivity factor:

S f (I j|S) =

P(S|I j)P(I j)

P(S)

(2)

Where S is the number of query statements con-

tained in the workload and Ij is the probability of

the index candidate appearing in the workload against

other index candidates. Index candidates with the

highest Sf will be the index to be recommended

(Ameri, 2016). But beforehand it was determined in

advance the type of cluster or non-clustered it.

3.5 Create Index Recommendation

Query

Last is to display the index recommendation in the

form of a query that has been made based on the re-

sults of the index recommendation. In addition to

displaying the recommended index, this section will

show the results of increasing the percentage of per-

formance by calculating in equation (3). QPS Eval-

uation and equation (4). Response time Evaluation.

The two assessments of improvement are different be-

cause for QPS the more the better and the less the Re-

sponse time the better.

QPS =

QPS(rec) − QP(initial)

QPS(initial)x100%

(3)

Responsetime(RT ) =

RT (rec)− RT (initial)

RT (initial)x100%

(4)

4 RESULT AND DISCUSSION

4.1 Implementation Environment

Implementation and result of this research depended

on the environment since performance was inﬂu-

enced by hardware. This research was developed

and implemented in the environment type Sony

SVF14319SGB, Processor Intel Core i5, RAM 8GB,

Hard Disk Drive 1000G. And software environment

in this research is MariaDB 10.1.19 as Database and

NetBeans IDE 8.2 for development process of appli-

cation.

4.2 Monitordom Result

The index recommendation system uses response

time, each query executed is iterated 50 times to get

the stability of the average response time. Stability is

tested by testing several times with the same environ-

mental conditions. With 50 iterations, the system can

provide consistent recommendations. Result of re-

sponse time increase in average up to 30.84%. Ressult

from monitordom in Figure 2 shows using this rec-

ommendation index, the responsetime decrease from

average 0.020 to 0.015 second.

Queries in Monitordom are executed in only one

table using similar pattern that is using type ‘SE-

LECT’ and operator like ‘GROUP BY’, ‘ORDER

BY’, ‘DISTINC’, and ‘COUNT’. Result of Query

Persecond (QPS) in this database sample give signiﬁ-

cant number increase after recommendation as shows

in Figure 3.

Figure 2: Responsetime result from Monitordom

Figure 3: QPS Result Monitordom.

4.3 Adventureworks2013 Result

Testing conducted with same method as database

sample before in Adventureworks2012 resulting in-

crease on average response time 1.26% and detail

each test shows in Figure 2. Tests also carried out us-

ing QPS measurements performance increase on av-

erage is 9% in queries that executed in Adventure-

Works2012 that detailed in Figure 5. The queries is

quite different from Monitordom because it has more

complex operator and executed in seven tables that

representing sales and marketing activities. The in-

signiﬁcant increase even decrease performance is due

to AdventureWorks is a database that has an index that

is in accordance with their needs. This is also evi-

denced in the AdventureWorks 2012 Data Dictionary

documentation (sourceforge, 2009) regarding which

indexes are already owned. Performance also found

decreases in one of the response time test because

the new index created does not actually help search

Implementation of Heuristic Principles for Index Recommendations using Java in the MariaDB Database

but burdens database work because the Adventure-

works2012 database already has an index with col-

umn names as recommended by the following soft-

ware.

Figure 4: Responsetime result from AdventureWorks

Figure 5: QPS Result AdventureWorks.

5 CONCLUSIONS

This section explains what the conclusion that was

made from this research and author’s suggestion for

further research. After conducting the implementa-

tion and testing of the research ”Implementation of

Heuristic Principles for Index Recommendations us-

ing Java on MariaDB” using the exact environment

explained, it can be concluded that performance of a

database according to response time using the index

of recommendations from the following studies can

increase by around 1.26% in database that already in-

dexed before and up to 30.84% in database that has

not indexed before. according to QPS can increase

by around 9% in database that already indexed before

and up to four times faster in database that has not

indexed before.

From the two sample databases used for imple-

mentation, it can be concluded that the selection

of indexes with the following heuristic principles

works better on databases that do not yet have in-

dexes supporting operational activities and query type

‘SELECT’ that used operator like ‘’ORDER BY’,

‘GROUB BY’, ‘DISTINC’ and ‘COUNT’.

This study also found that the measurement of the

response value of a query that was executed had the

best value for 50 times. The results of these values

are taken from the results of the experiment for sev-

eral times to get consistent recommendations, so that

with consistent recommendations it can be said that

the average response time has been stable.

Author’s suggestions that can be done for further

research are develop an application by adding features

such as combining QPS and response time assess-

ments for example by using linear regression equa-

tions or other metrics assessments. Pay attention to

the database structure such as the storage engine used

in order to determine recommendations in more detail

such as the type of index cluster or non-cluster.

Implement other algorithms in the selection of in-

dex recommendations such as Genetic algorithm to

get better performance. This research could be ref-

erence to perform index selection algorithms imple-

mentation in other programming languages or other

database objects.

Algorithm improvement also might improve by

applying the conﬁguration of the table name not only

conﬁguring column name in the step of creating can-

didate of index recommendations.

REFERENCES

Ameri, P. (2016). On a self-tuning index recommendation

approach for databases. In 2016 IEEE 32nd Inter-

national Conference on Data Engineering Workshops

(ICDEW), pages 201–205. IEEE.

Ameri, P., Meyer, J., and Streit, A. (2015). On a new ap-

proach to the index selection problem using mining

algorithms. In 2015 IEEE International Conference

on Big Data (Big Data), pages 2801–2810. IEEE.

Binus university, W. (2016). Pengertian methode, class dan

objek dalam oop.

Chaudhuri, S., Datar, M., and Narasayya, V. (2004). Index

selection for databases: A hardness study and a princi-

pled heuristic solution. IEEE Transactions on Knowl-

edge and Data Engineering, 16(11):1313–1323.

Chaudhuri, S. and Narasayya, V. R. (1997). An efﬁ-

cient, cost-driven index selection tool for microsoft sql

server. In VLDB, volume 97, pages 146–155. Citeseer.

Connolly, T. M. and Begg, C. E. (2005). Database systems:

a practical approach to design, implementation, and

management. Pearson Education.

Hermawan, B. (2004). Menguasai java 2 & object oriented

programming. Yogyakarta: Penerbit Andi.

MariaDB (2017). Mariadb connector/j 2.0.3 stable.

CONRIST 2019 - International Conferences on Information System and Technology

Osman, R. and Knottenbelt, W. J. (2012). Database sys-

tem performance evaluation models: A survey. Per-

formance evaluation, 69(10):471–493.

Pedrozo, W. G. and Vaz, M. S. M. G. (2014). A tool for au-

tomatic index selection in database management sys-

tems. In 2014 International Symposium on Computer,

Consumer and Control, pages 1061–1064. IEEE.

Raghu, R. and Gehrke, J. (2004). Sistem manajemen

database edisi 3.

S. Choenni, H. B. o. T. C. (1993). Index selection relational

database.

sourceforge (2009). Adventureworks database for mysql.

Valentin, G., Zuliani, M., Zilio, D. C., Lohman, G., and

Skelley, A. (2000). Db2 advisor: An optimizer smart

enough to recommend its own indexes. In Proceed-

ings of 16th International Conference on Data Engi-

neering (Cat. No. 00CB37073), pages 101–110. IEEE.

Winand, M. (2012). SQL performance explained: ev-

erything developers need to know about SQL perfor-

mance;[covers all major SQL databases]. M. Winand.

Implementation of Heuristic Principles for Index Recommendations using Java in the MariaDB Database