Technical Aspects of XML Format – Case Study

Differences between Saving Data into Element and Attribute

Ondřej Bureš

Faculty of Informatics and Management, University of Hradec Králové, Rokitanského 62, Hradec Králové, Czech Republic

Keywords: XML, Data, Java, PHP, Visual Basic.

Abstract: XML technology is used for data transmission on daily basis. While processing huge files, every

millisecond per node can play its part in the process. In total time it can lead to extending the whole

procedure by minutes or even tens of minutes which can make it very ineffective in matter of time and cost.

Goal of this study is to put saving data into elements in contrast with saving into attributes of XML files and

compare final results. In order to receive the best overview, three applications in different selected

programming languages were tested and results were compared.

1 INTRODUCTION

XML data transmission is based on many steps. It

starts with creating data, which is mostly printed

directly from relational database Speed of this step is

limited only by actual database settings. Next step

means sending or downloading data file from

exporter to importer. Time spent on this step is

determined by connection speed therefore network

settings. In final destination file is processed by a

XML parser and then again sent into database or

directly to frontend of some web portal.

Let’s focus on data parsing. Assuming that XML

is valid (Grijzenhout and Marx, 2013) it can contain

basically unlimited count of nodes, a single

millisecond can cause a lot of time lapse during the

file processing. In case of 60 thousand nodes in file,

one millisecond per each can lead up to one minute

delay. If we had couple of this sized files, simple

math can tell us how big delay would we get during

parsing.

This consideration can make us wonder. What if

there is a difference in time required for parsing

elements and attributes which would cause delay

while reading big XML file containing hundreds

thousands nodes or even millions of them. Finding

out that there is difference in data saving approach

could save many resources for companies in which

XML data transmission is one of the key processes.

2 TESTING CONDITIONS

In order to be able to test our assumption, we need to

set up an environment with equal conditions. That

way we are able to get objective results. Nowadays

applications are written and created in many

different programming languages. It would be the

best to test them all, but in this phase of study we

will settle with sample of the most popular and the

most used languages.

2.1 Programming Languages

TIOBE programming community is doing monthly

statistics and creating list of most popular

programming languages (TIOBE Software BV,

2014). These statistics are based on number of

skilled engineers world-wide, courses and third party

vendors. Results are calculated using popular search

engines such as Google, Bing Yahoo!, Wikipedia

and many others.

It’s not a chart of either the best programming

languages or those in which most lines of code have

been written. Main purpose of this chart is overview

which should serve for programmers to check if

their skill are still up to date. In spite of the fact, we

will use this index to determine which languages are

the most commonly used for creating applications.

354

Bureš O..

Technical Aspects of XML Format – Case Study - Differences between Saving Data into Element and Attribute.

DOI: 10.5220/0005154503540359

In Proceedings of the International Conference on Knowledge Management and Information Sharing (KMIS-2014), pages 354-359

ISBN: 978-989-758-050-5

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

Table 1: TIOBE index for June 2014

(http://www.tiobe.com/index.php/content/paperinfo/tpci/in

dex.html).

# Language Rating

1 C 16.191%

2 Java 16.113%

3 Objective-C 10.934%

4 C++ 6.425%

5 C# 3.944%

6 (Visual) Basic 3.736%

7 PHP 2.848%

8 Python 2.710%

9 Javascript 2.000%

10 Visual Basic .NET 1.914%

Out of programming languages mentioned above,

this study includes testing in applications written in

three of them, which are Java, Visual Basic and

PHP. Unfortunately we don’t have C or any of its

modifications on the list, but for needs of case study

it is not so important.

Each programming language has different

conditions to be able to run it locally on machine

using Windows operating system.

In order to be able to run Java program, it is

important to have Java Development Kit (also

known as JDK) installed. This package is provided

directly on Oracle website.

Visual Basic requires Internet Informational

Services server (also known as IIS) which is usually

part of Windows OS default installation. All what

user needs is to enable this service on control panel

as a feature.

PHP programming language requires Apache

server to be installed. There are many solutions that

avoid whole process of installing and setting up

whole server, one of them is EasyPHP package that

was used during tests.

2.2 Testing XML File

It is important to give both elements and attributes

the same starting conditions in order to achieve

comparable values. As a starter, files for testing

elements and for testing attributes must be same

sized. Only that way we’d avoid time difference

caused by loading file.

Next important feature is string length. We need

to be sure to avoid time differences while loading

strings of different lengths.

We will achieve both objectives by using only

one file in which we will have both elements and

attributes while in one row we have same values for

both element and attribute.

<item param="32781.00">32781.00

</item>

Values are generated randomly so we would

avoid any caching issues. XML also needs to be

fully valid, so we will include proper header at the

beginning of the file:

<?xml version="1.0" encoding="utf-8" ?>

Next we will try to figure out if file size matters

during the whole process. That’s why three different

files were created. The first one contains one

hundred thousand records, second contains five

hundreds thousands and the third one contains one

million records.

Table 2: File sizes.

Records count File size (bytes)

100 000 3 898 783

500 000 19 206 579

1 000 000 38 495 351

2.3 Testing Environment

As last parameter we have to mention, that all three

languages were tested with the same computing

power. Only that way we are able to compare every

application with each other. Configuration of the

used computer is 2 GHz dual core Intel processor

and 2 GB memory operated by OS Windows 7

Professional edition.

3 TESTING APPLICATIONS

All of programming languages selected for testing

differences between parsing elements and attributes

have its own XML parser in default so there is no

need in installing any additional libraries.

3.1 Application in PHP

PHP in its default settings uses memory of only 128

megabytes, which is not enough for processing file

containing one million records. Therefore we need

to enlarge this parameter to 1024 megabytes using

designated command right inside the application:

ini_set("memory_limit","1024M")

TechnicalAspectsofXMLFormat-CaseStudy-DifferencesbetweenSavingDataintoElementandAttribute

355

PHP itself has XML parser called SimpleXML

and the source code for processing elements in the

file is following:

$source = simplexml_load_file

('export_1m.xml');

$items = $source->xpath

("/root/item");

foreach ($items as $item) {

$array[$i] = (string)$item;

$i++;

}

We are saving every line into array just to be sure

that this line is processed. Script for parsing data

from

param attribute looks very similar:

$source = simplexml_load_file

('export_1m.xml');

$items = $source->xpath

("/root/item");

foreach ($items as $item) {

$array[$i] = (string)$item['param'];

$i++;

}

Start time and end time of application is tracked

with function

microtime(), which is called before

loading XML file and after loading the last node in

XML file.

3.2 Application in Java

According to TIOBE index, this programming

language was the most popular world-wide until

year 2012. Java uses 256 megabytes of memory in

default, which is sufficient for only around 150

thousands records. Therefore it is also required to

enlarge memory limit up to 1024 megabytes to be

limitless in our testing using Xmx parameter.

Java has its default XML parser called XPath API

which is part of basic Java package since version 5,

but thanks to its popularity, wide variety of libraries

extending and improving work with XML files is

available all over community forums.

Source code of application processing elements is

following:

XPath xpath =

XPathFactory.newInstance().newXPath();

NodeList nodes = (NodeList)

xpath.evaluate("/root/item/text()";,

new InputSource('export_1m.xml'),

XPathConstants.NODESET);

int size = nodes.getLength();

String[] valueArr = new String[size];

for (int i = 0; i < size; i++) {

valueArr[i] =

nodes.item(i).getNodeValue();

}

And with slight modification we get source code

of application which parses attributes of given XML

file:

XPath xpath =

XPathFactory.newInstance().newXPath();

NodeList nodes = (NodeList)

xpath.evaluate("/root/item/@param", new

InputSource('export_1m.xml'),

XPathConstants.NODESET);

int size = nodes.getLength();

String[] valueArr = new String[size];

for (int i = 0; i < size; i++) {

valueArr[i] =

nodes.item(i).getNodeValue();

}

Duration of running application is in this case

monitored using function nanoTime() which is again

called twice, once before file is loaded and nodes

parsed and once after whole process is finished.

3.3 Application in Visual Basic

The youngest of all used languages is Visual Basic

developed by Microsoft Company. According to the

TIOBE index this language is losing its popularity

since year 2010.

Unlike the two already mentioned languages, this

one does not have any memory limitation in default,

so there is no need for initial settings modification.

Application used for parsing elements has following

source code:

xml.LoadXmlFile('export_1m.xml')

Dim item = xml.FirstChild()

While Not (item Is Nothing)

Dim value As String = item.Content

item = item.NextSibling()

End While

While again with slight modifications we get an

application which parses attributes from XML file.

xml.LoadXmlFile('export_1m.xml')

Dim item = xml.FirstChild()

While Not (item Is Nothing)

Dim value As String =

item.GetAttrValue("param")

item = item.NextSibling()

End While

KMIS2014-InternationalConferenceonKnowledgeManagementandInformationSharing

356

Time required for code execution is in case of

Visual Basic tracked with function

now().Ticks.

4 RESULTS

While running every application in testing mode,

each returned different time results for every try of

processing XML file. Because of this observation

every application ran 10 times for both cases

meaning parsing elements and attributes.

4.1 Results of PHP Application

For the first language we got quite interesting

results. Values are in milliseconds:

Table 3: Values collected from PHP application.

Element Attribute

100 000 397.7 468.3

500 000 1900.4 2278.5

1 000 000 3800.0 4633.6

From initial view we can see that element

processing hundred thousands of elements is 70

milliseconds faster than processing same number of

attributes. For a better overview data collected from

all three counts of nodes in XML file were counted

to hundred thousand and put into a graph.

Figure 1: Graph of values collected from PHP application.

On y axis duration of processing 100 000 elements, on x

axis number of nodes in tested file.

We can see that in PHP results are approximately

the same regardless number of nodes in the file, but

it is faster to process element that it is to process

attribute. While reading results from the table, it is

obvious that while parsing a million of nodes,

difference is almost one second.

4.2 Results of Java Application

While testing in Java, we got surprising results. First

we take a look on the table with averaged values.

Table 4: Values collected from Java application.

Element Attribute

100 000 2211.3 2346.8

500 000 30044.2 30362

1 000 000 114047.3 111374.8

Right now we can see that time required for

processing million records is extreme. It takes

almost two minutes and even processing of 100

thousands nodes takes much longer than it does in

PHP (almost 6x in numbers).

This is caused by constant calling garbage

collector even when memory limit is set higher than

default. If we tried only about two thousands nodes,

we would get almost the same time per node as for

one hundred thousand, but with higher greater

number of nodes time increases exponentially.

As mentioned during introducing Java

application, thanks to popularity of this language

there are alternatives for XML parsing. One of the

most suggested is VDT-XML library. In order to be

completely honest with all programming languages

and to get the best and the most reliable results, we

tried testing also using this library while getting

following results:

Table 5: Values collected from Java application using

VDT-XML library.

Element Attribute

100 000 460.9 455.7

500 000 1377.2 1395.9

1 000 000 2795.2 2555.1

We can see that time required for processing

improved a lot. Again we count collected values for

100 thousands nodes and put them into graph for

better overview.

Interesting fact is that the more nodes we have,

the faster processing time per node is. It means that

application written in Java takes significant time just

for opening and loading the file. This time is

constant and the more nodes file has, the lower

average value of this time is.

Unlike PHP application, in case of Java there is

no such big difference in time required for

processing element and for attribute. At count of 500

thousands the time is almost the same for both

(275.44 milliseconds per hundred thousand elements

100

200

300

400

500

Element

Attribute

TechnicalAspectsofXMLFormat-CaseStudy-DifferencesbetweenSavingDataintoElementandAttribute

357

Figure 2: Graph of values collected from Java application

using VDT-XML library. On y axis duration of processing

100 000 elements, on x axis number of nodes in tested file.

and 279.18 milliseconds per hundred thousand

attributes).

4.3 Results of Visual Basic Application

Using this programming language got us notable

results as well.

Table 6: Values collected from Visual Basic application.

Element Attribute

100 000 1161.7 1684.2

500 000 5820.0 8216.0

1 000 000 10398.6 15570.0

Just before using graphical data representation we

can see that processing XML in Visual Basic is the

slowest out of all tested languages. Now let’s take a

look on a graph.

Figure 3: Graph of values collected from Visual Basic

application. On y axis duration of processing 100 000

elements, on x axis number of nodes in tested file.

In this case it is obvious, that processing element

is much faster than processing attribute regardless

number of nodes inside the file. As well as in Java

application, average values are decreasing per

number of nodes inside XML file, which again

means that application written in Visual Basic also

needs some time for opening and loading whole file,

but it is not as remarkable as it was while using Java

application.

5 CONCLUSION

As we found out, there is almost no difference using

Java language and only a minor difference using

PHP in behalf of elements. While using these two

languages we don’t need to care that much about

whether to save data into elements or attributes.

On the other hand time difference while using

Visual Basic language is obvious. Processing

attribute takes much more time than processing

element, it is around 1/3 of total time regardless

count, which can lead up to big delay while

processing big file or more smaller files.

If we were exporters and were completely sure

about language used by data importer, we could take

different values per programming languages from

this study in consideration, but when it is very likely

that our data importers may vary in using

technology, it is best to provide them with most of

data saved into elements, because it can lead to

faster data processing, therefore better cooperation

between companies, at least in the area of data

transmission.

At this point it might be very interesting to do

such a research for other languages as well, at least

for the most popular modifications of C language to

find out whether results in Visual Basic are just

some anomaly or not.

ACKNOWLEDGEMENT

This work was supported by the project No.

CZ.1.07/2.2.00/28.0327 Innovation and support of

doctoral study program (INDOP), financed from EU

and Czech Republic funds.

REFERENCES

Bureš, O. (2014) ‘Comparing suggested approaches for

XML design with current situation in Czech Republic

100

200

300

400

500

Element

Attribute

200

400

600

800

1000

1200

1400

1600

1800

Element

Attribute

KMIS2014-InternationalConferenceonKnowledgeManagementandInformationSharing

358

(case study).‘ Proceedings of the 23rd International

Business Information Management Association 2014,

481-487

Department of Interior of Czech Republic (2009) The

methodology for creating XML schemas in

information systems of public administration.

Available at: http://www.mvcr.cz/clanek/metodika-

tvorby-xml-schemat-v-oblasti-informacnich-systemu-

verejne-spravy.aspx

Grijzenhout, S. and Marx, M. (2013) ‘The quality of the

XML Web.‘ Journal of web semantics, 19, 59-68

Ogbuji, Uche (2004) ‘Principles of XML design: When to

use elements versus attributes.’ Available at:

http://www.ibm.com/developerworks/xml/library/x-

eleatt/index.html

TIOBE Software BV (2014) TIOBE Index for June 2014.

Available at: http://www.tiobe.com/index.php/content/

paperinfo/tpci/index.html (Accessed: 20 June 2014)

Walmsley, Priscilla (2012) Definitive XML Schema.

Prentice Hall PTR

TechnicalAspectsofXMLFormat-CaseStudy-DifferencesbetweenSavingDataintoElementandAttribute

359