AGGREGATED ACCOUNTING OF MEMORY USAGE IN JAVA

Paul Bouch´e

Nokia Siemens Networks, An den Treptowers 1, Berlin, Germany

Martin von L¨owis

Hasso-Plattner-Institute, Potsdam, Germany

Peter Tr¨oger

Humbolt University, Berlin, Germany

Keywords:

Software engineering, Algorithms and data structures, Software testing and maintenance.

Abstract:

Proﬁling of application memory consumption typically includes a trade-off between overhead and accuracy.

We present a new approach for memory usage accounting which has a comparatively low overhead and still

provides meaningful results. Our approach considers the structure of modern applications by introducing

the notion of memory accounts where application modules get “charged” for memory allocations. We have

applied this approach to Java application servers and discuss important implementation aspects as well as

experimental results of our prototype.

1 INTRODUCTION

Memory consumption is often a bottleneck of large

object-oriented applications. System operators can

often merely observe the total amount of memory

consumed and have to cope with ever-increasing de-

mand for more main memory. Various layers of ab-

stractions, such as communication middleware, XML

processing, and persistency layers, contribute to the

system’s intricacy from a memory management point

of view: users of upper layers often have no knowl-

edge what amount of memory is allocated by what

speciﬁc operation. We have primarily experimented

with Java application servers, which are a common

source of these issues today. Thus, we will draw our

examples from that domain, although we believe that

the results are valid for any Java application, and can

be applied to other object-oriented systems as well.

In order to deal with the complexity of the appli-

cations, analysis of resource consumption is an im-

portant issue. Performance indices must be related to

speciﬁc parts of the applications in order to identify

relevant points for optimization. The according tools

are typically called proﬁlers. A particular category of

such tools are memory proﬁlers.

Many memory proﬁlers today have one major

ﬂaw, which is the correlation of a speciﬁc memory al-

location and the responsible piece of source code. In

cases where memory proﬁlers are able to report such

information, they usually cause a very high runtime

and memory overhead due to the continuous storage

of stack trace information for each object allocation.

Large software is usually organized into modules.

A module encapsulates a set of tasks sharing a com-

mon goal such as SOAP message processing, servlet

containment or business logic implementation. In

Java, modules can be identiﬁed and structured with

varying degrees of abstraction, e.g. on the class level,

the package level, or the Java archive (jar) level. At-

tributing resource allocation costs to the correct mod-

ule of an application is in all cases an important task,

because the allocation of one object may cause subse-

quent allocations of other objects.

Furthermore, the calling of method

a()

in mod-

ule A may cause the execution of another method

b()

in module B. Both methods might cause object alloca-

tion and therefore memory allocation to be accounted.

It might be “unfair” to attribute the cost to the initial

call since it originated in another module.

As an example, consider an application server

177

Bouché P., von Löwis M. and Tröger P. (2009).

AGGREGATED ACCOUNTING OF MEMORY USAGE IN JAVA.

In Proceedings of the 4th International Conference on Software and Data Technologies, pages 177-185

DOI: 10.5220/0002253701770185

 SciTePress

hosting a Web service implementation. This appli-

cation uses a third party library for XML processing.

If all memory allocation costs that are caused by the

XML processing are charged to the server implemen-

tation, the usage of a different XML processing li-

brary might wrongly indicate a better or worse per-

formance of the server implementation. A misuse of

the XML parser library implementation leading to a

memory leak should not be accounted to the library

itself, but to the originating source of the allocation

request.

Therefore we deﬁned the principal-agent rela-

tionship as one module “asking” another module di-

rectly or indirectly to perform a memory allocation,

whereas traditional proﬁlers capture the full stack

trace for an allocation which contains information

about principal-agent relationships. We introduce a

novel scheme to perform the relevant accounting of

memory allocation capturing only principal-agent re-

lationships without unnecessary information and un-

necessary effort spent.

The rest of this paper is structured as follows. In

the next section we give a short explanation about

the problem at hand. The following section 3 intro-

duces the principal-agent relationship. In section 4

we present the concept of the memory account in de-

tail and give reasons for its necessity and advantages.

Section 5 covers the implementation strategy of our

memory proﬁler. It is followed by section 6 where an

experimental evaluation using standard benchmarks

of the proﬁler is portrayed. The paper concludes with

a discussion of related work, section 7, and closing

remarks in the last section 8. Source code examples

can be found in the appendix.

2 STATEMENT OF THE

PROBLEM

To perform an analysis of memory consumption (i.e.

memory proﬁling) in an object-oriented language, it

is necessary to keep track of object creation and de-

struction, and possibly also to keep track of how ob-

ject references pass through the system. Memory pro-

ﬁling can focus on various aspects, such as frequency

of allocations, redundant allocations, etc.

We focus on ”garbage” objects, i.e. objects that

are not any longer used. In Java, many of these ob-

jects will be automatically released by the garbage

collector. Unfortunately, the garbage collector can-

not determine whether objects are unused, but only

whether they are unreferenced. A common pitfall in

Java and similar systems is that objects remain ref-

erenced even though the software developer believes

that the last reference to the object should have been

released. Even in cases where such references get

released eventually, they may consume a signiﬁcant

amount of memory over some period of time.

Our objective is to detect such cases and to help

developers and operators to adjust the system appro-

priately. For this analysis we have to determine three

pieces of information:

• How many objects of what type are still allocated?

• Why had the objects been allocated originally?

• Why are they still referenced?

From this list we only support the ﬁrst two aspects.

We expect that users study the total number of objects

per type, and the amount of memory that these objects

consume, and then compare the numbers with their

expectations. If they ﬁnd that there are more objects

of a certain type than they had expected, they will next

need to ﬁnd out where they came from. Once they

have found out why the objects got allocated in the

ﬁrst place, they can then study why they had not been

released.

It is important to note that the ﬁrst two aspects

in the above list can be represented in an aggregate

manner. For the total number of objects and the to-

tal amount of memory the approach to aggregation is

obvious. For the second question, we found a way of

computing an aggregated number. For the last ques-

tion, aggregated answers are typically not possible: to

ﬁnd out why a speciﬁc object is still referenced, one

needs to ﬁnd the speciﬁc container object (or objects)

that still holds a reference. There are various debug-

ging techniques availableto ﬁnd such ”backwards ref-

erences”; this issue is out of scope of our research.

3 PRINCIPALS AND AGENTS

To answer the second question, various proﬁling tools

record the complete stack trace at the point of object

allocation (Pauw et al., 1999; Dmitriev, 2003; Pearce

et al., 2006), making it easier to investigate the con-

ditions under which the allocation had originally oc-

curred, even after the methods performing the alloca-

tion have already completed. Of course, recording the

stack trace does not allowone to replay the full system

state at the point of allocation, as access to various

global and instance variables may have contributed

to the parameters of the object allocation; these data

might have changed at a later replay. The fact that

tools often record the call stack indicates an impor-

tant aspect of the problem: To understand an object’s

role, it is often sufﬁcient to know the place in the code

ICSOFT 2009 - 4th International Conference on Software and Data Technologies

178

where it was created – access to the ﬁelds of the ob-

ject at the point of creation is often not necessary. At

the same time, analysis of the existing tools

demon-

strates that mere recording of the code line containing

the

new

-expression is considered insufﬁcient – devel-

opers need to inspect the call stack at allocation time

to see which of the callers ”actually” caused the allo-

cation to happen.

Deﬁnition 1. An agent module is a module that per-

forms the allocation of an object on behalf of another

object, the principal module.

Within an application server the principal may be

the server implementation and the agent may be a cer-

tain servlet implementation. The servlet in turn may

be a principal for the XML processing library.

An agent which allocated some object for some

principal might itself act as the principal with respect

to another agent module, in the context of the allo-

cation of another object, so the relationship between

principals and agents is deﬁned in the context of the

allocation of a single object.

Notice that the principal-agent relationship is not

necessarily an instance of an immediate caller-callee

relationship. Instead, there might be several interme-

diary modules which delegate the object creation to

another, until eventually agent code is invoked.

4 MEMORY ACCOUNTS

The principal-agent relationship, as described in the

last section, denotes a situation where one module

commissions another module to perform some task

and the commissioned module allocates additional

memory to perform this task. In order to align mem-

ory consumption to modules, we deﬁne:

Deﬁnition 2. A module is accounted if it is marked

for memory proﬁling during a program run.

Deﬁnition 3. A memory account represents the

amount of all memory allocated within an accounted

module. An object is allocated within a module if the

module is accounted and no other intermediary mod-

ule is accounted.

Imagine the scenario of a Web service request pro-

cessing within a Java EE application server. A SOAP

request is received on a TCP network socket. The

contained message payload is extracted and passed

to the servlet container. The container determines

the responsible servlet implementation and relays the

SOAP request to an instance of this servlet. In order

See section 7 for details

to provide a Java representation of the incoming pack-

age, it calls the currently registered XML processor

for parsing the message’s contents. Also the servlet

container itself and the application server networking

stack process the message’s XML information, since

relevant SOAP header entries might need to be con-

sidered. Typical examples are security or routing in-

formation.

In this particular example, several principal-agent

relationships can be identiﬁed. In all cases, the XML

parser allocated Java objects for the representation of

XML data. The incoming package triggers memory

allocation in the application server, which itself indi-

rectly triggers memory allocation by the servlet im-

plementation.

Different levels of abstraction of memory ac-

counts can be chosen for this example. The network

core and the servlet container can be mapped to one

memory account. Thus, allocations of the whole ap-

plication server implementation and a servlet imple-

mentation can be cleanly separated. This is especially

useful if one wants to detect possible memory leaks in

a servlet implementation, regardless of the XML pro-

cessing library or the application server. Each time

the execution enters the XML processing, all alloca-

tions should be attributed to its corresponding mem-

ory account. When the execution returns allocations

need to be charged to the previously active memory

account. The execution context determines the previ-

ously active memory account.

In the Java virtual machine, all calls are syn-

chronous unless an exception occurs. In the ideal

case, all memory account states are maintained on a

stack in sync with the Java execution stack. When a

call leaves one module, the memory account needs to

be saved and when the call returns it needs to be re-

stored. Exceptions need to be handled appropriately

as their processing may cause object allocation, e.g.

the exception object itself.

When objects are deallocated by the garbage col-

lector of the Java runtime, the corresponding memory

account needs to be refunded. Therefore, every Java

object must be mappable to the memory account its

allocation cost was charged to.

The next section describes how the basic idea of

memory accounts can be implemented in a Java run-

time environment.

5 IMPLEMENTATION

We have named our memory proﬁler ASGMemProf as

it was created in the context of the Adaptive Service

Grid (ASG) project.

AGGREGATED ACCOUNTING OF MEMORY USAGE IN JAVA

179

ASGMemProf is based on the notion of Java pack-

ages and classes. These are the primary mechanisms

for modularization of Java code; code that is from a

single author or which fulﬁlls a single function is of-

ten concentrated into a single class or package. As

a consequence, by attributing memory allocation to

classes and packages, we can typically identify the

“culprit” for a memory allocation: the software func-

tion that caused the object to be allocated.

Possibly contrary to intuition, it is not necessary

to record the exact line of code within the principal

module that caused the memory allocation. When

the developer ﬁnds that a certain package has caused

the allocation of a number of objects of class X, the

developer will often know what part of the package

(class/method/line) caused the allocation. Only when

a class has many instances that are allocated in many

places (e.g. String objects), might the developer want

to know where exactly the object has been allocated.

However, such objects often form a part of a larger

structure, so that the developer would want to track

the root object of the structure instead.

Consequentially, a memory account is identiﬁed

by a package name or by a package pattern. We im-

plemented a pattern matching which allows the us-

age of a wildcard (*) allowing deﬁnition of different

memory accounts for different sub packages. A sin-

gle package name without a wildcard, e.g.

a.b

, will

create a memory account for

a.b

and will attribute all

costs to classes within

a.b

. The pattern

a.b.*

will

match all classes in this package (direct match) and all

subpackages (wildcard match) Therefore, deﬁnition 3

means that an allocation is accounted within the mem-

ory account whose package pattern matches the pack-

age of the allocating method the closest in the order in

which they appear in the allocation’s stack trace. Be-

fore proﬁling, memory accounts need to be deﬁned by

appropriate package names or package name patterns.

As a general implementation strategy we have em-

ployed on-the-ﬂy code-rewriting (also known as dy-

namic bytecode instrumentation) based on the Java

Virtual Machine Tool Interface (JVM TI)

and the

Java Instrumentation API

. As a Java class is loaded,

where and whether modiﬁcations of its bytecode need

to take place in order to employ our memory account-

ing is determined automatically.

As we have previously mentioned the currently

active memory account needs to be maintained in

sync with the execution stack. For our scheme only

those methods are of interest which reside in (sub-)

http://java.sun.com/javase/6/docs/

platform/jvmti/jvmti.html

http://java.sun.com/javase/6/docs/api/

java/lang/instrument/package-summary.html

packages for which a memory account has been de-

ﬁned.

The current memory account is stored in a thread

local variable. Upon method entry it is saved into

an added local variable, updated with the method’s

memory account and restored upon exit. The memory

account a method “belongs to” is determined at load

time, identiﬁed via an integer ID and added as a con-

stant to the method’s class. Methods outside of the de-

ﬁned memory accounts do not alter the current mem-

ory account. Therefore, allocations will be charged

correctly to the memory account closest to the top of

the execution stack.

We have implemented our own thread local stor-

age which is based on the thread ID (added in Java

1.5.0) as an index into an array of

ThreadInfo

ob-

jects. The size of that array is static and currently

1,000,000. This should be sufﬁcient even though in

Sun’s JVM the thread ID is a consecutive number.

This implementation strategy avoids the expensive

capture of a stack trace upon each allocation event.

The combination of a thread local variable and lo-

cal variables implicitly constitutes a stack where the

thread local variable always shows the top element of

the stack and the elements below are entailed within

the normal execution stack frames of the VM.

Concerning the allocation and deallocation events

the following aspects need to be considered: measur-

ing object size, accounting object allocation, account-

ing object deallocation and association with the cor-

rect memory account.

Measuring Object Size. We deﬁne an object’s al-

location costs as the objects size in contrast to the ob-

ject graph it may refer to, i.e. the object graph is only

implicitly considered as the sum of all monitored ob-

jects. The size of an object can be measured based

on its class deﬁnition. The Java Instrumentation API

offers a method which is implemented this way, but

it needs an object as parameter. It would be desirable

to know the size of an object based on its class before

an instance is ever created. The implementation of

getObjectSize()

iterates over all ﬁelds and adds up

the size based on the ﬁeld’s type. Since a call to that

method upon every object allocation is very costly, the

result is cached in a weak hash map.

Accounting Object Allocation. Object alloca-

tions are tracked by instrumenting the constructor

java.lang.Object

with a global guard condi-

tion which when true will relay the control to a

static method in the proﬁler along with the

this

reference. This method records object allocations

(

trackAllocs(Object)

). This idea came from the

documentation of the JVM TI reference. The object

constructor will be called for all created Java objects

ICSOFT 2009 - 4th International Conference on Software and Data Technologies

180

including objects created via reﬂection and native

code excluding array objects and

java.lang.Class

objects. Therefore, each

newarray

bytecode of all

loaded classes will be instrumented with a call to a

corresponding proﬁler method passing along the ar-

ray object reference, the array’s length and the array’s

content type.

Accounting Object Deallocation. There are

several options for accounting object deallocation:

(1) ﬁnalizers, (2) usage of reference objects from

java.lang.ref

and (3) JVM TI object tagging.

Using ﬁnalizers for object deallocation account-

ing means adding or instrumenting the

finalize()

method for every class. This approach was tested

and degraded the performance of the JVM drastically.

Since our implementation is in pure Java we decided

for reference objects. There are two types of refer-

ences that can be used for accounting object deallo-

cation, i.e. weak and phantom references. Reference

objects can be registered with a reference queue. They

need to be cleared., i.e. their referent needs to be set to

null. When the garbage collector reclaimes a garbage

object the referring weak and soft references, if any,

will be cleared automatically and added to a reference

queue, if any. Phantom references are not cleared au-

tomatically and are only enqueued when the object’s

ﬁnalizers have been run. In our opinion, weak ref-

erences are least intrusive into the garbage collector,

and do not preventthe garbage collection of an object,

although in the case of reviving ﬁnalizers (which are

rare), these are less accurate than phantom references.

Nevertheless, we decided to use weak references.

Association with the correct Memory Account.

The accounting methods will use the currently active

memory account stored in the thread local variable of

the currently executing thread as the account to which

the costs are to be attributed. Allocation cost data

for each class, thread and memory account need to

be maintained. We have implemented this mainly by

using two or three layers of weak hashmaps. They are

weak in the sense that the value reference is a weak

reference. This is important for example in the case

of threads. If a thread terminates the proﬁler must not

prevent the thread object from being reclaimed.

ASGMemProf can be set to periodically dump the

collected data to disk (snapshot). The snapshot is in

parsable text form so as to allow easier post process-

ing. We employed a non-blocking scheme to create

a snapshot. A shutdown hook ensures a ﬁnal data

dump.

For a more detailed explanation of the proﬁler im-

plementation please refer to (Bouch´e, 2007).

5.1 Detecting Memory Leaks

Before each snapshot a full GC ensures the clearance

of garbage objects and the recharging to the corre-

sponding memory accounts. Although we focused

our work on Java EE server side components such as

Servlets or EJBs ASGMemProf can be used to proﬁle

Java SE applications as well.

Servlets or EJBs do not have a classic main

method, but several entry points. Adding to that is the

more complex life cycle of these components. There-

fore, it is harder to deﬁne a point in time when all used

memory for a given task should have been released.

If it has not been, this is a strong hint at a memory

leak. On the other hand Java SE applications have

one main method and it is reasonable to say that when

that method has ﬁnished all used memory should be

freed or at least be eligible for reclamation. Yet, static

variables complicate this as well.

In any case we detect memory leaks by analyzing

the difference between two or more proﬁling snap-

shots. Currently this is done manually. For this to

work memory accounts for different subsystems or

areas of interest have to be carefully deﬁned. Each

account will list the live and total number of cre-

ated objects for each class optionally grouped by the

method(s) causing the allocation. Usually a memory

leak is indicated by a rising number of live objects

over time. A time correlation of events in the applica-

tion and the time stamp of the snapshot is necessary.

For example, the execution of a certain Servlet may

cause a memory leak by misuse of another subsys-

tem or third party library leaving the memory account

of the Servlet engine clean, but causing a sustained

raised number in another memory account.

As a practical application we needed a feasi-

ble memory proﬁler for the ASG execution platform

which is implemented in Java EE. Under certain con-

ditions there was an out of memory error, i.e. we had

a memory leak. Employing available memory pro-

ﬁlers to ﬁnd the leak was practically impossible how-

ever, because either the system slowdown was making

it unresponsive or the amount of data collected num-

bered several gigabytes. Certainly the great size of the

ASG platform contributed to this.

We proﬁled the ASG execution platform with AS-

GMemProf and though there was a signiﬁcant slow-

down it was bearable and the amount of proﬁling data

was greatly reduced through the employed aggrega-

tion. After analyzing the proﬁling snapshots we were

ﬁnally able to identify the memory leak. It was caused

by a server management subsystem using the API of

the data abstraction library Hibernate

wrongly.

http://www.hibernate.org/

AGGREGATED ACCOUNTING OF MEMORY USAGE IN JAVA

181

6 DISCUSSION AND

EVALUATION

It is important to quantify the memory and runtime

overhead of a proﬁler experimentally in order to pro-

vide a basis for a prediction of the overhead it incurs

on average. This is also a measure of quality for a pro-

ﬁler implementation. We present the overhead mea-

surements of our implementation in this section.

6.1 Runtime Overhead Measurements

To measure the additional runtime an application re-

quires when instrumented we used two benchmark

suites. The DaCapo (Blackburn et al., 2006) bench-

mark suite v2006-10-MR2 was used to measure Java

SE performance and SpecJBB 2005 v1.07

was used

to measure server side performance.

All tests ran on Sun’s Java HotSpot VM version

1.6.0-b105 in mixed, client mode. The computer

hardware was a 2 GHz AMD Athlon processor with

1 GB of RAM running Microsoft Windows XP with

Service Pack 2. The test results were computed from

an average of three consecutive runs for each bench-

mark. The DaCapo benchmarks were executed with

input size small and SpecJBB 2005 was carrying out

Warehouses one through four each measuring 240

seconds. The results can be found in table 1.

The table shows how long each benchmark took

without the proﬁler (plain) and with the proﬁler ap-

plied. Additionally, we measured the performance

of the JFluid proﬁler (Dmitriev, 2003), whose tech-

nology has become part of the NetBeans

Java IDE

which we used in version 6.5., as a comparison. This

proﬁler works with similar technology to our pro-

ﬁler though it takes a stack sample upon each alloca-

tion event. In order to be fair we disabled this fea-

ture. For the SpecJBB2005 benchmark ASGMem-

Prof was set to create a memory account for the pack-

age

spec.*

. For the DaCapo benchmarks, allocations

were charged to an overall memory account.

For each of the proﬁlers the table shows two

columns: the time the benchmark took to complete

and the incurred overhead given as a factor of the orig-

inal execution time. For SpecJBB 2005 the overhead

was computed by dividing the original throughput by

the proﬁled throughput.

Standard Performance Evaluation Corpora-

tion, SPECjbb2005 (Java Server Benchmark),

http://www.spec.org/jbb2005/

Sun Microsystems Inc. and NetBeans contributors,

http://www.netbeans.org/kb/index.html

6.1.1 Discussion

Generally speaking, the overhead incurred with our

current implementation makes it only feasible for de-

velopment uses, but not applicable for productionuse.

An acceptable overhead for production use is cited

in the literature to be approximately 0.3 (Pauw et al.,

1999) or less which our proﬁler clearly does not de-

liver. But in comparison with a competitive proﬁler

implementation the results are more than encourag-

ing.

The advantage of our implementation is clearly

visible. In all cases ASGMemProf incurs overhead

less than or equal to JFluid. In most cases ASGMem-

Prof is twice as fast as JFluid.

The benchmark

antlr

incurs the least overhead,

1.72 which is getting into the region of acceptable

overhead (1.3). This rather low overhead is due to

the fact that

antlr

does not create many objects and

those which are created live until termination. There-

fore, much less time is spent in the proﬁling methods.

Especially the rather expensive slow down of the GC

via weak references is reduced due to its inactivity.

We have done preliminary tests of a worst case

scenario in an application which continually creates

a lot of objects with a very short life time. Creat-

ing a weak reference object for each allocated object

already degrades the performance signiﬁcantly - not

to speak of the time needed to account object deal-

location. The GC must treat

WeakReference

objects

specially and obviously the implementation in Sun’s

JVM for this is only of average quality. All our at-

tempts to further reduce the proﬁling overhead failed

because most overhead is incurred when creating and

tracking

WeakReferences

which is VM implementa-

tion dependent.

This can clearly be seen in benchmark

jython

where a lot of short living objects are created over

a sustained amount of time. Hence, the expensive

operations of weak reference creation, maintenance

and notiﬁcation upon referent reclamation occur of-

ten. Preliminary test with the

mtrt

benchmark of the

SpecJVM98 suite showed a slowdown of 25, respec-

tively 60 for JFluid. This is obviously not feasible

even for development circumstances.

SpecJBB 2005 models typical server side Java

behavior by emulating users accessing a rather big

in-memory database inserting, updating and deleting

records. The database is implemented as binary ob-

ject graphs. Here the negative GC behavioral in-

ﬂuence of weak references comes into play as well.

They still perform a lot better than the native imple-

mentation of JFluid via JVM TI object tagging, but

are very unacceptable for the production use of the

proﬁler. Yet, the advancement of our implementation

ICSOFT 2009 - 4th International Conference on Software and Data Technologies

182

Table 1: Runtime Overhead of ASGMemProf vs. JFluid/NetBeans.

DaCapo plain (ms) ASGMemProf (ms) (factor) JFluid/NetBeans (ms) (factor)

antlr 813 1406 1.72 2750 3.38

bloat 2609 25813 9.89 49703 19.05

chart 2812 27062 9.63 60844 21.63

eclipse 10531 40171 3.81 67938 6.45

fop 1218 7688 6.31 26484 21.74

hsqldb 3406 10531 3.09 13641 4.00

jython 532 8860 16.65 9375 17.62

luindex 781 5047 6.46 5422 6.94

lusearch 2938 12625 4.29 27985 9.52

pmd 500 1172 2.34 8703 17.40

xalan 3343 17078 5.10 26594 7.95

SpecJBB 2005 (bops) (bops) (factor) (bops) (factor)

warehouse 1 4895 608 8.05 294 16.64

warehouse 2 4854 615 7.89 336 14.46

warehouse 3 4833 545 8.86 237 20.39

warehouse 4 4754 520 9.14 274 17.35

strategy is clearly visible in SpecJBB 2005 as well.

Weak reference or phantom reference objects still

seem to degrade the performance drastically under

heavy load. Yet, they remain the only viable option to

do exact memory proﬁling apart from a native JVM

TI agent implementation. Maybe another mechanism

to track object deallocation in a standard manner for

the Java platform needs to be found.

One optimization remains. Short run, often called

methods cause a lot of unnecessary runtime overhead,

especially if the calling code is from within the same

memory account. This could be optimized by using

static code analysis and only updating the account

when necessary. Additionally, only those methods

that (could) cause object allocation are to be instru-

mented. It is not insigniﬁcant to determine this at load

time.

6.2 Memory Overhead Measurements

In order to measure the space overhead our proﬁler in-

curs we recorded the peak value of the Private Bytes

performance indicator of the VM process during a

benchmark suite run on the same hardware as previ-

ously mentioned. The reported result is an average of

three runs for each benchmark. The results are listed

in table 2.

Table 2 shows two columns for each ASGMem-

Prof and JFLuid/NetBeans and one column for the

benchmark without the proﬁler. The peak private byte

size is given in kilo bytes. The overhead factor is de-

termined as a factor of the plain value.

6.2.1 Discussion

The memory overhead does not vary as much as the

runtime overhead.

In SpecJBB2005 both proﬁlers incur practically

the same amount of overhead. The main factor con-

tributing to the space overhead is each reference ob-

ject that has to be allocated for each newly allocated

object in order to track its deallocation. These refer-

ence objects are allocated on the Java heap and in the

current implementation weigh 40 bytes. If an appli-

cation creates a lot of small objects, the overhead will

be very high.

For the DaCapo benchmarks JFluid gives a better

performance than our proﬁler, though the difference

is relatively low. NetBeans produces slightly lower

space overhead than ASGMemProf. This is proba-

bly due to the fact, that the NetBeans proﬁler imple-

mentation is in native code and instead of weak ref-

erence objects for deallocation accounting it uses the

tag mechanism of the JVM TI which must require less

space than a weak reference object.

7 RELATED WORK

Proﬁling of Java applications is an ongoing research

topic. Additionally, there are several commercial

and open source proﬁlers available. Several publi-

cations address the usage of CPU and time measure-

ment of single methods. Among them are ProfBuilder

(Cooper et al., 1998), JaViz (Kazi et al., 2000), JIn-

sight (Sevitsky et al., 2001), JFluid (Dmitriev, 2003),

J-Seal2 (Binder et al., 2001), JSpy/JPaX (Goldberg

AGGREGATED ACCOUNTING OF MEMORY USAGE IN JAVA

183

Table 2: Space Overhead of ASGMemProf vs. JFluid/NetBeans.

benchmark plain (KB) ASGMemProf (KB) (factor) JFluid/NetBeans (KB) (factor)

SpecJBB2005 242,420 549,212 2.26 599,696 2.47

DaCapo 183,456 356.416 1.94 225.792 1.23

and Havelund, 2003), JBOLT (Brear et al., 2003),

JPMT (Harkema et al., 2003), Twilight/Aksum (Sera-

giotto and Fahringer, 2005), JP Tool (Binder and Hu-

laas, 2006) and eDragon/JIS (Carrera et al., 2003).

These works are concerned with researching, dis-

cussing and evaluating different concepts and im-

plementations for measuring the CPU time usage of

methods, threads and whole modules. Some em-

ploy bytecode instrumentation (JPMT, Twilight/Ak-

sum, ProfBuilder J-Seal2, JP Tool) and others use the

provided proﬁling interface functions.

Much less attention has been given to memory

proﬁling. J-Seal2 which is concerned with account-

ing and enforcing restriction on mobile code execu-

tion environments has a memory proﬁling subsystem.

Their implementation also uses memory accounts and

employs bytecode instrumentation. Techniques for

associating context information with an allocation are

similar. Yet, a memory account is not based on mod-

ules, but on predeﬁned execution environment restric-

tions which are valid for a whole application. We em-

ploy a more reﬁned model and J-Seal2 does not cap-

ture principal-agent relationships. A more recent pub-

lication in conjunction with that research is JP Tool

where an expensive sampling of the stack is likewise

avoided by extending a method’s signature with a ref-

erence to the memory accounting object. Binder notes

that this technique cannot be applied to Java core

classes, whereas our rewriting scheme allows proﬁl-

ing of core classes as well.

The JFluid proﬁler which has been integrated

into the NetBeans development environment employs

memory accounting techniques similar to those we

have used. The implementation uses weak refer-

ences for object deallocation notiﬁcation as well. Re-

sults are aggregated into a calling context tree (CCT)

(Ammons et al., 1997). This aggregation technique

is very common among proﬁlers. While informa-

tion for memory accounts can be extracted from a

CCT, unnecessary stack trace data has been collected

and effort expended. The NetBeans proﬁler contains

a very interesting technique for detecting memory

leaks. The object generation metric for a class which

is the number of different ages for all objects. An ob-

ject age is the number of garbage collections it has

survived.

DJProf (Pearce et al., 2006) which is a proﬁler

based on aspect oriented programming (AOP) using

AspectJ (Kiczales et al., 2001) is employed to per-

form the bytecode instrumentation (the deﬁned as-

pects are woven into the code). It uses phantom ref-

erences to capture object deallocation. The goal was

to test the suitability of AOP for proﬁler implementa-

tions. There is a short discussion on where to attribute

allocation costs to using the example of a constructor

allocating other objects. It is decided to take the same

direction as we do, but not generalized into the notion

of a memory account.

There are several commercial tools which allow

memory proﬁling such as YourKit, JProbe, JPro-

ﬁler, Borland OptimizeIt, Intel VTune, IBM Rational

Quantify and Wily Introscope.

Furthermore, attempts are being made to reduce

the overhead of exact proﬁling with sampling tech-

niques for the cost of accuracy. There are sev-

eral works that discuss this issue (Arnold and Ry-

der, 2001; Factor et al., 2004; Ammons et al., 1997;

Dmitriev, 2003). Arnold points out a scheme for re-

ducing the overhead cost of instrumented code and

presents data showing that recording only every 10th

event will still yield an accuracy of 98%. This is

something we can investigatefor ASGMemProf in the

future. Yet, a sampling technique is not feasible for

ﬁnding memory leaks.

8 CONCLUSIONS AND FUTURE

WORK

We have presented a novel model for memory pro-

ﬁling: the principal-agent relationship, and the con-

cept of a memory account. These concepts attempt

to reduce the overhead of exact memory proﬁling by

performing a sensible aggregation of data. In partic-

ular, even though each individual object is accounted

for, we can avoid computing and preserving the stack

trace that lead to the allocation of a speciﬁc object.

Our approach inherently avoids taking a stack

sample for each allocation and therefore delivers less

information than traditional memory proﬁlers. As we

have explained this is in effect not a loss but a gain.

Although this constitutes less accuracy in terms of

amount of information. Optionally the proﬁler can

be set to record the top stack frame for the allocation

at no additional cost.

While initial results obtained from the approach

ICSOFT 2009 - 4th International Conference on Software and Data Technologies

184

are promising, end-user experience from a variety of

applications still needs to be obtained and studied, to

ﬁnd out whether the presented approach is practical

for analyzing memory consumption. Analyzing the

memory and run-time overhead is feasible, as we have

demonstrated above. Analyzing the value of our tools

to developers is more difﬁcult; based on past publica-

tions in this ﬁeld, we expect that any report on utility

and viability of this approach will remain anecdotal.

We envision two application areas for this ap-

proach: development and operations. Our applica-

tions of the tool had primarily been in the ﬁeld of de-

velopment – helping the developer to ﬁnd out mem-

ory leaks in the application, so that the code can be

improved.

In operations, the application of the approach

would be different. For example, the operator might

apply memory accounting to different services run-

ning in the same service container, and then take ser-

vice management decisions based on the amount of

memory used by each service (e.g. to migrate a ser-

vice with high memory consumption to a different

machine). As another example, the approach could

be used for the self-policing of application containers:

the container could enforce an upper limit on memory

consumption, and let allocations from a principal fail

if the principal’s memory account is overdrawn.

Further implementation details can be found

in (Bouch´e, 2007). The proﬁler is available at

sourceforge.net

REFERENCES

Ammons, G., Ball, T., and Larus, J. R. (1997). Exploiting

hardware performance counters with ﬂow and con-

text sensitive proﬁling. In PLDI ’97: Proceedings

of the ACM SIGPLAN 1997 conference on Program-

ming language design and implementation, pages 85–

96, New York, NY, USA. ACM Press.

Arnold, M. and Ryder, B. G. (2001). A framework for re-

ducing the cost of instrumented code. In SIGPLAN

Conference on Programming Language Design and

Implementation, pages 168–179.

Binder, W. and Hulaas, J. (2006). Exact and portable proﬁl-

ing for the JVM using bytecode instruction counting.

volume 164, pages 45–64.

Binder, W., Hulaas, J. G., and Villazon, A. (2001). Portable

resource control in java. In Proceedings of the

16th ACM SIGPLAN conference on Object oriented

programming, systems, languages, and applications,

pages 139–155. ACM Press.

Blackburn, S. M., Garner, R., and Hoffmann, C. (2006).

The DaCapo benchmarks: java benchmarking devel-

opment and analysis. In OOPSLA ’06, pages 169–190,

New York, NY, USA. ACM.

Bouch´e, P. (2007). A comparative study of J2EE proﬁl-

ing approaches for usage within asg. Master’s the-

sis, Hasso-Plattner-Institute for IT-Systems Engineer-

ing of the University of Potsdam.

Brear, D. J., Weise, T., Wiffen, T., Yeung, K. C., Bennett, S.

A. M., and Kelly, P. H. J. (2003). Search strategies for

java bottleneck location by dynamic instrumentation.

In Software, IEE Proceedings, volume 150, Issue: 4,

pages 235– 241.

Carrera, D., Guitart, J., Torres, J., Ayguade, E., and Labarta,

J. (2003). Complete instrumentation requirements

for performance analysis of web based technologies.

In Performance Analysis of Systems and Software,

2003. ISPASS. 2003 IEEE International Symposium

on, pages 166– 175.

Cooper, B., Lee, H., and Zorn, B. (1998). Profbuilder: A

package for rapidly building java execution proﬁlers.

Dmitriev, M. (2003). Design of JFluid: A proﬁling technol-

ogy and tool based on dynamic bytecode instrumenta-

tion. Technical report, Sun Microsystems Inc.

Factor, M., Schuster, A., and Shagin, K. (2004). Instru-

mentation of standard libraries in object-oriented lan-

guages: the twin class hierarchy approach. In OOP-

SLA ’04: Proceedings of the 19th annual ACM SIG-

PLAN conference on Object-oriented programming,

systems, languages, and applications, pages 288–300,

New York, NY, USA. ACM Press.

Goldberg, A. and Havelund, K. (2003). Instrumentation of

java bytecode for runtime analysis.

Harkema, M., Quartel, D., van der Mei, R., and Gijsen,

B. (2003). JPMT: A java performance monitoring

tool. Technical Report TR-CTIT-03-25 Centre for

Telematics and Information Technology, University of

Twente, Enschede.

Kazi, I. H., Jose, D. P., Ben-Hamida, B., Hescott, C. J.,

Kwok, C., Konstan, J., Lilja, D. J., and Yew, P.-C.

(2000). JaViz: A client/server java proﬁling tool.

Kiczales, G., Hilsdale, E., Hugunin, J., Kersten, M., Palm,

J., and Griswold, W. G. (2001). An overview of As-

pectJ. In ECOOP ’01: Proceedings of the 15th Eu-

ropean Conference on Object-Oriented Programming,

pages 327–353, London, UK. Springer-Verlag.

Pauw, W. D., Jensen, E., and Konuru, R. (1999). Jinsight, a

visual tool for optimizing and understanding java pro-

grams. ibm corporation, research division.

http://

www.alphaWorks.ibm.com/tech/jinsight/

. Url

last visited: 23. May 2009.

Pearce, D. J., Webster, M., Berry, R., and Kelly, P. H. J.

(2006). Proﬁling with aspectj. In Software: Practice

and Experience. John Wiley & Sons, Ltd.

Seragiotto, C. and Fahringer, T. (2005). Analysis of dis-

tributed java applications using dynamic instrumen-

tation. In IEEE International Conference on Cluster

Computing (Cluster 2005).

Sevitsky, G., de Pauw, W., and Konuru, R. (2001). An

information exploration tool for performance analy-

sis of java programs. In TOOLS ’01: Proceedings

of the Technology of Object-Oriented Languages and

Systems, page 85, Washington, DC, USA. IEEE Com-

puter Society.

AGGREGATED ACCOUNTING OF MEMORY USAGE IN JAVA

185