A STATISTICAL APPROACH FOR IDENTIFYING MEMORY

LEAKS IN CLOUD APPLICATIONS

Vladimir

Sor and Satish Narayana Srirama

Institute of Computer Science, University of Tartu, J. Liivi 2, Tartu, Estonia

Keywords:

Troubleshooting, Java

Virtual Machine, Byte code instrumentation, Cloud computing, Tools.

Abstract:

This position paper describes the attempt to automate the statistical approach for memory leak detection in

Java

applications. Proposed system extends the basic statistical memory leak detection method with further

intelligence to pinpoint the source of the memory leak in the source code. As the method adds only small

overhead in runtime it is designed to be used in production systems and will help detecting memory leaks in

production environments without constraint to the source of the leak. Architecture of the proposed approach

is intended to use in cloud applications.

1 INTRODUCTION

Memory leaks can be a major problem in distributed

applications, depleting their performance, even if they

run on platforms with automatic memory manage-

ment like Java Virtual Machine. Finding memory

leaks is coveredby many researches and there are sev-

eral tools and methodologies to ﬁnd memory leaks.

However, these tools are incorporated in proﬁlers and

are designed to use in the development phase. On one

hand, this is perfectly justiﬁed, as memory leaks are

bugs in software and ﬁnding bugs is a work for devel-

opers. For this to happen it is expected that such bugs

are found in test environment, or at least they can be

reproduced or simulated in test environment.

On the other hand, conﬁguration of the production

environment (e.g. integrations with real systems), up-

time of the system is much longer, and the usage pat-

terns that real users generate are much more rich than

teams of Q/A and analysts could think of. All this

means that in production could happen more untested

situations that may result in memory leaks. And a

memory leak even in a modern JVM will inevitably

result in the need for a restart of the JVM.

Use of clustering and cloud computing (Armbrust

et al., 2009) reduce the impact of such restarts for

the end user, who may even not notice anything,

but for operations this still is a problem. Moreover,

cloud computing and virtualization brings in addi-

tional uncertainty of not knowing the conﬁguration of

the physical hardware on which the application is ac-

tually running. Thus, a memory leak that only occurs

in production under very speciﬁc circumstances (of-

ten hardly speciﬁable) can be very hard to ﬁnd and ﬁx

in development and test environments.

This is the area we think we can improve by de-

veloping the solution that uses efﬁcient statistical al-

gorithm to detect memory leaks in advance, imposes

low overhead in production system and would help

tracking down the source of the leak. The rest of the

paper is organized as follows. Section 2 address the

related work in detail. Section 3 describes the statisti-

cal method we propose along with preliminary analy-

sis. Section 4 discusses how cloud computing beneﬁts

from such a method. Section 6 concludes the paper

with future research directions.

2 RELATED WORK

Memory leaks havebeen studied in the industryand in

research community extensively and currently there

are several approaches for ﬁnding memory leaks in

Java applications.

First option is the ofﬂine memory dump analy-

sis. Ofﬂine in this context means that the memory

dump is taken and then analyzed outside of the run-

ning JVM. Memory dump can be either requested

from the live JVM (during this procedure JVM ex-

ecution is stopped) or it can be generated automati-

cally by the JVM when out of memory condition oc-

curs. There are several algorithms to analyze memory

dumps to detect possible leaking objects. For exam-

ple (Maxwell, 2010) shows usage of graph mining al-

623

Šor V. and Narayana Srirama S..

A STATISTICAL APPROACH FOR IDENTIFYING MEMORY LEAKS IN CLOUD APPLICATIONS.

DOI: 10.5220/0003389906230628

In Proceedings of the 1st International Conference on Cloud Computing and Services Science (CLOSER-2011), pages 623-628

ISBN: 978-989-8425-52-2

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

gorithms for this purpose. Eclipse Memory Analyzer

or MAT (The Eclipse Foundation, 2010) is an exam-

ple of production quality heap dump analysis software

which is freely available. However, such ofﬂine anal-

ysis has several problems: heap dumps can be expen-

sive to acquire in production environment (because

generating a dump ﬁle requires freezing the applica-

tion) and heap dump ﬁles can be very big (up to sev-

eral gigabytes, depending on the memory conﬁgura-

tion). Because of the ﬁle size it can be hard to run

analysis software on a regular development machine.

Another drawback is the static nature of the memory

dump – there is no information regarding the source

of allocation of the objects, so ﬁnding the code re-

sponsible for memory leak is a separate task from just

ﬁnding leaked objects.

Another approach is to monitor certain collection

classes for unlimited growth. This approach relies

on bytecode instrumentation and one possible solu-

tion is described in (Xu and Rountev, 2008). This

technique is also used in several Application Per-

formance Monitoring (APM) suites. For example,

CA Wily Introscope

 LeakHunter

(CA Wily In-

troscope, 2010) and AppDynamics (AppDynamics,

2010). Both APM suites add some intelligence to ease

ﬁnding the cause of the memory leak. Unfortunately

there is no information about exact algorithms used in

them. Also, mentioned APM suites are targeted to the

Java Enterprise application and are not applicable for

example for desktop GUI applications.

As alternative to direct bytecode instrumentation,

aspect-oriented instrumentation may be used to ﬁnd

the metrics needed for memory leak detection. Find-

Leaks tool is using AspectJ pointcuts for this purpose

to analyze references between objects and ﬁnd out

leaking objects together with the sites of allocation

(Chen and Chen, 2007). In that paper only GUI ap-

plications were used for testing.

Proﬁlers are often used in development for ﬁnd-

ing memory leaks. Different proﬁlers allow gathering

different metrics that may help ﬁnding memory leaks.

For example, proﬁler of the NetBeans IDE can obtain

object age which can then be used by human opera-

tor to apply statistical method (Sedlacek, 2010). This

data can be collected during object allocation proﬁl-

ing. Major disadvantage of the proﬁlers is the need for

qualiﬁed and experienced operator who can ﬁnd the

actual leaking code. Inexperienced developer, given

the proﬁler, fact of the memory leak and reasonably

big code base would be arguably successful in this

process.

In addition to different instrumentation and byte

code modiﬁcation techniques there are several re-

search projects applying different statistical meth-

ods for analyzing unnecessary references between ob-

jects: Cork, (Jump and McKinley, 2007) and stale ob-

jects: SWAT, (Chilimbi and Hauswirth, 2004). Cork

implements statistical memory leak detection algo-

rithm directly in the Virtual Machine by integrating

the method in the garbage collector itself. Cork has

achieved signiﬁcantly small performance penalty –

only 2% and good results in memory leak detection

(Jump and McKinley, 2007). The only problem is

that this project is implemented as a module in the Re-

search Virtual Machine (RVM) Jikes (The Jikes RVM

Project, 2010), which makes it usable mostly in the

research community, as the industry is not very keen

anticipating the research VM.

Biggest disadvantage of these methods is the need

for qualiﬁed human operation to analyze gathered

data to ﬁnd real place in the source code responsi-

ble for the memory leak. We think that this manual

decision and search process can also be automated.

3 STATISTICAL APPROACH TO

MEMORY LEAK DETECTION

Based on the review of related work we noted that

there is still space for the automated end-to-end mem-

ory leak detection solution that would work on the

HotSpot or OpenJDK Java Virtual Machines, would

maximally assist the developer by pinpointing both

allocation and reference points of leaking objects and

would do that also in the distributed and cloud envi-

ronments with little performance penalty so it could

be usable in production systems. Similar idea about

memory leak detection with statistical method is de-

scribed in (Formanek and Sporar, 2006) as an ex-

ample of application of dynamic Java byte code in-

strumentation. However, so far it hasn’t been imple-

mented end-to-end in any knownproﬁlers or scientiﬁc

publications.

There are several challenges for implementation

of this approach using standard tools:

• Gathering the data with low overhead during run-

time. As the number of objects during application

is huge (for example, specJVM 2008 benchmark,

which we used to test performance impact, during

its 2 hour run created 877 958 317 objects).

• Actually applying statistical method in real time

to detect classes suspected to be leaking.

• Apply dynamic byte code instrumentation to ﬁnd

spots of allocation and most importantly referenc-

ing objects (as actually objects referencing leak-

ing ones are sources of the leaks rather than those

instantiating leaking objects).

CLOSER 2011 - International Conference on Cloud Computing and Services Science

624

• Present ﬁndings in a user friendly way.

The basis of the statistical method is the genera-

tional hypothesis or infant mortality of objects and it

is described in (Sun Microsystems Inc., 2003). Gen-

erational hypothesis states that most of the objects

become unreachable very soon after creation, or in

other words – they ”die” young. This means that

the memory these objects occupy can be freed early

by garbage collector. Objects that stay reachable, or

alive, survive several garbage collection cycles.

On the other extreme there are number of objects

that were created during initialization and start up of

the application and they stay alive until the end of ex-

ecution (e.g., application main classes, static caches,

etc.). See ﬁgure 1. Leaking objects on the other hand

are being created time after time and not being freed

(thus the name – leak).

From these observations one can conclude that if

we measure the age of the object as the number of

garbage collection cycles (or generations) it has sur-

vived, then by analyzing how live instances of the

class are distributed over different generations we can

evaluate if objects follow the generational hypothesis.

If the number of generations, in which instances of a

class are present, keeps growing then this means that

application allocates objects of a particular type and

doesn’t free them, which suggests we have a memory

leak.

Figure 1: Distribution of survived bytes over generations,

(Sun Microsystems Inc., 2003).

Generational hypothesis is also a basis for mod-

ern generational garbage collectors that use it to di-

vide the heap into different regions to keep objects of

different age in different regions: eden, tenured and

permanent. Having different regions also yields to

different collections – minor (collection in eden space

only, takes place when there is no more space in eden

for new objects) and major (collection of both eden

and tenured spaces, takes place if collecting the eden

space haven’t freed enough space for the new alloca-

tion). Objects that have already survived some num-

ber of collections are moved from eden to tenured

space (Sun Microsystems Inc., 2003). So, as garbage

collector has to keep track of object ages anyway to

perform its work, the best place to collect the data

for the statistical method would be the garbage col-

lector itself. Unfortunately as of now it is not possible

as there is no interface in OpenJDK or HotSpot vir-

tual machines that would expose information about

the age of objects for external code.

In following sections we’ll describe our ideas and

tests we have conducted on how to gather required

data and implement the statistical method.

3.1 Automated Statistical Sampling

To implement the statistical method we have to mon-

itor object ages with small performance penalty.

As described in previous section we will designate

the time of creating of the object with the garbage

collector cycle counter. In section 2 we mentioned

that NetBeans proﬁler collects the garbage collection

cycle when proﬁling object allocations. Reasonable

question arises – why not use it somehow? As Net-

Beans are general purpose proﬁler it collects a lot

more information that is needed for our task and be-

cause of this much more overhead occurs both in

terms of memory and processor time.

So, to monitor the age of the object we need to

get information about garbage collector activity. As

garbage collector is an internal process of the JVM

and it doesn’t have any public API to access from Java

code, we have to utilize Java

Virtual Machine Tool

Interface (JVMTI) functionality to get this data.

JVMTI provides native interface to add hooks for

the JVM’s internal events and functions to commu-

nicate with memory management, stack frames, and

many more. (Sun Microsystems Inc., 2006). The

functionality we require are the hooks for garbage

collection events:

void GarbageCollectionFinish(jvmtiEnv *env)

void GarbageCollectionStart(jvmtiEnv *env)

As the simplest solution, we created native agent

that uses JVMTI tagging to assign ages to objects.

In JVMTI tags are marker-values of type

long

that

can be attached using JVMTI functions to objects and

classes. Afterward, it is possible to iterate over heap

using JVMTI functions ﬁltering objects by their tags.

Whether the tag will be used as a plain value or a

pointer to some richer data structure depends on the

usage. As tags are kept in the native memory rather

than on Java heap, their use would not introduce any

impact Java heap-wise, which is very desired effect

for the case we are addressing.

To set tags we used naive agent to instrument the

java.lang.Object

class to tag all created objects

A STATISTICAL APPROACH FOR IDENTIFYING MEMORY LEAKS IN CLOUD APPLICATIONS

625

with the value of current garbage collector genera-

tion, i.e. time of creation of an object. At the moment

of writing, the agent is also capable of outputting the

histogram of distribution of classes over generations

upon request.

As a next step of our research we’ll change the

agent in the way that for any given class it would

output number of different generations where live in-

stances of this class are present. Based on this output

statistical method will be applied to detect classes of

leaking objects. To deﬁne effective threshold value

for this number requires fair amount of testing on ap-

plications with different types of applications.

4 APPLICATION OF THE

METHOD IN CLOUD

COMPUTING

Troubleshooting applications, even on a single sys-

tem, can be a demanding task due to several factors

that must be considered and a lot of available mon-

itoring metrics from which correct conclusions have

to be made to ﬁx the problem. Distributed computing

as such increases the complexity of troubleshooting,

because several machinescan now participate in a sin-

gle transaction. Cloud computing, with its elastic na-

ture and possibility of scaling to very large number of

virtual machines, further increases the troubleshoot-

ing complexity. Memory leak detection using statis-

tical method is only a part of our broader vision of

distributed troubleshooting tool. In the ﬁnal picture,

memory leak detection will be one agent among dif-

ferent troubleshooting agents that could be dynami-

cally engaged or disengaged to detect different kinds

of problems.

The following subsections explain how the dis-

tributed troubleshooting tool method can be applied

in cloud computing domain and how cloud applica-

tions could beneﬁt from it.

4.1 Deployment Scenario Architecture

Cloud computing is a style of computing in which,

typically, resources scalable on demand are provided

”as a service (aaS)” over the Internet to users who

need not have knowledge of, expertise in, or control

over the cloud infrastructure that supports them. The

provisioning of cloud services can be at the Infras-

tructural level (IaaS) or Platform level (PaaS) or at the

Software level (SaaS). A cloud computing platform

dynamically provisions, conﬁgures, reconﬁgures, and

de-provisions servers as requested. This ensures elas-

ticity of the systems deployed in the cloud. Our dis-

tributed troubleshooting solution will operate on the

IaaS level, meaning that the solution is aware of dif-

ferent virtual servers and will monitor separate JVM

instances.

To implement the automated statistical method, or

end-to-end memory leak troubleshooting solution for

the cloud computing domain, we propose monitoring

agents and analysis modules. The architecture of the

proposed solution is shown on Figure 2. Each JVM

under inspection will have one agent installed which

will perform sampling of object age distribution. Co-

ordinationserver, or dashboard, collects data from the

agents and applies the statistical method to ﬁnd can-

didate classes, instances of which are suspected to be

leaking. After suspected classes are identiﬁed, adap-

tive introspection will take place. This means that the

agent will instrument only suspected classes and mon-

itor allocation sites and incoming references of the se-

lected instances. Collected data will be sent back to

the coordination server to be analyzed and presented

to the end user.

Figure 2: Proposed solution architecture.

Dashboard is an important part of the cloud de-

ployment as it aggregates the data from many cloud

instances. The coordination server and its activi-

ties can also be presented at the dashboard. Agents

and dashboard will use auto-discovery to automati-

cally conﬁgure new/shut down instances. Moreover,

as only the standard APIs are used in the approach,

it will be simple to deploy our solution with regular

shell scripts which belong the usual tool set for con-

ﬁguring cloud instances.

4.2 Troubleshooting in Large Scale

Data Analysis Applications

Commercial APM suites are targeted to enterprise us-

age, mostly concentrating on web-based applications.

Our approach is also suitable for scientiﬁc comput-

ing and large scale data analysis applications, which

are getting more common on the cloud. One example

is MapReduce (Dean and Ghemawat, 2004), which

CLOSER 2011 - International Conference on Cloud Computing and Services Science

626

is a programming model and a distributed comput-

ing framework, which is widely used for large scale

data analysis on the cloud. It was ﬁrst developed by

Google to process very large amounts of raw data that

it has to deal with on daily basis, e.g., indexed Inter-

net documents and web requests logs, which grows

every day. Google uses MapReduce to process data

across hundreds of thousands of commodity comput-

ers. However, Google’s implementation of MapRe-

duce is proprietary. Apache Hadoop is an open source

implementation of MapReduce written in Java. Apart

from MapReduce, Hadoop also providesHadoop Dis-

tributed File System (HDFS) to reliably store data

across hundreds of computers. Apache Hadoop is

in active development and is used both commercially

and in research community.

As part of other projects at our institute, we are

also interested in deporting several of our scientiﬁc

computing and enterprise applications to the cloud.

Especially we are interested in establishing private

clouds at the universities and deporting scientiﬁc

computing applications to the hybrid clouds. In the

Scicloud (Srirama et al., 2010) project we are mainly

studying how to reduce the scientiﬁc computing algo-

rithms to MapReduce framework so that they can efﬁ-

ciently use the cloud computing resources. From this

analysis we observed that most of the scientiﬁc ap-

plications take huge amounts of resources and times

and observing the memory leaks in the algorithms is

very essential. We presume that statistical method for

detecting memory leaks will be applicable in the do-

main and our future research addresses this scope in

particular.

5 PRELIMINARY ANALYSIS OF

THE APPROACH

After implementing simple counting agent described

in section 3.1 we measured the performance impact

using SPECjvm2008 benchmark ((Standard Perfor-

mance Evaluation Corporation, 2008)) with and with-

out our age counting agent, which resulted in compos-

ite result of 11,72 ops/m and 5,48 ops/m respectively.

Detailed results of the benchmark are showed on ﬁg-

ure 3.

Benchmarking was made on a laptop computer

with Java HotSpot

Client VM (build 17.1-b03)

bundled with Java

SE Runtime Environment (build

1.6.0 22-b04) on Intel

 Core

2 CPU at 1.83GHz

and 3GB ram under Windows XP, however actual

value of the benchmark result is not important while

we are interested only in the amount of performance

degradation, which is roughly 50%.

Figure 3: SPECjvm2008 results.

Current naive implementation of the sampling

method is a subject of further improvement and op-

timization.

To assess applicability of the method we used our

tool to ﬁnd known memory leak in the HtmlUnit li-

brary. To ﬁnd that leak manually we used to spend

one whole day. Our tool found the same leak in a

matter of minutes. We were successful in discover-

ing the type of leaking objects as well as the place of

adding leaking objects into the

HashMap

6 CONCLUSIONS AND FUTURE

WORK

Research and experiments done so far are show that

automating statistical method is promising both in

terms of memory leak detection and also in using

described method in cloud computing environments.

We are actively working on different aspects of cur-

rent proof of concept implementation.

Further work includes implementation of the

adaptive introspection i.e., collection additional data

like allocation and reference places in the source

code. Adaptive introspection will be achieved us-

ing dynamic byte code modiﬁcation with the help of

JVMTI function

RetransformClasses

. Main ben-

eﬁt on dynamic bytecode modiﬁcation is that during

regular runtime code doesn’t have any overhead what-

soever. We only instrument classes that we are in-

terested in and after we got enough information to

present to the user we remove instrumentation. Work

done for JFluid (Dmitriev, 2003) shows that such lim-

iting of the proﬁling code yields in good results in

terms of performance overhead.

As described in section 3.1, the Java agent calcu-

A STATISTICAL APPROACH FOR IDENTIFYING MEMORY LEAKS IN CLOUD APPLICATIONS

627

lates the distribution of live instances of a class over

different generations and can provide gathered infor-

mation for analysis. Analysis will be performed out-

side of the JVM being under inspection. This de-

sign is made with distributed and cloud applications in

mind, to have one analysis dashboard which performs

analysis of several JVMs and gives an overview for

the end user. Performing analysis outside of the host

JVM will also use less computing resources of a host

JVM. Such cooperation of modules and their adapta-

tion will achieve what we call intelligence in terms of

proﬁling only what is needed and when it is needed.

When the class, instances of which are leaking, is

found, source of the leak is presented to the user then

the only thing left for him is to wait until JVM runs

out of heap memory. Collecting the data related to

the rate of creation of leaking objects, size of these

objects it is possible to forecast when out of memory

exception might occur. This information can be valu-

able for the operations team to know how to react and

when to be ready to react to out of memory error. In

case of web-applications the reaction might be adding

nodes (or cloud instances) to the cluster.

Finding the alarming ratio, or threshold, for the

count of live objects across different generations is

another topic for further testing and analysis.

ACKNOWLEDGEMENTS

This research is part of the EUREKA Project 4989

(SITIO) co-funded by the European Regional Devel-

opment Funds via Enterprise Estonia.

REFERENCES

AppDynamics (2010). Appdynamics home page. http://

www.appdynamics.com/.

Armbrust, M., Fox, A., Grifﬁth, R., Joseph, A. D., Katz,

R. H., Konwinski, A., Lee, G., Patterson, D. A.,

Rabkin, A., Stoica, I., and Zaharia, M. (2009). Above

the clouds, a berkeley view of cloud computing. Tech-

nical report UCB/EECS-2009-28, University of Cali-

fornia.

CA Wily Introscope (2010). http://www.ca.com/us/

application-management.aspx.

Chen, K. and Chen, J.-B. (2007). Aspect-based instrumen-

tation for locating memory leaks in java programs.

In Computer Software and Applications Conference,

2007. COMPSAC 2007. 31st Annual International,

volume 2, pages 23 –28.

Chilimbi, T. M. and Hauswirth, M. (2004). Low-overhead

memory leak detection using adaptive statistical pro-

ﬁling. In In Proceedings of the 11th International

Conference on Architectural Support for Program-

ming Languages and Operating Systems, pages 156–

164.

Dean, J. and Ghemawat, S. (2004). Mapreduce: Simpliﬁed

data processing on large clusters. In OSDI’04: Sixth

Symposium on Operating System Design and Imple-

mentation.

Dmitriev, M. (2003). Design of jﬂuid: A proﬁling technol-

ogy and tool based on dynamic bytecode instrumen-

tation. Technical report, Sun Microsystems Laborato-

ries.

Formanek, I. and Sporar, G. (2006). Dynamic bytecode in-

strumentation. Dr. Dobbs Journal, Online.

Jump, M. and McKinley, K. S. (2007). Cork: dy-

namic memory leak detection for garbage-collected

languages. In Proceedings of the 34th annual ACM

SIGPLAN-SIGACT symposium on Principles of pro-

gramming languages, POPL ’07, pages 31–38, New

York, NY, USA. ACM.

Maxwell, E. K. (2010). Graph mining algorithms for mem-

ory leak diagnosis and biological database cluster-

ing. Master’s thesis, Virginia Polytechnic Institute and

State University.

Sedlacek, J. (2010). Uncovering memory leaks using

netbeans proﬁler. http://netbeans.org/kb/articles/

nb-proﬁler-uncoveringleaks pt1.html.

Srirama, S. N., Batrashev, O., and Vainikko, E. (2010). Sci-

Cloud: Scientiﬁc Computing on the Cloud. In The

10th IEEE/ACM International Symposium on Cluster,

Cloud and Grid Computing(CCGrid 2010), page 579.

Standard Performance Evaluation Corporation (2008).

Specjvm2008. http://www.spec.org/jvm2008/. Veri-

ﬁed in Nov. 2010.

Sun Microsystems Inc. (2003). Tuning Garbage Collection

with the 5.0 Java

Virtual Machine.

Sun Microsystems Inc. (2006). Jvm

tool interface. On-

line.

The Eclipse Foundation (2010). Memory analyzer. Online.

The Jikes RVM Project (2010). The jikes rvm project. On-

line.

Xu, G. and Rountev, A. (2008). Precise memory leak de-

tection for java software using container proﬁling. In

ICSE ’08. ACM/IEEE 30th International Conference

on Software Engineering, 2008., pages 151 –160.

CLOSER 2011 - International Conference on Cloud Computing and Services Science

628