Using the Juliet Test Suite to Compare Static Security Scanners

Andreas Wagner

and Johannes Sametinger

GAM Project, IT Solutions, Schwertberg, Austria

Dept. of Information Systems – Software Engineering, Johannes Kepler University Linz, Linz, Austria

Keywords: Juliet Test Suite, Security Scanner, Scanner Comparison, Static Analysis.

Abstract: Security issues arise permanently in different software products. Making software secure is a challenging

endeavour. Static analysis of the source code can help eliminate various security bugs. The better a scanner

is, the more bugs can be found and eliminated. The quality of security scanners can be determined by letting

them scan code with known vulnerabilities. Thus, it is easy to see how much they have (not) found. We

have used the Juliet Test Suite to test various scanners. This test suite contains test cases with a set of securi-

ty bugs that should be found by security scanners. We have automated the process of scanning the test suite

and of comparing the generated results. With one exception, we have only used freely available source code

scanners. These scanners were not primarily targeted at security, yielding disappointing results at first sight.

We will report on the findings, on the barriers for automatic scanning and comparing, as well as on the de-

tailed results.

1 INTRODUCTION

Software is ubiquitous these days. We constantly get

in touch with software in different situations. For

example, we use software for online banking; we use

smartphones, we drive cars, etc. Security of software

is crucial in many, if not most situations. But news

about security problems of software systems contin-

ues to appear in the media. So the question arises:

how can security errors be avoided or at least mini-

mized? Security is complex and difficult to achieve.

It is commonly agreed on that security has to be

designed into software from the very start. Develop-

ers can follow Microsoft’s secure software life-cycle

(Howard, Lippner 2006) or adhere to the security

touch points (McGraw 2009). Source code reviews

depict an important piece of the puzzle in the direc-

tion of secure software. Source code scanners pro-

vide a means to automatically review source code

and to detect problems in the code. These scanners

typically have built-in, but mostly extensible sets of

errors to look for in the code. The better the scanner

and its rule set, the better the results of its scan. We

have used a test suite that contains source code with

security weaknesses to make a point about the quali-

ty of such scanners. We have analyzed several scan-

ners and compared their results with each other.

The paper is structured as follows: Section 2

gives an overview of the Juliet Test Suite. In Section

3, we introduce security scanners. The process

model for the analysis and comparison is shown in

Section 4. Section 5 contains the results of our study.

Related work is discussed in Section 6.

2 JULIET TEST SUITE

The Juliet Test Suite was developed by the Center

for Assured Software (CAS) of the US American

National Security Agency (NSA) (Center for As-

sured Software 2011). Its test cases have been creat-

ed in order to test scanners or other software. There

are two parts of the test suite. One part covers secu-

rity errors for the programming languages C and

C++. The other one covers security errors for the

language Java. Code examples with security vulner-

abilities are given in simple form as well as embed-

ded in variations of different control flow- and data-

flow patterns. The suite contains around 57,000 test

cases in C/C++ and around 24,000 test cases in Java

(Boland, Black 2012). A test suite can only cover a

subset of possible errors. The Juliet Test Suite co-

vers the top 25 security errors defined by

SANS/MITRE (MITRE 2011). MITRE is a non-

profit organization operating research and develop

244

Wagner A. and Sametinger J..

Using the Juliet Test Suite to Compare Static Security Scanners.

DOI: 10.5220/0005032902440252

In Proceedings of the 11th International Conference on Security and Cryptography (SECRYPT-2014), pages 244-252

ISBN: 978-989-758-045-1

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

Table 1: Top 10 Security Errors (MITRE 2011).

No Score ID Name

1 93.8

CWE-

Improper Neutralization of

Elements used in an SQL

Command (SQL Injection)

2 83.3

CWE-

Improper Neutralization of

Special Elements used in an

OS Command

(OS Command Injection)

3 79.0

CWE-

120

Buffer Copy without

Checking Size of Input

(Classic Buffer Overflow)

4 77.7

CWE-

Improper Neutralization of

Input During Web Page

Generation

(Cross-site Scripting)

5 76.9

CWE-

306

Missing Authentication for

Critical Function

6 76.8

CWE-

862

Missing Authorization

7 75.0

CWE-

798

Use of Hardcoded Creden-

tials

8 75.0

CWE-

311

Missing Encryption of Sen-

sitive Data

9 74.0

CWE-

434

Unrestricted Upload of File

with Dangerous Type

10 73.8

CWE-

807

Reliance on Untrusted In-

puts in a Security Decision

ment centers funded by the US government. The

SANS Institute is a cooperative research and educa-

tion organization and is a trusted source for comput-

er security training, certification and research

(http://www.sans.org/). CWE is a community-

developed dictionary for software weakness types

(http://cwe.mitre.org/). These types have been used

for the classification of the security errors in the

Juliet test Suite. Each CWE entry describes a class

of security errors. For example, CWE-89 describes

“Improper Neutralization of Special Elements used

in an SQL Command (SQL Injection)”. This hap-

pens to be the top 1 security error according to

SANS/MITRE. Table 1 shows the first 10 of the top

25 security errors by SANS/MITRE (MITRE 2011).

2.1 Structure of the Test Suite

The Juliet Test Suite contains source code files that

are structured in different folders. Each folder covers

one CWE entry. Therefore, there are several source

code files in every folder that contain a collection of

errors for the specific CWE entry. Every source code

file targets one error. In most cases, this error is

located in a function called “Bad-Function”. But

there are also cases, where the error is contained in

some helper functions called “Bad-Helper”. Addi-

tionally, “Class-based” errors arise from class inher-

itance. Besides bad functions, there are also good

functions and good helper functions. These functions

contain nearly the same logic as the bad functions

but without the security errors. The good functions

can be used to prove the quality of security scanners.

They should find the errors in the bad functions and

its helpers but not in the good functions (National

Institute of Standards and Technology 2012). In

version 1.1, the Juliet Test Suite covers 181 different

kinds of flaws, including authentication and access

control, buffer handling, code quality, control-flow

management, encryption and randomness, error

handling, file handling, information leaks, initializa-

tion and shutdown, injection, and pointer and refer-

ence handling (Boland, Black 2012).

2.2 Natural Code vs. Artificial Code

We can distinguish two types of source code, i.e.,

artificial code and natural code. Natural code is used

in real software like, for example, the Apache Web-

server or Microsoft Word. Artificial code has been

generated for some specific purpose, for example, to

test security scanners. The Juliet Test Suite contains

only artificial code, because such code simplifies the

evaluation and the comparison of security scanners.

In order to determine whether an error reported by a

security scanner is correct, it is necessary to know

where exactly there are any security bugs in the

source code. This is a difficult task for natural code,

because the complete source code would have to be

subject of close scrutiny. For artificial code this is a

much easier task, because the code had been gener-

ated and documented with specific errors in mind

anyway.

Any security scanner may not find specific secu-

rity errors in natural code. A manual code review is

necessary to find such problems, provided the avail-

ability of personnel with sufficient security

knowledge. Otherwise, the existence of these securi-

ty errors in the code may remain unknown. In con-

trast, the number of errors in artificial code is

known. Only if the errors in the test suite are known,

we can check whether scanners find all of these. But

even for artificial code, it is hard to determine the

exact source code line of a specific security error.

Different scanners typically report these errors at

different source code lines. Thus, authors of artificial

code have to pay close attention in order to define

the exact locations of any errors they include in their

code. Control flow and data flow can appear in

UsingtheJulietTestSuitetoCompareStaticSecurityScanners

245

many different combinations. Natural code does not

contain all of these combinations. With artificial

code, security scanners can be tested whether they

find all these combinations (National Institute of

Standards and Technology 2012). Artificial code has

advantages, but it also has its limitation. Artificial

test cases are typically simpler than what can be

found in natural code. In fact, test cases in the Juliet

Test Suite are much simpler than natural code.

Therefore, security scanners may find something in

these test cases but fail at real programs that are

much more complex. In addition, the frequency of

flaws in the test suite may not reflect the frequency

of the bugs of real programs. Again, a security scan-

ner may find many errors in the test suite but not in

natural code. And, even if less likely, it can also be

just the opposite (National Institute of Standards and

Technology 2012).

3 SECURITY SCANNERS

Source code scanners are programs that analyze the

static source code of other programs to identify

flaws. They typically check the source code, but

some may also scan byte code or binaries. Every

scanner has a built-in set of weaknesses to look for.

Most also have some means of adding custom rules

(Black 2009). The rules can target specific weak-

nesses, e.g., security, but may also check for pro-

gramming style.

3.1 Software Security

Security is about protecting information and infor-

mation systems from unauthorized access and use.

The core goals are to retain confidentiality, integrity

and availability of information. Besides IT security,

other security terms that are often used include net-

work security, computer security, web security,

mobile security, and software security. Software

security is “the idea of engineering software so that

it continues to function correctly under malicious

attack” (McGraw 2004).

Prominent software security bugs include buffer

overflows, SQL injections and cross-site scripting.

There are many examples, where these bugs have

occurred and caused damage. While security bugs

are problems at the implementation level, security

flaws are located at the architecture or design level.

Security flaws are much harder to detect and typical-

ly need more detailed expert knowledge. Security

scanners target security bugs. Thus, they can help to

greatly enhance the security of software, but they are

not capable of finding all security problems that may

exist in the software they are scanning.

3.2 Security Problems

Software security problems have to be uniquely

identified and classified. For example, when differ-

ent scanners use their own classification scheme,

comparison is difficult for customers who use more

than one tool from different vendors. The databases

CVE and CWE have been created for that purpose.

CVE stands for Common Vulnerabilities and Expo-

sures (http://cve.mitre.org/). It provides a standard

for security vulnerability names and is a dictionary

of publicly known information security vulnerabili-

ties and exposures. CWE stands for Common

Weakness Enumeration (http://cwe.mitre.org/). It

provides a unified, measurable set of software

weaknesses for effective discussion, description,

selection, and use of software security tools. Both

CWE and CVE are free for public use. They roughly

contain 1,000 and 60,000 entries, respectively.

If a specific security problem gets known, it is

assigned its own CVE number for identification and

description. The vulnerability relates to a specific

weakness, i.e., to a CWE number. CWE entries are

more abstract than CVE entries and identify a class

of problems. For example if a heap overflow is

found in a software product, this problem gets a

CVE number. The CVE number relates to a CWE

entry that describes buffer overflows in general.

Additionally, security vulnerabilities get scores to

classify the severity of the problems. For that pur-

pose, CVSS is used. CVSS stands for The Common

Vulnerability Scoring System. It is a “vendor agnos-

tic, industry open standard designed to convey vul-

nerability severity and help determine urgency and

priority of response” (http://www.first.org/cvss).

3.3 Selected Scanners

As mentioned above, the Juliet Test Suite contains a

Java and a C/C++ part. Thus, it can be used to either

test Java or C/C++ scanners. A list of security scan-

ners is given in http://samate.nist.gov/index.php/

Source_Code_Security_Analyzers.html. We have

chosen scanners that are available for free, and that

can be started via command line. Free scanners have

been chosen because we had done this project only

for demonstration purposes.

Commercial scanners may hopefully provide bet-

ter results than the scanners we have chosen. How-

ever, they can be used by the same approach that we

will present here. Additionally, we have only chosen

SECRYPT2014-InternationalConferenceonSecurityandCryptography

246

scanners that can be started via command line. This

requirement was necessary for the automation pro-

cess. However, with some programming or scripting,

tools without a command line interface will be able

to be integrated, too. As a result, we have used the

following source code scanners for our case study:

 PMD (Java) – version 4.0.3

PMD is an open source static analysis tool.

We have used the GDS PMD Secure Coding

Ruleset, a custom set of security rules intend-

ed to identify security violations that map to

the 2010 OWASP Top 10 application security

risks (https://github.com/GDSSecurity/GDS-

PMD-Security-Rules). More information on

PMD can be found at http://pmd.sourceforge.

net.

 FindBugs (Java) – version 2

FindBugs is also an open source static analysis

tool for the Java programming language. It is

interesting that FindBugs processes the com-

piled class files rather than the source code

files. More information can be found at

http://findbugs.sourceforge.net.

 Jlint (Java) – version 3.0

Jlint is an open source static analysis tool like

PMD and FindBugs. It mainly detects incon-

sistent software routines and synchronizations

problems. Version 3.0 has been released in

2011, indicating that its development may

have been discontinued. For more details see

http://jlint.sourceforge.net.

 Cppcheck (C/C++) – version 1.58

Cppcheck is a static analysis scanner for the

programming languages C and C++. It detects

several different security errors, which are de-

scribed on the tool’s website. See

http://cppcheck.sourceforge.net for more in-

formation.

 Visual Studio (C/C++) – version 2010

The VisualStudio Compiler is shipped with

Microsoft Visual Studio, Microsoft’s devel-

opment environment. The compiler has an op-

tion “/analyze” that lets it analyze the source

code for errors. A list of all detected problems

is given in the documentation of the compiler.

See http://msdn.microsoft.com/vstudio for

more information.

We have used these scanners because they were

freely available. The only exception is Microsoft

Visual Studio Compiler, for which a license had

been available. It is important to note that all these

scanners are not dedicated security scanners. They

detect non-security issues like bad programming

habits or violation of programming guidelines. Secu-

rity is not their primary target. Naturally, the Juliet

Test Suite is best suited to compare dedicated securi-

ty scanners. Our goal was to demonstrate the useful-

ness of the test suite for that purpose, and also to

provide a methodology for an automatic assessment

of such scanners.

4 APPROACH

In order to compare different scanners, we have to

simply let them scan the test suite and then compare

their results. As simple as it sounds, this can be a

tedious and cumbersome process for several reasons.

4.1 Error Types

First of all, different security scanners report differ-

ent error types, making it hard to compare the re-

sults. For example, one tool may report an SQL

injection as an error with type “SQL Injection”.

Another tool may report an error of “type=2”. There-

fore, we have a transformation tool that transforms

the scanner results into a uniform format that can

later be used for comparison. For that purpose, we

have mapped the scanner specific error types to

CWE numbers that already are used in the test suite.

The transformation is split into two steps. First, all

the scanner result files are converted to CVS files.

For that purpose, a tool called Metric Output Trans-

former (MOT) is used. This tool is part of a research

project called Software Project Quality Reporter

(SPQR) (Ploesch et al. 2008). Every line in the CVS

files represents one detected error. Each line con-

tains several columns with the name of the scanner,

an abbreviation of the error set by the transformer,

the path to the file with the error, and the line num-

ber where the error was detected. The line ends with

the scanner’s error message. Listing 1 shows one

such line, i.e., the description of one error. Due to

space limitations, each column entry is presented in

a separate line or two with a comment on the right

side.

In the second step, output in the CVS files is

used in combination with CWE mapping files. CWE

mapping files map scanner-specific results to CWE

numbers. Listing 2 shows a small part of the CWE

mapping file for the scanner Jlint. As can be seen,

CWE mapping files contain tags for the scanner

codes. These codes are assigned to CWE numbers.

Multiple CWE numbers can be assigned to a single

scanner code, be-cause scanner results are some-

times more abstract than its CWE numbers. For

UsingtheJulietTestSuitetoCompareStaticSecurityScanners

247

example, a scanner result with code “CPSR” can

either be of type CWE 570 or CWE 571.

JLint nameofscanner

CCNP abbreviationoferror

CWE113_HTTP_Response_Splitting file

__connect_tcp_addCookieServlet_15.java

148 linenumber

Switchcaseconstant6can'tbe errormessage

producedbyswitchexpression

Listing 1: SPQR output

4.2 Error locations

Another problem for comparison is the fact that

security errors are indeed documented in the test

suite, but their exact location, i.e., the specific source

code line, is not always known. We have therefore

developed a small utility program that identifies the

appropriate lines in the source code files. For some

of the test suite’s source code files, we have been

able to find additional descriptions that specify the

line of the error in an XML file. These additional

files had been used whenever they were available.

See http://samate.nist.gov for more information.

Without these files we were only able to define a

range of lines in the source code in which the error

was located. With the additional files we were able

to determine the exact location where the error oc-

curred. These files have lead to roughly 2,800 (out

of 24,000) exactly located errors in the Java version

of the test suite and about 23,000 (out of 57,000)

errors in the C/C++ version.

4.3 Automated analysis and

comparison

For automated comparison, we have developed a

simple utility program that is used to start the securi-

ty scanners and to compare their results. The process

is structured into four steps:

1. We scan the Juliet Test Suite and locate the

lines of errors as a reference. This needs to be

done only once.

2. We start the security scanners to analyze the

test suite. The scanners report their results in

specific formats.

3. We convert the result files of the scanners in-

to a uniform format (as shown in Listing 1)

for comparison.

4. We compare the results and generate a report

that shows the results in human readable

form.

<mappingsscanner="JLINT">

<scannerCodename="AWSMCD"

desc="*invocationofsynchronizedmethod

*">

<cweid="833"/>

</scannerCode>

<scannerCodename="AWWID"

desc="Methodwait\(\)canbeinvokedwith

monitorofotherobjectlocked">

<cweid="833"/>

</scannerCode>

...

<scannerCodename="CPSR"

desc="Comparisonalwaysproducessamere‐

sult">

<cweid="570"/>

<cweid="571"/>

</scannerCode>

</mappings>

Listing 2: CWE mapping file.

Details including the configuration of the indi-

vidual scanners can be found at https://github.com/

devandi/AnalyzeTool.

4.4 Scanner Results

When scanners analyze the test suite, they report

found errors, cf. step 2. These errors do not always

match the errors that in fact are in the test suite, cf.

step1. We get a false positive when a scanner reports

a problem when in fact there is none. A false nega-

tive occurs when a scanner does not report a prob-

lem that in fact is existent. Scanners may also report

a wrong location for a problem. Thus, we distinguish

the following types of reported errors:

 True positive.

The scanner reports an existing error.

 False positive.

The scanner reports a non-existing error.

 Wrong positive.

The scanner reports a specific error at the lo-

cation of an error with a different type.

 True negative.

The scanner does not report an error where in

fact there is none.

 False negative.

There is an error in the source code, but the

scanner does not find and report it.

In fact, wrong positives are a combination of a

false positive and a false negative. We consider them

separately, because the reported errors may come

close to the real errors. Of course, false positives and

false negatives are the most important categories

SECRYPT2014-InternationalConferenceonSecurityandCryptography

248

when evaluating a scanner. True positive means that

a scanner did report an existing error. Security errors

are typically spread over several lines. The question

is whether a scanner reports the right location. We

distinguish the following cases:

 True positive+: Correct location

The scanner has reported an error at the same

source code line where this error had been

documented, assuming that we know the exact

location of the error.

 True positive: Unspecified location

The scanner has correctly reported a specific

error. The exaction location is unknown in the

source code or has not been specified. We typ-

ically know a line range it the error is con-

tained in a (bad) function.

 True positive–: Incorrect location

The scanner has correctly reported an error,

but given a wrong location, which is close

enough to be counted as a true positive.

In the Juliet test suite, we have errors where we

know either the exact source code line or a range of

source code lines, i.e., we have a list of True posi-

tive+ and True positive. A scanner should report as

many True positives as possible. It is better to have a

report of an error with an incorrect location than no

report of that error at all, i.e., a True positive- re-

ported by a scanner is much better than a True nega-

tive. We can only count a true positive as a true

positive+, when we ourselves know the exact loca-

tion in the test suite.

4.5 Security Model

We have used a security model in which CWE en-

tries were combined to more abstract categories. For

example, a category “Buffer Overflow” represents

different CWE entries that describe types of buffer

overflows. We have used these categories according

to (Center for Assured Software 2011), to generate a

security model as part of a more general software

quality model (Ploesch et al. 2008). This security

model helps to interpret the scanner results. For

example, with the model we can determine a scan-

ner’s weak-ness or strength in one or more areas.

This information can be used to choose a particular

scanner for a software project. Thus, if a scanner is

weak in identifying authentication problems but is

strong in other areas, this scanner can be used for

projects that do not have to deal with authentication.

Alternatively, an additional scanner can be used with

strong authentication results. In general, the security

model helps to analyze found errors in more detail.

Another advantage of having a security model is to

provide a connection between generic descriptions

of software security attributes and specific software

analysis approaches (Wagner et al. 2012). This al-

lows us to automatically detect security differences

between different systems or subsystems as well as

security improvements over time in a software sys-

tem.

5 RESULTS

The Juliet Test Suite consists of a Java and a C/C++

test suite. We will start with Java. Subsequently the

results for C/C++ will be presented. Finally, an

overall discussion of the findings will follow.

5.1 Java

In the Java test suite we had far more true positives

with unspecified location (True Positive) than such

with a conclusive location (True Positive+), which

were determined with the additional ‘SAMATE’

files. Consequently, the scanners can only deliver a

few conclusive locations. Figure 1 contrasts the

percentage of errors with a conclusive location, i.e.,

True Positives+, to errors with an unspecified loca-

tion, i.e., True Positives. Figure 2 shows the distribu-

tion of the test cases by the security model. We can

see that the numbers of test cases are not balanced

for every category. As a result, scanners, which find

many issues in categories with many test cases, are

better in this comparison than other scanners. Ap-

parently, the category “Buffer overflow” has no test

cases at all. This should not come as a surprise as the

Java Runtime Environment prevents buffer over-

flows in Java programs. Figure 3 shows an over-

view of all errors that were detected by the different

scanners. The entry Juliet on top shows the actual

number of documented errors in the test suite. We

can see that for a small percentage the exact location

of the error is known, but for most errors this is not

the case. Apparently, FindBugs has detected the

most errors, followed by Jlint and PMD. However,

PMD has found more exact locations than Jlint and

FindBugs. As Fig. 3 shows, the numbers of True

Positives+, True Positives, True Positives- and

Wrong Positives are higher than the numbers of Jlint

and FindBugs. Thus, it can be said that PMD has

found the most accurate errors regarding to the test

suite. A deeper analysis of the results has shown that

the GDS rule set used within PMD were responsible

for the results of PMD. Without them, PMD would

not have found any errors in the test suite. Neverthe-

less, the overall results were poor. The scanners

UsingtheJulietTestSuitetoCompareStaticSecurityScanners

249

have detected only less than half of the errors of the

test suite. The figures do not explicitly show the

false negatives of the scanners. They can be deter-

mined by comparing a scanner’s true positives to the

test-suite’s true positives.

5.2 C/C++

In the C/C++ version there were more errors with a

conclusive location (True Positives+) than in the

Java version but more than 60 % had an unspecified

location (True Positives). Figure 4 shows the distri-

bution of the True Positives+ and the True Positives

of the C/C++ version. Figure 5 shows the distribu-

tion of the test cases by the security model. As we

can see the number of test cases per category is not

balanced either. The category Buffer Handling has

the most test cases because problems in this area are

common security issues in C/C++ programs. Figure

6 shows an overview of all detected errors by the

tested scanners. The Visual Studio Compiler found

far the most errors. But most of them were not cov-

ered by the test suite and were non-security issues. A

deeper analysis of the results had shown that the

scanner had found many issues where a function was

used to allocate memory in the test cases, which

should not be used. As such functions were used

many times within the test cases, this has lead to the

high number of found errors. Despite that, the Mi-

crosoft Scanner found more errors than Cppcheck.

However, very few errors were found in the test

suite. It should be mentioned that this scanner had

the longest run time. We have not taken exact run-

time measurements, but most of the scanners took

approximately one hour to scan the test suite. The

Visual Studio Compiler took about 11 hours. We

had used a virtual machine on Windows 7, 32-bit

with one CPU on an Intel i5-2500 @ 3.3 GHz with 2

GB assigned main memory.

5.3 Discussion

The results of the analysis may lead to the conclu-

sion that such security scanners should not be used

or that they are of only little value. We have to keep

in mind that these scanners are not dedicated securi-

ty scanners. Also, the example of PMD shows that if

special rule sets for security are used, then the re-

sults improve considerably. Furthermore, these

scanners are not a replacement to security reviews

from security experts. The scanners cost less and are

much quicker than such reviews. Moreover, the

scanners can be integrated in an automated build

process for software development. The Microsoft

Visual Studio Compiler can be used in a way that

the source code is analyzed every time the file gets

compiled during development. Even though the

results are behind expectations, every found and

corrected error reduces the attack surface of the

software under construction. For example, a third-

party source code analyzer could easily have found

Apple's recent 'goto fail' bug in its encryption library

(McCullagh 2014). In the results, we have seen

large number of false positives, i.e., scanners report

errors that are not documented in the test suite. In

fact, the scanners often did not report real false posi-

tives, but rather they reported issues that were not

unusual in artificial code like ‘Local declaration

hides declaration of same name in outer scope’.

Such issues are not documented in the test-suite,

because the focus is on security. Therefore, many of

these false positives are only false positives in rela-

tion to the documented security issues of the test-

suite.

6 RELATED WORK

Rutar et al. have tested several source code scanners

including PMD and FindBugs (Rutar et. al 2004).

For that purpose, they wrote a meta program to

automatically start the scanners. For their study, they

used open source programs containing natural code.

Therefore, they additionally had to perform security

reviews to determine errors in the tested programs.

Due to the effort it takes, they were only able to

review part of the scanned software. The reviewed

parts were then used to extrapolate an overall result.

The findings showed that scanners found different

results that did not overlap.

The Center for Assured Software also performed

a study in which they tested different security source

code scanners (Center for Assured Software 2001).

For their analysis they also used the Juliet Test

Suite. The names of the used security scanners were

not published. For the comparison they also trans-

formed the scanner specific results into a uniform

format. Furthermore, they generated some key fig-

ures, which were used to compare the scanners.

They used “weakness classes” to classify the differ-

ent errors. These weakness classes have been used as

the foundation for the security model in our ap-

proach.

SECRYPT2014-InternationalConferenceonSecurityandCryptography

250

Figure 1: Percentage of Java errors with conclusive loca-

tion.

Figure 2: Java test case distribution.

Figure 3: Java scanner results.

Figure 4: Percentage of C/C++ errors with conclusive

location.

Figure 5: C/C++ test case distribution.

Figure 6: C/C++ scanner results.

UsingtheJulietTestSuitetoCompareStaticSecurityScanners

251

A student project had been performed at CERN,

the European Organization for Nuclear Research

(Hofer 2010). Different scanners for several pro-

gramming languages like Java, Python and C/C++

were analyzed. The goal of the analysis was to make

recommendations for scanner usage and to increase

software security considerations at CERN. But locat-

ing the errors in software was only one of five clas-

ses, which were used for the classification of the

scanners. They also used classes like installation and

configuration to evaluate the scanners. The problem

was that they did not analyze how accurate the scan-

ners found some errors. It was only measured how

much errors where found but not whether found

errors were correct.

7 CONCLUSIONS

We have tested static source code analyzers with the

Juliet test suite that contains security bugs in Java

and C/C++ code. We have used scanners that were

not primarily targeted at security, because dedicated

security scanners require expensive licenses. The

general results of the scanners were disappointing

with the exception of the open source static analysis

tool PMD that we had used with secure coding rules.

We draw the following conclusions from our simple

test. First, static source code analysis is a simple and

efficient means to detect bugs in general and securi-

ty bugs in particular. Second, it is advisable to use

more than one tool and, thus, combine the strengths

of different analyzers, And third, dedicated test

suites like the Juliet Test Suite provide a powerful

means of revealing the security coverage of static

scanners, which in turn enables us to pick the right

set of scanners for improved software coding. We

have also shown that scanner results can be used to

assess the security quality of software systems.

Scanner results can automatically be used to gener-

ate a security model and, for example, indicate secu-

rity improvements or degradation over time.

REFERENCES

Black, P.E., 2009. Static Analyzers in Software Engineer-

ing. CROSSTALK–The Journal of Defense Software

Engineering.

https://buildsecurityin.us-cert.gov/resources/crosstalk-

series/static-analyzers-in-software-engineering

Boland, T., Black, P.E., 2012. Juliet 1.1 C/C++ and Java

Test Suite, Computer, vol. 45, no. 10, pp. 88-90, DOI:

10.1109/MC.2012.345

Center for Assured Software, 2011. CAS Static Analysis

Tool Study – Methodology, December 2011.

http://samate.nist.gov/docs/CAS%202011%20Static%

20Analysis%20Tool%20Study%20Methodology.pdf

Hofer T., 2010. Evaluating Static Source Code Analysis

Tools, Master Thesis, InfoScience 2010.

http://infoscience.epfl.ch/ record/153107

Howard M., Lipner S., 2006. The Security Development

Life-Cycle, Microsoft Press.

McCullagh D., 2014. Klocwork: Our source code analyzer

caught Apple's 'gotofail' bug, c|net February 28.

http://news.cnet.com/8301-1009_3-57619754-

83/klocwork-our-source-code-analyzer-caught-apples-

gotofail-bug/

McGraw G., 2004. Software Security, IEEE Security &

Privacy, vol. 2, no. 2, pp. 80-83.

doi:10.1109/MSECP.2004.1281254

McGraw G., 2009. Software Security: Building Security

In, 5th edition, Addison-Wesley.

MITRE 2011. CWE/SANS Top 25 Most Dangerous Soft-

ware Errors, Version 1.0.3.

http://cwe.mitre.org/top25/

National Institute of Standards and Technology 2012.

SAMATE Reference Dataset.

http://samate.nist.gov/SRD/testsuite.php.

Plösch, R., et al., 2008. Tool Support for a Method to

Evaluate Internal Software Product Quality by Static

Code Analysis, Software Quality Professional Jour-

nal, American Society for Quality, Volume 10, Issue 4.

Rutar, N., Almazan, C.B., Foster, J.S., 2004. A Compari-

son of Bug Finding Tools for Java, IEEE, ISBN 0–

7695–2215–7, 245–256.

Wagner, S., et al., 2012. The Quamoco Product Quality

Modelling and Assessment Approach, Proceedings of

34th International Conference on Software Engineer-

ing (ICSE 2012), Zurich.

SECRYPT2014-InternationalConferenceonSecurityandCryptography

252