AUTOMATED THREAT IDENTIFICATION FOR UML

George Yee, Xingli Xie and Shikharesh Majumdar

Dept. of Systems and Computer Engineering, Carleton University, Colonel By Drive, Ottawa, Canada

Keywords: Software threat identification, Software threat modeling, UML, expert systems, Secure software

development.

Abstract: In tandem with the growing important roles of software in modern society is the increasing number of

threats to software. Building software systems that are resistant to these threats is one of the greatest

challenges in information technology. Threat identification methods for secure software development can be

found in the literature. However, none of these methods has involved automatic threat identification based

on analyzing UML models. Such an automated approach should offer benefits in terms of speed and

accuracy when compared to manual methods, and at the same time be widely applicable due to the ubiquity

of UML. This paper addresses this shortcoming by proposing an automated threat identification method

based on parsing UML diagrams.

1 INTRODUCTION

Today, software systems are involved in almost

every aspect of our lives. From electrical power

generation, to telecommunications, to air travel,

software is essential. Unfortunately, software is

threatened by security problems that are getting

worst by the day. Building software systems that are

resistant to the growing number of threats against

them is one of the greatest challenges in information

technology (Yee, 2006).

Threat identification or threat modeling

methodologies have been proposed by researchers

for the development of secure software. Among

them are: Secure System Engineering Methodology

(Salter et al., 1998), threat modeling methodologies

based on Data Flow Diagrams (Howard & Lipner,

2006; Swiderski & Snyder, 2004; Saitta et al., 2005;

Microsoft, n.d.-A), Microsoft’s Threat Analysis and

Modeling (TAM) methodology (Ingalsbe et al.,

2008; Microsoft, n.d.-B), quantification on risk

analysis in threat modeling (PTA Technologies, n.d.;

Howard & LeBlanc, 2003), and expressing threat

scenarios in UML diagrams (Wang et al., 2007;

Object Management Group, n.d.-A).

Automated tools for threat modeling have also

been developed for the threat modeling

methodologies mentioned above. For example, a

tool for threat modeling by Microsoft automates the

threat modeling methodology by Swiderski &

Snyder (2004). This tool provides a user interface to

collect background information required for threat

modeling and generates the threat model by

modeling the software in data flow diagrams.

Another automated tool supports the TAM approach,

developed by the Microsoft Application and

Consulting Engineering team (Ingalsbe et al., 2008;

Microsoft, n.d.-B). This tool defines application

architecture in a set of components, service roles and

calls.

Current threat identification methodologies,

such as those mentioned above, exhibit two gaps.

One gap is that none of the methodologies has

proposed a threat identification process based on

analyzing UML models. The other gap is that there

is no automated threat identification method based

on parsing UML diagrams. Existing approaches of

software threat modeling rely on the developers to

draw Data Flow Diagrams, attack graphs or other

forms to express the architectural and data flow

information of the system. The use of UML for

analyzing threats and risks to a system is preferred

since i) UML is a widely used modeling language in

software engineering, and ii) software developers

who are modeling the system in UML may not be

familiar with attack graphs, or other forms that are

used in the security domain.

This paper describes preliminary research that

aims to fill the two gaps stated above. It looks at

deriving threats based on analyzing existing UML

diagrams, and shows how to automatically generate

threats to the software by using an expert system in

521

Yee G., Xie X. and Majumdar S. (2010).

AUTOMATED THREAT IDENTIFICATION FOR UML.

In Proceedings of the International Conference on Security and Cryptography, pages 521-527

DOI: 10.5220/0002996005210527

 SciTePress

conjunction with threat information from the UML.

Thus, this work aims to combine the benefits of

automation (speed and accuracy) with the ubiquity

of UML.

The objectives of this paper are to a) propose an

automated threat identification method based on

analyzing existing UML diagrams, and b) apply the

method to the UML model of an example web

service. This paper is organized as follows. Section

2 presents our approach for automated threat

identification. Section 3 implements the approach on

example UML diagrams. Section 4 discusses some

issues, and Section 5 presents conclusions and plans

for future research.

2 PROPOSED APPROACH

The proposed threat identification approach consists

of two phases, namely i) gathering relevant system

information, and ii) processing the UML diagrams to

identify threats. These phases are described as

follows.

Phase 1: Gather Relevant System Information

Identifying threats to a software system requires

certain information regarding the system’s design

and deployment. This information can be

categorized as a) assets that should be protected, b)

software dependencies, and c) security assumptions.

Assets are resources that the system must

protect from incorrect or unauthorized use

(Swiderski & Snyder, 2004). For example, the

common assets we are familiar with are business

equipment in an office that should not be stolen by

thieves, and sensitive data for a business that should

not be disclosed to its competitors. Physical assets

are easier to identify than abstract assets, such as the

company’s reputation. Different assets usually

require different forms of protection. For example,

money should not be lost or stolen, price data should

not be modifiable by an adversary, and a web service

should be available at all hours.

Software usually has dependencies. It runs on

operating systems and hardware. It might use

databases, a web server, or a framework (e.g. .NET

framework). Such dependencies are important for

determining the existence of threats.

Security assumptions specify the features of the

system or its environment that must to be true for the

system to remain secure. Defining the security

assumptions is important for the proper

identification of threats. For example, suppose that

the system relies on the underlying operating system

to protect encryption keys. A security assumption,

then, is that the operating system will protect the

keys (Howard & Lipner, 2006). For Microsoft

Windows XP, this assumption would be true if you

store the keys using the data protection API

(DPAPI). However, in the case of Linux (as of the

2.6 kernel), this assumption is incorrect, and leads to

new threats that put the keys at risk.

Phase 2: Process the UML and Identify the

Threats Using an Expert System

This phase is accomplished in 2 steps. In step 1, the

information needed to identify threats is extracted

from existing UML deployment and sequence

diagrams (available from normal UML-based

software development) in the form of Prolog facts.

In step 2, the threats are identified using an expert

system in conjunction with the facts from step 1.

Step 1 – Threat Information Extraction: The

extraction technique used here has been applied in

the literature for UML model checking and UML

quality assessment (Pap et al., 2001; Chimiak-Opoka

et al., 2008). Most UML modeling tools allow

machine processing of UML by expressing the UML

model in XML Metadata Interchange (XMI) form

(Object Management Group, n.d.-B). In our

approach we first export the UML model in XMI

format, in order to enable automatic machine

processing. The XMI file is then imported into a

logic programming language, e.g. SWI-Prolog with

its provided package, the SGML/XML parser (SWI-

Prolog, n.d.). The imported UML model (in XMI

form) is processed and the information needed, such

as the nodes, the instances, the interactive messages,

all their relations, and so on, is extracted in terms of

Prolog facts for the threat identification in step 2.

Step 2 - Threat Identification: The fact set

obtained from step 1, together with the relevant

system information gathered in Phase 1, form the

working data of our expert system for threat

identification. The associated knowledge base

contains a set of threat identification rules. The

inference engine is backward chaining (i.e. works

backwards from goals), as provided by SWI-Prolog.

This expert system analyzes the fact set and relevant

system information, and generates threats to the

software system based on the threat identification

rules in the knowledge base. Figure 1 illustrates

Phase 2.

A knowledge base for an expert system is a

declarative representation of the expertise, usually in

the form of rules. These rules are often written in the

“IF THEN” format. A threat scenario explains how

the system can be compromised. The knowledge

SECRYPT 2010 - International Conference on Security and Cryptography

522

base is constructed by defining threat scenarios and

formulating them into rules.

Figure 1: Process flow of Phase 2.

3 PROTOTYPE

IMPLEMENTATION AND

DEMONSTRATION

We begin this Section with a prototype

implementation of our approach, using Prolog to

construct the expert system. The UML model was

obtained using the Magic Draw UML Modeling

Tool, version 16.0 (No Magic, n.d.) (note: in

practice, the UML model would already exist,

created as part of normal development); the expert

system was constructed using SWI-Prolog, version

5.6.63 by Jan Wielemaker (SWI-Prolog, n.d.). The

implementation consists of a user interface that

allows the user to interact with the expert system, a

procedure to process the UML model and extract the

Prolog fact set, a set of threat modeling rules

(knowledge base), and an analysis process that runs

the rules on the system information and outputs the

threats to the system. We used the SWI-Prolog built-

in backward chaining inference engine as our goal-

driven reasoning inference engine.

GUI User Interface: SWI-Prolog offers a user

interface package called XPCE (SWI-Prolog, n.d.).

This toolkit is object-oriented and offers different

user interface classes that can be easily instantiated

and organized into a recognizable and usable GUI.

Extracting Prolog Facts from UML: This

procedure will take a XMI file exported from the

UML model tool as input and extract the Prolog

facts. The processing flow and some sample code

are shown in Figure 2. In this Figure, the displayed

code processes the UML deployment and sequence

diagrams.

Rules for the Expert System: We investigated four

threat scenarios and formulated rules from them.

Figure 2: Processing flow for extracting Prolog facts from

the XMI form of the UML model.

The scenarios are: i) a Trojan horse threat scenario,

ii) a SQL threat scenario, iii) a Man-In-The-Middle

(MITM) threat scenario, and iv) a Denial of Service

(DoS) threat scenario. We could in principle have

more threat scenarios but four were deemed

sufficient to demonstrate our approach.

For example, consider the Trojan horse threat

scenario. Suppose in our UML model a message m

is sent to object obj. Message m contains sensitive

data that is not allowed to be modified-by or

disclosed to an adversary. Suppose there exists a

Trojan horse in our system. Then a threat exists,

namely that the sensitive data may be modified by or

disclosed to an adversary. The rule for this threat

scenario can be written in IF THEN format as

follows:

object obj and

message m which has destination obj and

m contains sensitive data which should not be

modified by or disclosed to an adversary

and

there is a possibility that a Trojan horse is

installed in the system

THEN

threat exists for the sensitive data in m to be

modified by or disclosed to the

adversary

AUTOMATED THREAT IDENTIFICATION FOR UML

523

Threat Identification and Output of the Results

After all the facts are extracted and processed from

the XMI and saved, the inference engine performs

the threat identification by backward chaining

reasoning, based on the rules, the facts we have from

the XMI (UML model), and the relevant system

information. The threat identification is executed

with the following query:

findall([Location, Asset, Required_Protection,

Threat, Memo], threat(Location, Asset,

Required_Protection, Threat, Memo), A).

This query will identify all the threats along with

their locations, as determined by the rule set, and

produce a threat table in Microsoft Excel format

(using the SWI2EXCEL module (SWI-Prolog,

n.d.)). The threat table is further described below.

Next, we demonstrate our prototype by applying

it to the pre-existing UML model of a web store

service (Figures 3 and 4) from Yee (2007). The

service is hosted on a server and makes use of two

other web services, an accounting service and an

online payment service. The sequence diagram

(shown as 3 component parts in Figure 4) depicts a

successful order placement.

Figure 3: Deployment diagram for the web store service.

Phase 1: Gather Relevant Information on the

Web Service

By examining the system architecture with the

development team, we collect the following

information for the purpose of threat identification:

1. We are to identify threats for a successful

order placement.

2. The assets associated with a successful order

placement are credit card number and total

Figure 4(a): Beginning component sequence diagram for

the web store service (successful order).

Figure 4(b): Middle component sequence diagram for the

web store service (successful order).

Figure 4(c): Last component sequence diagram for the

web store service (successful order).

payment. The credit card number should not

be disclosed to adversaries and the total

payment should not be modifiable by

adversaries.

3. Trojan horses may be present in the service

platform.

4. The order data is stored in the order database

and in the accounting database. The

SECRYPT 2010 - International Conference on Security and Cryptography

524

receivable is stored in the accounting

database.

5. We assume that the communication paths and

the databases are not protected.

6. The order data is composed of customer

name, credit card number, and total payment.

The receivable consists of order number and

total payment. The credit card information

includes the customer name and credit card

number.

We next code this information in the file

relevant_information.pl, which will be loaded when

running the expert system.

Phase 2: Process the UML and Identify Threats

Using an Expert System

Steps 1 and 2 of Phase 2 (see Section 2) are

executed, identifying the threats shown in Table 1.

Table 1: Threat table, showing the threat identification

results from the demonstration.



Location Asse t

Required

Protection

Threat Mem o

627_1500 credit_card_number no_disclosure trojan_horse_attack

627_1500 total_payment no_modification trojan_horse_attack

664_1510 credit_card_number no_disclosure trojan_horse_attack

664_1510 total_payment no_modification trojan_horse_attack

799_1521 credit_card_number no_disclosure trojan_horse_attack

799_1521 total_payment no_modification trojan_horse_attack

6645_307 credit_card_number no_disclosure trojan_horse_attack

4234_325 credit_card_number no_disclosure trojan_horse_attack

4234_325 total_payment no_modification trojan_horse_attack

4558_343 total_payment no_modification trojan_horse_attack

8160_334 credit_card_number no_disclosure trojan_horse_attack

8160_334 total_payment no_modification trojan_horse_attack

9453_352 total_payment no_modification trojan_horse_attack

664_1510 credit_card_number no_disclosure sql_attack

order data stor ed in order database

664_1510 total_payment no_modification sql_attack

order data stor ed in order database

8160_334 credit_card_number no_disclosure sql_attack

order data stored in accounting database

9453_352 credit_card_number no_disclosure sql_attack

order data stored in accounting database

8160_334 total_payment no_modification sql_attack

order data stored in accounting database

9453_352 total_payment no_modification sql_attack

order data stored in accounting database

8160_334 total_payment no_modification sql_attack

receivable stored in accounting database

9453_352 total_payment no_modification sql_attack

receivable stored in accounting database

116_1087 credit_card_number no_disclosure MITM

Throug h message 4234_325

116_1087 total_payment no_modification MITM

Throug h message 4234_325

116_1087 total_payment no_modification MITM

Throug h message 4558_343

258_1013 credit_card_number no_disclosure MITM

Throug h message 664_1510

258_1013 total_payment no_modification MITM

Throug h message 664_1510

533_1053 credit_card_number no_disclosure MITM

Throug h message 8160_334

533_1053 total_payment no_modification MITM

Throug h message 8160_334

533_1053 total_payment no_modification MITM

Throug h message 9453_352

1657_990 availability DoS

763_1070 availability DoS

289_1104 availability DoS

Our prototype and demonstration give rise to the

following observations:

y UML model diagrams can be exported in XMI

format using the MagicDraw 16.0 UML modeling

tool and loaded into SWI-Prolog.

y XMI files exported by different UML modeling

tools are slightly different, which means that it

may be necessary to write different parsing code

for parsing XMI from different tools.

y Our automated approach appears to parse UML

fairly efficiently, but we did not do any

quantitative studies to confirm efficiency.

4 SOME PRACTICAL ISSUES

UML has been regarded as an informal or semi-

formal modeling language (Glinz, 2000). In

industrial settings, UML is widely used mainly

because it facilitates communication between

humans through visual means. When UML is used

for machine processing in automated processes (as

in this approach) some issues need to be considered,

as follows.

Missing Information

Certain details used in threat identification may not

be captured by UML, and thereby impact the results

of our approach. These include, for example, how a

system is protected for physical safety, what other

applications are running and the risks they pose, who

can access the system and how the system is

accessed. UML also lacks the ability to model

certain external entities and users that may be

critical to threat analysis, e.g. the role of an Internet

service provider. Some information may have been

omitted from the UML model of the system, either

because it was “too obvious” to be included in the

model or because it was considered only relevant to

security and not part of the UML model. One way to

solve this problem is to collect more detailed

relevant system information for the missing or

omitted information. Also the system model should

be more detailed in order to include enough

information for the automated threat identification.

Vague, Inconsistent, or Informal Information

The visualization capability and the informality of

UML provide more flexibility when modeling

software, but at the same time they cause problems

in automatic model processing. This is why UML is

often criticized for its vague semantics,

inconsistency and ambiguity. For example, in the

demonstration web store service, the message “order

data” can also be expressed as “order information”

and both terms sound the same to a human.

However, it is difficult for a machine to know that

they should be considered the same when it

processes the model automatically.

A realistic knowledge base can be developed by

a group of experts, as part of commercializing our

approach. However, building a knowledge base for

threat identification is still a huge task and the

following issues need to be considered.

AUTOMATED THREAT IDENTIFICATION FOR UML

525

Always Changing

The threat landscape is always changing, with new

vulnerabilities coming into play and existing

vulnerabilities subject to new kinds of threats. Thus,

it is difficult to build a complete set of rules for the

knowledge base. But one benefit is obvious - the

knowledge base can contain the threat identification

expertise of many experts, which can be

advantageous for development teams that lack this

expertise.

Need to Understand the Fact Set

The knowledge base relies heavily on understanding

the system model. The problems of vagueness or

missing information when modeling the system in

UML (as discussed above) may be solved either by

a) putting more detail in the UML to facilitate

construction of the knowledge base for threat

identification, or b) building a larger knowledge base

containing additional rules sufficient to understand

the problems caused by UML. In the latter case, the

knowledge base will not only contain the threat

identification rules, but also provide for reasoning

capability to cope with the deficiencies of UML

models.

5 CONCLUSIONS AND

FUTURE WORK

This work potentially fills the gap of a lack of threat

identification methodology based on analyzing UML

models, and the gap of a lack of automated

approaches for threat identification based on UML.

The limitations of this work include the issues

discussed in Section 4. Due to these issues, the

approach is probably best applied in conjunction

with other techniques such as manual code

inspection and designer testing, so that the different

techniques can support one another in terms of the

threats found, providing for more robust results.

Plans for future research include: a) addressing

the issues mentioned above, b) trialling the approach

with software developers, including using it in

conjunction with code inspection and designer

testing, c) investigating other UML diagrams and

elements for use in threat identification, and d)

performing a scalability analysis.

REFERENCES

Chimiak-Opoka, J., Felderer, M., Lenz, C., & Lange, C.

(2008). Querying UML Models using OCL and

Prolog: A Performance Study. 2008 IEEE

International Conference on Software Testing

Verification and Validation Workshop (ICSTW’08),

Lillehammer Norway, pp. 81-88, April.

Glinz, M. (2000). Problems and Deficiencies of UML as a

Requirements Specification Language. In Proceedings

of the 10th International Workshop on Software

Specification and Design (IWSSD-00), San Diego,

USA, pp. 11-22, November.

Howard, M. & LeBlanc, D. (2003). Writing Secure Code.

Microsoft Press, 2

edition.

Howard, M. & Lipner, S. (2006). The Security

Development Lifecycle: SDL: A Process for

Developing Demonstrably More Secure Software.

Microsoft Press.

Ingalsbe, J.A., Kunimatsu, L., Baeten, T., & Mead, N.R.

(2008). Threat Modeling: Diving into the Deep End.

IEEE Computer Software, Volume 25, Issue 1, pp. 28-

34, January-February.

Microsoft (n.d.-A). Microsoft’s Threat Modeling Tool.

Available as of July 31, 2009 at:

http://www.microsoft.com/downloads/details.aspx?Fa

milyID=62830f95-0e61-4f87-88a6-e7c663444ac1&

displaylang=en.

Microsoft (n.d.-B). Microsoft Threat Analysis and

Modeling v2.1.2. Available as of July 31, 2009 at:

http://www.microsoft.com/downloads/details.aspx?Fa

milyId=59888078-9DAF-4E96-B7D1-

944703479451&displaylang=en

No Magic (n.d.). MagicDraw UML 16.0. Available as of

July 31, 2009 at: http://www.nomagic.com/

Object Management Group (n.d.-A). UML. Available as

of July 31, 2009 at: http://www.omg.org/

Object Management Group (n.d.-B). XMI. Available as of

July 31, 2009 at:

http://www.omg.org/technology/xml/index.htm.

Pap, Z., Majzik, I., & Pataricza, A. (2001). Checking

General Safety Criteria on UML Statecharts. In

Lecture Notes in Computer Science, Vol. 2187, pp. 46-

55, Springer-Verlag.

PTA Technologies (n.d.). Practical Threat Analysis.

Available as of July 31, 2009 at:

http://www.ptatechnologies.com/

Saitta, P., Larcom, B., & Eddington, M. (2005). Trike v.1

Methodology Document [Draft], July 13. Available as

of July 31, 2009 at:

http://www.octotrike.org/papers/Trike_v1_Methodolo

gy_Document-draft.pdf.

Salter, C., Saydjari, O.S., Schneier, B., Wallner, J. (1998).

Toward a Secure System Engineering Methodology.

In Proceedings of New Security Paradigms Workshop,

Charlottsville, VA, USA, pp. 2-10, September.

Swiderski, F. & Snyder, W. (2004). Threat modeling.

Microsoft Press.

SWI-Prolog (n.d.). SWI-Prolog. Available as of July 31,

2009 at: http://www.swi-prolog.org/

Wang, L., Wong, E., & Xu, D. (2007). A Threat Model

Driven Approach for Security Testing. In Proceedings

of the third IEEE Computer Society International

SECRYPT 2010 - International Conference on Security and Cryptography

526

Workshop on Software Engineering for Secure

Systems (SESS), Minneapolis, MN, USA, pp. 10-16,

May.

Yee, G. (2006). Recent research in secure software. NRC

Institute for Information Technology, National

Research Council Canada, NRCC# 48478, NPArC#

8914119, March. Available as of July 29, 2009 at:

http://nparc.cisti-icist.nrc-

cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=89

14119&lang=en

Yee, G. (2007). Visual Analysis of Privacy Risks in Web

Services. In Proceedings of the IEEE International

Conference on Web Service 2007 (ICWS 2007), Salt

Lake City, UT, USA, pp. 671-678, July.

AUTOMATED THREAT IDENTIFICATION FOR UML

527