A New Suite of Metrics for Object-Oriented Software

Cara Stein

, Letha Etzkorn

, Glenn Cox

, Phillip Farrington

, Sampson Gholston

Dawn Utley

, and Julie Fortune

Computer Science Department

Industrial and Systems Engineering and Engineering Management Department

University of Alabama in Huntsville, Huntsville, AL 35899 USA

Software practitioners need to be able to assess the quality of their software,

and metrics provide an automated way to do that. Traditional software metrics

count aspects of code related to its syntax. In contrast, semantic metrics,

introduced by Etzkorn and Delugach, count things related to the meaning of

software in its domain. Because semantic metrics do not depend on the

structure of the code, they can be calculated from requirements and design

documents before implementation. This paper introduces and validates a suite

of new semantic metrics. These metrics measure aspects of the quality of

object-oriented software and can be computed from source code or from design

specifications.

1 Introduction

Software practitioners need ways to assess the quality of their software. Assessment

may be performed with an eye toward reuse, process improvement, or fault-proneness

of the software. Metrics provide an automated way to assess software quality.

Traditional software metrics count aspects of code related to its syntax. In contrast,

semantic metrics, introduced by Etzkorn and Delugach [1], count things related to the

meaning of the software in its domain. As Etzkorn, Gholston, and Hughes pointed out

[2], syntactic metrics are subject to distortion due to programmer style. One

advantage of semantic metrics is that coding style is less likely to skew the metric

values. For example, using the lines of code metric, a classic syntactic metric, the

C++ code samples in Figs. 1 and 2 result in very different metric values, even though

they do exactly the same thing. Although the lines of code metric is scorned by many,

Curtis and Carlton [3] pointed out that most complexity metrics actually measure size

and perform no better than lines of code.

if (balance < withdrawal) {

bounce = true;

}

else {

bounce = false;

}

Fig. 1. Code sample 1

Stein C., Etzkorn L., Cox G., Farrington P., Gholston S., Utley D. and Fortune J. (2004).

A New Suite of Metrics for Object-Oriented Software.

In Proceedings of the 1st International Workshop on Software Audits and Metrics, pages 49-58

DOI: 10.5220/0002649400490058

 SciTePress

bounce = (balance < withdrawal)? true : false;

Fig. 2. Code sample 2

If we count physical lines of source code, as recommended by [4], the first sample

contains six lines of code, whereas the second contains only one. Using these values,

one might conclude that the first sample accomplished more or was more complex,

when in reality they both accomplish the same task. Moreover, the first code sample

is easier for a non-C++ expert to understand, so it may be easier to maintain.

In contrast, semantic metrics are concerned only with the ideas involved. In this

example, semantic metrics would pick out the concepts of balance, withdrawal, and

bounce in the banking domain, without regard to the way the code was written. Thus

differences in programmer style and even programming language used will not impact

semantic measures of the system’s size or complexity.

Furthermore, because semantic metrics do not depend on the structure of the code,

they can be calculated from requirements and design documents before the code has

been written [2]. This allows quality to be assessed earlier in the development

process, when changes are cheaper to make.

2 Validating Metrics

Kitchenham, Pfleeger, and Fenton [5] proposed a framework for validating

measurements of software. First, they identified necessary concepts for measurement:

• Entities – things we want to know more about

• Attributes – properties we want to measure

• Units – mapping between an attribute and number system (ex. inches)

• Scale types – nominal, ordinal, interval, or ratio

Thus, a metric value should measure an attribute of an entity using some unit.

Units are valid for ratio and interval data and can be adapted to work with ordinal data

[5].

The four aspects of validity for a measure are:

• Attribute validity: the entity being studied has the attribute.

• Unit validity: the unit used is appropriate for the attribute.

• Instrumental validity: the underlying model is valid and the measurement

instrument is calibrated.

• Protocol validity: the measurement is taken in a way that is consistent,

unambiguous, and prevents problems such as double counting [5].

Generally, there are direct measures and indirect measures. For direct measures,

the following theoretical properties should apply:

• There exist at least two different entities that have different values for the attribute.

• The measurement must work in a way that makes sense with respect to human

understanding of the attribute.

• If the attribute is part of a valid measurement, any of its units can be used.

• It is possible for different entities to have the same attribute value [5].

For indirect measures, once the model on which they are based is shown to be

correct, they must also make appropriate use of units and scales, and they must not

have discontinuities such as division by zero [5].

More specifically, Briand, Morasca, and Basili [6] proposed a set of criteria for

different types of software metrics.

Complexity:

• Nonnegativity: the value is never negative

• Null Value: the value is zero if there are no relationships between parts of the

entity being measured

• Symmetry: the direction of relationships does not impact the value

• Module Monotonicity: the complexity of a system is greater than or equal to the

sum of the complexities of any set of disjoint modules

• Disjoint Module Additivity: the complexity of a system of modules with no

elements in common is the sum of the complexities of the modules [6]

Cohesion:

• Nonnegativity: the value is never negative

• Normalization: the value always falls within a specified range, so that systems of

different sizes can be compared

• Null Value: the value is zero if there are no relationships within a module or

system

• Monotonicity: the value never decreases as a result of adding relationships within a

module

• Cohesive Modules: the value never increases as a result of merging unrelated

modules [6]

Coupling:

• Nonnegativity: the value is never negative

• Null Value: the value is zero if there are no relationships between modules in a

system

• Monotonicity: the value never decreases as a result of adding relationships between

modules

• Merging of Disjoint Modules: the value never increases as a result of merging

disjoint modules [6]

Two other aspects of software that can be measured are overlap of classes or

member functions, and key module status. Key module metrics indicate the relative

importance of a class to a system or of a function to a class. We propose the

following properties for valid measures of overlap and key module status.

Overlap:

• Nonnegativity: the value is never negative

• Null Value: the value is zero if there is no common functionality between modules

• Monotonicity: adding functionality to a module never decreases the value

• Merging Modules: the value never increases as a result of merging modules with

no common functionality

Key Module Status:

• Nonnegativity: the value is never negative

• Null Value: the value is zero if the module performs no functionality

• Monotonicity: adding functionality to a module never decreases the value

• Merging Modules: if two modules are merged, the resulting module has a value at

least as large as each of the values for the two modules before they were merged.

3 Proposed Semantic Metrics

Most of the original semantic metrics defined by Etzkorn and Delugach [1] rely on a

knowledge base that has a strong conceptual graph structure. Conceptual graphs are

one format for expressing information and the relationships between concepts.

However, not all knowledge bases are built using conceptual graphs.

New semantic metrics that do not depend upon a particular knowledge base

structure are needed. This work is especially needed in the area of cohesion, because

all of the current semantic cohesion metrics rely heavily on a conceptual graph

structure in the knowledge base. The metrics proposed in this paper can be used with

any knowledge base that can associate ideas with classes and class members.

To define the proposed metrics, we need a few background definitions. Let {C

, …, C

} be the set of classes in a system. Let {M

, M

, …, M

} be the set of the

n member functions of class C

. Let an idea be a concept or keyword in the

knowledge base. For each member function M

, let I

be the set of all ideas

associated with that function. For each class C

, let I

be the set of all ideas associated

with that class. Let ~ be a relation on the set of ideas in the knowledge base, such that

a~b if there exists a conceptual relation from a to b, there exists a conceptual relation

from b to a, there exists an inference relation from a to b, or there exists an inference

relation from b to a.

All of the metrics proposed in this paper are indirect measurements based on the

direct measurement that results from counting concepts and keywords belonging to a

class or member function. That is, given a listing of all of the ideas in a knowledge

base and a list of classes and members with which each idea is associated, the sets {I

, …, I

} and {I

, I

, …, I

} are generated. The cardinality of each of these sets is

the measure of semantic mass of the corresponding class or member function.

To evaluate this measure using the criteria suggested by Kitchenham, Pfleeger, and

Fenton [5], first we must identify what we are measuring. Here, the entities are classes

and member functions, the attribute is semantic mass, the unit is the idea, and the

scale of the values is ratio. The measure meets the four aspects suggested in [5] for

valid measures as follows:

• Attribute validity: because semantic mass is a measure of how many ideas are

associated with a class or member, clearly the entities of class and member have

the attribute of semantic mass.

• Unit validity: idea is an appropriate unit for semantic mass.

• Instrumental validity: the instrument is valid if the knowledge base associates a

class or member with the ideas within the domain that relate to the functionality of

the class or member.

• Protocol validity: the measurement as described above using set theory is

unambiguous, consistent, and prevents double counting.

Based on this measurement, we define new semantic metrics that are calculated for

each class.

3.1 Cohesion Metrics

Cohesion is the degree to which the functionality of a software module all goes

together or works toward a common goal. If a class is cohesive, there will be some

commonality of concepts and keywords among its functions. For purposes of the

Briand, Morasca, and Basili criteria for cohesion metrics [6], we define a relationship

between modules to be an idea common to both modules.

LDM (Logical Disparity of Members). For each pair of member functions in the

class, count all of the ideas that are not shared. That is, count the ideas belonging to

one function in the pair but not the other and divide that by the number of unique

ideas belonging to either function in the pair. Add those up and divide by the number

of pairs of functions. (This is a measure of lack of cohesion.) LDM for class C

computed by equation 1.

{}

ajai

IzIz|z

IyIy|y

∈∨∈

∉∧∈

= ,

or 0 if

∅=∪

ajai

LDM=

)1n(n

−

∑∑

or 0 if n < 2.

(1)

Evaluation: The value of this metric is never negative and always falls in the range

[0,1]. Because this is a measure of lack of cohesion, the usual criterion of having a

value of zero when there are no shared ideas is inappropriate. Instead, LDM has a

value of zero when there are no unshared ideas. LDM also has a value of zero in the

trivial case where no member functions have any associated ideas, but we feel this is

appropriate given the definition of the metric: if no functions have any ideas, there is

no logical disparity among them. Similarly, because LDM is a measure of lack of

cohesion, its value never increases as a result of adding related modules within a

class, and its value never decreases as a result of merging unrelated modules.

PSI (percentage of shared ideas). PSI is the number of ideas shared by at least two

member functions of a class, divided by the number of ideas associated with any

member function in the class. PSI for class C

is computed by equation 2.

PSI=

}Iy:k|y{

}IxIx:j,i|x{

ajai

∈∃

∈∧∈∃

for 1≤ i, j, k ≤ n,

or 0 if no ideas are associated with any function of the class.

(2)

Evaluation: The value is never negative and always falls in the range [0,1]. The value

is zero if there are no common ideas between any member functions in the class. The

value never decreases as a result of adding common ideas between functions, and the

value never increases as a result of merging unrelated functions.

PUI (percentage of universal ideas). PUI is the number of ideas shared by all

member functions of a class, divided by the number of ideas associated with any

member function in the class. PUI for class C

is computed by equation 3.

PUI=

}Iy:k|y{

}Ix,i|x{

∈∃

∈∀

for 1 ≤ i, k ≤ n,

or 0 if no ideas are associated with any function of the class.

(3)

Evaluation: The value is never negative and always falls in the range [0,1]. As with

PSI, PUI also fulfills the properties of null value, monotonicity and cohesive modules.

LORM2b (logical relatedness of methods 2b). LORM2b is a variation on Etzkorn

and Delugach’s LORM2 [1]. LORM2b performs the same calculation as LORM2

except that LORM2b counts keywords as well as concepts. LORM2b for class C

computed by equation 4.

{}

ajai

IyIy|y*2

∈∧∈

or 0 if

∅=∪

ajai

II ;

LORM2b=

)1n(n

1ij

−

∑∑

=+=

or 0 if n < 2.

(4)

Evaluation: The value is never negative and always falls in the range [0,1]. The value

is zero if no ideas are held in common between any functions. The value never

decreases as a result of adding common ideas to functions nor increases as a result of

merging unrelated functions.

3.2 Overlap Metrics

Class overlap metrics measure the degree of overlap between the functionality of two

classes. If two classes have many ideas in common, one can assume that there is

overlap in their functionality.

PCRC (proportion of closely related classes). PCRC is the number of classes with

which this class shares over half of its ideas, divided by the number of classes in the

system. PCRC for class C

is computed by equation 5.

PCRC=

{}

IxIx|x|C

aii

−

⎪

⎭

⎪

⎬

⎫

⎪

⎩

⎪

⎨

⎧

⎟

⎠

⎞

⎜

⎝

⎛

>∈∧∈

for 1 ≤ i ≤ m, i≠a,

or 0 if m=0.

(5)

Evaluation: The value is never negative, and it is zero if there are no common ideas

between the class being evaluated and other classes. If functionality is added, it will

either add one or more classes to the set of C

above or it will do nothing to the set.

Therefore, adding functionality can never decrease the value of PCRC. Also, the

value never increases as a result of merging disjoint functions. However, PCRC does

not meet the merging modules property. For example, if a system consists of classes

, C

, and C

such that I

= {i

, i

}, I

={i

, i

}, and I

={i

, i

PCRC(C

)=0 because no other class shares over half of C

’s ideas. However, if

disjoint classes C

and C

are merged, the resulting class does share over half of C

’s

ideas, so PCRC(C

)=1.

APISOC (average proportion of ideas shared with other classes). APISOC is the

average of the proportion of ideas shared with each other class in the system divided

by the total number of unique ideas in that pair. APISOC for class C

is computed by

equation 6.

{}

IzIz|z

IyIy|y

∈∨∈

∈∧∈

or 0 if

∅=∪

II ;

APISOC=

−

∑

or 0 if m < 2 or |I

|=0.

(6)

Evaluation: The subtraction of one in the numerator may appear to introduce the

possibility of a negative value, but the one being subtracted is the overlap of the class

with itself. If this value is not one, it can only be zero, in which case the whole value

of APISOC is zero. Therefore, the value of APISOC is never negative. Similarly, if a

class has no common functionality with any other classes in the system, its value will

be 1-1=0. APISOC also meets the requirements of monotonicity and merging

modules.

3.3 Complexity Metrics

The more concepts and keywords a class contains, the more complex its role is with

respect to the application domain.

CDC2 (class domain complexity 2). CDC2 is a variation on Etzkorn and Delugach’s

CDC [1]. CDC2 performs the same calculation as CDC except that CDC2 includes

keywords as well as concepts and conceptual relations. CDC2 for class C

computed by equation 7.

CDC2=

{}

()

∑

∈

≠∧∈+

w*ixi~x|Ix1

where w

is a weighting factor for the complexity of an idea on the

following scale:

1.0 = complex, 0.5 = average, and 0.25 = simple.

(7)

Evaluation: The value is never negative, and the direction of relationships does not

impact the value because ~ is symmetric by definition. If the CDC2 calculation is

performed on a system level rather than a class level, its value is at least the sum of

the CDC2 values for disjoint classes. However, the value is not zero unless a class has

no ideas associated with it, so CDC2 does not meet the Null Value criterion. If CDC2

is calculated on a system level rather than a class level, it also fails to have the

property of Disjoint Module Additivity, because disjoint classes C

and C

could have

ideas x and y such that x≠y but x~y. In that case, the relationship would increase the

system-wide CDC2 value to more than the sum of the CDC2 values for the classes.

RCDC2 (relative class domain complexity 2). RCDC2 is a variation on Etzkorn and

Delugach’s RCDC [1]. RCDC2 performs the same calculation as RCDC except that

RCDC2 includes keywords as well as concepts and conceptual relations. RCDC2 for

class C

is computed by equation 8.

RCDC2=CDC2(C

) / max(CDC2(C

)) for 1 ≤ i ≤ m,

or 0 if all CDC2 values in the system are 0.

(8)

Evaluation: The value is never negative, and the direction of relationships does not

impact the value. However, none of the other criteria for complexity measures applies

to RCDC2, because it is a measure of relative complexity, not complexity.

3.4 Key Class Metrics

If a class touches on or contains most of the ideas of the system, then that class must

perform considerable or important functionality. Such a class may be considered a

key class of the system.

KCF (key class factor). KCF is the number of ideas belonging to the class divided by

the number of unique ideas belonging to any class in the system. KCF for class C

computed by equation 9.

KCF=

}Iy:k|y{

∈∃

for 1 ≤ k ≤ m,

or 0 if no class has any ideas associated with it.

(9)

Evaluation: The value is never negative, and the value is zero for a class that performs

no functionality, because such a class would have no ideas associated with it. The

value could never decrease as a result of merging modules, because merging modules

would cause the numerator to increase or stay the same while not impacting the

denominator. KCF also satisfies the Merging Modules property, because the KCF of a

merged class is at least the KCF value of each of the classes before they were merged.

KCI2 (key class indicator 2). KCI2 is a variation on Etzkorn and Delugach’s KCI

[1]. KCI2 performs the same calculation as KCI except that KCI2 includes keywords

as well as concepts and conceptual relations in the calculation. KCI2 for class C

computed by equation 10.

KCI2=1, if RCDC2(C

) ≥ .75,

0 otherwise.

(10)

Evaluation: KCI2 is a nominal measure, so most validation criteria do not apply.

However, it does meet the Nonnegativity and Symmetry properties of key module

measures.

4 Conclusion

Semantic metrics provide a mechanism for assessing the quality of software in the

design or implementation phases. While assessing software in the implementation

phase is important, being able to assess earlier provides added value in the ability to

correct mistakes or potential problems earlier in the software development lifecycle,

when changes are less expensive to make. Whereas most of the previous semantic

metrics required a conceptual graph-based knowledge base, the proposed set of

metrics can be calculated using any knowledge base that associates classes with ideas.

Thus, the proposed set of metrics can be computed using a broader range of

knowledge bases than could be used with previously existing semantic metrics.

6 Acknowledgements

The research in this paper was partially supported by NASA grants NAG5-12725 and

NCC8-200.

References

1. Etzkorn, L., Delugach, H.: Towards a Semantic Metrics Suite for Object-Oriented Design.

Proceedings of the 34th International Conference on Technology of Object-Oriented

Languages and Systems (2000) 71-80.

2. Etzkorn, L., Gholston, S., Hughes, W.: A Semantic Entropy Metric. Journal of Software

Maintenance and Evolution, Vol. 14, No. 4 (July/August 2002) 293-310.

3. Curtis, B., Carleton, A.: Seven ± Two Software Measurement Conundrums. Proceedings of

the 2

International Metrics Symposium (1994) 96-105.

4. Park, R.: Software Size Measurement: A framework for Counting Source Statements.

Technical Report SEI-92-TR-20. Software Engineering Institute, Pittsburgh (1992) 136-137.

5. Kitchenham, B., Pfleeger, S., Fenton, N.: Towards a Framework for Software Measurement

Validation. IEEE Transactions on Software Engineering, Vol. 21, No. 12 (Dec. 1995) 929-

944.

6. Briand, L., Morasca, S., Basili, V.: Property-Based Software Engineering Measurement.

IEEE Transactions on Software Engineering, Vol. 22, No. 1 (Jan. 1996) 68-86.