UCs are classified into the high, medium or low match
categories to generate classified traceability matrices.
Table 5 A shows the matrices computed using the
measure of cosine similarity and the LSI technique.
For cosine similarity method, it is observed that for
UC
1
the high and medium matches agree with ex-
perts’ trace. The relationships of UC
2
, UC
3
and UC
4
with CS
7
fall in the medium, low and medium match
categories respectively. But the experts’ trace suggest
that these are the strongest relations for these UCs.
Relations for UC
5
can be ignored as it does not have
significantly matching CSs as per experts’ trace.The
matches indicated by the traceability matrix for LSI
approach are valid for UC
1
, UC
2
and UC
3
. Relations
for UC
4
and UC
5
can be ignored owing to the rea-
sons cited earlier. The classified traceability matrices
generated using LDA and CTM are shown in Table 5
B. The trace measure indicated by the LDA approach
validates the assertions by experts’ trace for UC
1
only.
The fallacy of the other links retrieved by LDA is due
to inappropriate representation in bag of words.As for
the CTM approach, UC
1
, UC
2
and UC
3
are observed
to have approvable matches.
Table 5: Classified traceability matrices for CosSim, LSI,
LDA and CTM approaches.
UC CS
A CosSim LSI
High Medium Low High Medium Low
1 4,10 14,17 7,21,13,9,12 4,10 12,5 11,13,14,9,17
2 7 13,11,9,16,2 5,7,13,9,19
3 7,11,13,17,16 7,13,5,19,9
4 7 9,13,11,20,16 13,12,9,11,5
5 14 4,10,17 21,12,9,7,13 4,10,14 17 21,12,11,5,13
B LDA CTM
High Medium Low High Medium Low
1 10 4,14,17,21 19,12,15,11,7 10,4 14,17,21 19,12,15,5,11
2 15,3,2,12,11 15,5,7,3,12
3 10,19,2,3,7 7,5,15,11,3
4 19,2,11,4,10 15,19,10,4,11
5 10,4,14,17 21,19,12,15,11 10,4,14,17 21,12,15,19,5
4.3 Analysis of Experimental Results
The four different approaches were experimented on
two case studies. The precision and recall shown in
Table 6 were calculated without applying a threshold
on RI values.
Table 6: The precision and recall table.
Precision% Recall%
Approach Case I Case II Case I Case II
CosSim 73 72 94 90
LSI 55 48 71 63
LDA 50 40 65 50
CTM 50 40 65 50
For a given pair of documents, the RI computation
using cosine measure involves all unique words in
both the documents. But it provides a relatively small
amount of reduction in description length and reveals
little in the way of inter or intra document statistical
structure. LSI does “noise reduction”, precluding the
term combinations which are less frequently occur-
ring in the given document collection, from the LSI
subspace used to calculate RI. The approaches that
use LDA and CTM for computing the RI are con-
fined to the bag of words that they generate after se-
mantic analysis. The recalls offered by the later two
methods are poorer than the cosine similarity and LSI
based approaches because of the inability of dominant
words from certain documents to figure in the bag of
words. This could be the reason why the strength of
traceability links are different when the different ap-
proaches are used.
In general, CTM approach scores over LDA ap-
proach in the fact that the words collected under one
bag in CTM, is not confined to a particular document.
So the inter document relationship is delivered by the
bag of words as well. The LDA approach was found
to extract words in a more document specific manner
and hence the words with low frequency but of high
importance in some documents didn’t figure in the
bag. However the two approaches yielded the same
precision and recall in the experiments conducted.
The best traceability scheme as suggested by the
result of experiments is thus ‘document correlation
using cosine similarity considering connected words’.
However a great deal of its accuracy is attributed to
the emphasis on connected words. When the same
procedure was carried out ignoring such words, the
recall was very poor and the result very erratic.
4.4 Advantages and Disadvantages
One of the most notable advantages is that they par-
tially automate the task of concept identification and
relationship extraction reducing the burden on the
user for building and maintaining trace information.
Further all of these approaches can be adopted during
any phase of the project’s lifecycle. Also, as the rela-
tionships are quantified, we can differentiate between
strongly related and weakly related documents. This
aids impact analysis and helps find the minimal set of
design specifications that cover a given set of require-
ments. Moreover, the emphasis given to connected
words by all four approaches is extremely advanta-
geous for use in many technical domains of projects.
Also, the techniques are programming language and
paradigm independent, thus offering more flexibility
and automation capability. Further, the relationship
extraction using LDA and CTM bag of words greatly
reduce the description length of a document and also
ICEIS 2008 - International Conference on Enterprise Information Systems
68