2 LSA FOR SUMMARIZATION
Using the LSA for text summarization is performed
through four main steps, namely; (1) constructing the
word-by-sentence matrix, (2) applying the Singular
Value Decomposition (SVD), (3) dimension Reduc-
tion (DR), and finally (4) sentence selection for sum-
marization. In the following, we will shed more light
on each of these steps.
1. Constructing the Word-by-sentence Matrix. If
there are m terms and n sentences in the docu-
ment, then the word-by-sentence matrix A will be
m × n matrix. The element a
i j
in this matrix is de-
fined as: a
i j
= L
i j
× G
i j
, where a
i j
is the element
of word w
i
in sentence s
j
, L
i j
is the local weight
for term w
i
in sentence s
j
, and G
i j
is the global
weight for term w
i
in the whole document. The
weighting scheme suggested by the LSA+TRM
algorithm, thereon referred to as Original Weight-
ing Scheme (OWT), is given as follows
L
i j
= log(1 +
t
i j
n
j
), (1)
G
i j
= 1 − E
i
(2)
E
i
= −
1
logN
N
∑
j=1
t
i j
× logt
i j
, (3)
where t
i j
is the frequency of word w
i
in sentence
s
j
, n
j
is the number of words in sentence s
j
, E
i
is
the normalized entropy of word w
i
, and N is the
total number of sentences in the document. Stein-
berger et al. (Steinberger et al., 2007) studied the
influence of different weighting schemes on the
summarization performance. They observed that
the best performing local weight was the binary
weight and the best performing global weight was
the entropy weight as follows:
L
i j
=
(
1 if w
i
occurs in sentence s
j
0 Otherwise
(4)
G
i j
= 1 −
∑
j
P
i j
× logP
i j
logN
, (5)
where P
i j
= t
i j
/g
i
, g
i
is the total number of times that
term w
i
occurs in the whole document D. We refer to
this technique as the Modified Weighting Technique
(MWT). A comparative study between the two dif-
ferent weighting schemes is conducted to study the
effect on the summarization performance. The result
shows that the MWT gives its best performance with
the high dimension reduction as well as the low di-
mension reduction. We will focus on using the MWT,
since we implemented the LSA to get its benefits in
the dimension reduction phase.
1. Applying the SVD. The SVD of an m × n
matrix A is defined as: A = U ΣV
T
, where
U is m × n column-orthonormal matrix, Σ =
diag(σ
1
,σ
2
,...,σ
n
) is n × n diagonal matrix,
whose diagonal elements are non-negative singu-
lar values sorted in descending order and V is
n × n orthonormal matrix. The SVD decomposi-
tion can capture interrelationships among terms,
therefore that terms and sentences can be clus-
tered on a “semantic“ basis rather than on the ba-
sis of words only.
2. Dimension Reduction. After applying the SVD
decomposition, the dimensionality of the matri-
ces is reduced to the r most important dimensions.
The initial Dimension Reduction ratio (DRr) is
applied to the singular value matrix Σ. The DRr
reflects the number of LSA dimensions, or topics,
to be included. If few dimensions are selected,
the summary might lose important topics. How-
ever, selecting too many dimensions implies in-
cluding less important topics which act as a noise
and affect negatively the performance. Dimension
reduction can be useful, not only for reasons of
computational efficiency, but also because it can
improve the performance of the system. The DRr
in most cases is selected manually, which we refer
to as Manual Dimensionality Reduction (MDR).
3. Sentence Selection for Summarization. In this
step the summary is generated by selecting a
set of sentences from the original document.
The LSA+TRM uses the text relationships map
(TRM) for sentence selection through reconstruc-
tion the semantic matrix A
’
using the new reduced
dimensions. The similarity between each pairs of
sentences is calculated using the cosine similarity
between the semantic sentences representation. If
the similarity exceeds a certain threshold, Sim
th
,
this pair of sentences are semantically related and
are linked. The significance of a sentence is mea-
sured by counting the number of valid links for
each sentence. Yeh et al. (Yeh et al., 2005) use
Sim
th
= 1.5 × N to decide whether a link should
be considered as a valid semantic link. A global
bushy path is then established by arranging the k
bushiest sentences in the order that they appear
in the original document. Finally, a designated
number of sentences are selected from the global
bushy path to generate a summary.
Two main drawbacks of the LSA+TRM are:
1. Determining the DRr manually is data dependent,
and is conducted based on an experimental evalu-
ation of ratios from 0.1 × N to 0.9 × N. However,
datasets usually has different documents with dif-
AUTOLSA: AUTOMATIC DIMENSION REDUCTION OF LSA FOR SINGLE-DOCUMENT SUMMARIZATION
445