purpose of these methods is to reduce the matrix di-
mensionality, the resulted model will be more effi-
cient.
6 CONCLUSIONS AND FUTURE
WORK
In this paper we present a linear programming model
for the problem of extractive summarization. We rep-
resent the document as a sentence-term matrix whose
entries contain term count values and view this matrix
as a set of intersecting hyperplanes. Every possible
summary of a document is represented as an intersec-
tion of two or more hyperlanes, and one additional
constraint is used to limit the number of terms used in
a summary. We consider the summary to be the best
if term frequency is preserved during summarization,
and in this case the summarization problem translates
into a problem of finding a point on a convexpolytope
(defined by linear inequalities) which is the closest to
the hyperplane describing overall term frequencies in
the document.
Linear programming problem can be solved in
polynomial time (see (Karmarkar, 1984), (Khachiyan,
1996)). Numerous packages and applications are
available, such as (Berkelaar, 1999), (Makhorin,
2000) etc. In future research, we plan to implement
and test our approach, as in unsupervised as in super-
vised learning. Also, we’d like to extend our model to
query-based summarization by adapting the distance
function and apply our text representation model to
such text mining tasks like text clustering and text cat-
egorization.
ACKNOWLEDGEMENTS
The authors thank Ruvim Lipyansky for ideas that led
to development of their approach.
REFERENCES
Alfonseca, E. and Rodriguez, P. (2003). Generating ex-
tracts with genetic algorithms. In Proceedings of the
2003 European Conference on Information Retrieval
(ECIR’2003), pages 511–519.
Berkelaar, M. (1999). lp-solve free software.
http://lpsolve.sourceforge.net/5.5/.
Filatova, E. (2004). Event-based extractive summarization.
In In Proceedings of ACL Workshop on Summariza-
tion, pages 104–111.
Gillick, D. and Favre, B. (2009). A Scalable Global Model
for Summarization. In Proceedings of the NAACL
HLT Workshop on Integer Linear Programming for
Natural Language Processing, pages 10–18.
Hassel, M. and Sjobergh, J. (2006). Towards holistic sum-
marization: Selecting summaries, not sentences. In
Proceedings of LREC - International Conference on
Language Resources and Evaluation.
Hitoshi Nishikawa, Takaaki Hasegawa, Y. M. and Kikui,
G. (2010). Opinion Summarization with Integer Lin-
ear Programming Formulation for Sentence Extrac-
tion and Ordering. In Coling 2010: Poster Volume,
pages 910–918.
Karmarkar, N. (1984). New polynomial-time algorithm for
linear programming. Combinatorica, 4:373–395.
Khachiyan, L. G. (1996). Rounding of polytopes in the real
number model of computation. Mathematics of Oper-
ations Research, 21:307–320.
Khuller, S., Moss, A., and Naor, J. S. (1999). The budgeted
maximum coverage problem. Information Precessing
Letters, 70(1):39–45.
Litvak, M., Last, M., and Friedman, M. (2010). A new ap-
proach to improving multilingual summarization us-
ing a Genetic Algorithm. In ACL ’10: Proceedings of
the 48th Annual Meeting of the Association for Com-
putational Linguistics, pages 927–936.
Liu, D., Wang, Y., Liu, C., and Wang, Z. (2006). Multiple
Documents Summarization Based on Genetic Algo-
rithm. In Fuzzy Systems and Knowledge Discovery,
volume 4223 of Lecture Notes in Computer Science,
pages 355–364.
Makhorin, A. O. (2000). GNU Linear Programming Kit.
http://www.gnu.org/software/glpk/.
Makino, T., Takamura, H., and Okumura, M. (2011). Bal-
anced coverage of aspects for text summarization. In
TAC ’11: Proceedings of Text Analysis Conference.
Mani, I. and Maybury, M. (1999). Advances in Automatic
Text Summarization. MIT Press, Cambridge, MA.
Ouyang, Y., Li, W., Li, S., and Lu, Q. (2011). Applying
regression models to query-focused multi-document
summarization. Information Processing and Manage-
ment, 47:227–237.
Salton, G., Yang, C., and Wong, A. (1975). A vector-space
model for information retrieval. Communications of
the ACM, 18.
Takamura, H. and Okumura, M. (2009). Text summariza-
tion model based on maximum coverage problem and
its variant. In EACL ’09: Proceedings of the 12th Con-
ference of the European Chapter of the Association for
Computational Linguistics, pages 781–789.
Woodsend, K. and Lapata, M. (2010). Automatic Genera-
tion of Story Highlights. In ACL ’10: Proceedings of
the 48th Annual Meeting of the Association for Com-
putational Linguistics, pages 565–574.
KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
286