The positive sign (+) stands for improvement,
and the negative sign (-) stands for the opposite. In
Table 3, the performances of MOGAs are about 9%,
16%, 8%, 12% and 13% higher than CRF, Manifold
Ranking, NetSum, QCS and SVM respectively.
In Table 4, we compare the performances of
MOGAs using different similarity measures (Cosine,
Euclidean, and NGD) to test the effectiveness of the
NGD-based dissimilarity measure. Consequently,
MOGAs with NGD performs better than Cosine and
Euclidean measures.
4 CONCLUSIONS
We have presented automatic document
summarization using sentence clustering based on
MOGAs, NSGA-II and SPEA 2, to improve the
performance of summarization. These MOGAs with
the CH and DB-indexes are compared to several
existing summarization methods on the open
DUC01 and DUC01 datasets. Since the conventional
document similarity measures are not suitable for
computing similarity between sentences, a
normalized Google distance is used.
Even though the MOGAs are no novelty in the
methodology, these algorithms have been proved for
the good clustering algorithms. Also, these are not
yet been studied for the document summarization.
We tested them with various methods (five
summarization methods) and various datasets
(DUC01 contain 147 and DUC02 contain 567) to
prove their good performances. Consequently,
NSGA-II and SPEA 2 showed the higher
summarization performances than other methods.
The performances of these MOGAs are about 9%,
16%, 8%, 12% and 13% higher than CRF, Manifold
Ranking, NetSum, QCS and SVM respectively.
In the near future, we will apply semantic
analysis to sentence similarity to reduce the
redundancy problem. And more various cluster
indices as objective functions will be tested to
improve the clustering performances.
ACKNOWLEDGEMENTS
This research was supported by Basic Science
Research Program through the National Research
Foundation of Korea (NRF) funded by the Ministry
of Education, Science and Technology (No. 2012-
0002004) and partially supported by the second
stage of Brain Korea 21 Project in 2012.
REFERENCES
Shen, D., Sun, J. T., Li, H., Yang, Q., and Chen, Z. 2007.
Document summarization using conditional random
fields. In Proceedings of IJCAI. 2862-2867.
Lee, J. S., Choi, L. C., and Park, S. C., 2011. Multi-
objective genetic algorithms, NSGA-II and SPEA2,
for document clustering. Communications in
Computer and Information Science. 257:219-227.
Song, W., and Park, S. C., 2009. Genetic algorithm for
text clustering based on latent semantic indexing.
Computers and Mathematics with Applications.
57:1901-1907
Song W., and Park, S. C., 2010. Latent semantic analysis
for vector space expansion and fuzzy logic-based
genetic clustering. Knowledge and Information
Systems. 22:347-369.
Censor, Y., 1977. Pareto optimality in multiobjective
problems. Applied Mathematics and Optimization.
4:41-59.
Knonak, A., Coit, D. W and Smith, A. E., 2006. Multi-
objective optimization using genetic algorithms : A
tutorial. Reliability Engineering and System Safety.
91:992-1007.
Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T., 2002.
A fast elitist multiobjective genetic algorithm: NSGA-
II. IEEE Transaction on Evolutionary Computation.
6(2):182-197.
Zitzler, E., Laumanns, M., and Thiele, L., 2002. SPEA2:
Improving the strength pareto evolutionary algorithm.
Proceedings of the EROGEN.
Cilibrasi, R. L., Vitányi, P. M. B., 2007. The Google
similarity measure. IEEE Transaction on Knowledge
and Data Engineering. 19:370-383.
Aliguliyev, R. M., 2009. A new sentence similarity
measure and sentence based extractive technique for
automatic summarization. Expert Systems with
Applications. 36 (4):7764-7772.
Calinski, T., Harabasz, J., 1974. A dendrite method for
cluster analysis. Communucations in Statistics.
Davies, D. L., Bouldin, D. W., 1979. A cluster separation
measure. IEEE transactions on Pattern analysis and
Machine Intelligene.
Pavan, M., Pelillo, M., 2007. Dominant sets and pairwise
clustering. IEEE Transactions on Pattern Analysis and
Machine Learning. 29:167-172.
Fragoudis, D., Meretakis, D., and Likothanassis, S., 2005.
Best terms:an efficient feature-selection algorithm for
text categorization. Knowledge and Information
Systems.
Lin, C. Y., Hovy, E. H., 2003. Automatic evaluation of
summaries using N-gram co-occurrence statistics. In
Proceedings of the NAACL on HLT 2003. 1:71-78.
Wan, X., Yang, J., and Xiao, J. 2007. Manifold-ranking
based topic-focused multi-document summarization.
In Proceedings of the 20th international joint
conference on artificial intelligence. 2903-2908.
Svore, K. M., Vanderwende, L., and Burges, C. J. C. 2007.
Enhancing single-document summarization by
GenerationofNon-redundantSummarybasedonSentenceClusteringAlgorithmsofNSGA-IIandSPEA2
181