framework that covers accuracy and performance to
achieve the best outcome.
REFERENCES
Aslam, N., Tahir, B., Shafiq, H. M., & Mehmood, M. A.
(2019, December). Web-AM: An efficient boilerplate
removal algorithm for Web articles. In 2019
International Conference on Frontiers of Information
Technology (FIT) (pp. 287-2875). IEEE.
Chakrabarti, D., Kumar, R., & Punera, K. (2007, May).
Page-level template detection via isotonic smoothing.
In Proceedings of the 16th international conference on
World Wide Web (pp. 61-70).
Debnath, S., Mitra, P., & Giles, C. L. (2005). Identifying
content blocks from web documents. In Foundations of
Intelligent Systems: 15th International Symposium,
ISMIS 2005, Saratoga Springs, NY, USA, May 25-28,
2005. Proceedings 15 (pp. 285-293). Springer Berlin
Heidelberg.
Fernandes, D., de Moura, E. S., Ribeiro-Neto, B., da Silva,
A. S., & Gonçalves, M. A. (2007, November).
Computing block importance for searching on web
sites. In Proceedings of the sixteenth ACM conference
on Conference on information and knowledge
management (pp. 165-174).
Finn, A., Kushmerick, N., & Smyth, B. (2001, June). Fact
or Fiction: Content Classification for Digital Libraries.
In DELOS.
Gottron, T. (2007, September). Evaluating content
extraction on HTML documents. In Proceedings of the
2nd International Conference on Internet Technologies
and Applications (pp. 123-132).
Gottron, T. (2008, September). Content code blurring: A
new approach to content extraction. In 2008 19th
international workshop on database and expert systems
applications (pp. 29-33). IEEE.
Gupta, S., Kaiser, G., Neistadt, D., & Grimm, P. (2003,
May). DOM-based content extraction of HTML
documents. In Proceedings of the 12th international
conference on World Wide Web (pp. 207-214).
Kohlschütter, C., Fankhauser, P., & Nejdl, W. (2010,
February). Boilerplate detection using shallow text
features. In Proceedings of the third ACM international
conference on Web search and data mining (pp. 441-
450).
Mantratzis, C., Orgun, M., & Cassidy, S. (2005,
September). Separating XHTML content from
navigation clutter using DOM-structure block analysis.
In Proceedings of the sixteenth ACM conference on
Hypertext and hypermedia (pp. 145-147).
Mohammadzadeh, H., Gottron, T., Schweiggert, F., &
Nakhaeizadeh, G. (2011a, October). Extracting the
main content of web documents based on a naive
smoothing method. In International Conference on
Knowledge Discovery and Information Retrieval (Vol.
2, pp. 462-467). SCITEPRESS.
Mohammadzadeh, H., Gottron, T., Schweiggert, F., &
Heyer, G. (2012, November). TitleFinder: extracting
the headline of news web pages based on cosine
similarity and overlap scoring similarity. In
Proceedings of the twelfth international workshop on
Web information and data management (pp. 65-72).
Mohammadzadeh, H., Schweiggert, F., & Nakhaeizadeh,
G. (2011b, July). Using utf-8 to extract main content of
right to left language web pages. In International
Conference on Software and Data Technologies (Vol.
2, pp. 243-249). SCITEPRESS.
Vieira, K., Da Silva, A. S., Pinto, N., De Moura, E. S.,
Cavalcanti, J. M., & Freire, J. (2006, November). A fast
and robust method for web page template detection and
removal. In Proceedings of the 15th ACM international
conference on Information and knowledge management
(pp. 258-267).
Weninger, T., & Hsu, W. H. (2008, September). Text
extraction from the web via text-to-tag ratio. In 2008
19th International Workshop on Database and Expert
Systems Applications (pp. 23-28). IEEE.