framework that covers accuracy and performance to
achieve the best outcome.
Aslam, N., Tahir, B., Shafiq, H. M., & Mehmood, M. A.
(2019, December). Web-AM: An efficient boilerplate
removal algorithm for Web articles. In 2019
International Conference on Frontiers of Information
Technology (FIT) (pp. 287-2875). IEEE.
Chakrabarti, D., Kumar, R., & Punera, K. (2007, May).
Page-level template detection via isotonic smoothing.
In Proceedings of the 16th international conference on
World Wide Web (pp. 61-70).
Debnath, S., Mitra, P., & Giles, C. L. (2005). Identifying
content blocks from web documents. In Foundations of
Intelligent Systems: 15th International Symposium,
ISMIS 2005, Saratoga Springs, NY, USA, May 25-28,
2005. Proceedings 15 (pp. 285-293). Springer Berlin
Fernandes, D., de Moura, E. S., Ribeiro-Neto, B., da Silva,
A. S., & Gonçalves, M. A. (2007, November).
Computing block importance for searching on web
sites. In Proceedings of the sixteenth ACM conference
on Conference on information and knowledge
management (pp. 165-174).
Finn, A., Kushmerick, N., & Smyth, B. (2001, June). Fact
or Fiction: Content Classification for Digital Libraries.
Gottron, T. (2007, September). Evaluating content
extraction on HTML documents. In Proceedings of the
2nd International Conference on Internet Technologies
and Applications (pp. 123-132).
Gottron, T. (2008, September). Content code blurring: A
new approach to content extraction. In 2008 19th
international workshop on database and expert systems
applications (pp. 29-33). IEEE.
Gupta, S., Kaiser, G., Neistadt, D., & Grimm, P. (2003,
May). DOM-based content extraction of HTML
documents. In Proceedings of the 12th international
conference on World Wide Web (pp. 207-214).
Kohlschütter, C., Fankhauser, P., & Nejdl, W. (2010,
February). Boilerplate detection using shallow text
features. In Proceedings of the third ACM international
conference on Web search and data mining (pp. 441-
Mantratzis, C., Orgun, M., & Cassidy, S. (2005,
September). Separating XHTML content from
navigation clutter using DOM-structure block analysis.
In Proceedings of the sixteenth ACM conference on
Hypertext and hypermedia (pp. 145-147).
Mohammadzadeh, H., Gottron, T., Schweiggert, F., &
Nakhaeizadeh, G. (2011a, October). Extracting the
main content of web documents based on a naive
smoothing method. In International Conference on
Knowledge Discovery and Information Retrieval (Vol.
2, pp. 462-467). SCITEPRESS.
Mohammadzadeh, H., Gottron, T., Schweiggert, F., &
Heyer, G. (2012, November). TitleFinder: extracting
the headline of news web pages based on cosine
similarity and overlap scoring similarity. In
Proceedings of the twelfth international workshop on
Web information and data management (pp. 65-72).
Mohammadzadeh, H., Schweiggert, F., & Nakhaeizadeh,
G. (2011b, July). Using utf-8 to extract main content of
right to left language web pages. In International
Conference on Software and Data Technologies (Vol.
2, pp. 243-249). SCITEPRESS.
Vieira, K., Da Silva, A. S., Pinto, N., De Moura, E. S.,
Cavalcanti, J. M., & Freire, J. (2006, November). A fast
and robust method for web page template detection and
removal. In Proceedings of the 15th ACM international
conference on Information and knowledge management
(pp. 258-267).
Weninger, T., & Hsu, W. H. (2008, September). Text
extraction from the web via text-to-tag ratio. In 2008
19th International Workshop on Database and Expert
Systems Applications (pp. 23-28). IEEE.