Chang, C.-H. and Lui, S.-C. (2001). Iepad: Information
extraction based on pattern discovery. In Proceed-
ings of the 10th International Conference on World
Wide Web, WWW ’01, pages 681–688, New York,
NY, USA. ACM.
Cohen, W. W., Hurst, M., and Jensen, L. S. (2002). A flex-
ible learning system for wrapping tables and lists in
html documents. In WWW, WWW ’02, pages 232–
241, New York, NY, USA. ACM.
Crescenzi, V., Mecca, G., and Merialdo, P. (2001). Road-
runner: Towards automatic data extraction from large
web sites. In Proceedings of the 27th International
Conference on Very Large Data Bases, VLDB ’01,
pages 109–118, San Francisco, CA, USA. Morgan
Kaufmann Publishers Inc.
Gusfield, D. (1997). Algorithms on strings, trees, and se-
quences: computer science and computational biol-
ogy. Cambridge University Press, New York, NY,
USA.
Han, W., Buttler, D., and Pu, C. (2001). Wrapping web data
into xml. SIGMOD Rec., 30(3):33–38.
Hao, Q., Cai, R., Pang, Y., and Zhang, L. (2011). From one
tree to a forest: A unified solution for structured web
data extraction. In Proceedings of the 34th Interna-
tional ACM SIGIR Conference on Research and De-
velopment in Information Retrieval, SIGIR ’11, pages
775–784, New York, NY, USA. ACM.
Hsu, C.-N. and Dung, M.-T. (1998). Generating finite-state
transducers for semi-structured data extraction from
the web. Information Systems, 23(8):521–538.
Kushmerick, N. (2000). Wrapper induction: Efficiency and
expressiveness. Artificial Intelligence, 118(1-2):15–
68.
Kushmerick, N., Weld, D. S., and Doorenbos, R. B. (1997).
Wrapper induction for information extraction. In Pro-
ceedings of the 15th International Joint Conference on
Artificial Intelligence (IJCAI ’97), pages 729 – 737.
Levenshtein, V. I. (1966). Binary codes capable of correct-
ing deletions, insertions and reversals. Soviet Physics
Doklady, 10(8):707–710.
Liu, B., Grossman, R., and Zhai, Y. (2003). Mining data
records in web pages. In Proceedings of the Ninth
ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, KDD ’03, pages
601–606, New York, NY, USA. ACM.
Muslea, I., Minton, S., and Knoblock, C. (1999). A hier-
archical approach to wrapper induction. In Proceed-
ings of the Third Annual Conference on Autonomous
Agents, AGENTS ’99, pages 190–197, New York, NY,
USA. ACM.
Myllymaki, J. and Jackson, J. (2002). Robust web data ex-
traction with xml path expressions. Technical Report
May.
Pinto, D., McCallum, A., Wei, X., and Croft, W. B. (2003).
Table extraction using conditional random fields. In
Proceedings of the 26th Annual International ACM SI-
GIR Conference on Research and Development in In-
formation Retrieval, SIGIR ’03, pages 235–242, New
York, New York, USA. ACM Press.
Sahuguet, A. and Azavant, F. (1999). Building light-weight
wrappers for legacy web data-sources using w4f. In
Proceedings of the 25th International Conference on
Very Large Data Bases, VLDB ’99, pages 738–741,
San Francisco, CA, USA. Morgan Kaufmann Publish-
ers Inc.
Sugibuchi, T. and Tanaka, Y. (2005). Interactive web-
wrapper construction for extracting relational infor-
mation from web documents. In Special Interest
Tracks and Posters of the 14th International Confer-
ence on World Wide Web, WWW ’05, pages 968–969,
New York, New York, USA. ACM Press.
Varun, S. (2011). Siloseer : A visual content extraction
system.
Wang, Y. and Hu, J. (2002). A machine learning based ap-
proach for table detection on the web. Proceedings of
the eleventh international conference on World Wide
Web, 9.
Wong, T.-L. and Lam, W. (2010). Learning to adapt web in-
formation extraction knowledge and discovering new
attributes via a bayesian approach. IEEE Transactions
on Knowledge and Data Engineering, 22(4):523–536.
Zhai, Y. and Liu, B. (2005). Web data extraction based on
partial tree alignment. In Proceedings of the 14th In-
ternational Conference on World Wide Web, WWW
’05, pages 76–85, New York, NY, USA. ACM.
Zhu, J., Nie, Z., Wen, J.-R., Zhang, B., and Ma, W.-Y.
(2005). 2d conditional random fields for web infor-
mation extraction. ICML ’05, pages 1044–1051, New
York, New York, USA. ACM Press.
Zhu, J., Nie, Z., Wen, J.-R., Zhang, B., and Ma, W.-Y.
(2006). Simultaneous record detection and attribute
labeling in web data extraction. In Proceedings of
the 12th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD ’06,
pages 494–503, New York, New York, USA. ACM
Press.
KDIR2014-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
500