Authors:
Alex Tacuri
1
;
Sergio Firmenich
1
;
2
;
Gustavo Rossi
2
;
1
and
Alejandro Fernandez
1
Affiliations:
1
LIFIA, CIC, Facultad de Informática, UNLP, Argentina
;
2
CONICET, Argentina
Keyword(s):
Web Scraping, Web Browser Extensions.
Abstract:
Web browser extensions are the preferred method for end-users to modify existing web applications (and the browser itself) to fulfill unanticipated requirements. Some extensions improve existing websites based on online data, combining techniques such as mashups and augmentation. To obtain data when no APIs are available, extension developers resort to scraping. Scraping is frequently implemented with hard-coded DOM references, making code fragile. Scraping becomes more difficult when a scraping pipeline involves several websites (i.e., the result of scraping composes elements from various websites). It is challenging (if not impossible) to reuse the scraping code in different browser extensions. We propose a data service layer for browser extensions. It encapsulates site-specific search and scraping logic and exposes object-oriented search APIs. The data service layer includes a visual programming environment for the specification of data search and object model creation, which are
exposed then as a programmatic API. While using this data service layer, developers are unconcerned with the complexity of data search, retrieval, scraping, and composition.
(More)