Authors:
Rainer Schnell
and
Sarah Redlich
Affiliation:
University of Duisburg-Essen, Research Methodology Group, Forsthausweg 2, 47057 Duisburg and Germany
Keyword(s):
Administrative Data, Web Data, Mortality, Undercoverage, Big Data, Obituaries.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Since access to real-world data is often tedious, web scraping has gained popularity. A health research example is the monitoring of mortality rates. We compare the results of local online death notices and print-media obituaries to administrative mortality data. The web scraping process and its problems are being described. The resulting estimates of death rates and demographic characteristics of the deceased are statistically different from known population values. Scaped data resulted in a sample that is more male, older and contains less foreign nationals. Therefore, using web scraped data instead of administrative data cannot be recommended for the estimation of death rates at this time for Germany.