Web Scraping Online Newspaper Death Notices for the Estimation of the Local Number of Deaths
Rainer Schnell, Sarah Redlich
2019
Abstract
Since access to real-world data is often tedious, web scraping has gained popularity. A health research example is the monitoring of mortality rates. We compare the results of local online death notices and print-media obituaries to administrative mortality data. The web scraping process and its problems are being described. The resulting estimates of death rates and demographic characteristics of the deceased are statistically different from known population values. Scaped data resulted in a sample that is more male, older and contains less foreign nationals. Therefore, using web scraped data instead of administrative data cannot be recommended for the estimation of death rates at this time for Germany.
DownloadPaper Citation
in Harvard Style
Schnell R. and Redlich S. (2019). Web Scraping Online Newspaper Death Notices for the Estimation of the Local Number of Deaths. In Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 5: HEALTHINF; ISBN 978-989-758-353-7, SciTePress, pages 319-325. DOI: 10.5220/0007382603190325
in Bibtex Style
@conference{healthinf19,
author={Rainer Schnell and Sarah Redlich},
title={Web Scraping Online Newspaper Death Notices for the Estimation of the Local Number of Deaths},
booktitle={Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 5: HEALTHINF},
year={2019},
pages={319-325},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007382603190325},
isbn={978-989-758-353-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - Volume 5: HEALTHINF
TI - Web Scraping Online Newspaper Death Notices for the Estimation of the Local Number of Deaths
SN - 978-989-758-353-7
AU - Schnell R.
AU - Redlich S.
PY - 2019
SP - 319
EP - 325
DO - 10.5220/0007382603190325
PB - SciTePress