discovery of information enabling us valuable in-
sights from Big Data. AI claims to be in the position
to provide information consumers with useful in-
sights. There is no doubt, data mining, machine learn-
ing, deep learning, etc. are essential in data science.
Their methods and tools produce a lot of useful data,
which, in turn, lead to the expected insights. How-
ever, focusing exclusively on AI would restrict us to
insights that we may gain from the results of AI tools.
The capabilities of the available AI tools determine
the range of insights.
By viewing data science as the discipline of infor-
mation discovery performed along the requirements
of information consumers and managed by data sci-
entists we bring humans into the loop of technology
design and use. The user-centred approach shifts the
focus from a purely data-driven approach towards a
problem-driven approach. We see practical data sci-
ence rather as the result of a cooperative effort of dis-
covery team representing domain and technical
knowledge and managed by the data scientist. In this
article we sketched the profession of the data scientist
from this perspective that conceives the data scien-
tists rather as a master of the information discovery
lifecycle than data mining expert or the like.
Data scientists do not construct models for all
kinds of analysis tools and reasoning systems but
should clearly be aware of the information those sys-
tems presume to produce. They should constantly
check the results for false positives, in particular,
when personal data are involved and/or expectations
are high on the outcome of the data analysis. The data
scientist is a scientist. She or he should take care that
information is put to good use. We can expect that she
or he assumes ethical responsibility and makes sure
that information provided is reliable. The “everything
is possible” attitude that shines through in many Big
Data discussions is dangerous and rises false expec-
tations. The polls during the recent elections in USA
showed that sometimes even the whole statistical and
data mining machinery of an entire country can fail
quite embarrassingly. Data Scientists need to develop
a sound sensation of plausibility that helps them to
rise doubts and to prompt a closer look when the re-
sults of data analysis seem too questionable to them.
Finally, we advocate for the education in data sci-
ence a stronger focus on text mining, information vis-
ualization, ethics aspects rised by Big Data, and the
management of information discovery.
REFERENCES
Bedathur, S., Berberich, K., Dittrich, J., Mamoulis, N.,
Weikum, G., 2010. Interesting-phrase mining for ad-
hoc text analytics. In: Proceedings of the VLDB Endow-
ment, vol. 3, no. 1-2, 1348-1357.
Cowie, J., Lehnert, W., 1996. Information Extraction. Com-
munications of the ACM, vol. 39, no. 1, 80-91.
T.H. Davenport, T. H., Patil, D.J. 2010. Data scientist: The
sexiest job of the 21st century. Harvard Business Re-
view, vol. 90, no. 10, pp. 70-76.
Elbeshausen, S., Womser-Hacker, C., Mandl, T., 2014.
Searcher heterogeneity in collaborative information
seeking within the context of work tasks. In: Proceed-
ings of the 5th Information Interaction in Context Sym-
posium (IIiX), 327-329.
Englmeier, K., Murtagh, F., 2016. Interaction for Infor-
mation Discovery Empowering Information Consum-
ers. In: S. Yamamoto (ed.): Human Interface and the
Management of Information: Information, Design and
Interaction. Volume 9734, Lecture Notes in Computer
Science, 252-262.
Evangelopoulos, N, Visinescu, L., 2012. Text-Mining the
Voice of the People. Communications of the ACM, vol.
55, no. 2, 62-29.
Fan, W., Wallace, L., Rich, S., Zhang, Z., 2006. Tapping
the power of text mining. Communications of the ACM,
vol. 49, no. 9, 76-82.
Feldman, R., 2013. Techniques and Applications for Senti-
ment Analysis. Communications of the ACM, vol. 56
no. 4, 82-89.
Gudivada, V. N., Baeza-Yates, R., Raghavan, V.V. 2015.
Big data: Promises and problems. IEEE Computer, vol.
48, no. 3, pp. 20-23.
Langfeld, H., Kroh, M., 2016. Solidarity with EU countries
in crisis: results of a 2015 Socio-Economic Panel
(SOEP) survey. DIW Economic Bulletin, no. 39, Sep-
tember 30, 2016, 473-479.
Lohr, S., 2014. Google Flu Trends: The Limits of Big Data.
In: The New York Times, March 24, 2014.
McCallum, A., 2005. Information Extraction: Distilling
Structured Data from Unstructured Text. ACM Queue -
Social Computing. vol. 3, no. 9, 48-57.
Norman, D., 1987. Some observations on mental models.
In: D. Gentner; A. Stevens, (Eds.) Mental Models, Law-
rence Erlbaum, Hillsdale, NJ.
Rosenberg, E., 2016. Fake New York Times Article Claims
Elizabeth Warren Endorsed Bernie Sanders. The New
York Times, March 1, 2016.
Rosenbaum, S. and Ramey, J., 2014. Current Issues in As-
sessing and Improving Information Usability. In: Pro-
ceedings of the CHI’14, Extended Abstracts, 1119-
1122.
Turney, P., 2002. Thumbs Up or Thumbs Down? Semantic
Orientation Applied to Unsupervised Classification of
Reviews. In: Proceedings of the Association for Com-
putational Linguistics
. pp. 417–424.
Wittenberg, E., 2016. Eight Questions for Jürgen Schupp,
Refugees have a strong educational orientation, DIW
Economic Bulletin, no 48/2016, December 6, 2016,
557-558.
Wright, A., 2009. Our Sentiments, exactly, Communica-
tions of the ACM, vol. 52 no. 4, 14-15 (2009)