6 CONCLUSION
Data analytics, as could be seen throughout this
document, is a very promising and important field
nowadays, as it is still growing and being adapted
within many companies around the globe. The paper
Power to the People! (Spruit & Jagesar, 2016)
represented a starting point for spreading the power of
KD, of technology, to people who are no experts in the
area, who have other qualities that could indeed help
on extracting information as good as (or sometimes
better) data analysts or scientists. This study followed
the same line of research, focusing in the applied data
science area of study, and showed to be significant, as
we could see it to be of value for domain experts to start
exploring data in a simpler and structured way, as per
the good results during the evaluation of the guideline.
Additionally, answering the research question
presented earlier in this research, first, the original
CRISP-DM was identified as being indifferent
regarding the type of professional who is following it,
when in reality the type of user who is conducting the
analysis, in conjunction with the type of analytical
project and data available, should determine how to
pursue an activity, and which tasks to actually perform.
Thus, an adaptation of the CRISP-DM was proposed,
aligning the objectives of the framework with what is
believed to be indeed important for domain experts
(based on the interviews, data quality assessment, and
literature review), where only the activities (as well as
their inner tasks) that would add some value into the
analysis, and at the same time, would be feasible
considering all the mentioned constraints, were
suggested to be followed by domain experts. Second,
regarding the Data Preparation phase, one cannot
prepare any data without first defining a project context
and going through the Data Understanding phase. It
was not possible to focus only in the Data Preparation
task, without providing domain experts the means and
the goals for preparing the data. Thus, to facilitate the
Data Preparation phase the Business Understanding
and Data Understanding phases had to be addressed
and simplified as well. Third, as mentioned earlier in
this study, Data Preparation is considered to be even
more time consuming and complicated than DM itself.
Defining how to pursue this activity, depends most of
the times to the project at hand and information
available. Thus, in order to facilitate it, the goals of this
phase had to be limited to only making the dataset
simpler and smaller, instead of fixing and cleaning all
possible scenarios, given domain experts’ time and
technical constraints. Additionally, based on the
difficulties mentioned by domain experts during the
interviews and the quality of the data that they would
be dealing with, some activities within the Data
Preparation phase were highlighted, such as Data
Integration and Data Construction, focusing on
allowing those professionals to prepare the data, and at
the same time, to not spend more time than required on
this task. Therefore, Data Preparation for domain
experts such as healthcare professionals should not
have the purpose of creating a perfect dataset, but
rather to create a simpler and smaller one for further
exploration.
REFERENCES
Baraldi, A. N., & Enders, C. K. (2010). An introduction to
modern missing data analyses. Journal of School
Psychology, 48(1), 5–37.
Bibal, A., & Frénay, B. (2016). Interpretability of Machine
Learning Models and Representations. ESANN
European Symposium on Artificial Neural Networks, 27–
29.
Brinkkemper, S. (1996). Method engineering: Engineering of
information systems development methods and tools.
Information and Software Technology, 38(4), 275–280.
Cao, L. (2012). Actionable knowledge discovery and
delivery. Wiley Interdisciplinary Reviews: Data Mining
and Knowledge Discovery, 2, 149–163.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz,
T., Shearer, C., & Wirth, R. (2000). Crisp-Dm 1.0.
CRISP-DM Consortium.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From
Data Mining to Knowledge Discovery in Databases. AI
Magazine, 17(3), 37–54.
Francis, J., Johnston, M., Robertson, C., Glidewell, L.,
Entwistle, V., Eccles, M., & Grimshaw, J. (2010). What
is an adequate sample size? Operationalising data
saturation for theory-based interview studies. Psychology
and Health, 25(10), 1229–1245.
Linden, A., Vashisth, S., Sicular, S., Idoine, C., Krensky, P.,
& Hare, J. (2017). Magic Quadrant for Data Science
Platforms. G00301536. Gartner.
Luo, Q. (2008). Advancing Knowledge Discovery and Data
Mining. First International Workshop on Knowledge
Discovery and Data Mining (WKDD 2008), 7–9.
Raghupathi, W., & Raghupathi, V. (2014). Big data analytics
in healthcare: promise and potential. Health Information
Science & Systems, 2(1), 3.
Rozanski, N., & Woods, E. (2005). Software Systems
Architecture: Working with Stakeholders using
Viewpoints and Perspectives. Addison-Wesley.
Spruit, M., & Jagesar, R. (2016). Power to the People! - Meta-
Algorithmic Modelling in Applied Data Science. Proc. of
the 8th Int. Joint Conf. on Knowledge Discovery,
Knowledge Engineering and Knowledge Management,
1, 400–406.
Tsai, C.-W., Lai, C.-F., Chao, H.-C., & Vasilakos, A. V.
(2015). Big data analytics: a survey. Journal of Big Data,
2(1), 21.