6 CONCLUSIONS
In this work we presented a case study on the health-
care domain. Using the Hepatitis dataset, we showed
how these data can be modeled and explored in a
multi-dimensional model to promote decision sup-
port. We also discussed the use of a multi-dimen-
sional data mining algorithm to mine this model.
Results over the Hepatitis dataset show that it is
possible to mine these data and find interesting rela-
tions between dimensions. However,due to the nature
and distributions of these data, interesting patterns
found have very low support, and therefore, there is a
need to further analysis. Our analysis over the discov-
ered association rules concluded that the examination
results present in the hepatitis dataset, explored as de-
scribed, cannot predict the fibrosis state, mainly due
to the very low supports.
As future work, and in order to surpass the diffi-
culties of this dataset, other paths must be taken. One
of the problems comes from the lack of data and their
quality. The hepatitis dataset contains more than 30%
of patients that did not perform any biopsy (undiag-
nosed), and more than 75% of examinations for which
there is no information about an active biopsy. To
have a better understanding about why these patients
have not performed a biopsy requires domain knowl-
edge, and may help partitioning the data and improve
the results. In line with the above, this dataset con-
tains a very low number of instances for each type and
stage of hepatitis. There is the need for the integration
and analysis of more data in this domain.
The use of different approaches may also result
in better outcomes, such as infrequent pattern min-
ing (Zhou and Yau, 2007), for finding rare patterns; or
sequential and temporal pattern mining, for the anal-
ysis of the evolution of the disease.
An important step should also be the discovery of
structured patterns. Instead of considering one exam
at a time, we can, for example, aggregate the data per
pair (patient, biopsy), and using the same algorithm,
find frequent examination results that are common to
some type of hepatitis or that lead to some fibrosis
state. These structured patterns can also be used as
training data, in a further step, to improve classifica-
tion results, and therefore improve the prediction of
new hepatitis cases.
ACKNOWLEDGEMENTS
This work is partially supported by FCT – Fundac¸˜ao
para a Ciˆencia e a Tecnologia, under research
project D2PM (PTDC/EIA-EIA/110074/2009) and
PhD grant SFRH/BD/64108/2009.
REFERENCES
Crestana-Jensen, V. and Soparkar, N. (2000). Frequent
itemset counting across multiple tables. In Proc. of
the 4th Pacific-Asia Conf. on Knowl. Discovery and
Data Mining, pages 49–61, London. Springer.
Dehaspe, L. and Raedt, L. D. (1997). Mining association
rules in multiple relations. In ILP 97: Proc. of the 7th
Intern. Workshop on Inductive Logic Programming,
pages 125–132, London, UK. Springer.
Dˇzeroski, S. (2003). Multi-relational data mining: an intro-
duction. SIGKDD Explor. Newsl., 5(1):1–16.
Frawley, W. J., Piatetsky-Shapiro, G., and Matheus, C. J.
(1992). Knowledge discovery in databases: an
overview. AI Mag., 13(3):57–70.
Kaur, H. and Wasan, S. (2006). Empirical study on applica-
tions of data mining techniques in healthcare. Journal
of Computer Science, 2(2):194–200.
Koh, H. and Tan, G. (2005). Data mining applications in
healthcare. Journal of Healthcare Information Man-
agement, 19(2):64–71.
Ng, E. K. K., Fu, A. W.-C., and Wang, K. (2002). Mining
association rules from stars. In ICDM 02: Proc. of
the 2002 IEEE Intern. Conf. on Data Mining, pages
322–329, Japan. IEEE.
Pizzi, L., Ribeiro, M., and Vieira, M. (2005). Analysis
of hepatitis dataset using multirelational association
rules. In ECML/PKDD 2005 Discovery Challenge,
Porto, Portugal.
Silva, A. and Antunes, C. (2010). Pattern mining on stars
with fp-growth. In MDAI 10: Proc. of the 7th In-
tern. Conf. on Modeling Decisions for Artificial Intel-
ligence, pages 175–186, Perpignan, France. Springer.
Silva, A. and Antunes, C. (2012). Finding patterns in large
star schemas at the right aggregation level. In Proc. of
the 9th Intern. Conf. on Modeling Decisions for Arti-
ficial Intelligence, pages 329–340, Spain. Springer.
Srikant, R. (1996). Fast algorithms for mining association
rules and sequential patterns. PhD thesis, University
of Wisconsin, Madison. Supervisor-Jeffrey Naughton.
Watanabe, T., Susuki, E., Yokoi, H., and Takabayashi, K.
(2003). Application of prototypelines to chronic hep-
atitis data. In ECML/PKDD 2003 Discovery Chal-
lenge, Cavtat, Croatia.
Xu, L.-J. and Xie, K.-L. (2006). A novel algorithm for fre-
quent itemset mining in data warehouses. Journal of
Zhejiang University - Science A, 7(2):216–224.
Zhou, L. and Yau, S. (2007). Efficient association
rule mining among both frequent and infrequent
items. Computers and Mathematics with Applications,
54(6):737–749.
ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems
280