Analysis approach to the output of Semi-MultiCons
for this dataset is shown in Figure 10, where the
assigned cluster for each task is represented in color.
Using Jaccard index to compare true classes and
assigned clusters for the 474 tasks, an accuracy of
82% was calculated. It should be noted that these
initial results were obtained without tuning the
parameters of each step of the Semi-MultiCons
approach. In a second time, the Semi-MultiCons
approach was applied to a dataset of 303 064 error
tasks containing all error tasks raised by the Proration
module between January 2019 and September 2019
for a medium sized airline customer. Due to the size
of dataset, only partial information was available for
supervised validation of the results. However,
assuming clustering result is correct, the assessed rate
of tasks that are similar is 39.5%. With an estimated
average manual correction duration for tasks of more
than one minute, identifying similar tasks for their
simultaneous anomaly correction may save up to
2 000 hours of manual correction activity for these
303 064 tasks.
These achievements have also shown the
necessity for a speciation of semi-supervised
approaches to take into account the heterogeneous
internal and external available information, i.e., data
and prior knowledge, in input and the application
objectives from the perspective of the classes that are
to be distinguished: The potential overlapping
properties of classes in the data space, a hierarchical
structure of application classes, the availability of
prior knowledge such as data partially annotated with
application classes, the complex processing of logs of
sequential correction actions requiring deep learning
techniques, etc. Examples of recent applications with
similar considerations in the domains of ontology
matching and document classification can be found in
(Boeva et al., 2018) and (Ippolito and Júnior, 2016).
ACKNOWLEDGMENTS
This project was carried out as part of the IDEX
UCA
JEDI
MC2 joint project between Amadeus and the
Université Côte d'Azur. This work has been
supported by the French government, through the
UCA
JEDI
Investments in the Future project managed
by the National Research Agency (ANR) with the
reference number ANR-15-IDEX-01.
REFERENCES
Agovic A., Banerjee A. Semi-supervised Clustering. In
Data Clustering: Algorithms and Applications, Chapter
20, pp. 505-534, 2013, Chapman & Hall.
Al-Najdi A., Pasquier N., Precioso F. Frequent Closed
Patterns Based Multiple Consensus Clustering. In
ICAISC'2016 International Conference on Artificial
Intelligence and Soft Computing, pp. 14-26, June 2016,
LNCS 9693, Springer.
Al-Najdi A., Pasquier N., Precioso F. Using Frequent
Closed Pattern Mining to Solve a Consensus Clustering
Problem. In SEKE'2016 International Conference on
Software Engineering & Knowledge Engineering, pp.
454-461, July 2016, KSI Research Inc. SEKE'2016
Third Place Award.
Al-Najdi A., Pasquier N., Precioso F. Multiple Consensuses
Clustering by Iterative Merging/Splitting of Clustering
Patterns. In MLDM'2016 International Conference on
Machine Learning and Data Mining, pp. 790-804, July
2016, LNAI 9729, Springer.
Al-Najdi A., Pasquier N., Precioso F. Using Frequent
Closed Itemsets to Solve the Consensus Clustering
Problem. In International Journal of Software
Engineering and Knowledge Engineering,
26(10):1379-1397, December 2016, World Scientific.
Boeva V., Angelova M., Lavesson N., Rosander O.,
Tsiporkova, E. Evolutionary Clustering Techniques for
Expertise Mining Scenarios. In ICAART’2018
International Conference on Agents and Artificial
Intelligence, pp. 523-530, January 2018, SciTePress.
Boongoen T., Iam-On N. Cluster Ensembles: A Survey of
Approaches with Recent Extensions and Applications.
In Computer Science Review, vol. 28, pp. 1-25, 2018.
Dalton L., Ballarin V., Brun M. Clustering Algorithms: On
Learning, Validation, Performance, and Applications to
Genomics. In Current Genomics, 10(6):430-445, 2009,
Bentham Science Publisher.
Fahad A., Alshatri N., Tari Z., Alamri A., Khalil I., Zomaya
A., Foufou S., Bouras A. A Survey of Clustering
Algorithms for Big Data: Taxonomy and Empirical
Analysis. In IEEE Transactions on Emerging Topics in
Computing, 2(3):267-279, September 2014, IEEE
Computer Society.
Färber I., Günnemann S., Kriegel H.-P., Kröger P., Müller
E., Schubert E., Zimek A. On Using Class-Labels in
Evaluation of Clusterings. In KDD MultiClust
International Workshop on Discovering, Summarizing
and Using Multiple Clusterings, 2010.
Ghosh J., Acharya A. A Survey of Consensus Clustering.
In Handbook of Cluster Analysis, Chapter 22, pp. 497-
518, 2016, Chapman and Hall/CRC.
Grira I., Crucianu M., Boujemaa N. Unsupervised and
Semi-supervised Clustering. A Brief Survey. In A
Review of Machine Learning Techniques for
Processing Multimedia Content, vol. 1, pp. 9-16, 2005.
Halkidi M., Batistakis Y., Vazirgiannis, M. On Clustering
Validation Techniques. In Journal of Intelligent
Information Systems, vol. 17, pp. 107-145, 2001,
Springer.