Authors:
Siham Eddamiri
1
;
El Moukhtar Zemmouri
1
and
Asmaa Benghabrit
2
Affiliations:
1
LM2I Laboratory, Moulay Ismail University, ENSAM, Meknes and Morocco
;
2
LMAID Laboratory, Mohammed V University, ENSMR, Rabat and Morocco
Keyword(s):
Machine Learning, Linked Data, RDF, Clustering, Word2vec, Doc2vec, K-means.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Mining Text and Semi-Structured Data
;
Soft Computing
;
Symbolic Systems
Abstract:
With the increasing amount of Linked Data on the Web in the past decade, there is a growing desire for machine learning community to bring this type of data into the fold. However, while Linked Data and Machine Learning have seen an explosive growth in popularity, relatively little attention has been paid in the literature to the possible union of both Linked Data and Machine Learning. The best way to collaborate these two fields is to focus on RDF data. After a thorough overview of Machine learning pipeline on RDF data, the paper presents an unsupervised feature extraction technique named Walks and two language modeling approaches, namely Word2vec and Doc2vec. In order to adapt the RDF graph to the clustering mechanism, we first applied the Walks technique on several sequences of entities by combining it with the Word2Vec approach. However, the application of the Doc2vec approach to a set of walks gives better results on two different datasets.