Table 7: Evaluation results for the Link Prediction task on the Mushroom dataset (Entities=8487, Relations=23, Tri-
ples={153057, 9525, 9564, 18942} for training, validation, tuning and test sets) – Mean Reciprocal Rank (MRR), Mean
Rank (MR), Hits@1, Hits@3, Hits@5, Hits@10, and Accuracy (equivalent to Hits@1 on predicting the relation has class).
MRR MR Hits@1 Hits@3 Hits@5 Hits@10 Accuracy
TransE 0.565 472.32 0.466 0.643 0.682 0.718 53.1%
HEXTRATO (H4)
k = 8 0.717 2.054 0.553 0.856 0.955 0.993 88.6%
k = 16 0.763 1.856 0.619 0.892 0.961 0.994 89.3%
k = 32 0.804 1.712 0.683 0.914 0.964 0.994 90.7%
k = 64 0.814 1.688 0.703 0.915 0.965 0.996 95.3%
version of Freebase, on a publicly available da-
taset, and on two domain-specific datasets show
HEXTRATO outperforms previous state-or-the-art
methods in the link prediction task when using cate-
gorised entities. Some of the directions in which this
work can be extended include:
TransE-like extended models. Learning embed-
ding representation from more structured knowledge
sources can benefit from the inherit enriched meta-
data. HEXTRATO is a constraint-based method that
extends TransE in order to obtain an initial baseline
for the evaluation task when dealing with domain-
specific categorised datasets. We plan to evaluate our
method coupled with more complex embedding mo-
dels originated from TransE.
Many-to-many relationships. Normalising N:N
relations can make an embedding model more flexi-
ble. However, it adds additional level of complexity
in terms of learning semantically related entities. Alt-
hough preliminary experiments did not show effective
improvement over previously applied constraints, we
believe further investigation can demonstrate whether
more specific conditions can lead our model to reach
better results.
Activation functions. More complex embedding
models deal with projection matrices and rely on sim-
ple linear neural networks. We plan to investigate
whether alternatively coupling ontology-based con-
straints with non-linear activation functions, such as
RELUs, Sigmoid, or Tanh, can improve the embed-
ding model performance on domain-specific datasets.
Hybrid approaches. Distinct sets of relation em-
bedding representations can be more effectively learnt
from distinct approaches. Tightening state-of-the-art
bounds by combining different methods into a hybrid
approach in which each relation can be represented by
a distinct embedding model can produce models that
are more flexible on learning distinct types of relati-
onships between entities within a dataset.
Unseen entities. The primordial assumption when
dealing with any kind of machine learning model is
the ability of such resulting model on generalising.
Embedding models are weak regarding to this aspect.
Validation and test sets are required to be designed
with entities and relations that appear at least once in
the training set. We plan to investigate how embed-
ding models coupled with ontology-based constraints
can be used to learn low-embedding representation
for unseen entities along the validation, tuning and
test steps.
REFERENCES
Bengio, Y., Larochelle, H., and Vincent, P. (2005). Non-
local manifold parzen windows. In Weiss, Y.,
Sch
¨
olkopf, B., and Platt, J., editors, Advances in
Neural Information Processing Systems 18 (NIPS’05),
Cambridge, MA. MIT Press.
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., and Taylor,
J. (2008). Freebase: A collaboratively created graph
database for structuring human knowledge. In Procee-
dings of the 2008 ACM SIGMOD International Con-
ference on Management of Data, SIGMOD ’08, pages
1247–1250, New York, NY, USA. ACM.
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., and
Yakhnenko, O. (2013). Translating embeddings for
modeling multi-relational data. In Burges, C. J. C.,
Bottou, L., Welling, M., Ghahramani, Z., and Wein-
berger, K. Q., editors, Advances in Neural Information
Processing Systems 26, pages 2787–2795. Curran As-
sociates, Inc.
Bordes, A., Weston, J., Collobert, R., and Bengio, Y.
(2011). Learning structured embeddings of know-
ledge bases. In Conference on Artificial Intelligence.
Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N.,
Murphy, K., Strohmann, T., Sun, S., and Zhang, W.
(2014). Knowledge vault: A web-scale approach to
probabilistic knowledge fusion. In Proceedings of
the 20th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD ’14,
pages 601–610, New York, NY, USA. ACM.
Fellbaum, C., editor (1998). WordNet: an electronic lexical
database. MIT Press.
Gardner, M. and Mitchell, T. (2015). Efficient and expres-
sive knowledge base completion using subgraph fe-
ature extraction. In Proceedings of the 2015 Con-
ference on Empirical Methods in Natural Language
Processing, pages 1488–1498. Association for Com-
putational Linguistics.
KDIR 2018 - 10th International Conference on Knowledge Discovery and Information Retrieval
80