Author: Sébastien Ferré (homepage)
This page provides information for the reproducibility of our experiments on the use of Concepts of Nearest Neighbours (CNN) to the task of knowledge graph completion, aka. link prediction in knowledge graphs.
Given a knowledge graph whose nodes are entities, and whose edges are labelled with relations, the CNNs of an entity e are clusters of similar entities in the graph where, for each cluster, the similarity is formally represented by a Concept of Nearest Neighbours (CNN). Such a concept is composed of an intent and an extent. The intent is a conjunctive query Q (i.e. a graph pattern with a distinguished node) that characterizes what the entities in the cluster have in common with entity e. The extent is the set of answers of query Q, i.e. all entities that are at least as similar as entities in the cluster. Such concepts define a symbolic form of distance between entities, hence the analogy with k-nearest neighbours. The larger the concept intent or the smaller the concept extent, the closer (or the more similar) entities in the cluster are.
Technical details about the definition and computation of CNNs can be found in the following publications:
In this page, we consider the use of CNNs for the task of link prediction in knowledge graphs. The task consists in predicting a missing entity of a triple (head, relation, tail), i.e. either predicting the missing tail given the subject and relation, or predicting the head given the object and relation.
The source code of our implementation is available as part of the
BitBucket
repository of SEWELIS. The main relevant code is found in
branch dev
in source files src/cnn.ml
(computation of CNNs) and src/cnn_expe.ml
(experiments
based on CNNs). For the simplicity of reusing the program on other
datasets or with different parameters, we here provide
an executable binary for Linux64 Fedora 25
system (it should work on any Linux64 distribution).
Usage of the program (run ./cnn_expe --help
for more details).
./cnn_expe -train file.rdf [-valid file.rdf] -test file.ttl [-inv] [-maxprio integer] [-timeout float] [-v]
The train and valid RDF files are treated alike (no validation
phase) and can be in various formats (RDF/XML, Ntriples,
Turtle). The test file must be in Turtle format. The
flag -inv
asks to predict the heads of triples in
addition to the tails. The maxprio
option enables to
specify a maximum depth, and determines how far from the entity the
knowledge graph should be explored. It must be a positive integer
(default: no limit). The timeout is in seconds (default 1s), and is
split in two between CNN computation and prediction. The
flag -v
is for verbose more, which allows to get
explanations for the predictions.
We have experimented on 5 datasets, four classical ones, and a new one.
A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston and O. Yakhnenko, Translating embeddings for modeling multi-relational data, in: Advances in neural information processing systems, 2013, pp. 2787–2795.Here is an RDF version formatted for our program.
T. Dettmers, P. Minervini, P. Stenetorp and S. Riedel, Convolutional 2D Knowledge Graph Embeddings, in: Conf. Artificial Intelligence (AAAI), S.A. McIlraith and K.Q. Weinberger, eds, AAAI Press, 2018, pp. 1811–1818.Here is an RDF version formatted for our program.
A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston and O. Yakhnenko, Translating embeddings for modeling multi-relational data, in: Advances in neural information processing systems, 2013, pp. 2787–2795.Here is an RDF version formatted for our program.
Toutanova, K., Chen, D.: Observed versus latent features for knowledge base and text inference. In: Work. Continuous Vector Space Models and their Compositionality. pp. 57--66 (2015)Here is an RDF version formatted for our program.
dataset | depth | timeout(s) | Hits@1 | Hits@3 | Hits@10 | MRR | output log |
---|---|---|---|---|---|---|---|
WN18 | 3 | 1+1 | 0.967 | 0.970 | 0.972 | 0.969 | Download |
WN18RR | 3 | 1+1 | 0.444 | 0.481 | 0.519 | 0.469 | Download |
FB15k | 3 | 1.5+0.5 | 0.827 | 0.861 | 0.890 | 0.849 | Download |
FB15k-237 | 3 | 1.5+0.5 | 0.222 | 0.322 | 0.446 | 0.296 | Download |
Mondial | 3 | 1+1 | 0.409 | 0.511 | 0.645 | 0.485 | Download |