ArabRelat: Arabic Relation Extraction using Distant Supervision

Reham Mohamed, Nagwa M. El-Makky and Khaled Nagi

Department of Computer and Systems Engineering, Alexandria University, Alexandria, Egypt

Keywords:

Relation Extraction, Linked Data, DBpedia.

Abstract:

Relation Extraction is an important preprocessing task for a number of text mining applications, including:

Information Retrieval, Question Answering, Ontology building, among others. In this paper, we propose a

novel Arabic relation extraction method that leverages linguistic features of the Arabic language in Web data

to infer relations between entities. Due to the lack of labeled Arabic corpora, we adopt the idea of distant

supervision, where DBpedia, a large database of semantic relations extracted from Wikipedia, is used along

with a large unlabeled text corpus to build the training data. We extract the sentences from the unlabeled text

corpus, and tag them using the corresponding DBpedia relations. Finally, we build a relation classiﬁer using

this data which predicts the relation type of new instances. Our experimental results show that the system

reaches 70% for the F-measure in detecting relations.

1 INTRODUCTION

Relation Extraction (RE) is the task of extracting se-

mantic relations between entities from plain text. RE

is one of the important tasks in computational lin-

guistics and is considered as a preprocessing task for

a number of applications, such as: information re-

trieval (Waitelonis and Sack, 2012; Hsu et al., 2012),

question answering (Unger et al., 2012; Yahya et al.,

2012), ontology building (Gupta et al., 2014), etc. Al-

though there are several resources of linked data for

different languages, the Arabic resources are still very

limited. Therefore, there is a great need for automated

methods which extract relations from Arabic text to

enrich the Arabic linked data. While several relation

extraction systems have been proposed for the English

language, Arabic RE systems are still very limited due

to the lack of tagged corpora and the challenges of the

Arabic language.

Among the challenges of the Arabic language is

that Arabic is highly inﬂectional and derivational,

which makes its morphological analysis a complex

task. Inﬂectional: where each word consists of a root

and zero or more afﬁxes (preﬁx, inﬁx, sufﬁx). Deriva-

tional: where all the Arabic words have root verbs

of three or four characters. Also, Arabic is charac-

terized by diacritical marks (short vowels). The

same word with different diacritics can express dif-

ferent meanings. Diacritics are usually omitted which

greatly increases ambiguity. The absence of capi-

tal letters in Arabic is an obstacle against accurate

named entities recognition. All attempts to make Ara-

bic RE systems rely on small tagged corpora and are

limited to a set of relations and speciﬁc domains. Ex-

amples of these challenges are shown in Figure 1.

In this paper, we introduce ArabRelat, an Arabic

relation extraction system that adopts the method of

distant supervised learning. In distant supervision, a

large database of semantic relations is used along with

a corpus of unlabeled Web data, such as: Wikipedia.

The corpus is used to extract sentences which con-

tain the relation entities. These sentences tagged with

the corresponding relation types, are used to build

the training data. Finally, the system uses this data

to train a Relation classiﬁer which predicts the rela-

tion type of new instances. Several features are ex-

tracted from the Arabic sentences to build the classi-

ﬁer. Among these features, we extract a set of Arabic-

speciﬁc rich features which characterize relations in

the Arabic language.

We evaluate the system using Arabic DBpedia as

the database of semantic relations, and Wikipedia as

the untagged corpus. Our results show that the system

could achieve 70% F-measure for extracting 97 types

of relations which shows its applicability for general

relation extraction.

Our main contribution can be summarized in the

following points:

1. Building an Arabic relation extraction system us-

ing distant supervised learning.

410

Mohamed, R., El-Makky, N. and Nagi, K..

ArabRelat: Arabic Relation Extraction using Distant Supervision.

In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - Volume 2: KEOD, pages 410-417

ISBN: 978-989-758-158-8

2. Constructing a relation classiﬁer which predicts

the relation type of newly unseen instances.

3. Introducing new Arabic speciﬁc features which

characterize relations in the Arabic language.

The rest of the paper is organized as follows: Sec-

tion 2 shows some of the work related to our system.

Section 3 shows the Arabic corpora used to construct

the training data. Section 4 shows the details of the

system architecture. Section 5 shows the different

features used to build the relation classiﬁer. In Sec-

tion 6, we show the results of the system evaluation.

Finally, we conclude the paper in Section 7.



Parsing:

  and

  open

  they

  it



Root: جرخ



















Gloss:

grandfather

seriousness

Be generous

Figure 1: Examples of the challenges of the Arabic lan-

guage. a) Shows one Arabic word that is chuncked into

four English words. b) Shows an example of a three char-

acter root of an Arabic word. c) Shows that the diacritics

could change the meaning of the same word.

2 RELATED WORK

Some attempts for automatic relation extraction have

been proposed in literature. These attempts can be

classiﬁed into: supervised techniques, unsupervised

techniques and distant supervised techniques. (Snow

et al., 2004) built a relation classiﬁer that makes a bi-

nary decision of whether two nouns are related by hy-

pernym (is-a) relation or not. Given a training set of

text containing hypernym pairs, the algorithm auto-

matically extracts useful dependency paths and ap-

plies them to new corpora to identify novel pairs.

(Banko et al., 2007) built a domain independent sys-

tem for discovery of relations extracted from text that

scales to diversity and size of the web corpus. The

system uses self-learning where given a small corpus

sample, it outputs a classiﬁer that labels candidate ex-

tractions as “trustworthy” or not without hand-tagged

data.

(Mintz et al., 2009) introduced the idea of distant

supervision. This paper uses Freebase to provide dis-

tant supervision for relation extraction where any sen-

tence that contains a pair of entities, that participate in

a known Freebase relation, is likely to express that re-

lation in some way. (Nguyen and Moschitti, 2011)

proposed a joint model between distant supervised

data and manually annotated data from ACE. Their

system shows good accuracy for extracting 52 types

of relations which suggests the applicability of dis-

tant supervision for general RE. In (Fan et al., 2014),

the distantly supervised relation extraction was solved

as a matrix completion problem. (Yao et al., 2012)

uses an unsupervised approach to handle the problem

of Polysemy where the same pattern can have several

meanings. It employs local features and global fea-

tures to induce pattern senses by clustering feature

representations of pattern contexts.

All of the previous systems were built for the En-

glish language. Few systems have been proposed for

Arabic, however, all of which depend on tagged small

corpora. (Alsaif and Markert, 2011) presented an al-

gorithm to identify explicit discourse connectives and

the relations they signal for Arabic text. They an-

notated news articles from Arabic Penn Treebank to

build their system. (Kambhatla, 2006) built a minor-

ity voting scheme among a committee of classiﬁers to

enhance the recall of the relation classiﬁer. This sys-

tem was trained and tested using the datasets of ACE

2004 relation extraction task for English, Arabic and

Chinese (NIST, 2003).

On the other hand, some systems were proposed

to study the linguistic features of the Arabic language

and to use these rich features to extract useful infor-

mation. For example, (Diab et al., 2008) used some

Arabic rich morphological features to predict the se-

mantic roles of Arabic text. (Green and Manning,

2010) studied the Arabic linguistic features to achieve

better parsing for the Arabic text. To the best of

our knowledge, our paper is the ﬁrst work to exploit

the Arabic rich linguistic features to extract relations

without using any tagged data.

3 ARABRELAT CORPORA

In this section, we describe the Arabic corpora that

we use to build our system.

3.1 Arabic Wikipedia

Wikipedia is one of the most commonly used re-

sources in computational linguistics. The attraction

to Wikipedia returns to its large size, its diversity and

for being always up to date. The Arabic version of

ArabRelat: Arabic Relation Extraction using Distant Supervision

411

DBpedia

Wikipedia

Document Retrieval

Sentence Extractor

Features Extractor

IR Module

Relation Classifier

instance

classifier

voting

Relation (name) Not relation

Figure 2: System Architecture.

Wikipedia has over 350, 000 articles and is currently

the 21

largest edition of Wikipedia

3.2 Arabic DBpedia

DBpedia is a crowd-sourced community effort to ex-

tract structured information from Wikipedia and make

this information available on the Web. It is one of the

largest resources of linked data. The structured data

are extracted from the infoboxes of Wikipedia pages,

which are offered as parameter and value. However,

the names of these parameters are ambiguous, where

the same parameter can be expressed using different

names; such as: birthplace and placeofbirth. There-

fore, DBpedia Mapping Language has been devel-

oped to help in mapping these properties to an on-

tology. DBpedia is offered in many languages. Ara-

bic DBpedia release is still unavailable on the ofﬁ-

cial DBpedia website, but some unofﬁcial dumps are

available.

We use DBpedia as the database of semantic rela-

tions, along with Wikipedia as the unlabeled text cor-

pus, to build our distant supervised training data.

4 SYSTEM ARCHITECTURE

The system architecture is shown in Figure 2. The

goal is to build a system that extracts new relations

from a large text corpora, such as: Wikipedia, such

https://en.wikipedia.org/wiki/Arabic Wikipedia

that the relations should be domain independent and

the system should not rely on previously tagged data.

Therefore, we adopt the idea of distant supervision,

where a large database of linked data is used to build

a training set using the relations and their entities. For

the purpose of this system, we use DBpedia to get

the training relations and we extract the correspond-

ing sentences from Wikipedia to build the training set.

The system can be divided into two stages: in-

formation retrieval and relation classiﬁcation. First,

the DBpedia relations and Wikipedia pages are fed

into an Information Retrieval(IR) module. The main

function of the IR module is to retrieve the pages

that are semantically related to each relation. We use

Wikipedia as an ontology of concepts, to build an in-

verted index of Wikipedia terms. Then, we convert

the relation with its entities into a vector of concepts

to retrieve the most relevant pages. The IR module

then extracts the sentences that contain the entities of

the relation from the retrieved pages. The extracted

sentences are tagged according to the relation type.

The sentences then pass to a features extraction mod-

ule which extracts the different features using a mor-

phological analyzer and a dependency parser.

In the second stage, we build a relation classiﬁer

which predicts the relation type between two unseen

entities. We use the training data constructed in the

previous stage to train an SVM classiﬁer, which clas-

siﬁes a Wikipedia sentence containing the two entities

into a relation type, after extracting its features. To

make the system more robust, we classify all the sen-

tences including the two entities in question. Then,

we use a voting scheme which selects the relation

class that appears most frequently with the highest

conﬁdence. In the next subsections, we explain each

module in detail.

4.1 IR Module

As an initial step, we extract the sentences from

Wikipedia that correspond to a DBpedia relation.

These sentences would be used as the training data

tagged with the relation type. This process is referred

to as distant supervision. Due to the large ambigu-

ity of Arabic language, where one word may refer to

several meanings, we do not match a sentence to a

relation directly. Instead, we use a semantic informa-

tion retrieval (IR) module to retrieve the related doc-

uments i.e Wikipedia pages that are relevant to the

relation. Then we extract the sentences from the re-

trieved documents. This way we could guarantee the

semantic relation between the extracted sentences and

the relation.

The details of the IR module is illustrated in Fig-

KEOD 2015 - 7th International Conference on Knowledge Engineering and Ontology Development

412

Wikipedia

Ontology

Weighted

inverted index

Term 1

Term i

Term N

Weighted

list of

Concepts

Semantic Interpreter

Query

Weighted vector of

Concepts

Figure 3: Details of the Information Retrieval (IR) module.

ure 3. We use Wikipedia as an ontology of con-

cepts, where each Wikipedia page represents a con-

cept (Gabrilovich and Markovitch, 2007). Then, we

build an inverted index for all Wikipedia terms, such

that each term is represented as a vector of concepts.

When a query enters the system, it is parsed into

terms. Finally, the query is interpreted as one vector

using the Cosine similarity of its terms vectors. The

concepts (pages) which have the highest weights in

the ﬁnal vectors are the most relevant to the query.

The query here is the DBpedia relation repre-

sented by its two entities. For each relation, we use

the IR module to get the most relevant Wikipedia

pages. Then, we parse each page into sentences using

the punctuation marks. The sentences which contain

the two entities are used to construct the training data.

4.2 Features Extraction

Several features are extracted from the Arabic sen-

tences, which can be classiﬁed into three types: (a)

Lexical features, including: part-of-speech tags, types

of entities, etc, (b) Syntactic features, including the

syntactic path between two entities, sentence voice,

etc, and (c) Arabic-speciﬁc features which are special

features that characterize the Arabic language. More

details about the features types are shown in Section

Once the relevant sentences are retrieved from the

IR module. Each sentence is converted into a vector

of features and tagged with the corresponding rela-

tion type. These tagged feature vectors form the ﬁnal

training data which is used to train the Relation clas-

siﬁer.

4.3 Relation Classiﬁer

The goal of the relation classiﬁer is to extract the rela-

tion triplets (entity

, entity

, relation type) from un-

seen text with high conﬁdence. The constructed train-

ing data is used to train an SVM classiﬁer, which clas-

siﬁes one sentence into a relation type. For any two

unseen entities, ﬁrst we extract all the relevant sen-

tences that contain the two entities using the IR mod-

ule. Then, we classify each sentence using the SVM

classiﬁer into a relations type. Finally, we use a voting

scheme which selects the most conﬁdent relation type

of these two entities. The voting scheme calculates

the total conﬁdence of each relation type predicted by

the SVM classiﬁer and selects the relation type with

the highest conﬁdence.

Since we are more concerned about the false pos-

itive rate of the overall system, we only detect a re-

lation type between the two entities if the conﬁdence

value is larger than a conﬁdence threshold α. Other-

wise, the system fails to detect a relation. Although

this aggressive way fails to detect all true relations, it

guarantees that the detected relations are always true.

5 FEATURES

Table 1 summarizes the features extracted by ArabRe-

lat system.

5.1 Lexical Features

The lexical features describe the two entities and the

words around them. This type of features includes:

• Number of words between the two entities.

• Part-of-speech tags of the two entities.

• Named-entity types of the two entities.

• Part-of-speech tags of the words between the two

entities.

ArabRelat: Arabic Relation Extraction using Distant Supervision

413

Table 1: Features extracted by ArabRelat system.

Lexical Features

Number of words between the two entities.

Named-entity type of ﬁrst entity.

Named-entity type of second entity.

POS tag of ﬁrst entity.

POS tag of second entity.

POS tags of k words before the ﬁrst entity.

POS tags of k words after the second entity.

Syntactic Features

Syntactic path between the two entities.

Sentence voice, values are: passive or active.

isNegated a ﬂag for negated relations.

Arabic-speciﬁc Features

Structural word order, values are: SVO or VSO.

Number matching with the ﬁrst entity, values are: singular, plural or N/A.

Number matching with the second entity, values are: singular, plural or N/A.

Gender matching with the ﬁrst entity, values are: feminine, masculine or N/A.

Gender matching with the second entity, values are: feminine, masculine or N/A.

Verb mood, values are: subjunctive, jussive or N/A.

• Part-of-speech tags of k words before the ﬁrst en-

tity.

• Part-of-speech tags of k words after the second en-

tity.

Where k is a parameter. For our system, we set

k = 3. These features have been used in previous

work, such as: (Mintz et al., 2009), (Yao et al., 2012).

The function of the lexical features is to characterize

the sentence assuming that sentences with the same

relation type would exhibit similar lexical features.

5.2 Syntactic Features

Syntactic parsing is the process of parsing plain text

into linguistic units (e.g. words) which are connected

using directed links. We use the syntactic path be-

tween the two entities, which consists of the sequence

of the head words on the directed path between the

two entities. To get the syntactic path, we use Stan-

ford Arabic parser (Manning et al., 2014). An exam-

ple is shown in Figure 4.

We also use the voice of the sentence (active or

passive) as a relation feature, where similar relations

usually appear in different sentences with the same

sentence voice. For example, the relation “was born

in” usually appears in passive voice, while the relation

“traveled to” usually appears in active voice, even if

the entities have the same types in both relations.

We also add a feature that shows whether the rela-

tion is negated. If a negative part appears in the words

between the two entities, this may indicate that there

VBN

NP PP



NNP

DTNNP

IN NP

  

NNP



Figure 4: An example of Arabic sentence parsed using Stan-

ford parser.

is no relation between these two entities.

5.3 Arabic-speciﬁc Features

The Arabic language is one of the Semitic languages

that have unique characteristics different from the En-

glish language. Therefore, we exploit some of the

Arabic rich morphological features, which could be

used to address the Arabic language challenges and

better discriminate entities relation types. Among the

Arabic-speciﬁc features are:

5.3.1 The Structural Word Order

Arabic sentences differ in the syntactic order of

the words. Sentences can be classiﬁed into two

KEOD 2015 - 7th International Conference on Knowledge Engineering and Ontology Development

414

types: subject-verb-object (SVO) sentences and verb-

subject-object (VSO) sentences. Relations of the

same type usually appear in the same form. There-

fore, we use the type of the sentence as a feature in

our relation classiﬁer.

5.3.2 Number Matching

The Arabic verbs include number information. For

example, the verb differs based on the subject num-

ber (singular or plural). Therefore, among the rich

features that could characterize a relation, is the num-

ber matching between the relation verb (i.e. the verb

between the two entities) and the two entities. Using

this property, we extract two more features:

• Number matching between the verb of the relation

(if any) and the ﬁrst entity.

• Number matching between the verb of the relation

(if any) and the second entity.

5.3.3 Gender Matching

The Arabic verbs are also characterized by the gender

information, where the ﬁrst letter of the verb differs

according the gender of its subject. As most of the

relations are characterized by a verb describing this

relation, we argue that the matching pattern between

the verb gender and the gender of the entities could in-

dicate the relation between these entities. Therefore,

we add another two features:

• Gender matching between the verb of the relation

(if any) and the ﬁrst entity.

• Gender matching between the verb of the relation

(if any) and the second entity.

5.3.4 Verb Mood

The Arabic verbs also differ according to the verb

mood. Arabic verb moods include: subjunctive and

jussive. Relation verbs usually come in one form ac-

cording to the relation type. Therefore, the verb mood

is also added as a discriminative relation feature.

6 PERFORMANCE EVALUATION

6.1 Implementation

We use Apache Lucene library

for the IR module.

We use Lucene indexer to build the inverted index

over Wikipedia pages and the searcher to get the most

https://lucene.apache.org/

relevant pages to a relation query. For building the

classiﬁer, we use the SVM implemenation of Weka

library (Hall et al., 2009).

For the features extraction, we use Stanford parser

(Manning et al., 2014) to extract the syntactic path.

The morphological analyzer, MADAMIRA (Pasha

et al., 2014), is used to extract all the other features.

MADAMIRA provides a large set of rich morpholog-

ical features of the Arabic text, including: stem, root,

lemma, POS tags, gloss, case, mood, etc. The last

version of MADAMIRA also provides named-entity

tagging. So we use it to extract the morphological

features and named entities.

6.2 Datasets

We build the training data using DBpedia relations

and sentences extracted from Wikipedia. We use a

subset of DBpedia dump consisting of 1358 relation

instances which correspond to 97 different relation

classes. We divided the instances of each relation type

equally into training and testing. For each relation in-

stance, we extracted all the relevant sentences from

Wikipedia. The total number of sentences of the train-

ing data is 4915. The total number of sentences of the

test data is 7500.

We also extracted some negative relations, which

are unrelated entity pairs that exist in one sentence.

To build the negative relations, we used the entities

that appear in one Wikipedia sentence and do not ap-

pear in the whole DBpedia relations. Although one

might criticize this method since DBpedia is incom-

plete, which means that the negative relations may in

fact express a relation, we argue that this will lead to

a decrease in the true positive rate, while maintaining

a low false positive rate, which is our main concern in

the system.

6.3 Evaluation Results

Two evaluation methodologies are used to evaluate

the system. In the ﬁrst method, half the instances of

each relation are used in training and the other half is

held out for testing. In this method, we trust the auto-

matic tagging manipulated by our system, thus we call

it Trust method. In the second method, we use human

evaluation where a small subset of the test relations

are tagged by an Arabic speaker and used to evaluate

the system. We call this method Human evaluation

method.

6.3.1 Trust Method

We compare ArabRelat system against a baseline re-

lation classiﬁer. The baseline uses ArabRelat system

ArabRelat: Arabic Relation Extraction using Distant Supervision

415

Table 2: Results of the test dataset.

Baseline ArabRelat

Precision 0.59 0.74

Recall 0.21 0.67

F-measure 0.31 0.70

0.2

0.4

0.6

0.8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Precision

Confidence Threshold (α)

Figure 5: Effect of conﬁdence threshold α on the system

accuracy.

with subset of the features. We assume the baseline

features are the lexical and syntactic features. We

show the effect of adding the Arabic-speciﬁc features

on the system accuracy. Table 2 shows the results of

the system using the evaluation test set. The results

show that while the baseline maintains good preci-

sion, it bitterly decreases the recall. ArabRelat im-

proves the precision by 15% over the baseline, and

improves the recall by 46%.

6.3.2 Effect of Conﬁdence Threshold

Figure 5 shows the effect of the conﬁdence threshold

α on the system accuracy using the test data. As the

value of α increases, the accuracy of the system in-

creases until it reaches its optimal value at α = 0.6.

For larger values of alpha the accuracy decreases be-

cause the number of instances which survive becomes

very small, thus more prone to false positive errors.

We set the default value of α to 0.6.

6.3.3 Human Evaluation Method

Due to the lack of public gold-standard Arabic rela-

tion data, we construct another test dataset tagged by

an Arabic speaker. We extracted 100 sentences of the

test dataset, an Arabic native speaker tagged each sen-

tence to a relation type. The speaker was given each

sentence and the two entities to be tagged, with a set

of relation types. The task was to tag each sentence

with a suitable relation type or none if the sentence

does not express a relation between the two entities.

Table 3: Human Evaluation Results.

Baseline ArabRelat

Precision 0.34 0.50

Recall 0.33 0.43

F-measure 0.34 0.46

We used a subset of 18 relation types. The results are

shown in Table 3. The precision of ArabRelat system

decreases due to the small size of the dataset, how-

ever, it still outperforms the baseline.

7 CONCLUSION

In this paper, we propose a novel Relation Extrac-

tion system for the Arabic language. The system uses

distant supervised learning to build a relation classi-

ﬁer, without the need of prior labeled data. We in-

troduce new Arabic speciﬁc features that character-

ize Arabic relations. Our experimental results on sen-

tences extracted from Wikipedia show that the system

achieves 70% overall F-measure for detecting 97 re-

lation types.

REFERENCES

Alsaif, A. and Markert, K. (2011). Modelling discourse re-

lations for arabic. In Proceedings of the Conference on

Empirical Methods in Natural Language Processing.

Association for Computational Linguistics.

Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M.,

and Etzioni, O. (2007). Open information extraction

for the web. In IJCAI.

Diab, M. T., Moschitti, A., and Pighin, D. (2008). Semantic

role labeling systems for arabic using kernel methods.

In ACL.

Fan, M., Zhao, D., Zhou, Q., Liu, Z., Zheng, T. F., and

Chang, E. Y. (2014). Distant supervision for relation

extraction with matrix completion. In Proceedings of

the 52nd Annual Meeting of the Association for Com-

putational Linguistics.

Gabrilovich, E. and Markovitch, S. (2007). Computing se-

mantic relatedness using wikipedia-based explicit se-

mantic analysis. In IJCAI.

Green, S. and Manning, C. D. (2010). Better arabic pars-

ing: Baselines, evaluations, and analysis. In Proceed-

ings of the 23rd International Conference on Compu-

tational Linguistics, pages 394–402. Association for

Computational Linguistics.

Gupta, R., Halevy, A., Wang, X., Whang, S. E., and Wu, F.

(2014). Biperpedia: An ontology for search applica-

tions. Proceedings of the VLDB Endowment.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann,

P., and Witten, I. H. (2009). The weka data min-

KEOD 2015 - 7th International Conference on Knowledge Engineering and Ontology Development

416

ing software: an update. ACM SIGKDD explorations

newsletter, 11(1):10–18.

Hsu, I.-C., Lin, H.-Y., Yang, L. J., and Huang, D.-C. (2012).

Using linked data for intelligent information retrieval.

In Soft Computing and Intelligent Systems (SCIS) and

13th International Symposium on Advanced Intelli-

gent Systems (ISIS), 2012 Joint 6th International Con-

ference on. IEEE.

Kambhatla, N. (2006). Minority vote: at-least-n voting im-

proves recall for extracting relations. In Proceedings

of the COLING/ACL on Main conference poster ses-

sions. Association for Computational Linguistics.

Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J.,

Bethard, S. J., and McClosky, D. (2014). The stan-

ford corenlp natural language processing toolkit. In

Proceedings of 52nd Annual Meeting of the Associa-

tion for Computational Linguistics: System Demon-

strations.

Mintz, M., Bills, S., Snow, R., and Jurafsky, D. (2009).

Distant supervision for relation extraction without la-

beled data. In Proceedings of the Joint Conference

of the 47th Annual Meeting of the ACL and the 4th

International Joint Conference on Natural Language

Processing of the AFNLP: Volume 2. Association for

Computational Linguistics.

Nguyen, T.-V. T. and Moschitti, A. (2011). Joint distant and

direct supervision for relation extraction. In IJCNLP.

NIST, U. (2003). The ace 2003 evaluation plan. US Na-

tional Institute for Standards and Technology (NIST).

Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Es-

kander, R., Habash, N., Pooleery, M., Rambow, O.,

and Roth, R. M. (2014). Madamira: A fast, com-

prehensive tool for morphological analysis and disam-

biguation of arabic. In Proceedings of the Language

Resources and Evaluation Conference (LREC), Reyk-

javik, Iceland.

Snow, R., Jurafsky, D., and Ng, A. Y. (2004). Learning

syntactic patterns for automatic hypernym discovery.

Advances in Neural Information Processing Systems

17.

Unger, C., B

uhmann, L., Lehmann, J., Ngonga Ngomo, A.-

C., Gerber, D., and Cimiano, P. (2012). Template-

based question answering over rdf data. In Proceed-

ings of the 21st international conference on World

Wide Web, pages 639–648. ACM.

Waitelonis, J. and Sack, H. (2012). Towards exploratory

video search using linked data. Multimedia Tools and

Applications.

Yahya, M., Berberich, K., Elbassuoni, S., Ramanath, M.,

Tresp, V., and Weikum, G. (2012). Natural language

questions for the web of data. In Proceedings of the

2012 Joint Conference on Empirical Methods in Natu-

ral Language Processing and Computational Natural

Language Learning. Association for Computational

Linguistics.

Yao, L., Riedel, S., and McCallum, A. (2012). Unsuper-

vised relation discovery with sense disambiguation. In

Proceedings of the 50th Annual Meeting of the Asso-

ciation for Computational Linguistics: Long Papers-

Volume 1. Association for Computational Linguistics.

ArabRelat: Arabic Relation Extraction using Distant Supervision

417