• use Semantic Role Labeling to pre-process a text
instead of complex dependency parsing;
• compare relation extraction efficiency using dia-
log coherency approach.
2 RELATED WORK
2.1 Open Information Extraction
The first solution formally introducing OIE was Tex-
tRunner (Banko, 2007). Its primary goal was to obvi-
ate manual adjustment of the extraction rules in case
of corpora or a target relation shift. TextRunner was
making a single pass over the corpus heuristically ex-
tracting relations triple centered around a verb phrase
(VP) with a specific subject and direct object. It at-
tempted to find the best arguments for that relation
applying additional heuristics e.g. neither head nor
target consist solely of a pronoun. A Naive Bayes
was used as an estimator of a confidence function that
was then trained over a set of features on the extrac-
tions so that the system could provide calibrated con-
fidence values. Comparing to the last pre OIE solu-
tion, KnowitAll, quality of extraction and its perfor-
mance significantly improved. Apart from that in Kn-
woItAll relations had to be specified upfront (Oren Et-
zioni, 2005). ReVerb (Fader, 2011) introduced addi-
tional syntactic and lexical constraints to limit inco-
herent and uninformative extractions of TextRunner
in detecting distant relationships. For example: ”The
Obama administration is offering only modest green-
house gas reduction targets at the conference.” would
yield a relation ”is offering only modest greenhouse
gas reduction targets at” between ”The Obama ad-
ministration” and ”the conference”.
Fast dependency parsers and their ability to cre-
ate a sentence Dependency Tree (DT) allowed con-
struction of more sophisticated templates that further
increased precision and recall of extraction. OLLIE
(M. Schmitz, 2012) defined ”Open Relation Patterns”,
which, using a dependency tree, were mediated by
nouns and adjectives, not just verbs. OLLIE’s pro-
cessing began with seed tuples from REVERB and
used them to build a bootstrap training set. It learned
open pattern templates applied to individual sentences
at extraction time.
The latest advancement in OIE is related to the lat-
est progress in language modeling. The Transformer
architecture leads to a novel paradigm of Neural Open
Information Extraction (NOIE), (Zhou, 2022). NOIE
approaches extractions from two major directions:
Tagging and Generation.
Tagging-based solutions use annotation of tags’
sequence corresponding to facts in the input sentence.
The Generation-based ones directly decode relations
relying on Sequence2Sequence architecture. Both
tagging and generation paradigms predicts relation-
ships auto-repressively, which means the current pre-
diction relies on the previous output (Zhou, 2022). A
skewed prediction will be inherited and magnified in
the later steps. As the number of steps grows, errors
accumulate and may decrease the performance.
An exemplary graph-based approach (Yu, 2021)
breaks the auto-regressive factorization by construct-
ing a graph where nodes are text spans and edges con-
necting them indicate that they belong to the same
fact. Relationship discovery task is cast as maximal
clique detection.
2.2 Semantic Role Labeling
Frame Semantics was originally introduced by
Charles J. Fillmore (Charles, 1977) with the basic
idea that one cannot understand the meaning of a sin-
gle word without access to all the essential knowledge
related to that word, namely, its semantic frame. The
semantic frame is strictly associated with the word’s
meaning expressed in the sentence. Semantic Role
Labelling (SRL) (D.Gildea, 2000) (D.Jurafsky, 2022)
identifies and models frame’s structure. SRL takes
a sentence and identifies verbs and their arguments.
Then, it classifies the arguments by mapping them to
roles relevant to the verb in that frame, such as agent,
patient, instrument, or benefactor. In other words,
SRL tries to identify ”Who, What, Where, When,
With What, Why, How” for each frame. A state-of-
the-art deep pre-trained SRL model (Peng Shi, 2019)
detects the simplified structure of a frame where in-
stead of an agent, a patient, an instrument it detects
generic simplified arguments of a verb: ARG0, ARG1
and others. The structure of the frame highly corre-
lates with the dependency tree (DT) of the sentence
(T.Shi and O.Irsoy, 2020), where the verb and verb’s
arguments create constituencies (noun phrases NPs
and verb phrases VPs). Moreover, it is possible to
reduce the SRL task to a Dependency Parsing task
(T.Shi and O.Irsoy, 2020) . In addition, SRL offers an
efficient approach to the problem of the decomposi-
tion of complex sentences which was initially solved
by a trained, dedicated, classifier splitting a sentence
into shorter utterances (Angeli, 2019).
2.3 Knowledge Graphs
Constructing an ontology from text is challenging
due complexity of human language. Initial approach
ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods
436