NER systems are often evaluated by means of the
precision, recall and balanced F
β=1
score, which are
defined as follows. “TP”, “FP” and “FN” are the
numbers of true positives, false positives and false
negatives.
Precison =
TP
TP+ FP
Recall =
TP
TP+ FN
F
β=1
=
(β
2
+ 1) · Recall · Precision
β
2
· Recall + Precision
Next, we will present our results and compare
them to other methods that have been applied on the
mostly similar data set, which are shown in Table 2 as
follows.
Table 2: Evaluation results of our system compared with
other systems.
Method Precision Recall F
β=1
Our system
0.835 0.808 0.821
Two-layers SVMs
PosBIOTM-Ner 0.800 0.685 0.738
YamCha 0.825 0.742 0.781
Rule-based tagger 0.843 0.718 0.775
HMM-based tagger 0.803 0.814 0.809
PowerBioNE 0.820 0.832 0.826
Our best result is obtained when C = 1 and
[−le ft, right] window size is [−2, 2]. The results of
PosBIOTM-Ner, Yamcha, Rule-based tagger, HMM-
based tagger and PowerBioNE are taken from (Song
et al., 2004), (Mitsumori et al., 2005), (Tamames,
2005), (Kinoshita et al., 2005) and (Zhou et al., 2005)
respectively.
PosBIOTM-Ner and Yamcha are both based on
SVM algorithm and BIO representation, the same
as our system. PosBIOTM-Ner makes use of edit-
distance as a significant contributing feature for SVM
while Yamcha uses an external lexicon composed of
SWISS-PROT and TrEMBL data. From Table 2, it
can be easily seen that our system based on two-layer
SVMs can outperform these two systems, i.e., F
β=1
score is 8.3% higher than PosBIOTM-Ner and 4%
higher than Yamcha. Also our system can achieve
better performance than Rule-based and HMM-based
taggers.
Notice that our system performs a little worse
than PowerBioNE. We’ve achieved 1.5% higher pre-
cision but 2.4% lower recall, which leads to 0.5%
lower F
β=1
score. This is not surprising for us be-
cause PowerBioNE is a complex system, which is
an ensemble of classifiers in which three classifiers,
one SVM and two discriminative HMMs, are com-
bined using a simple majority voting strategy. In addi-
tion, PowerBioNE incorporates three post-processing
modules to improve the performance further. In the
paper (Zhou et al., 2005), they report that SVM clas-
sifier can get higher precision while HMM classifier
can obtain higher recall. We think that is why our
system has lower recall than PowerBioNE. However,
considering the simplicity of our approach, our sys-
tem is still comparable to PowerBioNE.
In a word, our system can outperform most of
above systems and achieve an approximate state of
the art result, though it is relatively simple and doesn’t
utilize any external lexical resources.
4 DISCUSSION
In the following sections, we start by discussing our
system in detail from different kinds of aspects. Then
we will give a comprehensive error analysis.
4.1 Contributions of Different Features
The effects of each feature in the first layer in our sys-
tem are shown in Table 3. The row of “token(base)”
shows the result when only the token feature is used
in the SVM learning. The other rows show the re-
sults when the token feature plus one other feature are
used in the learning, i.e.,“+orthographic” means the
token plus the orthographic feature; “+POS” means
the token feature plus the MedPost POS feature; “+bi-
prefix/suffix” means the token feature plus the bi-
prefix and bi-suffix features, etc.
Table 3: Contributions of each feature after added into the
base feature.
Precision Recall F
β=1
token (base) 0.714 0.491 0.581
+orthographic 0.729 0.709 0.719
+MedPost POS 0.717 0.541 0.616
+bi-prefix/suffix 0.760 0.615 0.680
+tri-prefix/suffix 0.750 0.571 0.648
+quad-prefix/suffix 0.728 0.530 0.613
From Table 3, it shows that the orthographic fea-
ture and the bi-prefix/suffix feature are critical for
our system, which improve F
β=1
score by 13.8% and
9.9% compared with the base. Moreover, most of the
features in Table 3 can’t affect the precision so much
while they mostly have great effects on the recall.
Then, as mentioned above, we use MedPost for
POS tagging, which was trained on the MEDLINE
ICEIS 2007 - International Conference on Enterprise Information Systems
42