Domain Speciﬁc Grammar based Classiﬁcation for Factoid Questions

Alaa Mohasseb

, Mohamed Bader-El-Den

and Mihaela Cocea

School of Computing, University of Portsmouth, Portsmouth, U.K.

Keywords: Information Retrieval, Question Classiﬁcation, Factoid Questions, Grammatical Features, Machine Learning.

Abstract:

The process of classifying questions in any question answering systems is the ﬁrst step in retrieving accurate

answers. Factoid questions are considered the most challenging type of question to classify. In this paper, a

framework has been adapted for question categorization and classiﬁcation. The framework consists of three

main features which are, grammatical features, domain-speciﬁc features, and grammatical patterns. These

features help in preserving and utilizing the structure of the questions. Machine learning algorithms were used

for the classiﬁcation process in which experimental results show that these features helped in achieving a good

level of accuracy compared with the state-of-art approaches.

1 INTRODUCTION

The process of classifying questions in any question

answering systems is the ﬁrst step in retrieving ac-

curate answers. Question classiﬁcation performed in

question answering systems affects the answers and

most errors happen due to miss-classiﬁcation of the

questions (Moldovan et al., 2003). The task of gen-

erating answers to the users’ questions is directly re-

lated to the type of questions asked. Several ques-

tion taxonomies have been proposed, the most pop-

ular classiﬁcation taxonomy of factoid (’wh-’) ques-

tions is Li and Roth’s categories (Li and Roth, 2006),

which consists of a set of six coarse-grained cate-

gories and ﬁfty ﬁne-grained ones. Developing an

accurate question classiﬁer requires using the accu-

rate question features such as linguistics features (Le-

Hong et al., 2015). In addition, it is found more chal-

lenging to classify ’wh-’ questions into proper seman-

tic categories more than other types in question an-

swering systems (Li et al., 2008).

In this paper, a grammar-based framework for

questions categorization and classiﬁcation is adapted

(Mohasseb et al., 2018). The framework is applied to

question classiﬁcation according to Li and Roth’s (Li

and Roth, 2006) categories of intent, using domain

speciﬁc grammatical features and patterns. These fea-

tures have the advantage of preserving the grammati-

cal structure of the question. The aim is to assess the

https://orcid.org/0000-0003-2671-2199

https://orcid.org/0000-0002-1123-6823

https://orcid.org/0000-0002-5617-1779

inﬂuence of using the structure of the question and

the domain-speciﬁc grammatical categories and fea-

tures on the classiﬁcation performance. To achieve

this aim, the following objectives are deﬁned:

1. Investigate the inﬂuence of using domain-speciﬁc

grammatical features on the classiﬁcation perfor-

mance;

2. Compare the performance of different machine

learning algorithms for the classiﬁcation of fac-

toid questions intent;

3. Investigate the classiﬁcation performance in com-

parison with state-of-the art approaches.

The rest of the paper is organised as follows. Sec-

tion 2 outlines the categorization of questions and the

previous work in question classiﬁcation. Section 3

describes the proposed approach and the grammatical

features used. The experimental setup and results are

presented in Section 4, while the results are discussed

in Section 5. Finally, Section 6 concludes the paper

and outlines directions for future work.

2 BACKGROUND

In this section we review previous work on question

classiﬁcation. Different proposed question categories

are outlined in Section 2.1, while Section 2.2 reviews

previous work on question classiﬁcation methods.

Mohasseb, A., Bader-El-Den, M. and Cocea, M.

Domain Speciﬁc Grammar based Classiﬁcation for Factoid Questions.

DOI: 10.5220/0007958601770184

In Proceedings of the 15th International Conference on Web Information Systems and Technologies (WEBIST 2019), pages 177-184

ISBN: 978-989-758-386-5

177

2.1 Questions Categories

Different questions categories were proposed. In

(Kolomiyets and Moens, 2011) authors classiﬁed

questions to eight types which are: list, deﬁnition,

factoids, causal, relationship, hypothetical, procedu-

ral, and conﬁrmation questions. Authors in (Bu et al.,

2010) proposed six question categories which were

tailored to general QA, namely: fact, list, reason,

solution, deﬁnition and navigation. Furthermore, in

(Bullington et al., 2007) authors classiﬁed questions

to 11 categories, which are: Advantage/Disadvantage,

Cause and Effect, Comparison, Deﬁnition, Example,

Explanation, Identiﬁcation, List, Opinion, Rationale,

and Signiﬁcance. In (Mohasseb et al., 2018) ques-

tions were classiﬁed and labeled to six different cate-

gories, which are; causal, choice, conﬁrmation (Yes-

No questions), factoid (Wh-questions), hypothetical

and list. This classiﬁcation is based on the question

types asked by the users and the answers given and

categorization was motivated by the question types in

English.

The most famous factoid question type taxon-

omy is the one of Li and Roth (Li and Roth, 2006).

Many researchers focused on Li and Roth classiﬁca-

tion of question (Li et al., 2005), (Metzler and Croft,

2005), (Van-Tu and Anh-Cuong, 2016), (Xu et al.,

2016), (Le-Hong et al., 2015), (May and Steinberg,

2004),(Zhang and Lee, 2003), (Mishra et al., 2013),

(Huang et al., 2008), (Nguyen et al., 2008), (Nguyen

and Nguyen, 2007), (Li et al., 2008). Their two-

layer taxonomy consists of a set of six coarse-grained

categories which are abbreviation, entity, description,

human, location and numeric value, and ﬁfty ﬁne-

grained ones, e.g., Abbreviation, Description, Entity,

Human, Location and Numeric as coarse classes, and

Expression, Manner, Color and City as ﬁne-grained

classes.

2.2 Question Classiﬁcation Methods

In this section, we review related work on question

classiﬁcation methods and machine learning algo-

rithms using Li and Roth’s question categories which

is adopted in this study as it is the most widely used

question intent taxonomy in the literature.

Authors in (Li et al., 2005) used compositive

statistic and rule classiﬁers combined with different

classiﬁers and multiple classiﬁer combination meth-

ods. Moreover, many features such as dependency

structure, wordnet synsets, bag-of-words, and bi-

gram were used with a number of kernel functions.

In (Van-Tu and Anh-Cuong, 2016) a method was pro-

posed using feature selection algorithm to determine

appropriate features corresponding to different ques-

tion types. Authors in (Metzler and Croft, 2005)

proposed a statistical classiﬁer based on SVM which

learns question word by using prior knowledge about

correlations between question words and types. Fur-

thermore, a SVM-based approach for question clas-

siﬁcation was proposed in (Xu et al., 2016). The

proposed approach incorporates dependency relations

and high-frequency words. In (Mishra et al., 2013)

question classiﬁcation method was proposed using

three different classiﬁers, k-Nearest neighbor, Naive

Bayes, and SVM. In addition, features such as using

bag-of-words and bag-of-ngrams were used and a set

of lexical, syntactic, and semantic features. Moreover,

authors in (Zhang and Lee, 2003) used ﬁve machine

learning algorithms which are, k-Nearest neighbor,

Naive Bayes, Decision Tree, Sparse Network of Win-

nows, and SVM. In addition, two features were used;

bag-of-words and bag-of-ngrams. In (Huang et al.,

2008) authors adapted Lesk’s word sense disambigua-

tion algorithm and the depth of hypernym feature is

optimized with further augment of other standard fea-

tures such as unigrams. Moreover, authors in (Le-

Hong et al., 2015) proposed a compact feature set that

uses typed dependencies as semantic features. A hi-

erarchical classiﬁer was designed in (May and Stein-

berg, 2004). In addition, different classiﬁers has been

used such as SVM, MaxEnt, Naive Bayes and Deci-

sion Tree for primary and secondary classiﬁcation. In

(Nguyen et al., 2008) authors used unlabeled ques-

tions in combination with labeled questions for semi-

supervised learning. In addition, Tri-training were

selected to improve the precision of question clas-

siﬁcation task. In addition, a two-level hierarchical

classiﬁer for question classiﬁcation was proposed in

(Nguyen and Nguyen, 2007) such as supervised and

semi-supervised learning. Finally, in (Li et al., 2008)

authors classiﬁed what-type questions by head noun

tagging. In addition, different features such as local

syntactic feature, semantic feature and category de-

pendency were integrated among adjacent nouns with

Conditional Random Fields to reduce the semantic

ambiguities of head noun.

3 PROPOSED APPROACH

3.1 Factoid Questions Grammatical

Features

This analysis was ﬁrst introduced in (Mohasseb et al.,

2018). Wh-questions (factoid) has its own character-

istics, features, and structure that help in the identiﬁ-

WEBIST 2019 - 15th International Conference on Web Information Systems and Technologies

178

cation and the classiﬁcation process.

The main feature of a factoid question (Wh-

Questions) is the presence of question words, this

kind of question starts with a question word, such as

What, Where, Why, Who, Whose, When, Which, e.g.

”What did the only repealed amendment to the U.S.

Constitution deal with ?”. In addition, this question

could start with question words that do not start with

”wh” such as how, how many, how often, how far, how

much, how long, how old, e.g. ”How long does it take

light to reach the Earth from the Sun ?”. In addi-

tion, the structure of this type of question could begin

with a preposition followed by a question, ”P + QW”

such as ”In what year did Thatcher become prime

minister?” OR ”At what age did Rossini stop writing

opera?”. Also in many cases the question word could

be found in the middle of the question, e.g. ”The cor-

pus callosum is in what part of the body ?”.

Most factoid questions are related to facts, current

events, ideas and suggestions and could formulate an

advice question, e.g. ”How do you make a paintball

?”. In addition, some factoid questions could contain

two types of question words, for example ”What does

extended deﬁnition mean and how would one write a

paper on it ?”. Furthermore, factoid questions could

have any kind of information given as an answer or

response.

3.2 Question Classiﬁcation Features

In a previous study (Mohasseb et al., 2018), a

Grammar-based framework for Questions Catego-

rization and Classiﬁcation (GQCC) was proposed. In

this study, the framework is applied to question clas-

siﬁcation, according to Li and Roth’s (Li and Roth,

2006) categories of intent. Three main features which

are, (1) Grammatical Features, (2) Domain speciﬁc

Features and (3) Grammatical Pattern Features have

been used for the analysis and classiﬁcation of fac-

toid questions. These features help in transforming

the questions into a new representation which has the

advantage of preserving the grammatical structure of

the question. The used features are discussed in more

details in the following sections.

3.2.1 Grammatical Features

The main objective of the grammatical features (Mo-

hasseb et al., 2018), is to transform the questions (by

using the grammar) into a new representation as a se-

ries of grammatical terms, i.e. a grammatical pattern.

The grammatical features consist of the seven major

word classes in English, which are Verb (V ), Noun

(N), Determiner (D), Adjective (Ad j), Adverb (Adv)

, Preposition (P) and Conjunction (Con j) in addition

to the six main question words: How (QW

How

), Who

(QW

W ho

), When (QW

W hen

), Where (QW

W here

), What

(QW

W hat

) and Which (QW

W hich

). Some word classes

like Noun can have sub-classes, such as Common

Nouns (CN), Proper Nouns (PN), Pronouns (Pron),

and Numeral Nouns (NN) as well as Verbs, such as

Action Verbs (AV), Linking Verbs (LV ) and Auxil-

iary Verbs (AuxV ). In addition, the grammatical fea-

tures consist of other features such as singular (e.g.

Common Noun – Other- Singular (CN

)) and plural

terms (e.g. Common Noun- Other- Plural (CN

))

3.2.2 Domain Speciﬁc Grammatical Features

Domain-speciﬁc features (i.e. related to question-

answering) where identiﬁed, which correspond to

topics (Mohasseb et al., 2018) – these are listed in

Table 1. Instead of further classifying the question

to ﬁne grained which is based on a large number of

categories, we have used domain speciﬁc features to

determine the type of question. For example, question

type ENTY consists of ﬁne grained categories such

as religion, disease/medicine, event, product. These

type could be identiﬁed using the following domain

speciﬁc grammatical features: religion = religious

terms PN

, disease/medicine = health terms CN

HLT

and PN

HLT

, product = Products PN

, event = events

. Hence the domain speciﬁc grammatical features

contain less categories than the ﬁne grained categories

proposed by Li and Roth but still could identify the

different coarse categories.

Table 1: Domain Speciﬁc Grammatical Features.

Domain speciﬁc Features Abbr.

Celebrities Name PN

Entertainment PN

Ent

Newspapers, Magazines, Documents, Books PN

BDN

Events PN

Companies Name PN

Geographical Areas PN

Places and Buildings PN

Institutions, Associations, Clubs, Foundations and Or-

ganizations

IOG

Brand Names PN

Software and Applications PN

Products PN

History and News PN

Religious Terms PN

Holidays, Days, Months PN

HMD

Health Terms PN

HLT

Science Terms PN

Database and Servers CN

DBS

Advice CN

Entertainment CN

Ent

History and News CN

Site, Website, URL CN

SWU

Health Terms CN

HLT

Domain Speciﬁc Grammar based Classiﬁcation for Factoid Questions

179

3.2.3 Grammatical Patterns

The main objective of the question grammatical pat-

tern help is the identiﬁcation of the question type,

since each factoid question type has a certain struc-

ture (grammatical pattern). For example, the follow-

ing question which represent (HUM) type of question

”Who killed Martin Luther King?” has the follow-

ing grammatical pattern QW

W ho

+ AV + PN

. While,

the question which represent (LOC) type of ques-

tion ”What is the capital of Italy?” has the follow-

ing grammatical pattern QW

W hat

+ LV + D + CN

P + PN

. The different pattern representations help

in distinguishing between different factoid question

type. A full description of how these features are used

is provided in the following sections

3.3 Question Classiﬁcation Framework

The question classiﬁcation framework takes into ac-

count the grammatical structure of the questions and

combines grammatical features with domain-related

information and grammatical patterns. The frame-

work consists of three main phases; (1) Question

Parsing and Tagging, (2) Pattern Formulation and (3)

Question Classiﬁcation. The following question from

Li and Roth datasets will be used ”Where are the

Rocky Mountains?” to illustrate how these phases

work.

(1) Question Parsing and Tagging: this step is

mainly responsible for extracting user’s question

terms. The system simply takes the question and

parses to tag each term in the question to its terms’

category. In this phase parsing the keywords and

phrases is done by; ﬁrst parsed compound words then

single words. In addition, the term tagging is done

by tagging each term to its grammar terminals; each

term will be tagged to its highest level of abstraction

(domain speciﬁc).

For the given example the question will be parsed

and tagged as follow:

Question: ”Where are the Rocky Mountains?”

Terms extracted: Where, are, the, Rocky Moun-

tains

After parsing, each term in the question will be

tagged to one of the terms category using tag-set the

was proposed by (Mohasseb et al., 2014) and (Mo-

hasseb et al., 2018). The ﬁnal tagging will be:

Question Terms Tagging: Where= QW

W here

, are=

LV , the= D, Rocky Mountains= PN

(2) Pattern Formulation: in this phase after parsing

and tagging each term in the question, the pattern is

formulated. This is done by matching the question

with the most appropriate question pattern to help fa-

cilitate the classiﬁcation processing and the identiﬁ-

cation of the factoid question type in the next phase.

For the given example, the following pattern will

be formulated:

Question Pattern: QW

W here

+ LV + D + PN

(3) Question Classiﬁcation: This phase is done by

using the patterns generated in Phase (2), the aim of

this phase is to build a model for automatic classiﬁca-

tion. The classiﬁcation is done by following the stan-

dard process for machine learning, which involves the

splitting of the dataset into a training and a testing

dataset. The training dataset is used for building the

model, and the test dataset is used to evaluate the per-

formance of the model.

For the given example, the question will be classi-

ﬁed to the following question type.

Question Type: LOC

4 EXPERIMENTAL STUDY AND

RESULTS

In the experimental study we investigate the ability

of machine learning classiﬁers to distinguish between

different question types based on grammatical fea-

tures and question patterns. Two machine learning al-

gorithms, were used for question classiﬁcation; Sup-

port Vector Machine (SVM) and J48. We used 1000,

2000 and 3000 questions that were selected from Li

and Roth

. Their distribution is given in Table 2.

Questions in the dataset are classiﬁed into two cate-

gories; coarse and ﬁne, in this experiment coarse cat-

egories have been used.

Table 2: Data distribution.

Question Type 1000 2000 3000

ABBR 18 30 45

DESC 211 419 655

ENTY 244 486 710

HUM 220 442 655

LOC 156 312 457

NUM 151 311 478

To assess the performance of proposed features

and the machine learning classiﬁers two experiments

have been conducted (1) using our proposed features

using the Weka

software and (2) using the most used

features in previous works such as n-gram, Bag-of

Words, Snowball Stemmer and stop word remover us-

ing Knime

software. The experiments were set up

http://cogcomp.org/Data/QA/QC/

http://www.cs.waikato.ac.nz/ml/weka/

https://www.knime.com

WEBIST 2019 - 15th International Conference on Web Information Systems and Technologies

180

using the typical 10-fold cross validation. The results

are presented in the next sub-section.

4.1 Results

In this section we present and analyse the results

of the machine learning algorithms for each of the

three set of questions. Table 3 shows the accuracy

for GQCC and n-gram based classiﬁers for the 1000,

2000 and 3000 questions. In following sections the

results will be discussed in more details.

Table 3: Accuracy of GQCC and n-gram based classiﬁers

for 1000, 2000 and 3000 questions.

Questions GQCC

SVM

n-gram

SVM

GQCC

J48

n-gram

J48

1000 92.6% 87% 95.5% 86.7%

2000 95.1% 89.3% 96.6% 88.9%

3000 95.5% 92.4% 95.8% 91.1%

4.1.1 1000 Questions

Table 4 presents the classiﬁcation performance details

(Precision, Recall and F-Measure) of the classiﬁers

that have been used SVM and J48 using the proposed

grammatical features, while Table 5 presents the clas-

siﬁcation performance details (Precision, Recall and

F-Measure) of SVM and J48 using features such as

n-grams, punctuation eraser, stop-word remover and

snowball stemmer. Results show that Decision Tree

(GQCC

J48

) identiﬁed correctly (i.e. Recall) 95.5%

of the questions and GQCC

SV M

identiﬁed correctly

92.6% of the questions when grammatical features

were used while Decision Tree (n-gram

J48

) identiﬁed

correctly (i.e. Recall) 86.7% of the questions and n-

gram

SV M

identiﬁed correctly 87.3% of the questions

when features such as n-grams and snowball Stemmer

were used.

Comparing the performance of the classiﬁers

when 1000 questions were evaluated, GQCC

J48

had

a better performance than GQCC

SV M

, in which

GQCC

J48

has the highest precision, recall and f-

measure for all the classes and GQCC

SV M

has a simi-

lar precision (100%) as GQCC

J48

When comparing the performance of n-gram

J48

and n-gram

SV M

, n-gram

SV M

has a better precision,

recall and f-measure for classes such as ABBR and

NUM. In addition, both classiﬁers have similar preci-

sion, recall and f-measure for class type HUM. For

class type DESC and ENTY n-gram

SV M

has a re-

call of (100%) while n-gram

J48

has better precision

and f-measure. Furthermore, comparing the classi-

ﬁcation performance of GQCC

SV M

and n-gram

SV M

GQCC

SV M

has better precision and f-measure for

class type DESC and ENTY while n-gram

SV M

has

better recall (100%). For class type HUM and

LOC, GQCC

SV M

has better Recall and n-gram

SV M

has better precision and f-measure. In addition, n-

gram

SV M

has a (100%) precision, recall and f-measure

for class type ABBR and higher precision, recall

and f-measure than GQCC

SV M

for class type NUM.

Comparing GQCC

J48

and n-gram

J48

, GQCC

J48

has

a (100%) precision, recall and f-measure for class

type ABBR, DESC and HUM. For class type ENTY,

GQCC

J48

has higher recall and f-measure while n-

gram

J48

has higher precision. While for class type

LOC, GQCC

J48

has better precision and f-measure

and n-gram

J48

has better Recall.

Table 4: Performance of the classiﬁers using grammatical

features (1000 questions) - Best results are highlighted in

bold.

GQCC

SVM

GQCC

J48

Class: P R F P R F

ABBR 1.000 0.833 0.909 1.000 1.000 1.000

DESC 0.995 0.995 0.995 1.000 1.000 1.000

ENTY 0.845 0.893 0.869 0.873 0.955 0.912

HUM 0.995 0.995 0.995 1.000 1.000 1.000

LOC 0.848 0.821 0.834 0.936 0.840 0.885

NUM 0.848 0.821 0.834 0.986 0.940 0.963

Table 5: Performance of the classiﬁers using n-gram fea-

tures (1000 questions) - Best results are highlighted in bold.

n-gram

SVM

n-gram

J48

Class: P R F P R F

ABBR 1.000 1.000 1.000 1.000 0.800 0.889

DESC 0.887 1.000 0.940 0.911 0.984 0.947

ENTY 0.712 1.000 0.831 0.965 0.743 0.840

HUM 1.000 0.788 0.881 1.000 0.788 0.881

LOC 1.000 0.553 0.712 0.605 0.978 0.748

NUM 1.000 0.933 0.966 0.953 0.911 0.932

4.1.2 2000 Questions

Table 6 presents the classiﬁcation performance details

(Precision, Recall and F-Measure) of the classiﬁers

that have been used SVM and J48 using the proposed

grammatical features, while Table 7 presents the clas-

siﬁcation performance details (Precision, Recall and

F-Measure) of SVM and J48 using features such as

n-grams, punctuation eraser, stop-word remover and

snowball stemmer. Results show that Decision Tree

(GQCC

J48

) identiﬁed correctly (i.e. Recall) 96.6%

of the questions and GQCC

SV M

identiﬁed correctly

95.1% of the questions when grammatical features

were used while Decision Tree (n-gram

J48

) identiﬁed

correctly (i.e. Recall) 88.8% of the questions and n-

gram

SV M

identiﬁed correctly 89.3% of the questions

when features such as n-grams and snowball Stemmer

were used.

When 2000 questions were evaluated, GQCC

J48

outperformed GQCC

SV M

in terms of precision, re-

Domain Speciﬁc Grammar based Classiﬁcation for Factoid Questions

181

call and f-measure for classes such as ABBR, ENTY,

LOC and NUM. While, for classes such as DESC

and HUM both classiﬁers had (100%) recall. For

n-gram based classiﬁers n-gram

SV M

had better preci-

sion, recall and f-measure for classes such as ABBR,

NUM. While, n-gram

J48

had better performance for

class type ENTY. In addition, for class type DESC

n-gram

SV M

had better recall while n-gram

J48

had bet-

ter precision and f-measure. On the other hand, for

class type LOC, n-gram

SV M

had better precision while

n-gram

J48

had better recall and f-measure. Compar-

ing the performance of GQCC

SV M

, GQCC

J48

and n-

gram

SV M

, n-gram

J48

. GQCC

SV M

had a better pre-

cision, recall and f-measure for class type ABBR,

DESC and HUM. while, n-gram

SV M

had better preci-

sion, recall and f-measure for class type NUM. More-

over, n-gram

SV M

has better precision and f-measure

for class type ENTY. While, n-gram

SV M

has bet-

ter recall. On the other hand, for class type LOC

GQCC

SV M

has better recall and f-measure while n-

gram

SV M

has higher precision. Comparing GQCC

J48

and n-gram

J48

, GQCC

J48

has better performance in

terms of precision, recall and f-measure for classes

such as ABBR, DESC, HUM and NUM. While for

class type ENTY GQCC

J48

has higher recall and f-

measure and n-gram

J48

has better precision. On the

other hand, for class type LOC GQCC

J48

has better

precision and f-measure while n-gram

J48

has higher

recall.

Table 6: Performance of the classiﬁers using grammatical

features (2000 questions).

GQCC

SVM

GQCC

J48

Class: P R F P R F

ABBR 1.000 0.967 0.983 1.000 1.000 1.000

DESC 1.000 1.000 1.000 1.000 1.000 1.000

ENTY 0.915 0.903 0.909 0.954 0.936 0.945

HUM 1.000 1.000 1.000 1.000 1.000 1.000

LOC 0.859 0.901 0.879 0.881 0.929 0.905

NUM 0.960 0.936 0.948 0.977 0.952 0.964

Table 7: Performance of the classiﬁers using n-gram fea-

tures (2000 questions) - Best results are highlighted in bold.

n-gram

SVM

n-gram

J48

Class: P R F P R F

ABBR 1.000 0.889 0.941 1.000 0.889 0.941

DESC 0.920 1.000 0.958 0.933 0.992 0.962

ENTY 0.777 0.979 0.867 0.959 0.795 0.869

HUM 0.973 0.827 0.894 0.973 0.820 0.890

LOC 0.896 0.645 0.750 0.650 0.957 0.774

NUM 0.978 0.957 0.967 0.977 0.925 0.950

4.1.3 3000 Questions

Table 8 presents the classiﬁcation performance details

(Precision, Recall and F-Measure) of the classiﬁers

that have been used SVM and J48 using the proposed

grammatical features, while Table 9 presents the clas-

siﬁcation performance details (Precision, Recall and

F-Measure) of SVM and J48 using features such as

n-grams, punctuation eraser, stop-word remover and

snowball stemmer. Results show that Decision Tree

(GQCC

J48

) identiﬁed correctly (i.e. Recall) 95.8%

of the questions and GQCC

SV M

identiﬁed correctly

95.5% of the questions when grammatical features

were used while Decision Tree (n-gram

J48

) identiﬁed

correctly (i.e. Recall) 91.1% of the questions and n-

gram

SV M

identiﬁed correctly 92.4% of the questions

when features such as n-grams and snowball Stemmer

were used.

When 2000 questions were evaluated, both

GQCC

J48

and GQCC

SV M

had nearly similar perfor-

mance, both classiﬁers had (100%) recall for class

type ABBR and similar recall for classes such as

ENTY and HUM. However, GQCC

J48

has higher

precision and f-measure for these classes. In addi-

tion, GQCC

SV M

has better performance for LOC class

while for classes such as DESC and NUM GQCC

SV M

has higher precision and GQCC

J48

has higher re-

call and f-measure. Moreover, comparing the per-

formance of n-gram

SV M

and n-gram

J48

; n-gram

SV M

has higher performance for class type NUM while

n-gram

J48

has higher performance (100%) precision,

recall and f-measure for class type ABBR. For class

such as HUM, LOC n-gram

SV M

has better precision

and n-gram

J48

has better recall and f-measure. In

addition, for class type DESC, n-gram

SV M

has bet-

ter recall and f-measure while n-gram

J48

has better

precision. while n-gram

SV M

has higher recall for

class type ENTY and n-gram

J48

has higher precision

and f-measure. On the other hand, for classes such

as HUM and LOC, n-gram

SV M

has better precision

and n-gram

J48

has better recall and f-measure. Fur-

thermore, comparing the performance of GQCC

SV M

GQCC

J48

and n-gram

SV M

, n-gram

J48

. GQCC

SV M

has

higher precision, recall and f-measure for class type

ABBR and HUM. While, n-gram

SV M

has higher pre-

cision, recall and f-measure for class type NUM. In

addition, for classes such as DESC, ENTY GQCC

SV M

has better precision and f-measure while n-gram

SV M

has better Recall. On the contrary, for class type

LOC, GQCC

SV M

has better recall and f-measure

while n-gram

SV M

has better precision. Furthermore,

GQCC

J48

has better performance for classes such as

DESC and HUM while, both classiﬁers (GQCC

J48

and n-gram

J48

) have similar precision, recall and f-

measure (100%) for class type ABBR. Moreover, for

classes such as ENTY and NUM, GQCC

J48

has better

recall and f-measure while n-gram

J48

has better preci-

sion and for class type LOC, n-gram

J48

has better Re-

call and GQCC

J48

has better precision and f-measure.

WEBIST 2019 - 15th International Conference on Web Information Systems and Technologies

182

Table 8: Performance of the classiﬁers using grammatical

features (3000 questions) - Best results are highlighted in

bold.

GQCC

SVM

GQCC

J48

Class: P R F P R F

ABBR 1.000 1.000 1.000 1.000 1.000 1.000

DESC 0.998 0.998 0.998 0.998 1.000 0.999

ENTY 0.917 0.920 0.918 0.937 0.920 0.928

HUM 0.998 0.998 0.998 1.000 0.998 0.999

LOC 0.861 0.908 0.884 0.859 0.904 0.881

NUM 0.987 0.931 0.958 0.970 0.948 0.959

Table 9: Performance of the classiﬁers using n-gram fea-

tures (3000 questions) - Best results are highlighted in bold.

n-gram

SVM

n-gram

J48

Class: P R F P R F

ABBR 1.000 0.909 0.952 1.000 1.000 1.000

DESC 0.961 1.000 0.980 0.990 0.954 0.972

ENTY 0.788 0.981 0.874 0.977 0.803 0.881

HUM 0.995 0.866 0.925 0.924 0.934 0.929

LOC 0.922 0.691 0.790 0.700 0.971 0.813

NUM 0.991 0.942 0.966 0.985 0.916 0.949

5 DISCUSSION

These results indicate that in term of precision, re-

call and f-measure GQCC

J48

and GQCC

SV M

had the

better performance when 1000, 2000 and 3000 ques-

tions were evaluated. In addition, for class type

NUM, which consists of questions such as how many,

how much and how long, n-gram

SV M

performed

marginally better than both GQCC classiﬁers when

1000, 2000 and 3000 questions were evaluated, which

indicate that n-gram based classiﬁers is more suitable

in the identiﬁcation of this type. While CQCC per-

formed better for all other classes (ABBR, DESC,

LOC, ENTY and HUM) in which combining gram-

matical features and domain speciﬁc grammatical fea-

tures improved the classiﬁcation of these type and en-

able the machine learning algorithms to better dif-

ferentiate between different class types, since ques-

tions related to these type of classes contain terms re-

lated to companies name, geographical areas, places

and buildings..etc. (e.g ”What does IBM stand for”,

”What is the name of the largest water conservancy

project in China ?”, ”Who was Jean Nicolet ?”

Comparing our approach with the state-of-the-art

methods as shown in Table 10, the majority of the pre-

vious works used SVM for the classiﬁcation process;

in our experiments it has been shown that other clas-

siﬁers like J48 could have a better performance and

classiﬁcation accuracy.

The proposed hierarchical classiﬁer in (Li and

Roth, 2006) classiﬁed questions into ﬁne grained

classes, using Sparse Network of Winnows (SNoW);

Table 10: Previous approaches performance.

Authors Features Algorithms Acc.

(Li and Roth,

2006)

syntactical features sparse network

of winnows

(SNoW)

92.5%

(Zhang and Lee,

2003)

bag-of-words

features

SVM 85.8%

(Huang et al.,

2008)

head word features,

unigrams and word-

Net

liner SVM/

maximum

entropy

89.2%/

89%

(Metzler and

Croft, 2005)

syntactic and

semantic features

SVM 90.2%

(Li et al., 2008) head noun tagging,

syntactical and se-

mantic features

SVM 85.6%

(Nguyen et al.,

2008)

question patterns

and designed

features

SVM 95.2%

(Mishra et al.,

2013)

semantic features

with the lexico-

syntactic features

KNN, NB,

SVM

96.2%

(Van-Tu and

Anh-Cuong,

2016)

question patterns SVM 95.2%

(May and Stein-

berg, 2004)

part-of-speech,

parse signatures and

WordNet

SVM, MaxEnt,

NB, Decision

Tree

77.8%

(Xu et al., 2016) Parts-of-Speech, Bi-

Gram and named

entities

SVM 93.4%

using only syntactical features, the proposed ap-

proach achieved accuracy of 92.5% for coarse grained

classes. In (Zhang and Lee, 2003) bag-of-words fea-

tures were used with different machine learning al-

gorithms in which SVM performed better comparing

with the other classiﬁers and has achieved an accuracy

of 85.8% with coarse grained classes. Furthermore, In

(Huang et al., 2008) head word features were used in

addition to wordNet and unigrams; using liner SVM

and maximum entropy models the proposed approach

has achieved an accuracy accuracy of 89.2% and 89%

respectively. In (Metzler and Croft, 2005) the statis-

tical classiﬁer is based on SVM and has achieved an

accuracy of 90.2% using coarse grained classes. In

(Li et al., 2008) head Noun tagging was used and was

combined with syntactical and semantic features; for

the classiﬁcation process conditional random ﬁelds

(CRFs) and SVM were used; the model achieved an

accuracy of 85.6%. In addition, in (Nguyen et al.,

2008) the proposed method which is based on ques-

tion patterns and designed features has achieved an

accuracy of 95.2% using SVM. In (Mishra et al.,

2013) a combinations of semantic features with the

lexico-syntactic features were used which achieved

an accuracy of 96.2% for coarse classiﬁcation. Work

in (Van-Tu and Anh-Cuong, 2016) which was based

on a new type of features and question patterns ob-

Domain Speciﬁc Grammar based Classiﬁcation for Factoid Questions

183

tained an accuracy of 95.2% for coarse grain using

SVM. Moreoever, The hierarchical classiﬁer in (May

and Steinberg, 2004) achieved accuracy of 77.8% us-

ing different classiﬁers such as SVM, MaxEnt, NB

and Decision Tree. Finally, in (Xu et al., 2016) the

proposed SVM-based approach achieved accuracy of

93.4% using a Bi-Gram mode and SVM kernel func-

tion.

In conclusion, GQCC had a better results than

previous ones due to the ability of our approach to

identify different classes of the factoid question using

domain-speciﬁc information which facilitate the iden-

tiﬁcation of domain categories, unlike previous works

which used additional ﬁfty ﬁne-grained categories.

6 CONCLUSION AND FUTURE

WORK

A framework has been adapted for question catego-

rization and classiﬁcation. The framework consists of

three main features which are, grammatical features,

domain speciﬁc features and patterns. These features

help in utilizing the structure of the questions. In ad-

dition, the performance of different machine learning

algorithms (J48 and SVM) were investigated. The re-

sults show that our solution led to a good performance

in classifying questions compared with the state-of-

arts approaches. As future work, we aim to investigate

the impact of combing different features like seman-

tic, syntactic and lexical attributes and compare the

results. In addition, We are also planning to test other

machine learning algorithms to classify the questions.

REFERENCES

Bu, F., Zhu, X., Hao, Y., and Zhu, X. (2010). Function-

based question classiﬁcation for general qa. In Pro-

ceedings of the 2010 conference on empirical methods

in natural language processing, pages 1119–1128.

Association for Computational Linguistics.

Bullington, J., Endres, I., and Rahman, M. (2007). Open

ended question classiﬁcation using support vector ma-

chines. MAICS 2007.

Huang, Z., Thint, M., and Qin, Z. (2008). Question clas-

siﬁcation using head words and their hypernyms. In

Proceedings of the Conference on Empirical Methods

in Natural Language Processing, pages 927–936. As-

sociation for Computational Linguistics.

Kolomiyets, O. and Moens, M.-F. (2011). A survey

on question answering technology from an infor-

mation retrieval perspective. Information Sciences,

181(24):5412–5434.

Le-Hong, P., Phan, X.-H., and Nguyen, T.-D. (2015). Us-

ing dependency analysis to improve question classiﬁ-

cation. In Knowledge and Systems Engineering, pages

653–665. Springer.

Li, F., Zhang, X., Yuan, J., and Zhu, X. (2008). Classi-

fying what-type questions by head noun tagging. In

Proceedings of the 22nd International Conference on

Computational Linguistics-Volume 1, pages 481–488.

Association for Computational Linguistics.

Li, X., Huang, X.-J., and WU, L.-d. (2005). Question clas-

siﬁcation using multiple classiﬁers. In Proceedings of

the 5th Workshop on Asian Language Resources and

First Symposium on Asian Language Resources Net-

work.

Li, X. and Roth, D. (2006). Learning question classiﬁers:

the role of semantic information. Natural Language

Engineering, 12(03):229–249.

May, R. and Steinberg, A. (2004). Al, building a ques-

tion classiﬁer for a trec-style question answering sys-

tem. AL: The Stanford Natural Language Processing

Group, Final Projects.

Metzler, D. and Croft, W. B. (2005). Analysis of statistical

question classiﬁcation for fact-based questions. Infor-

mation Retrieval, 8(3):481–504.

Mishra, M., Mishra, V. K., and Sharma, H. (2013). Ques-

tion classiﬁcation using semantic, syntactic and lexi-

cal features. International Journal of Web & Semantic

Technology, 4(3):39.

Mohasseb, A., Bader-El-Den, M., and Cocea, M. (2018).

Question categorization and classiﬁcation using gram-

mar based approach. Information Processing & Man-

agement.

Mohasseb, A., El-Sayed, M., and Mahar, K. (2014). Auto-

mated identiﬁcation of web queries using search type

patterns. In WEBIST (2), pages 295–304.

Moldovan, D., Pas¸ca, M., Harabagiu, S., and Surdeanu, M.

(2003). Performance issues and error analysis in an

open-domain question answering system. ACM Trans-

actions on Information Systems (TOIS), 21(2):133–

154.

Nguyen, T. T. and Nguyen, L. M. (2007). Improving the ac-

curacy of question classiﬁcation with machine learn-

ing. In Research, Innovation and Vision for the Fu-

ture, 2007 IEEE International Conference on, pages

234–241. IEEE.

Nguyen, T. T., Nguyen, L. M., and Shimazu, A. (2008).

Using semi-supervised learning for question classiﬁ-

cation. In Information and Media Technologies, vol-

ume 3, pages 112–130. Information and Media Tech-

nologies Editorial Board.

Van-Tu, N. and Anh-Cuong, L. (2016). Improving ques-

tion classiﬁcation by feature extraction and selection.

Indian Journal of Science and Technology, 9(17).

Xu, S., Cheng, G., and Kong, F. (2016). Research on ques-

tion classiﬁcation for automatic question answering.

In Asian Language Processing (IALP), 2016 Interna-

tional Conference on, pages 218–221. IEEE.

Zhang, D. and Lee, W. S. (2003). Question classiﬁcation

using support vector machines. In Proceedings of

the 26th annual international ACM SIGIR conference

on Research and development in informaion retrieval,

pages 26–32. ACM.

WEBIST 2019 - 15th International Conference on Web Information Systems and Technologies

184