INFORMATION EXTRACTION FOR SUPPORTING A LEARNER’S

EFFORTS TO RECOGNIZE WHAT THE LEARNER DID NOT

UNDERSTAND

Naoki Isogai, Ryo Nishimura, Yasuhiko Watanabe and Yoshihiro Okada

Department of Media Informatics, Ryukoku University, Otsu, Shiga, Japan

Keywords:

Learning support system, Question making support, Q&A site, Information extraction, Support vector ma-

chine.

Abstract:

Asking a question is an essential method of learning. Especially, when problems in learner’s question are

pointed out, the learner has a chance to recognize what he/she did not understand. As a result, we intend to

develop a learning support system which points problems in learner’s questions and give the learner a chance

to recognize what he/she did not understand. In this study, we propose a method of extarcting information

from questions and their answers posted to Q&A sites for supporting a learner.

1 INTRODUCTION

Asking a question is an essential method of learn-

ing. Especially, when problems in learner’s question

are pointed out, the learner has a chance to recognize

what he/she did not understand. For example,

(Qst 1) kinou yometa webpage ni access dekimasen.

dou shitara ii deshouka? (I cannot access a web-

page which I could read yesterday. What should I

do?)

(Ans 1) URL wo misetekudasai. (Show URL.)

In this case, the questioner could not obtain an solu-

tion, however, he/she had a chance to understand the

relation between webpage and URL. In this way, it

is important for a learner to ask a question and re-

ceive indications of problems in the quesiton. As a

result, we intend to develop a learning support system

which points problems in learner’s questions and give

the learner a chance to recognize what he/she did not

understand. In order to develop this learning support

system, it is necessary to investigate

• a method of analyzing learner’s question and

pointing out problems what the learner did not un-

derstand, and

• a method of extarcting information from ques-

tions and their answers posted to Q&A sites

for

supporting a learner.

Q&A sites is websites where users answer to each

other’s questions.

In this study, we are concerned with information ex-

traction from questions and answers posted to Q&A

sites.

The point is that our approach differs from ques-

tion answer (Dumais 02), (Kiyota 02), query expan-

sion, (Matsuike 05), (Xu 96) and writing support sys-

tems (Hayashi 91), (Yamazaki 99).

By using the following examples, we discuss in-

formation for supporting learner to recognize what

he/she did not understand and make better questions.

(Qst 2) PC wo kidou deki masen. dou shitara ii

deshouka? (I cannot start my PC. What should

I do?)

(Ans 2–1) OS ha nan desu ka? chanto shitsumon shi-

nai to, kotae raremasen. (Which OS? I cannot

make an answer unless you ask a question prop-

erly.)

(Ans 2–2) kidou disk wo tsukaeba, saikidou deki-

masu. (You can start your PC by using boot disk.)

In (Ans 2–1), the answerer pointed out that the ques-

tioner did not describe important information (OS

type). The questioner had a chance to recognize that

information about OS type should be add to his/her

question. By the way, the questioner probably knew

such OS matters. If the questioner had got a clue as

to which information should be described in his/her

question, he/she would have made such a question:

(Qst 2–a) windows XP no PC wo kidou dekimasen.

dou shitara ii deshouka? (I cannot start my win-

371

Isogai N., Nishimura R., Watanabe Y. and Okada Y. (2009).

INFORMATION EXTRACTION FOR SUPPORTING A LEARNER’S EFFORTS TO RECOGNIZE WHAT THE LEARNER DID NOT UNDERSTAND.

In Proceedings of the First International Conference on Computer Supported Education, pages 371-374

DOI: 10.5220/0001845203710374

 SciTePress

dows XP PC. What should I do?)

By the way, information which a questioner did not

know might be also important to give a learning

chance to the questioner when it is easy to conﬁrm.

For example, even a questioner who did not know the

utilization of a booting disk can ﬁnd that there is a

way of dealing with his/her problem by using it when

he read (Ans 2–2). However, if he/she had no booting

disk, the solution described in (Ans 2–2) was useless.

It is not difﬁcult to conﬁrm whether he/she have a

booting disk, and if he/she had no booting disk, he/she

would have made such a question:

(Qst 2–b) PC wo kidou deki masen. dou shitara ii

deshouka? kidou disk ha motte imasen. (I cannot

start my windows Vista PC. What should I do? I

have no booting disk.)

As shown, information easy to conﬁrm is also impor-

tant to recognize what he/she did not understand. In-

formation easy to conﬁrm could be instruments, envi-

ronments, conditions, or solutions themselves.

In this study, we propose a method of extracting

informationfor supportinga learner to recognize what

he/she did not understand, in other wards,

• clues as to which information should be described

in his/her question, and

• information which a questioner does not know but

is easy to conﬁrm

from questions and answers posted on Q&A sites

by using support vector machine (SVM) (Kudoh

00). The point is that information extracted by our

method differs from information extracted for devel-

oping knowledge of Q&A systems, (Watanabe 08),

(Lin 02). In this study, we used questions and answers

posted on Yahoo! chiebukurowhich was published by

Yahoo! Japan via National Institute of Informatics.

2 INFORMATION FOR

SUPPORTING A LEARNER TO

RECOGNIZE WHAT HE/SHE

DID NOT UNDERSTAND

In this study, we propose a method of extracting in-

formation for supporting a learner to recognize what

he/she did not understand from questions and their an-

swers posted on the Q&A site. Speciﬁcally, we use

support vector machine (SVM) and extract the follow-

ing kinds of sentences:

• important sentences from questions, and

• sentences which include information for support-

ing a learner to recognize what he/she did not un-

derstand from answers.

We used the data of Yahoo! chiebukuro for develop-

ing experimental data and investigating features for

SVM. The data of Yahoo! chiebukuro was published

by Yahoo! Japan via National Institute of Informat-

ics in 2007

. This data consists of about 3.11 mil-

lion questions and 13.47 million answers which were

posted on Yahoo! chiebukuro from April/2004 to

October/2005. The answers were classiﬁed into two

types: best answer and normal answer. In this study,

from about 470 thousand answers which were posted

on “PC and peripheral equipments” category, we ex-

tracted 2251 answers (1058 best and 1193 normal

answers) which consists of less than four sentences.

This is because, we think, it is easier to extract in-

formation for supporting a learner to recognize what

he/she did not understand from these short answers

than longer answers.

Table 1 shows the results of this investigation. We

show below some examples of questions and their an-

swers which consist of less than four sentences.

(Qst 3) gazou no tokoro ga zenbu   △ (aka,

midori, ao) no kigou ni natte shimaun desu kedo,

virus deshouka ? (Is it virus?: Symbols   △

(red, green, blue) were displayed instead of an

image)

mata dou shitara naose masuka? (And, what

should I do?)

(Ans 3) net jyou no gazou to iu koto deshouka? (An

image on the network?)

kono te no shitsumon wo suru toki ha saiteigen OS

no jyouhou kurai ha irenaito kotaere masen. (You

must describe at least OS information when you

make such a kind of question, or I cannot make an

answer.)

(Ans 3), was a normal answer of (Qst 3). In this

case, we determined that the important sentence of

(Qst 3) is the ﬁrst sentence (Is it virus?: Symbols

  △ (red, green, blue) were displayed instead of

an image). Also, we determined that the ﬁrst sentence

(An image on the network?) and the second sentence

(You must describe at least OS information when you

make such a kind of question, or I cannot make an an-

swer.) include clues as to which information should

be described in the question. In (Ans 3), the answerer

pointed out that the questioner did not describe im-

portant information (OS type), and made no solution.

(Qst 4) kinkyu nanode, oshiete kudasai. (It is urgent,

help me.)

http://research.nii.ac.jp/tdc/chiebukuro.html

CSEDU 2009 - International Conference on Computer Supported Education

372

Table 1: Results of the investigation of questions and their answers posted on Yahoo! chiebukuro (category: PC and peripheral

equipments). A target sentence (type I) means a sentence including clues as to which information should be described in

his/her question. On the other hand, a target sentence (type II) means a sentence including information which a questioner

does not know but is easy to conﬁrm.

# of # of target # of target

# of # of important sentence sentence

text type text sentence sentence (type I) (type II)

question 2219 6216 2893 — —

answer (best) 1058 2116 — 214 649

answer (normal) 1193 2160 — 232 332

ima sugu print shinakya ikenai mono ga arimasu.

(A matter should be printed as soon as possible.)

2ji made desu. (by two o’clock.)

demo, color ink 2 shoku ga nakute koukan su-

ruyou message ga demasu. (However, I received

a message: due to out of ink, change the two

colors of ink)

mou sukoshi motsudarouto omotte itanode kaioki

ha shite imasen. (I have no spare ink because I

thought ink was enough.)

insatsu ha shirokuro desu. (I want to print the

matter in monochrome.)

nantoka color ink 2 shoku wo koukan sezuni in-

satsu suru urawaza wo shitteiru kata imasenka?

(Do any of you know how to print it without

exchanging the two colors of ink?)

printer no kishu ha epson no PM-A850 desu. (My

printer is epson PM-A850.)

ink ha kuro to, color ink 5 shoku ni cartridge ga

wakareteimasu. (There are black and ﬁve color

ink cartridges.)

(Ans 4) printer no property ni “monochrome

insatsu” tte naidesuka? (Do you have

“monochrome print” in the property of the

printer?)

areba, sore wo shiji suru toka. (If you have, turn

it on.)

(Ans 4) was the best answer of (Qst 4). In this case,

we determined that the important sentence of (Qst 4)

is the seventh sentence (Do any of you know how to

print it without exchanging the two colors of ink?).

Also, we determined that the ﬁrst sentence of (Ans 4)

(Do you have “monochrome print” in the property of

the printer?) includes information which a questioner

does not know but is easy to conﬁrm.

3 FEATURES USED IN MACHINE

LEARNING ON YAHOO!

CHIEBUKURO

In this study, we made experiments on questions and

their answers posted on Yahoo! chiebukuro to extract

by using support vector machine (SVM).

• important sentences from questions, and

• sentences including information for supporting a

learner to recognize what he/she did not under-

stand from answers.

Figure 1 shows feature S1 ∼ S16 used in machine

learning (SVM) on Yahoo! chiebukuro. S1 ∼ S4

were extracted from the target sentence of the ex-

tracting process based on SVM. On the other hand,

S6 ∼ S8 were extracted from sentences other than

the target sentence. S1 ∼ S8 were used in extracting

sentences from questions and answers. On the other

hand, S9 ∼ S16 were only used in extracting sentences

from answers. S9 ∼ S11 were extracted from ques-

tions, S12 ∼ S14 were extracted from the important

sentences in questions, and S15 and S16 were ex-

tracted from questions and their answers. These fea-

tures were based on the results of the investigation in

section 2. In the experiments, we used JUMAN for

the morphological analysis (JUMAN 05).

4 EXPERIMENTAL RESULTS

In this section, we show the results of the following

experiments by using SVM and effective features in

extracting information for supporting a learner to rec-

ognize what he/she did not understand.

INFORMATION EXTRACTION FOR SUPPORTING A LEARNER'S EFFORTS TO RECOGNIZE WHAT THE

LEARNER DID NOT UNDERSTAND

373

Table 2: Results and effective features in Exp. 1, 2 and 3.

Exp. effective features accuracy F-measure

Exp. 1 S1, S2, S3, S4, S5, S6 86.04% 0.8443

Exp. 2 S1, S4, S5, S9, S12, S15, S16 91.65% 0.4773

Exp. 3 S1, S4, S5, S9, S12, S16 86.04% 0.6503

S1 word unigrams of the target sentence

S2 word bigrams of the target sentence

S3 word trigrams of the target sentence

S4 number of sentence of the question/answer and

sentence number of the target sentence

S5 number of words of the question/answer

S6 word unigrams of the non-target sentences and

relative position to the target sentence (be-

fore/after)

S7 word bigrams of the non-target sentences and rel-

ative position to the target sentence (before/after)

S8 word trigrams of the non-target sentences and rel-

ative position to the target sentence (before/after)

S9 word unigrams of the question

S10 word bigrams of the question

S11 word trigrams of the question

S12 word unigrams of the important sentence in the

question

S13 word bigrams of the important sentence in the

question

S14 word trigrams of the important sentence in the

question

S15 nouns which are found both in the question and

its answer

S16 number of nouns which are found both in the

question and its answer

Figure 1: The features used in machine learning (SVM) on

Yahoo! chiebukuro.

Exp. 1 extract important sentences from questions

posted on a Q&A site

Exp. 2 extract sentences including clues as to which

information should be described in a question

from answers posted on a Q&A site

Exp. 3 extract sentence including information which

a questioner does not know but is easy to conﬁrm

from answers posted on a Q&A site

We conducted Exp. 1, 2, and 3 using TinySVM (Ku-

doh 00) with polynomial kernel (d = 2, c = 1). In this

experiments, we used 2219 questions and their 2251

answer in Table 1 as the experimental data.

All experimental results were obtained with 10-

fold cross-validation. To calculate the accuracyand F-

measure, the experimental data was manually tagged

in the preparation of the experiments.

Table 2 shows the results and effective features in

Exp. 1, 2, and 3.

Finally, we discuss the features which were not

designated as effective features in Exp. 2 and 3. Both

in Exp. 2 and 3, S6, S7, and S8 were not designated

as effective features. These features were based on

word n-grams in the non-target sentences of SVM ex-

traction process. It shows that sentences including in-

formation for supporting a learner to recognize what

he/she did not understand can be extracted, not by us-

ing non-target sentences of SVM extraction process.

Furthermore, it may show that although the user only

read sentences which include information for support-

ing a learner to recognize and never read other sen-

tences, he/she can understand and use it.

REFERENCES

Dumais, Banko, Brill, Lin, and Ng: Web question answer-

ing: Is more always better?, ACM SIGIR 2002, 2002.

Kiyota, Kurohashi, and Kido: “Dialog Navigator” A Ques-

tion Answering System based on Large Text Knowl-

edge Base, COLING02, 2002.

Matsuike, Zettsu, Oyama, and Tanaka, Supporting the

Query Modiﬁcation by Making Keyword Formula of

an Outline of Retrieval Result, IEICE DEWS2005,

1C-i9, 2005 (in Japanese).

Xu and Croft: Query expansion using local and global doc-

ument analysis, ACM SIGIR 1996, 1996.

Hayashi and Kikui: Design and Implementation of Rewrit-

ing Support Functions in a Japanese Text Revi-

sion System, Trans. of IPSJ, Vol.32, No.8, 1991 (in

Japanese).

Yamazaki, Yamamura, and Ohnishi: A Computer Aided

System for Writing in the way of Changable

Styles, IEICE technical report, NLC98-54, 1998 (in

Japanese).

Kudoh: TinySVM: Support Vector Machines,

(http://chasen.org/

taku/software/TinySVM/index.html,

2002).

Lin, Fernandes, Katz, Marton, and Tellex: Extracting an-

swers from the Web using knowledge annotation and

knowledge mining techniques, 11th TREC, 2002.

Watanabe, Nishimura, and Okada: A Question Answer Sys-

tem Based on Conﬁrmed Knowledge Acquired from a

Mailing List, Internet Research, Vol.18, No.2, 2008.

Kurohashi and Kawahara: JUMAN Manual version 5.1,

Kyoto University, 2005 (in Japanese).

CSEDU 2009 - International Conference on Computer Supported Education

374