3. Instead of only one model answer (expected an-
swer), a set of multiple answers can be provided
for each question.
The content of the paper is organized as follows. Sec-
tion 2 explains the literature review of the automatic
answer evaluation methods. Section 3 describes the
proposed method for evaluating the descriptive an-
swer. Section 4 presents the experiments and results.
Section 5 provides the conclusion and recommenda-
tions for future work.
2 RELATED WORK
For the automatic evaluation of subjective answers,
several techniques have been developed. Some of
them are mentioned as follows:
Assessment of Answers in Online Subjective Ex-
amination. The following categories of ques-
tions were used to classify the questions: Define,
Describe/Illustrate, Differentiate/Distinguish, Dis-
cuss/Explain, Enumerate / List / Identity / Outline,
Interpret, Justify / Prove with considering answer in
one sentence. The paragraph indexing module re-
ceives a set of query words from the question process-
ing module, which it utilizes to carry out the informa-
tion retrieval. For the answer, part-of-speech tagger
(e.g., Python POS tagger), shallow parsing was per-
formed to extract only the relevant word or phrase.
Lexical resources like WordNet (Synonyms) for cor-
rectness were used. Paraphrasing (synonym based,
lexical/structural-based, alteration based) was done to
focus more on the answer intention. Semantic analy-
sis was carried out using a word net dictionary, which
determines the density of each word in a given se-
quence; if more than 50 % of the words in a sentence
matched, the sentence was termed as correct. The
overall performance of the system was found to be
70%. The major constraint of the system was that the
questions, which included mathematical formulas, di-
agrams, and examples, were not considered (Dhokrat
et al., 2012).
Artificial Intelligence-Based Online Descriptive
Answer Verification System. The Cosine Similarity
module and Text Gears Grammar API were two in-
dependent modules that made up the Answer Verifier
Unit. Text Gears grammar API allows the integration
of language processing methods. If the grammar is
flawless, the API outputs 1, whereas if there are any
errors in the sentence, the API outputs 0. The three at-
tributes that made up the Result Set Unit were: Gram-
mar, keywords, and QST (Question Specific Terms).
Keywords had a value from 1 to 6, with 1 denoting
excellent and 6 denoting poor. The grammar attribute
has values between 0 and 1, with 1 denoting correct
usage. Class values varied from 1 to 9, with 1 be-
ing the best and 9 representing the worst. The two
main components of the system were the Information
Extraction module and the Weighing Module. The
system’s main strength was its use of Cosine Sim-
ilarity to match keywords. Fuzzy Wuzzy, a Python
module, was utilized to determine an answer’s grade
(Jagadamba et al., 2020).
Machine Learning-Based Subjective Answer Eval-
uation. The system used Wordnet, Part of Speech
Tagging, Lemmatization, and Tokenization of words
and sentences to analyze the subjective answers. Data
from the scanned images have been appropriately re-
trieved and organized. The examiner provides the in-
put, which consists of the keywords and model re-
sponse sets. Using machine learning techniques, sen-
tences in the model answer have been clustered ac-
cording to the ontology concepts and combined with
the ontology map. The words in the model answer
were merged with Ontology concepts once the words
were fetched from the Ontology. The score for every
keyword was determined by dividing the number of
times each word appeared in the student’s answers by
the total number of words in their responses. (Bashir
et al., 2021).
Evaluation of Descriptive Responses Using Seman-
tic Relational Features. The model utilizes text pat-
terns taken from the responses to be categorized into
the answers. The Naive Bayes classifier was used to
classify the questions into factual, inductive, and an-
alytical categories. Retrieval of facts from the ques-
tion is required for the factual questions. Who, where,
when, how, what, or which inquiry categories were
used to identify these queries. By using named en-
tity recognition or stemming to separate the question’s
phrase or tag from the question, it was possible to in-
fer the answer’s emphasis. The categories for provid-
ing answers included explanation, comparison, cause
and effect, sequence, and problem and solution. Co-
sine similarity and Jaccard matrices are used to ob-
tain the similarity score. The total score is calculated
by adding the value of the similarity score and the
number of keywords. As a result of the various ways
students may choose to represent the answer, further
improvements in vocabulary are required. Addition-
ally, grammatical analysis and fingerprinting can be
used for evaluation to examine the meaning provided
in responses (Nandini and Uma Maheswari, 2020).
Automatic Answer Script Evaluation Using NLP.
For measuring similarities, various techniques like
cosine similarity, Jaccard similarity, bigram similar-
ity, and synonym similarity were utilized. Another
strategy involved multiplying the parameter value and
ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods
290