Authors:
Takumi Sonoda
and
Takao Miura
Affiliation:
HOSEI University, Japan
Keyword(s):
Collocation, Co-occurrences, Feature Selection, Natural Language Processing.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence and Decision Support Systems
;
Enterprise Information Systems
;
Human Factors
;
Human-Computer Interaction
;
Interface Design
;
Natural Language Interfaces to Intelligent Systems
;
Physiological Computing Systems
Abstract:
In this investigation, we discuss a computational approach to extract collocation based on both data mining and statistical techniques. We extend n-grams consisting of independent words and that we take frequencies on them after filtering on colligation. Then we apply statistical filters for the candidates, and compare these feature selection methods in statistical learning with each other. Five methods are evaluated, including term frequency (TF), Pairwise Mutual Information (PMI), Dice Coefficient(DC), T-Score (TS) and Pairwise Log-Likelihood ratio (PLL).We found PMI, MC and TS the most effective in our experiments. Using these we got 88 percent accuracy to extract collocation.