parsing techniques. The feature extraction achieved
98.85% F-measure, which is quite satisfactory.
A study of the features that contributed most to
the success of the classification task was conducted.
For both classifiers, the feature contribution shows the
importance of using more sophisticated, NLP-based
tools in this classification task. Additionally, the com-
plexity of the classification task was shown by the
analysis of the feature value variations between the
different readability levels.
In both scenarios, with five readability levels (A1
to C1) or with three-levels (A, B or C), the classi-
fiers here developed achieved good results with an
accuracy of 75.11% and 81.44%, respectively, and
most of their errors are within one-level distance
from the expected results. For comparative proposes,
the five-levels classifier developed presents good re-
sults against the best classifier of the LX-CEFR sys-
tem (Branco et al., 2014)(section 2), which just got
a maximum accuracy of 30%, while only using the
average number of syllables per word in the classifi-
cation task. For evaluation proposes, the corpus used
in the classifiers here presented is the same used by
LX-CEFR system but with more 112 texts.
The systems here presented has already been
made available to the general public through a web
and it can easily be extended by adding new
features or metrics of interest to the task at hand. Tak-
ing into account the small size of the corpus anno-
tated according to the readability level in the five-level
scale defined by QuaREPE (Grosso et al., 2011a), it
may prove useful to investigate unsupervised learning
techniques, i.e. techniques that do not depend on a
previously classified corpus, for example, using tech-
niques of cluster analysis, which allows to group a set
of objects into clusters via their similarities.
This work was supported by national funds through
ao para a Ci
encia e a Tecnologia (FCT) with
reference UID/CEC/50021/2013. The authors grate-
fully acknowledge the use of the corpus classified ac-
cording to the Framework for Teaching Portuguese
Abroad and provided by the Instituto Cam
