with documents intended for public dissemination.
For example, measurement of readability of text is
used for medical texts both for patient’s consent
documents and for educational brochures for general
public. Readability metrics are also used for the
evaluation of quality of writing style in educational
materials (when they are still draft documents) for
primary and secondary schools (López, 1982).
Different authors have contributed to readability
evaluation with indexes of text readability. In
general, they tend to express complexity of reading
(and subsequently of writing) as formulae which are
easy to calculate. Flesh was the pioneer with his
index for evaluating English-language newspapers.
He presented a formula expressing the readability
level in terms of average word number per sentence
and average syllables per word (Flesch, 1948). The
original scale interpretation was established as
follows: 100 points means “easy to read” text, 65
points represents a text adequate for an average U.S.
citizen and 0 points implies a document which is
extremely difficult to understand.
Kincaid et al. (1981) adapted Flesch index to the
educational level required to read and understand the
text. This is really interesting for the evaluation of
WCAG guidelines requirements (W3C, 2008)
because they refer to secondary education level as
upper threshold required by users to understand
contents.
Gunning (1968) proposed another index in his
book about techniques of clear writing in English
language. It uses the words average per sentence and
the number of words known as "hard words" /the
ones which are not used daily by people) as
parameters for calculating the readability factor. The
result is the minimum formal education level
required to easily read the text. Specific adaptations
to different languages have appeared. In the case of
the Spanish language, Spaulding (1951) presented
the first metric. Fernández-Huerta adapted the
Flesch formula to the Spanish language and López-
Rodríguez contributed with a series of readability
metrics (Fernández-Huerta, 1959).
There are two Flesh-Kincaid indexes: the "reading
easiness" and "educational level" (Kincaid et al.,
1981). The first is basically a formula to measure if a
text is easy or difficult to read depending on the
number of syllables, words and sentences. The basic
premise is that more readable texts contains
generally less complex sentences and, subsequently,
less words on average and less over-elaborated
words, with less syllables on average.
In general most of existing readability metrics
are based on determining the amount of significant
lexical and syntactic elements which appear in the
text (syllables, words, sentences, etc.) and
combining these values with some coefficients
obtained empirically. As a summary, the Table 1
shows the exact calculation formulae for the metrics
used in this work.
Table 1: Readability metrics used in this work.
Author/year Expression
Flesch (1948)
ps
nn ⋅−⋅− 105.1846.085.206
Farr et al. (1951)
517.31015.1599.1
1
−⋅−⋅
p
np
Gunning (1968)
ln
p
+⋅4.0
Smith and Kincaid
(1970)
lp
nn ⋅+ 9
Kincaid et al.
(1981)
59.158.1139.0 −⋅+⋅
p
s
f
p
n
n
n
n
The meaning of the symbols, which appear in
these formulas, is the following:
s
n
: Average word length (average number of
syllables per word);
p
n
: Average sentence length (average number of
syllables per word);
1
p
: Percentage of words in the text with only one
syllable;
l
: Percentage of long words in the text (words with
three or more syllables);
p
n
: Number of words in the text;
f
n
: Number of sentences in the text;
s
n
: Number of syllables in the text;
l
n
: Average words length (average number of
letters per word);
l
n
: Number of letters in the text;
pd
n
: Number of different words in the text.
These metrics are intended to evaluate the
content complexity of a text: in the three first
indexes, the higher value calculated is, the easier the
text is understood. Analogously, low values in the
first two metrics and large values in the last three
suggest the text is difficult to understand. In most
cases, the authors of these indexes recommend
applying the corresponding calculation not to the full
text but to texts chunks between 100 and 200 words.
ICEIS 2010 - 12th International Conference on Enterprise Information Systems
208