performing food chunk extraction from each individ-
ual text, while another person cross referenced those
manually obtained chunks with the ones obtained
from FoodIE. Using this method, a figure for true pos-
itives (TPs), false negatives (FNs) and false positives
(FPs) was procured, while it was decided that the cat-
egory true negative was not applicable to the nature
of the problem and its evaluation. Additionally, it was
decided that a “partial (inconclusive)” category was
necessary, as some of the food chunks were incom-
plete, but nevertheless caught, thus including signifi-
cant information. This category encompasses all the
extracted food chunks which were caught, but missed
at least one token. An example would be “bell pep-
per”, where FoodIE would only catch “pepper”.
We would like to compare the results using the
model presented in (Chen, 2017), but we were un-
able to obtain the requested model and corpus. We
provide a small example of comparing FoodIE with
drNER (Eftimov et al., 2017), in order to show that
they provide food entities on different level, so a fair
comparison cannot be made.
While the evaluation was being done, we kept
track of all the False Negative instances and have con-
structed a resource set that will improve the perfor-
mance of FoodIE in future implementations.
4.1 Data
Firstly, a total of 200 recipes were processed and eval-
uated. The original 100 recipes, which were analyzed
and upon which the rule engine was built, were taken
into consideration, as well as 100 new recipes which
had not been analyzed beforehand. The recipes were
taken from two separate user-based sites, Allrecipes
(https://www.allrecipes.com/) and MyRecipes (https:
//www.myrecipes.com/), where there is no standard-
ized format for the recipe description. This was cho-
sen as such to ensure that the linguistic constructs uti-
lized in each written piece varied and had no pattern
behind them. The texts were chosen from a variety of
topics, as to provide further diversity.
Secondly, we selected 1,000 independently ob-
tained recipes from Allrecipes (Groves, 2013), which
is the largest food-focused social network, where ev-
eryone plays part in helping cooks discover and share
home cooking. We selected the Allrecipes because
there is no limitation as to who can post recipes, so
we have variability in how users express themselves.
The recipes were selected from five recipe categories:
Appetizers and snacks, Breakfast and Lunch, Dessert,
Dinner, and Drinks. From each recipe category 200
recipes were included in the evaluation set.
The evaluation datasets, including the obtained
results, are publicly available at http://cs.ijs.si/
repository/FoodIE/FoodIE\ datasets.zip.
4.2 Results and Discussion
The results for TPs, FPs, and FNs of evaluating the
FoodIE using the dataset of 200 recipes are presented
in Table 5. The group “Partial (Inconclusive)” was
left out of these evaluations, as some would argue
they should be counted as TPs, while other that they
should be included in the FNs. Some examples in-
cluded here are: “empty passion fruit juice”, “cinna-
mon” and “soda”, where the actual food entity chunks
would be “passion fruit juice”, “cinnamon sticks” and
“club soda”, respectively. These are mostly due to the
dual nature of words, meaning that a word that is a
synonym of both a noun and a verb or an adjective and
a verb, occur. For such words, the tagger sometimes
incorrectly classifies the tokens. In these examples,
“empty” is tagged as an adjective, where in context
it, in fact, is a verb. The same explanation holds for
the other two examples. For these reasons, when the
evaluation metrics were calculated, this category was
simply omitted. Moreover, even if they are grouped
with either TPs or FNs, this does not significantly af-
fect the results.
Regarding the FN category (type II error), there
were some specific patterns that produced the most
instances. One very simple type of a FN instance
is where the author of the text refers to a specific
food using the brand name, such as “allspice” or
“J
¨
agermeister”. These are difficult to catch if there is
no additional information following the brand name.
However, if the user includes the general classifi-
cation of the branded food, FoodIE will catch it.
An example of this would be by simply writing
“J
¨
agermeister liqueur”. Another instance of a type II
error is when the POS taggers give incorrect tags, as
was the case with some “Partial (Inconclusive)” in-
stances. An example of this is when the tagger misses
chunks such as “mint leaves” and “sweet glazes”,
where both “leaves” and “glazes” are incorrectly clas-
sified as verbs when in this context they should be
tagged as nouns. Another example would be when
the semantic tagger incorrectly classifies some token
within the given context, such as “date” being clas-
sified as a noun meaning day of year, as opposed to
it being a certain fruit. Furthermore, there exist FNs
which are simply due to the rarity of the food, such
as “kefir”, “couscous” or “stevia”, the last one being
of immense importance to people suffering from dia-
betes, as it is a safe sugar substitute. Another category
of type II errors is due to the fact that some foods are
often referred by their colloquial name, such as “half-
ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods
920