several methods which can be applied on the input
text. Using the knowledge of the atmospheric variable
features depicted on the ontology, the system iden-
tifies the sentence format, and consequently, it uses
regular expressions to recover the accurate data.
The output data of this stage are a set of tuples of
elements (<Attribute, Value>) with the following
information:
• Attribute: Particular characteristic of an atmo-
spheric variable. As an example, if we are refer-
ring to temperature, we can mention ”minimum”,
”maximum”, or ”frosts”.
• Value: Possible attribute values. For example, in
the case of wind, the possible values of the at-
tribute ”direction” are {”N”}, {”NE”}, {”SE”},
{”S”}, {”SW”}, {”W”} or {”NW”}.
As previously mentioned, there are certain at-
tributes of an atmospheric variable that can not be ver-
ified. This is the case of frosts: there are not observa-
tions available referring to frosts. Therefore, on these
cases, AEMIX returns a N/A (Not Applicable) value
in order to know that there is no way to continue with
the verification process.
When the system has recovered all the data, it
transforms them into numerical information using a
set of direct rules defined in the ontology. Whenever
the attributes are geographical, the values are a set
of geographical points defining the affected area, or
a reserved word, like ”Rest”, which refers to the rest
of the stations in the region included in the weather
forecast, and not mentioned yet. When no location is
indicated, it means that the forecast covers the whole
region.
Therefore, the system consults the ontology
through the semantic interface and obtains for each
prediction a set of atmospheric variables identified in
the forecast, and for each variable, AEMIX extracts
a set of tuples (<Attribute, Value>). Finally, all
these data are stored in the database.
In the fourth and last stage (”Verification”), we
take for granted that we have stored in the database
both observational data and forecast data (see Fig-
ure 3). These data are indexed to the valid date of the
forecast and the observations, so the information for
each atmospheric variable can be crossed and com-
pared separately.
Regarding geographical data, if the prediction
affects the entire region, the forecast is compared
against observational data of all the meteorological
stations. For example, when the forecast says that
there will be weak showers, these will affect the en-
tire region by default. In an attempt to simplify, we
could say if 0 to 2 mm/h of rainfall are collected in
most stations, the prediction will have been correct.
But the verification process at meteorological level is
much more complicated, and a detailed explanation is
out of the scope of this work.
If the forecast affects only an area (defined by a
longitude and latitude) using the attribute ”location”,
only those observation stations situated within that
area are taken into account to make the verification,
and the predicted atmospheric variable is compared
just against those observations. Therefore, the verifi-
cation of the meteorological observations in an area
will be done against those weather forecasts in the
same area.
Hence, at the end of the process, the AEMIX sys-
tem will house in its database all the detailed infor-
mation about the two types of data (predictions and
observations) related by date. Forecasters and meteo-
rologists will be able to use this information to carry
out data mining actions and perform any desired ver-
ifications.
4 PRELIMINARY RESULTS
To carry out the system testing, we used the informa-
tion provided by the website of the aforementioned
Spanish meteorological service (AEMET). We down-
loaded both the weather forecasts and observational
data, and we have adapted both the ontology and ex-
traction methods to AEMET’s writing style guide.
These extraction methods are based on symbolic pat-
tern rules.
We used a sample of 2,828 weather forecasts cor-
responding to one year of predictions over Galicia re-
gion. The forecasts were classified into two types:
FPSP75 and FPSP85, respectively corresponding to
one-day or two-day forecasts. Corresponding ob-
servation data from 58 observation stations was also
downloaded from the web, with the complete infor-
mation about longitude and latitude. We obtained
from the same source datasets of temperature and pre-
cipitation during the same year, for a total of 77,339
observation registers.
Regarding the geographical areas, we have iden-
tified the forecaster’s linguistic uses to describe the
main areas of the Galicia region, and then we intro-
duced them in the AEMIX database, and next we re-
lated them with the corresponding geographical areas
using an easy graphical interface.
Although the experiments are still ongoing, with
our new enhancements, AEMIX achieved very good
results (above 90% of correct results retrieved).
AEMIX: Semantic Verification of Weather Forecasts on the Web
285