AEMIX: Semantic Veriﬁcation of Weather Forecasts on the Web

Angel-Luis Garrido

, Mar

ıa G. Buey

, Gema Mu

noz

and Jos

e-Luis Casado-Rubio

IIS Department, University of Zaragoza, Zaragoza, Spain

Spanish Meteorological Service (AEMET), Madrid, Spain

Keywords:

Weather Forecast, Information Extraction, Ontologies, Automatic Veriﬁcation.

Abstract:

The main objectives of a meteorological service are the development, implementation and delivery of weather

forecasts. Weather predictions are broadcasted to society through different channels, i.e. newspaper, television,

radio, etc. Today, the use of the Web through personal computers and mobile devices stands out. The forecasts,

which can be presented in numerical format, in charts, or in written natural language, have a certain margin of

error. Providing automatic tools able to assess the precision of predictions allows to improve these forecasts,

quantify the degree of success depending on certain variables (geographic areas, weather conditions, time of

year, etc.), and focus future work on areas for improvement that increase such accuracy. Despite technological

advances, the task of verifying forecasts written in natural language is still performed manually by people in

many cases, which is expensive, time-consuming, and subjected to human errors. On the other hand, weather

forecasts usually follow several conventions in both structure and use of language, which, while not completely

formal, can be exploited to increase the quality of the veriﬁcation. In this paper, we describe a methodology

to quantify the accuracy of weather forecasts posted on the Web and based on natural language. This work

obtains relevant information from weather forecasts by using ontologies to capture and take advantage of the

structure and language conventions. This approach is implemented in a framework that allows to address

different types of predictions with minimal effort. Experimental results with real data are promising, and most

importantly, they allow direct use in a real meteorological service.

1 INTRODUCTION

Humankind has lived for years watchful to weather

and has tried to know how to predict atmospheric con-

ditions for a given location. Today we continue giving

to it a major importance, since many human activities

have a great relationship with meteorology: agricul-

ture, transports, or sport events. Any outdoor event

like a concert or parade, or just planning a leisure

weekend or holidays are also situations where a par-

ticular weather forecast can have a great inﬂuence at

the time of making decisions.

Therefore, an important aspect for both companies

and particulars is the broadcast of weather forecasts.

The ﬁrst daily forecasts were published in newspa-

pers in the 19th century, and the ﬁrst radio weather

forecasts were made in the beginning of the 20th cen-

tury. They were followed immediately by television

broadcasts. Therefore, as expected, the populariza-

tion of the Internet in the 90s and the increase of web

pages created a successful new form of dissemination

of weather forecasts.

In each country, the meteorological services cre-

ated their own websites where they publish predic-

tions. Some examples can be appreciated in Figure 1.

Soon, there were many commercial sites that also pro-

vided similar information. Today, with the growth of

mobile devices, hundreds of applications for consult-

ing weather forecasts have appeared for the different

mobile operating systems. These mobile applications

allow us to practically check in real time the evolu-

tion of the weather, and even tell us when an adverse

weather event is going to happen.

Forecasters make their own interpretation of the

mathematical models and create graphics, maps, and

texts in natural language to explain the weather condi-

tions of the atmosphere which may occur in the next

few hours or days. This interpretation is what we see

every day in our personal computers and our mobile

devices through Internet technologies. An important

task is to check the weather forecasts contrasting them

to the data coming from actual observations. This is

an interesting work, which can provide useful infor-

mation to meteorological services, such as:

• Empirical testing of the accuracy of forecasts.

280

Garrido, A-L., Buey, M., Muñoz, G. and Casado-Rubio, J-L.

AEMIX: Semantic Veriﬁcation of Weather Forecasts on the Web.

In Proceedings of the 12th International Conference on Web Information Systems and Technologies (WEBIST 2016) - Volume 2, pages 280-287

ISBN: 978-989-758-186-1

Figure 1: Meteorological Service’s web pages offering wording weather forecasts.

• Veriﬁcation of predictions made by automatic sys-

tems compared to those manually performed by

humans.

• Detection of frequent and signiﬁcant biases in pre-

dictions. Once they are identiﬁed, they can be an-

alyzed in order to try possible solutions.

The problem emerges when a forecast has been

published using natural language, because the veri-

ﬁcation is not trivial. It is necessary to transform

the worded forecast into veriﬁable numerical data,

and then compare these data to the actual observa-

tions from different meteorological stations situated

in the forecast area. These texts usually predict dif-

ferent atmospheric variables: temperature, precipita-

tion, cloudiness, among others. Accordingly, the type

of data that is necessary to extract from the forecast

varies. Furthermore, the content structure of predic-

tions can be quite different, depending on the person

(or the system software) who has written the forecast.

Fortunately, it is common to have a writing style guide

that helps to standardize weather forecast sentences.

This problem has been already studied from a mete-

orological point of view during the last years (Ma-

son, 1982; Murphy and Winkler, 1992; Jolliffe and

Stephenson, 2012).

Hence, we propose a semantic approach for the

development of tools to identify speciﬁc data (for

example, atmospheric variables and their properties)

from weather forecasts published on the Web in text

format in order to verify them. The process is guided

by the information stored into a special type of ontol-

ogy that contains the knowledge about the structure

and the language use of the weather forecast, as well

as references to pertinent extracting mechanisms. Fi-

nally, we also present an implementation of our ap-

proach, the AEMIX Project

, which have been devel-

oped in collaboration with the Spanish Meteorologi-

cal Service (AEMET). Figure 2 shows some captures

of the software.

This work is structured as follows. Section 2 ana-

lyzes the state of the art. Section 3 explains the differ-

ent steps of the proposed system. Section 4 studies the

preliminary results. Finally, we conclude the paper in

Section 5.

AEMIX stands for AEMet Information eXtraction.

AEMIX: Semantic Veriﬁcation of Weather Forecasts on the Web

281

Figure 2: Some screenshots of the AEMIX software.

2 RELATED WORK

The Semantic Web (Horrocks, 2008) is an extension

of the Web through standards by the World Wide

Web Consortium (W3C), and it promotes common

data formats and exchange protocols on the Web to

provide a common framework that allows data to be

shared and reused across application, enterprise, and

community boundaries (Berners-Lee et al., 2001).

The goal of the Semantic Web research is to allow

the vast range of the web accessible information and

services to be more effectively exploited by both hu-

mans and computer tools. In recent years, the inﬂu-

ence of the Semantic Web in the design of computer

programs, web services and applications in general

is increasing, as can be found at (Aquin et al., 2008;

Breslin et al., 2010; Garrido et al., 2011; Borobia

et al., 2014; Buey et al., 2014; Bobed et al., 2016).

In order to determine and verify weather predic-

tions, systems usually apply statistical calculations by

joining the data which come from the predictions and

observations. For instance, in (Murphy and Winkler,

1987), the authors proposed a general framework for

forecast veriﬁcation based on the joint distribution of

forecasts and observations. Verifying the accuracy of

weather predictions is an important task because the

better the prediction system is, the easier it is to antic-

ipate some critical situations (Mathiesen and Kleissl,

2011; He et al., 2009) .

Regarding weather issues, weather predictions are

the result of numerical and statistical methods and

techniques which try to anticipate the weather that is

going to affect an area. They usually take into ac-

count the actual weather observations and their past

trend. There exist some models such as the Numeri-

cal Weather Prediction (NWP), the Weather Research

and Forecasting (WRF) or the COSMO model (Done

et al., 2004; Baldauf et al., 2011) that allow to get

weather forecasts, and also there are many approaches

that focus on determining and verifying the actual ef-

fectiveness of them. This is an important task because

their improvements anticipate some critical situations

such as solar irradiance or ﬂood alerts (Murphy and

Winkler, 1987; Mathiesen and Kleissl, 2011; He et al.,

2009).

The results of these predictions are given as a set

of numerical data, so it might be difﬁcult for the gen-

eral public to understand them. In order to make these

predictions understandable by everybody, it is neces-

sary to transform them into a comprehensible format,

e.g. to translate them into a natural language format.

Many approaches have been developed to handle this

task automatically (Goldberg et al., 1994; Reiter et al.,

2005; Belz, 2008). Although there are many works

dedicated to mathematical and statistical veriﬁcation

of weather forecasts, to the best of our knowledge,

WEBIST 2016 - 12th International Conference on Web Information Systems and Technologies

282

there are no works dedicated to the veriﬁcation task

of worded weather forecasts.

3 THE AEMIX SYSTEM

In this section, we describe our proposed system,

called AEMIX, which identiﬁes and extracts informa-

tion contained in weather forecasts that are expressed

in a natural language format and downloaded from the

Web. This system aims at verifying that weather pre-

dictions match the real observation data in a speciﬁc

date. Figure 3 depicts the complete process which has

been divided into four stages.

The input of the system consists of a set of weather

forecasts, and a spreadsheet with the equivalent real

data from all the observation stations. After a prepro-

cess, which stores all this information in a database,

the system separates the plain text of the forecast into

fragments according to the atmospheric variable de-

scribed, and then each of these fragments is analyzed

with the aid of an ontology.

AEMIX extracts relevant data from those frag-

ments, and transforms them into equivalent numerical

data. Once the system has converted the forecast into

numerical data, it stores the information in a database,

and, ﬁnally, veriﬁes the results. This veriﬁcation is the

output of AEMIX.

We have used meteorological services style guides

to identify in each weather forecast phrases referred to

particular atmospheric variables. Each of these vari-

ables has an associated set of attributes and informa-

tion about the data required to be found. For exam-

ple, the data extracted from the precipitation might be

its typology (drizzle, rain, snow, or hail), its adjec-

tives and quantiﬁcation (weak, moderate, heavy, very

heavy, or torrential), and the temporal evolution of the

atmospheric phenomenon (persistent, frequent, inter-

mittent, etc.).

It is important to remark that there are certain at-

tributes of atmospheric variables that can not be veri-

ﬁed, because observations of them are sparse or non-

existent (for example, the presence of snow). The

three most relevant variables to verify are the temper-

ature, the precipitation and the wind.

As we mentioned before, the extraction process

is guided by an ontology containing the knowledge

about different meteorological variables, and how to

identify and extract them in the text of a forecast. Fol-

lowing, we explain the different stages of the process.

As we can see in Figure 3, in the very ﬁrst stage

(”’Pre-processor”) we perform a number of different

tasks which clean weather forecasts before the data

extraction. These forecasts are downloaded from the

web and they usually have a special format depending

on the meteorological service. The main components

in this stage are:

• Geographical Information: Area where the

weather forecast is valid. Namely, the name of

a region or a country or even a continent.

• Date and time: Date and time of the writing of

the forecast, either by human or by automatic sys-

tems.

• Range or Type of Forecast: Period of the future

which the prediction is valid for. Examples of

these ranges can be one day, two days, a day and

a half, etc.

• Weather Forecast Text: The plain text of the fore-

cast. It usually needs to be cleaned, because

it contains special characters used internally for

communication. These characters are worthless

data which hinder the work of extraction, so they

should be ﬁltered out.

This stage must be implemented with custom

programming according to the data provided by the

meteorological service web page. As soon as AEMIX

has retrieved all the required information, the system

stores it in a database in order to facilitate its access

together with observation data from a standarized

spreadsheet for the same date, also retrieved from the

corporate website of the meteorological service.

In the second stage (”Text Analyzer”), AEMIX

queries the ontology using a custom-made semantic

interface to obtain the text structure that it expects ac-

cording to the type of weather forecast (see Figure 3).

With this information: 1) AEMIX analyzes the fore-

cast text, 2) the system identiﬁes the different possible

meteorological variables, 3) it uses the most suitable

method for cutting off the texts, also according to the

information provided by the ontology, and 4), ﬁnally,

for each variable, the system returns a set of tuples

(<Variable, Sentences>) with the following infor-

mation:

• Variable: Type of atmospheric variable: temper-

ature, precipitation, storms, visibility, cloudiness

or wind.

• Sentences: Group of sentences which that atmo-

spheric variable refers to in a given forecast.

The third stage (”Data Extractor”) is in charge of

extracting the data from a weather forecast and con-

verting them to a numerical format (see Figure 3).

AEMIX: Semantic Veriﬁcation of Weather Forecasts on the Web

283

Figure 3: Stages of AEMIX system to extract numerical data from weather forecasts on the Web. on the Web. The system

aims at verifying them, by using patterns embedded in an ontology that drives the extraction process.

The input is the set of tuples returned by the previ-

ous stage, including: 1) the name of the atmospheric

variable, and 2) its related sentences in the text.

At this stage, AEMIX asks the ontology for the

most suitable methods in order to extract the required

data from the forecast. Therefore, the system owns

WEBIST 2016 - 12th International Conference on Web Information Systems and Technologies

284

several methods which can be applied on the input

text. Using the knowledge of the atmospheric variable

features depicted on the ontology, the system iden-

tiﬁes the sentence format, and consequently, it uses

regular expressions to recover the accurate data.

The output data of this stage are a set of tuples of

elements (<Attribute, Value>) with the following

information:

• Attribute: Particular characteristic of an atmo-

spheric variable. As an example, if we are refer-

ring to temperature, we can mention ”minimum”,

”maximum”, or ”frosts”.

• Value: Possible attribute values. For example, in

the case of wind, the possible values of the at-

tribute ”direction” are {”N”}, {”NE”}, {”SE”},

{”S”}, {”SW”}, {”W”} or {”NW”}.

As previously mentioned, there are certain at-

tributes of an atmospheric variable that can not be ver-

iﬁed. This is the case of frosts: there are not observa-

tions available referring to frosts. Therefore, on these

cases, AEMIX returns a N/A (Not Applicable) value

in order to know that there is no way to continue with

the veriﬁcation process.

When the system has recovered all the data, it

transforms them into numerical information using a

set of direct rules deﬁned in the ontology. Whenever

the attributes are geographical, the values are a set

of geographical points deﬁning the affected area, or

a reserved word, like ”Rest”, which refers to the rest

of the stations in the region included in the weather

forecast, and not mentioned yet. When no location is

indicated, it means that the forecast covers the whole

region.

Therefore, the system consults the ontology

through the semantic interface and obtains for each

prediction a set of atmospheric variables identiﬁed in

the forecast, and for each variable, AEMIX extracts

a set of tuples (<Attribute, Value>). Finally, all

these data are stored in the database.

In the fourth and last stage (”Veriﬁcation”), we

take for granted that we have stored in the database

both observational data and forecast data (see Fig-

ure 3). These data are indexed to the valid date of the

forecast and the observations, so the information for

each atmospheric variable can be crossed and com-

pared separately.

Regarding geographical data, if the prediction

affects the entire region, the forecast is compared

against observational data of all the meteorological

stations. For example, when the forecast says that

there will be weak showers, these will affect the en-

tire region by default. In an attempt to simplify, we

could say if 0 to 2 mm/h of rainfall are collected in

most stations, the prediction will have been correct.

But the veriﬁcation process at meteorological level is

much more complicated, and a detailed explanation is

out of the scope of this work.

If the forecast affects only an area (deﬁned by a

longitude and latitude) using the attribute ”location”,

only those observation stations situated within that

area are taken into account to make the veriﬁcation,

and the predicted atmospheric variable is compared

just against those observations. Therefore, the veriﬁ-

cation of the meteorological observations in an area

will be done against those weather forecasts in the

same area.

Hence, at the end of the process, the AEMIX sys-

tem will house in its database all the detailed infor-

mation about the two types of data (predictions and

observations) related by date. Forecasters and meteo-

rologists will be able to use this information to carry

out data mining actions and perform any desired ver-

iﬁcations.

4 PRELIMINARY RESULTS

To carry out the system testing, we used the informa-

tion provided by the website of the aforementioned

Spanish meteorological service (AEMET). We down-

loaded both the weather forecasts and observational

data, and we have adapted both the ontology and ex-

traction methods to AEMET’s writing style guide.

These extraction methods are based on symbolic pat-

tern rules.

We used a sample of 2,828 weather forecasts cor-

responding to one year of predictions over Galicia re-

gion. The forecasts were classiﬁed into two types:

FPSP75 and FPSP85, respectively corresponding to

one-day or two-day forecasts. Corresponding ob-

servation data from 58 observation stations was also

downloaded from the web, with the complete infor-

mation about longitude and latitude. We obtained

from the same source datasets of temperature and pre-

cipitation during the same year, for a total of 77,339

observation registers.

Regarding the geographical areas, we have iden-

tiﬁed the forecaster’s linguistic uses to describe the

main areas of the Galicia region, and then we intro-

duced them in the AEMIX database, and next we re-

lated them with the corresponding geographical areas

using an easy graphical interface.

Although the experiments are still ongoing, with

our new enhancements, AEMIX achieved very good

results (above 90% of correct results retrieved).

AEMIX: Semantic Veriﬁcation of Weather Forecasts on the Web

285

5 DISCUSSION

In this work, we have presented a new system, called

AEMIX, able to extract information from worded

weather forecasts downloaded from the Web. It aims

at verifying their accuracy by comparing the forecasts

against observation data from meteorological stations.

Weather forecasts are a special type of texts which

use a very speciﬁc language style. This work is usu-

ally carried out by hand by meteorologists, which is

expensive, time-consuming, and subjected to human

errors.

We have introduced a new approach where an on-

tology, which models the meteorological knowledge,

is in charge of guiding the automatic data extraction

from the Web to verify them. This ontology includes

information about the extraction methods to be

applied for each meteorological variable, and also

decides how to split the forecasts by the meteorolog-

ical variables. The architecture lets embed several

methods to perform the data extraction. The ﬁrst tests

seem to indicate that using semantics tools to guide

the extraction process improves the results obtained

by other approaches, and makes them usable in a real

environment.

Technical Considerations.

In this context, we deal with a particular type

of weather forecasts: those expressed in natural lan-

guage. We considered the use of machine learning

techniques, but in agreement with other studies, we

think it is not advised in these kind of contexts be-

cause it is the typical case of use of a text belonging

to a closed domain.

We have taken advantage of the Semantic Web

tools to design an ontology-driven approach which

improves the results of the classical approach. The

ontology is used as a tool to split the weather fore-

casts according to the atmospheric variables, in order

to use the most appropriate extraction methods. This

way of working avoids errors due to ambiguous lan-

guage.

Moreover, it allows us to integrate any extraction

method dynamically with the system, ranging from

custom made parsers based on easy rules (e.g., detect-

ing the presence of certain keywords), to more com-

plex ones (e.g., using complex rules or statistical ap-

proaches). Besides, this methodology based on plac-

ing the extraction parameters in an external ontology

facilitates the maintenance labors and the evolution of

the system because the changes, adjustments, and ex-

tensions can be done in an easier way.

Meteorological Aspects.

Regarding meteorological aspects, there are many

ways of using the information extracted from the fore-

casts. These could be some possibilities:

• To verify the level of accuracy. The data can be

segmented according to the forecaster, the geo-

graphical area, or a speciﬁc atmospheric variable,

to name a few.

• To study the use of the style guides by forecasters,

in order to check ambiguities in the predictions,

the frequency of use of some keywords, or the

geographical features which they mention. This

would allow to ﬁx common problems found in

worded forecasts, such as the heterogeneity of the

predictions (by different human forecasters), the

bad usage of the language, or the too low (or ex-

cessive) risk taken by forecasters when predicting

a difﬁcult meteorological situation.

• To verify data over time, in order to identify

particular trends over different time periods, and

to localize unconscious biases which can plague

the predictions.

Open Tasks.

Our next steps go ﬁrst to complete in detail the ex-

periments, establishing a baseline with current tech-

nology, and checking the improvement experienced

with our approach. From a meteorological point of

view, it would be interesting improve extraction meth-

ods to ensure the reliability of extraction, particularly

in forecasts where major meteorological phenomena

are involved. For instance, an extraordinary tempera-

ture increase, or a very heavy rainfall. With regard to

veriﬁcation, a very deep study of the results should be

made to draw conclusions from a purely meteorolog-

ical perspective, which will be the subject of further

works. Therefore, we do not intend with this work to

evaluate the accuracy of the work of meteorologists,

but offer an integrated solution in order to help to im-

prove the quality of weather forecasts in the future.

ACKNOWLEDGEMENTS

This research work has been supported by the CICYT

project TIN2013-46238-C4-4-R, and DGA-FSE. The

authors also wish to thank Dr. Eduardo Mena and

AEMET for their support.

WEBIST 2016 - 12th International Conference on Web Information Systems and Technologies

286

REFERENCES

Aquin, M. D., Motta, E., Sabou, M., Angeletou, S.,

Gridinoc, L., Lopez, V., and Guidi, D. (2008). To-

ward a new generation of semantic web applications.

Intelligent Systems, IEEE, 23(3):20–28.

Baldauf, M., Seifert, A., F

orstner, J., Majewski, D.,

Raschendorfer, M., and Reinhardt, T. (2011). Oper-

ational convective-scale numerical weather prediction

with the COSMO model: description and sensitivities.

Monthly Weather Review, 139(12):3887–3905.

Belz, A. (2008). Automatic generation of weather forecast

texts using comprehensive probabilistic generation-

space models. Natural Language Engineering,

14(04):431–455.

Berners-Lee, T., Hendler, J., Lassila, O., et al. (2001). The

semantic web. Scientiﬁc american, 284(5):28–37.

Bobed, C., Yus, R., Bobillo, F., Ilarri, S., Bernad, J., Mena,

E., Trillo-Lado, R., and Garrido,

A. L. (2016). Emerg-

ing semantic-based applications. In Semantic Web,

pages 39–83. Springer.

Borobia, J. R., Bobed, C., Garrido, A. L., and Mena, E.

(2014). SIWAM: Using social data to semantically

assess the difﬁculties in mountain activities. In 10th

International Conference on Web Information Systems

and Technologies (WEBIST’14), pages 41–48.

Breslin, J. G., O’Sullivan, D., Passant, A., and Vasiliu, L.

(2010). Semantic web computing in industry. Com-

puters in Industry, 61(8):729–741.

Buey, M. G., Garrido,

A. L., and Ilarri, S. (2014). An ap-

proach for automatic query expansion based on nlp

and semantics. In Database and Expert Systems Ap-

plications, pages 349–356. Springer.

Done, J., Davis, C. A., and Weisman, M. (2004). The

next generation of nwp: Explicit forecasts of convec-

tion using the weather research and forecasting (wrf)

model. Atmospheric Science Letters, 5(6):110–117.

Garrido, A., G

omez, O., Ilarri, S., and Mena, E. (2011).

NASS: News Annotation Semantic System. In 23rd

International Conference on Tools with Artiﬁcial In-

telligence (ICTAI 2011), pages 904–905. IEEE.

Goldberg, E., Driedger, N., and Kittredge, R. (1994). Using

natural-language processing to produce weather fore-

casts. IEEE Expert, 9(2):45–53.

He, Y., Wetterhall, F., Cloke, H., Pappenberger, F., Wil-

son, M., Freer, J., and McGregor, G. (2009). Tracking

the uncertainty in ﬂood alerts driven by grand ensem-

ble weather predictions. Meteorological Applications,

16(1):91–101.

Horrocks, I. (2008). Ontologies and the semantic web.

Communications of the ACM, 51(12):58–67.

Jolliffe, I. T. and Stephenson, D. B. (2012). Forecast veriﬁ-

cation: a practitioner’s guide in atmospheric science.

John Wiley & Sons.

Mason, I. (1982). A model for assessment of weather

forecasts. Australian Meteorological Magazine,

30(4):291–303.

Mathiesen, P. and Kleissl, J. (2011). Evaluation of numer-

ical weather prediction for intra-day solar forecast-

ing in the continental united states. Solar Energy,

85(5):967–977.

Murphy, A. H. and Winkler, R. L. (1987). A general frame-

work for forecast veriﬁcation. Monthly Weather Re-

view, 115(7):1330–1338.

Murphy, A. H. and Winkler, R. L. (1992). Diagnostic veriﬁ-

cation of probability forecasts. International Journal

of Forecasting, 7(4):435–455.

Reiter, E., Sripada, S., Hunter, J., Yu, J., and Davy,

I. (2005). Choosing words in computer-generated

weather forecasts. Artiﬁcial Intelligence, 167(1):137–

169.

AEMIX: Semantic Veriﬁcation of Weather Forecasts on the Web

287