The KMA (Korea Meteorological Administration)
has an operational 12km resolution global
forecasting system utilizing the Unified Model. The
UM is run twice a day (00 and 12 UTC) producing
forecasts from 6 hours to 66 hours at a 3 hours
interval. The total 37 potential predictors of UM that
were employed in our work including temperature,
humidity, wind speed and accumulated rainfall as
shown in Table I.
2.3 MOS (Model Output Statistics)
Numerical weather prediction models contain
numerous parameterizations for physical processes
and numerical stability. Parameterizations are based
on physical laws, but typically contain parameters
whose values are not known precisely.
The MOS technique aims at correcting current
forecasts based on statistical information gathered
from past forecasts. In its most popular form, it is
based on a linear relation between the reference
variables that we want to predict a set of model
predictors at a certain lead time The MOS currently
used in KMA for short term prediction of
temperature has adopted a linear regression with
equation (1). It consists of a linear combination of
predictors (or predictor variables). It is a
compensated amount for the corrected forecast.
∆WSS
⋯
(1)
where,
, i=1, …, N, represents one of the
potential predictors in Table 1.
As before mentioned in the introduction, one of
the problems with this method is that the entire large
number of predictor variables should be included to
construct a MOS model for diverse situations. It may
be suffering from multicollinearity which the
coefficient estimates of the multiple regression may
change erratically in response to small changes in
the model or the data by multiple predictor variables
in a regression model are highly correlated. The
other problem is that the above linear operation is
inappropriate to model non-linear relationships
between the MOS for temperature prediction and its
predictor variables.
3 GENETIC PROGRAMMING
BASED MODEL OUTPUT
STATISTICS
In recent years, evolutionary optimization
techniques based on Darwinian principles have
become popular to solve complex NP hard problems.
Genetic programming is an extension of the genetic
algorithm and can manipulate variable-sized entities.
The tree representation of GP chromosomes, as
compared with the string representation typically
used in GA, gives GP more flexibility to encode
solution representations for many model design and
optimization applications.
The GP algorithm starts with an initial
population of arbitrarily generated individuals.
These individuals, represented by trees, consist of
functions and terminals that are suitable for a
specific problem. GP builds new trees by repeatedly
selecting from a function set (the collection of items,
which may appear as nodes in a tree) and stringing
them together. The termination criterion may include
a maximum number of generations to be run as well
as a problem-specific success predicate. Next, each
individual of the population is classified by a fitness
function that is defined by the programmer and
obtains the aptitude of the individual during the
course of its adaptation. As such, a new population
is created by applying the genetic operators of
reproduction, crossover and mutation to individuals
that are selected according to their performance, and
the previous generation is replaced.
The most commonly used form of crossover is
subtree crossover. Given two parents, subtree
crossover randomly (and independently) selects a
crossover point (a node) in each parent tree. Then, it
creates the offspring by replacing the subtree rooted
at the crossover point in a copy of the first parent
with a copy of the subtree rooted at the crossover
point in the second parent, as illustrated in Figure 2.
The most commonly used form of mutation in GP
(which we will call subtree mutation) randomly
selects a mutation point in a tree and substitutes the
subtree rooted there with a randomly generated
subtree. This is illustrated in Figure 3.
An example of GP MOS regression by the GP
tree is shown in Figure 4. Compared to the equation
(1) of the linear regression, a GP based MOS can
Figure 2: Crossover operation of GP.