ing job posting templates. A similar procedure can
be found in papers by Macskassy (Macsakssy and
Provost, 2001). The key to his approach is the user’s
specification to label historical documents. These
data then form a training corpus to which inductive
algorithms will be applied to build a text classifier.
In Lavrenko (Lavrenko et al., 2000) a set of news
is correlated with each trend. The goal is to learn
a language model correlated with the trend and use
it for prediction. A language model determines the
statistics of word usage patterns among the news in
the training set. Once a language model has been
learned for every trend, a stream of incoming news
can be monitored and it can be estimated which of
the known trend models is most likely to generate the
story.
Compared to our investigation there are two dif-
ferent approaches. One difference is that Lavrenko
uses his models of trends and corresponding news
only for day trading. The weak point of this ap-
proach is that it is not clear how quickly the market re-
sponds to news releases. Lavrenko discusses this but
the problem is that it is not possible to isolate market
responses for each news story. News build a context
in which investors decide what to buy or sell. Fresh
news occur in the context of older news and may have
a different impact.
In (Kroha and Baeza-Yates, 2005), the relevance
of properties of large sets of news and long-term mar-
ket trends was investigated using bags of news for
classification. In (Kroha et al., 2006), the method was
improved so that all news stories were separated from
each other and the fine-grain classification was pro-
vided. The obtained results were of a new quality but
the problems of statistical methods of classification
still remain. In the next chapter we present our new
solution.
3 GRAMMAR FOR NEWS
TEMPLATES
Some features that are important for the classification
are given by the sentence structure and not by the term
frequency. In the example bellow, the both news sto-
ries have the same term frequency but completely dif-
ferent meaning.
Example 1:
News story 1: ”XY company closed with a loss
last year but this year will be closed with a profit”.
News story 2: ”XY company closed with a profit
last year but this year will be closed with a loss.
(End of example)
There are grammatical constructions changing the
meaning of a sentence that would be derived from
phrases.
Example 2:
”Lexmark’s net income rose 12 % but the com-
pany warned that an uncertain economy and price
competition could weigh on future results.”
(End of example)
Example 3:
”Lexmark’s net income rose 12 % but it did not
achieved the earnings expectations.”
(End of example)
To overcome the problem presented above, i.e.
two sentences may exist that have the same term fre-
quency but completely different meaning, we have
written a grammar describing grammatical construc-
tions in English that usually bring positive or negative
meaning to a sentence.
We collected news stories that can have an impor-
tant influence (at least in our opinion) on markets and
divided them accordingly to their content and struc-
ture into positive and negative ones.
Investigating the positive news we found phrases
like e.g. new contracts, cost cutting, net rose, profit
surged, jump in net profit, net doubled, earnings dou-
bled, jump in sales, huge profit, income rose, strong
sales, upgraded, swung to a profit.
In negative news we found e.g. accounting prob-
lems, decline in revenue, downgraded, drop in earn-
ings, drop in net income, expectations down, net fell,
net loss, net plunged, profile plunged, profit reduced,
earnings decline, profits drop, slashed forecast, low-
ered forecast, prices tumbled.
We also investigated the sentence structure to clas-
sify news like: ”Dell’s profit fell 11% due to a
tax charge, but operating earnings jumped 21%.” or
”EBay said earnings rose 44% but narrowly missed
Wall Street expectations, sending shares down 12%
in after-hours trading.” We focused the parts of sen-
tences that denote future event before last events be-
cause this is the way the investors do it.
In general, we can observe some repeating tem-
plates in news stories that can be used for association
of news into groups and estimation of their meaning.
For example, we can distinguish the following
templates:
1. New big contract
2. Acquisitions
3. Income rose
4. Income fell
5. Bad past but good future
ICEIS 2007 - International Conference on Enterprise Information Systems
260