Unsupervised Discovery of Signiﬁcant Candlestick Patterns for

Forecasting Security Price Movements

Karsten Martiny

Institute for Software Systems (STS), Hamburg University of Technology, Hamburg, Germany

Keywords:

Candlestick Patterns, Time Series Analysis, Market Forecasting, Pattern Discovery, Information Extraction,

Unsupervised Learning, Hierarchical Clustering.

Abstract:

Candlestick charts are a visually appealing method of presenting price movements of securities. It has been

developed in Japan centuries ago. The depiction of movements as candlesticks tends to exhibit recognizable

patterns that allow for predicting future price movements. Common approaches of employing candlestick anal-

ysis in automatic systems rely on a manual a-priori speciﬁcation of well-known patterns and infer prognoses

upon detection of such a pattern in the input data. A major drawback of this approach is that the performance

of such a system is limited by the quality and quantity of the predeﬁned patterns. This paper describes a

novel method of automatically discovering signiﬁcant candlestick patterns from a time series of price data and

thereby allows for an unsupervised machine-learning task of predicting future price movements.

1 INTRODUCTION

Candlestick charts are a visually appealing method of

presenting price movements. The method of analyz-

ing candlestick charts in order to predict price move-

ments of a security has been developed in Japan cen-

turies ago. In fact, the use of candlesticks amongst an-

cient Japanese rice traders dates back to the early 18th

century and thereby, this method is established signif-

icantly longer than modern ﬁnancial markets themself

exist. However, this technique remained unknown to

the western world until the early 1990s, when it was

ﬁrst made accessible in (Nison, 1991). It has gained

popularity amongst western analysts ever since.

Due to the increasing importance of machine-

driven trading systems some approaches (as ex-

plained below) have been developedto exploit candle-

stick patterns in algorithmic trading systems. How-

ever, previously proposed methods depend on an ini-

tial manual deﬁnition of signiﬁcant candlestick pat-

terns. This paper describes an unsupervised learn-

ing method that is able to identify signiﬁcant patterns

without any a-priori knowledge and can therefore be

used to develop an adaptive trading system.

This paper is structured as follows. Section 2

provides the preliminaries for analyzing candlestick

charts. In Section 3, a method for an unsupervised

learning process of inferring signiﬁcant candlestick

patterns is described and it is shown how these re-

sults can be used to infer price movement prognoses.

Section 4 presents an evaluation of the proposed tech-

niques’ prediction performance. The paper ﬁnishes

with a summary in Section 5.

2 CANDLESTICK ANALYSIS

To create a candlestick chart, the key points of every

day are used to construct a candle. A day’s key points

consist of its opening, highest, lowest, and closing

price. These data sets are sometimes also referred to

as OHLC-data and will form the actual machine input

subsequently.

As depicted in Figure 1, the construction of a can-

dle works as follows: a rectangle is drawn between

opening and closing price (called the candle’s body),

a line is drawn from the body’s upper edge to the high-

est value (called upper shadow or upper wick) and,

accordingly, another line is drawn from the body’s

lower edge to the lowest value (called lower shadow

or lower wick). Additionally, the candle’s body is col-

ored according to the situation: “rising days” (i.e.,

the closing price is above the opening price) are de-

picted with hollow white bodies, while “falling days”

are marked through solid black bodies.

Note that a candle does not necessarily have to ex-

hibit all of these features: since the opening or closing

values may coincide with a day’s high or low values,

145

Martiny K..

Unsupervised Discovery of Signiﬁcant Candlestick Patterns for Forecasting Security Price Movements.

DOI: 10.5220/0004107701450150

In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR-2012), pages 145-150

ISBN: 978-989-8565-29-7

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

rising candle falling candle

body

upper shadow

lower shadow

Figure 1: Depiction of time intervals as candles: O = open-

ing price, H = highest price, L = lowest price, C = closing

price; if the closing price is above the opening price, the

candle’s body is white, otherwise it is colored black.

there can be candles with only one or even no shadow

at all. The candlestick’s main advantage is that it al-

lows for an instant perception of the market partici-

pants’ attitude. For instance, as explained in (Nison,

2003), large shadows indicate signiﬁcant movements

in both directions and are therefore usually associated

with uncertainty. This could give hints that no direc-

tion is currently in favor or that a turning point is im-

minent.

While candlesticks are able to provide valuable in-

sight into a market’s situation, single candlesticks are

usually regarded as too fragile to allow for a progno-

sis of required reliability. Hence, instead of relying

only on a single candle’s shape, forecasts are based

upon constellations of successive candlesticks. These

so-called Candlestick Patterns usually consist of a se-

ries of three candlesticks with certain properties. In

addition to the candles’ shapes, their positions rela-

tive to each other, as well as the prevailing direction

of movement,are taken into consideration. For a thor-

ough guide to candlestick patterns see (Nison, 1991)

or (Morris, 2006).

In general, candlestick patterns can be classiﬁed

into two categories: reversal patterns indicate an im-

minent turning of the current movement’s direction

while continuation patterns conﬁrm the current move-

ment. Opposed to other established analysis methods,

candlestick patterns only require a succinct time pe-

riod to form a characteristic pattern and thereby emit

a signal. Consequently, it is only natural to utilize

these patterns for short-term forecasts. In fact, dis-

tinguished patterns have the means to forecast the fol-

lowing day’s direction with a rather high certainty, but

by extending the forecast further into the future, its

reliability will quickly diminish. In (Morris, 2006),

statistical correlations between a forecast’s length and

its quality have been analyzed and it was shown that

a feasible hit ratio can be achieved for a maximum

of three days, while after a maximum of seven days

a pattern’s prognosis is hardly able to outperform a

random guess.

Due to their short-term nature, candlestick pat-

terns may be employed to facilitate strategies with

a rather high trading frequency. Also, since other

analyzing methods tend to forecast movements on a

larger scale, they can be augmented with a candle-

stick pattern analysis to pinpoint the exact positions

of turning points and thereby result in an increased

quality of the inferred predictions.

3 UNSUPERVISED DETECTION

OF CANDLESTICK PATTERNS

Previous research in the ﬁeld of candlestick pat-

tern analysis aimed at an automatic detection of pre-

deﬁned signiﬁcant patterns. This task can be seen as

a form of stream-based query answering. Several ap-

proaches of inferring future price prognoses, based

on querying streams for candlestick patterns and in-

corporating a variety of machine learning techniques,

have been proposed: In (Chou et al., 1997) fuzzy

membership functions are combined with induction

trees to infer predictions, (Lee and Jo, 1999) employs

a rule-based expert-system to produce trading recom-

mendations. (Ng et al., 2011) proposes to train a ra-

dial basis function neural network in order to derive

investment decisions. In (Lee et al., 2011) descrip-

tions of candlestick patterns through fuzzy linguis-

tic variables are combined with genetic algorithms in

order to obtain investment decisions. In (Lin et al.,

2011), an autonomous trading system is described,

which learns a trading strategy based on candlestick

patterns and other technical indicators through an

echo state network.

While these works differ in the proposed data min-

ing techniques, all of them share the same basic ap-

proach to the task: before any of these systems are

able to infer predictions, they rely on an initial deﬁ-

nition of signiﬁcant candlestick patterns provided by

a domain expert. These manual deﬁnitions are then

used to obtain training data sets. Though all works

were able to show that their respective approach leads

to the obtainment of valuable analysis results, their

dependencies on a domain expert bear a major draw-

back. The potential performance of all of these sys-

tems is strictly limited by the quality and quantity of

the predeﬁned set of signiﬁcant patterns. It is espe-

cially not possible that these approaches are able to

ﬁnd any supplementary signiﬁcant patterns that may

be contained in training data sets but are unknown to

the domain expert. This work describes an alterna-

tive approach of classifying candlestick patterns that

facilitates an unsupervised learning task. Hence this

approach has the means of discovering all signiﬁcant

latent patterns contained in the training data set with-

KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

146

out having to rely on the (possibly limited) knowledge

of a domain expert and thus enables the development

of truly adaptive trading systems.

3.1 Domain Model

3.1.1 Evidence

As explained before, a particular candlestick is de-

scribed through its OHLC-data and thus these val-

ues are used to form the observation space. In or-

der to prevent the system from inferring dependencies

of certain price levels, it is necessary to avoid abso-

lute values. Instead of directly using the OHLC-data,

the values are scaled such that they denote a relative

change with respect to the opening value. In addition

to resulting in universally applicable descriptions of

candlesticks’ shapes, this normalization has the con-

venient side effect that the evidence space is reduced

because the opening value can be omitted. Conse-

quently, the observation of a particular candlestick

for a time interval at t is described by:





= (H

′

, L

′

) (1)

Next to the included candlestick shapes within each

pattern, it is also necessary to provide information

about the candlestick’s positions relative to each

other. This information is based on the midpoint M

of a candlestick:

− L

+ L

Using these midpoints, the relative position of a can-

dle j with respect to a preceding candle i is deﬁned as

the relative change ∆

of their respective midpoints:

∆

− M

(2)

A candlestick pattern comprises three consecutive

candlesticks and therefore, by using Equations 1 and

2, a particular instance of the evidence e

formed by

a complete candlestick pattern in a certain trend state

T (as explained below) is described by



t−2

, CS

t−1

, CS

, ∆

t−2,t−1

, ∆

t−1,t

) (3)

To complete the description of a particular pattern

it is also required to provide information about the

pattern’s context (i.e., the current trend state). In or-

der to identify this trend state, a very short weighted

moving average (denoted by MA) is used. This indi-

cator calculates the average of the last three candles’

closing values. An upward movement is assumed if

this moving average has been strictly monotonically

increasing for at least two days before a pattern was

formed. Accordingly, a downward movement is as-

sumed for a strictly monotonically decreasing aver-

age. If neither condition is fulﬁlled the context does

not exhibit a clear trend. Thus the trend state T

for a

certain pattern CP

is deﬁned as











1 if MA

t−4

< MA

t−3

< MA

t−2

−1 if MA

t−4

> MA

t−3

> MA

t−2

0 otherwise

(4)

3.1.2 Prognosis

The prognosis model employed in this work predicts

the direction of price movements for a short time pe-

riod. For a prediction period of p days into the future,

the prediction (also called hypothesis) h

is deﬁned

= sgn



∆

t, t+p



. (5)

3.2 Pattern Detection

The central goal of this work is the automatic detec-

tion of signiﬁcant patterns (i.e., patterns that tend to

indicate similar subsequent price movements) solely

from training data without any manually deﬁned a-

priori information. While the task of inferring predic-

tions based on unknown pattern memberships resem-

bles the process of learning memberships of unknown

Gaussian mixture components, this domain bears the

additional challenge that the number of mixture com-

ponents (i.e., the number of signiﬁcant patterns) is

unknown a-priori, too. Thus, conventional clustering

procedures such as EM-clustering, SVM-clustering,

or k-means clustering are not applicable for this task

as all of these methods require an a-priori deﬁnition

of the number of clusters. Since the feature values

of observed candle instances are distributed roughly

evenly, density-based clustering methods are neither

applicable because although they are able to cope with

an unknown number of clusters, the feature distribu-

tion would result in a single large cluster (with the

exception of occasional outliers), provided that the

training set is sufﬁciently large. In order to cope with

Strictly speaking, this deﬁnition would yield three dif-

ferent forecasting states because next to rising and falling

days it may also happen that midpoint positions are ex-

actly equal. However, since a pair of candles virtually

never exhibits exactly the same midpoint values in prac-

tice, the hypothesis space is modeled in a binary way. In

order to achieve an exhaustive deﬁnition, the forecast is im-

plemented with the assumption sgn(0) = −1. Due to the

practical irrelevance one could also keep the third unneeded

state or use other assumptions as well without impairing the

results.

UnsupervisedDiscoveryofSignificantCandlestickPatternsforForecastingSecurityPriceMovements

147

1 12

11 2

5 10 6 8

data points

distance

Figure 2: Schematic depiction of a hierarchical cluster

structure (this type of graph is called dendogram)

both the unknown shapes and numbers of signiﬁcant

patterns, a hierarchical clustering structure is used in

this work that allows to query for signiﬁcant patterns

based on characteristic properties.

3.2.1 Hierarchical Clustering

Hierarchical clustering methods construct an ordered

structure of clusters such that the distance according

to some distance measure between respective cluster

members increases with the hierarchical order of a

cluster. To illustrate this, Figure 2 depicts an example

of a hierarchical cluster structure: in a complete struc-

ture, the topmost cluster contains all elements of the

data set and its children are split such that they form

two clusters with a minimal distance between all re-

spective cluster elements. This work uses agglomera-

tive hierarchical clustering (“bottom-up clustering”),

as explained in (Ma; and Wu, 2007) to construct the

required cluster hierarchy.

A useful distance measure to deﬁne similarities

between different observations in this context is the

euclidean distance. Hence, the distance between two

patterns x and y (of the form described in Equation 3,

containing 11 properties) is deﬁned as:

d(x, y) = kx− yk

∑

i=1

− y

)

After merging two observations into a cluster, the fea-

ture values of the resulting cluster are set to the re-

spective average values of both merging candidates.

Applying agglomerative hierarchical clustering with

this distance measure to the training set of input

patterns results in a structure that provides informa-

tion about the similarity of different observations and

therefore can be used to identify recurring patterns.

Once this structure has been constructed, it can

easily be used to query for signiﬁcant patterns. To

identify signiﬁcant patterns, two properties for each

cluster (i.e., each possible pattern) are deﬁned: First,

it is necessary to ensure that a cluster indeed repre-

sents a recurring pattern and not only some random

effect. In order to ensure recurrence, a cluster must

contain at least a certain minimum number of mem-

bers (representing instances of a particular pattern).

This is denoted by the instance count I

of a pattern P

and is deﬁned as

= # instances for a pattern P

= # elements in the corresponding cluster.

Also, a pattern should exhibit a certain reliability R

i.e., a certain ratio of all cluster members need to be

succeeded by the same directional development. The

reliability of a pattern P is deﬁned as

max # instances with the same outcome

This stresses the importance of considering the in-

stance count: each of the initial clusters only con-

tains a single instance and thus exhibits a reliability

of 100%. Without requiring a certain number of oc-

currences, all of these clusters would be considered

signiﬁcant. By adjusting this parameter, the general-

ization capabilities of the system can be tuned: a low

value will lead to a high amount of discovered distinct

patterns with only few instances each, while increas-

ing this value will result in fewer, but more general,

descriptions of signiﬁcant patterns.

Based upon these properties a pattern is consid-

ered signiﬁcant if both values exceed a certain thresh-

old (i.e., I

≥ I

min

and R

≥ R

min

). Search for these

pattern starts at the bottom of the hierarchical struc-

ture. As soon as a cluster is found that meets the

speciﬁed criteria, it is added to the set of signiﬁcant

patterns and its parents are omitted from the search.

The rationale of this can be understood as follows. If

a pattern is already considered as signiﬁcant, an addi-

tion of further instances bears the danger of diluting

the pattern’s properties by considering auxiliary in-

signiﬁcant instances.

As a result of this procedure, one obtains an

emergent set of signiﬁcant pattern descriptions. This

knowledge is elicited solely from latent information

of the training data and therefore transcends the lim-

itations imposed by relying on the knowledge of a

domain expert. Once the descriptions of signiﬁcant

patterns have been obtained, they can be incorporated

in various analysis approaches in order to infer pre-

dictions of future price movements. To evaluate the

quality of detected patterns, a naive Bayesian classi-

ﬁer is used in the following.

3.3 Inferring Prognoses from

Candlestick Patterns

Once the descriptions of signiﬁcant candlestick pat-

terns have been obtained through agglomerative hier-

KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

148

Figure 3: Bayesian classiﬁer for candlestick pattern match-

ing: The evidence node E (shaded) denotes the actually ob-

served pattern, P denotes the unknown pattern assignment

and H the inferred hypothesis (i.e., the predicted movement

according to Equation 5).

archical clustering, this information can be used with

a naive Bayesian classiﬁer in order to infer predic-

tions regarding future price movements and thereby

evaluate the quality of the previously discovered pat-

terns. The Bayesian network used for this purpose is

depicted in Figure 3. The evidence node in this net-

work represents a random variable for a particular ob-

servation as described in Equation 3. This evidence

is ﬁrst used to determine whether the observation re-

sembles an instance of some known signiﬁcant pat-

tern. Thus, node P is a discrete node that indicates

the cluster membership of the observation. Note that

obviously not every observation can be assigned to a

known pattern. Hence, in addition to the set of known

patterns, an additional “absorbing pattern” is intro-

duced, which indicates that an observation is not an

instance of a known pattern. Once the pattern mem-

bership is determined, this information is used to pre-

dict the direction of future price movements.

The conditional probabilities P(e|p), denoting

that a particular observation e is due to a certain pat-

tern p, are modeled as Gaussians. Hence, the proba-

bility of this observation is a mixture of Gaussians:

P(e) =

n+1

∑

p=1

P(p)P(e|p) =

n+1

∑

p=1

· N (µ

, σ

)

denotes the mixing weight for each component

and can be determined by the prior P(p) which in

turn is determined by the ratio of p’s number of oc-

currences and the size of the training set. Since both

the pattern membership and the associated forecast

are discrete variables, the corresponding conditional

probability p(h|p) is deﬁned through a conditional

distribution table. The values of these probabilities

are acquired automatically as described in the follow-

ing.

4 EVALUATION

In order to to evaluate the performance of the pro-

Table 1: Results for a one-day forecast with I

min

= 30 and

varying R

min

parameters. For each R

min

the number of iden-

tiﬁed patterns, the number of observations that could be

matched to one of the patterns, the corresponding match-

ing ratio and the resulting success ratio of the forecasts are

listed.

To compare these results to conventional approaches, the ta-

ble also lists the forecasting results for applying the static

rule descriptions and using supervised learning, respec-

tively.

method

reliability

min

)

patterns

matches

matching

ratio

success

rate

60% 162 6387 0.43 0.66

unsuper- 70% 103 3961 0.26 0.70

vised 80% 70 2690 0.18 0.72

90% 26 1044 0.07 0.81

100% 3 105 0.01 0.90

supervised N/A 27 989 0.07 0.67

static N/A 27 202 0.01 0.53

posed techniques, a training scenario has been set up

with historical data from the Dow Jones Industrial

Average comprising OHLC-data from February 26,

1932 until October 12, 2011, thus containing 20000

pattern instances. This large data set hast been se-

lected to ensure that the results are not inﬂuenced by

an overﬁtting to short-term effects. To cope with this

large amount of data, the system has been evaluated

with a rolling forecasting procedure (as described for

example in (Tsay, 2010)): initially, the ﬁrst 5000 data

sets including the subsequent directionaldevelopment

have been used to detect latent patterns and to esti-

mate the unknown parameters of the Bayesian clas-

siﬁer afterwards. After this initialization, the system

is used for inferring forecasts for the following 1000

data sets (the “rolling forecasting window”, this time

without subsequent developments). The inferred fore-

casts are then compared to the actual developments

so that a hit ratio of the prognoses can be determined.

Upon completing this test procedure, the tested data is

added to the training set, the system is retrained and

the rolling forecasting window is advanced to the next

1000 datasets. This procedure allows for both a long-

term performance evaluation with the available data

and an adherence to possible additional latent patterns

in the test sets.

Evaluations with varying minimum instance count

values have shown that requiring a minimum of 30

observed instances of any pattern in order to consider

it as signiﬁcant provides a feasible compromise be-

tween an accurate ﬁtting and appropriate generaliza-

tion capabilities. The corresponding results for a one-

day directional forecast with varying R

min

parameters

are listed in Table 1. A comparison of the used reli-

ability thresholds and the according success rates of

the forecasts shows that the resulting success rate re-

UnsupervisedDiscoveryofSignificantCandlestickPatternsforForecastingSecurityPriceMovements

149

ﬂects a rather close tracking of the respective relia-

bility parameter. Thus, by deciding upon a certain

min

, one can tune the success rate to the extreme.

However, there is a clear trade-off between success

rate and matching ratio: e.g., conﬁguring the system

such that it yields a 90% success rate signiﬁcantly de-

creases the matching ratio to 0.01. Consequently, a

signal is generated only approximately twice per year

and is concerned with only forecasting movements of

the following day. Obviously such a sparse existence

of very short-term signals renders this result virtually

useless in practice, although its quality is outstand-

ing. If instead one would for example use a relia-

bility of 70% (i.e., intentionally accepting occasional

false signals), the resulting success rate would still be

highly satisfactory but, on average, signals are gener-

ated more than once per week. Thus, assuming that

a correct prediction yields proﬁts, the overall gain of

any trading strategy employing this parameter setting

would result in signiﬁcantly higher gains.

To compare these results to conventional ap-

proaches, the table also lists the results of applying

the pattern descriptions from (Morris, 2006) to the

test data set (i.e., static pattern matching without any

adaptive component) as well as the results of using

these manual descriptions for training the network

(i.e., a supervised learning task). A comparison of

these results clearly shows that the proposed unsuper-

vised method is able to outperform conventional ap-

proaches with respect to both the number of detected

patterns as well as the resulting success rate.

Tests with forecasts of various durations were able

to conﬁrm the statements from (Morris, 2006) regard-

ing the forecasting ability: a forecast three days into

the future with R

min

= 0.7 resulted in a success rate

of roughly 60% while the success rate of a four-day

forecast was not able to exceed 50% signiﬁcantly and

therefore can hardly be of any use in practice.

5 SUMMARY

For candlestick-based prognoses, well-deﬁned

knowledge regarding signiﬁcant patterns is required,

as discussed in Section 2. In accordance with

previous work, a probabilistic learning approach is

used to infer prognoses based on a set of known

signiﬁcant patterns. Rather than merely using the

inference procedures for computing prognoses

based on predeﬁned patterns, the learning process is

extended to an earlier stage of the data processing.

By pursuing the approach of unsupervised pattern

detection, a self-contained knowledge base can be

created which allows for truly adaptive forecasting

systems. Through the ability of directly identifying

latent patterns from training data, the resulting

system is able to cope with securities that may exhibit

varying behavior. Most importantly, the system’s

performance is independent of any manually spec-

iﬁed knowledge and is therefore insusceptible to

any knowledge deﬁciencies in both quantity and

precision, which are virtually inevitable if a human

domain expert is tasked with specifying his expertise

in a machine-readable form. By replacing the manu-

ally speciﬁed pattern sets of the approaches discussed

in Section 3, the presented procedure of identifying

signiﬁcant candlestick patterns can be combined with

various methods of inferring the actual forecasts

and thus can be used as a an extension for existing

systems. Additionally, it is interesting to note that the

feature selection used by human analysts is obviously

a suitable choice for a machine-learning approach as

well.

REFERENCES

Chou, S., Hsu, H., Yang, C., and Lai, F. (1997). A stock

selection DSS combining AI and Technical Analysis.

Annals of Operations Research, 75:335–353.

Lee, C.-H. L., Liaw, Y.-C., and Hsu, L. (2011). Invest-

ment decision making by using fuzzy candlestick pat-

tern and genetic algorithm. In Fuzzy Systems (FUZZ),

2011 IEEE International Conference on, pages 2696

–2701.

Lee, K. and Jo, G. (1999). Expert system for predicting

stock market timing using a candlestick chart. Expert

Systems with Applications, 16(4):357 – 364.

Lin, X., Yang, Z., and Song, Y. (2011). Intelligent stock

trading system based on improved technical analysis

and Echo State Network. Expert Systems with Appli-

cations, 38(9):11347 – 11354.

Ma;, G. G. C. and Wu, J. (2007). Data Clustering: The-

ory, Algorithms, and Applications. SIAM, Society for

Industrial and Applied Mathematics.

Morris, G. L. (2006). Candlestick Charting Explained:

Timeless Techniques for Trading Stocks and Futures.

McGraw-Hill Professional, 3rd edition.

Ng, W. W. Y., Liang, X.-L., Chan, P. P. K., and Yeung, D. S.

(2011). Stock investment decision support for Hong

Kong market using RBFNN based candlestick mod-

els. In Machine Learning and Cybernetics (ICMLC),

2011 International Conference on, volume 2, pages

538 –543.

Nison, S. (1991). Japanese Candlestick Charting Tech-

niques: A Contemporary Guide to the Ancient Invest-

ment Techniques of the Far East. New York Institute

of Finance.

Nison, S. (2003). The Candlestick Course. Wiley & Sons.

Tsay, R. S. (2010). Analysis of Financial Time Series (Wi-

ley Series in Probability and Statistics). Wiley, 3rd

edition.

KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

150