
organized hierarchically, which facilitates browsing into
relevant portions of the data archive.
5 CONCLUSIONS AND FUTURE
WORK
We presented the GHSOM, a novel neural network
model based on the self- organizing map. The main
feature of this model is its capability of dynamically
adapting its architecture to the requirements of the
input space. Instead of having to specify the precise
number and arrangement of units in advance, the
network determines the number of units required for
representing the data at a certain accuracy level at
training time. This growth process is guided solely
by the desired granularity of data representation. As
opposed to other growing network architectures, the
GHSOM does not grow into a single large map, but
rather dynamically evolves into a hierarchical
structure of growing maps in order to represent the
data at each level in the hierarchy at certain
granularity. This enables the creation of smaller
maps, resulting in better cluster separation due to the
existence of separated maps. It further allows easier
navigation and interpretation by providing a better
overview of huge data sets.
We demonstrated that both the self-organizing map
and the hierarchical feature map are highly useful
for assisting the user to find his or her orientation
within the document space. The shortcoming of the
self-organizing map, however, is that each document
is shown in one large map and thus, the borderline
between clusters of related and clusters of unrelated
documents are sometimes hard to find. This is
especially the case if the user does not have
sufficient insight into the contents of the document
collection. The GHSOM overcomes this limitation
in that the clusters of documents are clearly visible
because of the architecture of the neural network.
The document space is separated into independent
maps along different layers in a hierarchy. The
similarity between documents is shown in a fine-
grained level in maps of the lower layers of the
hierarchy while the overall organizational principles
of the document archive are shown at higher layer
maps. Since such a hierarchical arrangement of
documents is the common way of organizing
conventional libraries, only small intellectual
overhead is required from the user to find his or her
way through the document space.
An important feature of GHSOM is that, the training
time is largely reduced by training only the
necessary number of units for a certain degree of
detail representation. The benefits of the proposed
approach have been demonstrated by a real world
application from the text classification domain.
Our future work on GHSOM includes fine-tuning
the basic algorithm and applying it to collections in
any language, provided that words as primary tokens
can be identified. This may require special
preprocessing steps for languages as Chinese, where
word boundaries are not eminent from the texts. In
addition, develop a method for setting the threshold
values (τ
m
and τ
u
) automatically according to
application requirements.
REFERENCES
T. Kohonen, “Self-organized formation of topologically
correct feature maps,” Biol. Cybern. vol. 43, 1982,
pp. 59–69.
T.Kohonen, “Self-organizing maps” Berlin, Germany:
Springer verlage, 1998.
K. Lagus, T. Honkela, S. Kaski, and T. Kohonen, “Self-
organizing maps of document collection: A new
approach to interactive exploration” In Proc. Int.
Conf. on Knowledge Discovery and Data Mining
(KDD-96), Portland, OR, vol.36, 1998, pp. 314-322
D. Merkl, “Exploration of text collections with
hierarchical feature maps”. In Proc. Int. ACM
SIGIR Conf. on Information Retrieval (SIGIR'97),
Philadelphia, PA, vol.62, 1997,pp. 412-419
A. Rauber and D. Merkl, “Finding structure in text
archives” In Proc. Europe an Symp. on Artificial
Neural Networks (ESANN98), Bruges, Belgium,
vol.18, 2000,pp.410-419
B. Fritzke, “Growing self-organizing networks -------
Why?” In Proc. Europ Symp on Artificial Neural
Networks (ESANN'96), Bruges, Belgium,
vol.16,1998,pp.222-230.
B. Fritzke, “Growing grid: a self-organizing network
with constant neighborhood range and adaptation
strength” Neural Processing Letters, 1997.
R. Miikkulainen, “Script recognition with hierarchical
feature maps” Connection Science, 2, 1995.
M. Salem, M. Syiam, and A. F. Ayad, “Improving self-
organizing feature map (SOFM) training algorithm
using k-means initialization” In Proc. Int. Conf. on
Intelligent Eng. Systems INES, IEEE,
vol.40,2003,pp.41-46.
M. Porter, “An algorithm for suffix stripping” Program
14(3), pp. 130-137, 1980.
K. Lagus, and S. Kaski, “Keyword selection method for
characterizing text document maps” In Proc of
ICANN99, Ninth International Conference on
Artificial Neural Networks,IEEE,vol 68, 1999,pp.615-
623
ICEIS 2004 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
392