[16]. Furthermore, the density estimate output by kMER can be written in terms of a
mixture distribution where the kernel functions represent the component Gaussian den-
sities with equal prior probabilities, providing an heteroscedastic, homogeneous mix-
ture density model [16] whose log-likelihood function can be computed just like in the
GTM case (see Eq. 1 above).
3 Analysis
Once we have trained the SOM, a number of summaries of its structure are routinely
extracted and analyzed. In particular, here we consider Sammon’s projections, median
interneuron distances, and dataloads. We now review each of this basic tools in turn.
To visualize high-dimensional SOM structures, use of Sammon’s projection is cus-
tomary. Sammon’s map provides a useful global image while estimating all pairwise
Euclidean distances among SOM pointers and projecting them directly onto 2D space.
Thus, since pointer concentrations in data space will tend to be maintained in the pro-
jected image, we can proceed to identify high-density regions directly on the projected
SOM. Furthermore, by displaying the set of projections together with the connections
between immediate neighbours, the degree of self-organization in the underlying SOM
structure can be assessed intuitively in terms of the amount of overcrossing connections.
Interneuron distance or proximity information has also been traditionally used for
cluster detection in the SOM literature. Inspection of pointer interdistances was pio-
neered by Ultsch, who defined the unified-matrix (U-matrix) to visualize Euclidean dis-
tances between neuron weights in Kohonen’s SOM. Here we consider the similar me-
dian interneuron distance (MID) matrix. Each MID entry is the median of the Euclidean
distances between the corresponding pointer and all pointers belonging to a star-shaped,
fixed-radius neighborhood containing typically eight units. The median can be seen as a
conservative choice; more radical options based on extremes can also be implemented.
To facilitate the visualization of pointer concentrations, a linear transformation onto a
256-tone gray scale is standard (the interpretation here is that the lower the value, the
darker the cell).
On the other hand, the number of data vectors projecting onto (won by) each unit,
namely the neuron dataload, is the main quantity of interest for UDL monitoring pur-
poses. Again, to easily visualize the dataload distribution over the map, a similar gray
image is computed, namely, the DL-matrix (note that, in this case, darker means higher).
The main idea in UDL is that, in the truly equiprobabilistic case, each neuron would
cover about the same proportion of data, that is, a (nearly) uniform DL-matrix should
be obtained. Hence, training is stopped as soon as the first signs of having reached this
state are noticed [7]. Note that we use the UDL stopping policy as a heuristic for the
optimal value of the final adaptation radius in SOM-B and SOM-Cx.
The training strategy for cluster analysis is thus formally described as follows. First
train the SOM network until a (nearly) uniform DL-matrix is obtained and Sammom’s
projection shows a good level of organization. Compute the MID and DL matrices
associated to this map. We stress that we do not use the maps obtained by training all
the way (which yield much worse results).
54