(AHC) (Jain and Dubes, 1998; Grabmeier and
Rudolph, 2002) is applied to the species coordinates
generated by MCA. For example, AHC with the Ward
method and Euclidean distance, pre-establishing four
clusters (to identify observations on the four
quadrants obtained with components C1 and C2),
gives the results shown in Figure 4. For each cluster,
1 through 4, the following description can be made
regarding genera:
• Cluster 1 - Located on the second and third
cuadrants : Abies, Alnus, Pseudotsuga, Quercus,
Arbutus, Salix, Toxodium, Ulmus, Calocedrus,
Cedrus, Celtis, Chamaecyparis, Chirantoendrom,
Cupressus, Fraxinus, Junglans, Legerstroemia,
Liquidambar, Magnolia, Acer, Pinus, Alnus,
Platanus, Podocarpus, Populus.
• Cluster 2 - Centered and spread through the
first, third, and fourth quadrants: Quercus,
Araucaria, Robinia, Rosa, Schinus, Tamarix,
Thuja, Washingtonia, Yucca, Acacia, Cedrus,
Cupressus, Erythrina, Eucalyptus, Acacia,
Ficus, Ginkgo, Grevillea, Jacaranda, Juniperus,
Ligustrum, Magnolia, Olea, Phoenix, Acer,
Pinus, Pittosporum.
• Cluster 3 - In the fourth quadrant: Quercus,
Schinus, Beaucarnea, Buddleia, Buxus,
Calistemon, Cassia, Casuarina, Acacia,
Eucalyptus, Juniperus, Acacia, Ligustrum,
Nerium, Pittosporum, Populus, Prosopis.
• Cluster 4 - In the first quadrant: Prunus, Citrus,
Crataegus, Cydonia, Elaeagnus, Eryobotria,
Ficus, Malus, Morus, Musa and Persea.
These results provide insight into the validity of
the approach presented in this work. For example,
fruit species are found in Cluster 4. Cluster 1
primarily includes large trees appropriate for planting
along streets and in medians and that are tolerant of
high levels of pollution. Cluster 3 includes evergreen
species (not fruit) that can tolerate dryness, soil
salinity and mistreatment.
These results open the way for deeper data
explorations. For example, it would be interesting to
separate Cluster 2 into a group of trees and palms
and to evaluate the differences between species in
Cluster 1 and Cluster 2 because they are the same
genera. However, such an evaluation should be done
with care, considering the diverse characteristics of
Quercus and Pinus presented in this study.
Figure 5: Application of hierarchical clustering with Ward
method and euclidean distance, for 4 clusters.
5 CONCLUSIONS
After considering different options for handling
missing values pertaining to trees species tolerance
of and sensitivity to air pollution in Mexico City and
given that it is not currently possible to obtain reliable
information from official sources, this paper presents
an option for estimating and imputing missing values
of pollution related variables based on MCA.
The proposed approach has important advantages
because MCA is an adequate and well proven
technique for treating categorical variables. The
approach to estimate missing information does not
require to delete information, and it does not add
spurious information to account for missing values.
In theory basic data structure is not modified with
this approach, for example, in terms of importance
or possible associations between individuals and
variables with non-missing information. However it
would be worthwhile to apply other approaches to
estimate missing information and to explore changes
in data structure.
Further explorations with imputed missing values
should be undertaken, changing the number of
clusters generated, including the possibility of
evaluating the results with the assistance of an expert
in arboriculture in analysis of species similarities
within a genus. Nevertheless, the exploratory
results presented in this study agree with associated
variables such as genus and some important species
characteristics. These interpretations are, for the most
part, validated by the clustering results.
HandlingMissingDatainaTreeSpeciesCatalogProposedforReforestingMexicoCity
463