and clustering method. Figure 3 shows the ARI per-
formance (%) for each number of features and for
each clustering method in order to select the best
number of features to use in ACC recognition. This
study used as ACC data the FCHA database and all
implemented features from four domains: statistical,
time, frequency and time-frequency, presented in Ta-
ble 1.
Figure 3 suggests higher ARI percentages in 4 to
7 features. For a more detailed analysis, table 2 shows
the ARI performances achieved with different cluster-
ing methods when the framework uses the best 4 to 7
feature types (sets A, B, C and D respectively) and all
features.
From table 2, it can be concluded that the set B
is the best set for K-means and Ward performances
with 89.97% ± 9.97% and 88.56% ± 11.72%, respec-
tively. On the other hand, the set C showed higher per-
formance for the DBSCAN method with 80.43% ±
6.29% while the set D showed best performance for
the Affinity Propagation with 81.19% ± 5.99%. For
this reason, any choice from set B, C and D is ac-
ceptable. For K-means, Affinity Propagation, DB-
SCAN and Ward, the clustering performance values
are 84.54% ± 9.23% ,81.19% ± 5.99%, 79.84% ±
10.75% and 84.73% ± 9.00% for set D, 89.97% ±
9.97%, 76.85% ± 10.30%, 78.12% ± 10.95% and
88.56% ± 11.72% for set B. Overall, sets B and D
showed similar computing times (with difference of
approximately 13 seconds) and the best accuracy val-
ues. In the present study set D is the set of the best 7
features and it was chosen for the formed framework.
After selecting the best number as 7 features, the
best group of features was found through a histogram,
where the occurrences of each feature type were rep-
resented. The 10 most used features in each clustering
method were pooled, some of them belonging to the
same feature type, presented in the figure 4.
The histogram shown in figure 4 suggested that
the Forward Feature Selection algorithm used with
a higher frequency the Log Scale Power Bandwidth,
Root Mean Square, Total Energy, Autocorrelation,
Variance, Wavelet Coefficients and the Mean for HAR
systems. Therefore, these feature types are the most
used and promising features for the developed frame-
work.
Furthermore, figure 4 suggested that the Log Scale
Power Bandwidth occurs more frequently (over 20%)
than all the other types of features (with less than 20%
in all occurrences). The Log Scale Power Bandwidth
algorithm involved complex stages and offered a wide
number of coefficients as output. The resulting data
from those 40 output coefficients are complementary.
One particular coefficient tended to be more sensible
in activity distinction due to the variation in behav-
ior over time while the other coefficient may identify
better other different tasks. Thus, by using this type
of feature, all these coefficients are used together and
there will be more information related to the activity
distinction compared with other type of features with
fewer information and lower number of coefficients.
Moreover, the Log Scale Power Bandwidth fea-
ture considered data from the lower frequencies. A
detailed analysis in this frequency range suggested
that there was important information for activity
recognition in accelerometry.The information located
at low frequencies was preserved due to the elimina-
tion of the filtering step in the signal processing stage.
Therefore, no information was lost and the GA com-
ponent was maintained in the ACC data.
It was possible to observe from figure 5, and in op-
position to others features, that each Log Scale Power
Bandwidth coefficient showed an overall distinction
for all activities carried out by the volunteers. There-
fore the choice of this type of feature from the For-
ward Feature Selection as the best feature is justified
for its greater ability for activity recognition.
Some difficulties referred in (In
ˆ
es Prata Machado,
2014) such as the hard discrimination between sit-
ting and standing positions and between walking and
running activities were also identified in this work.
These difficulties were subdued due to the presence
of the GA component in the processed data and the
use of the Log Scale Power Bandwidth and Wavelet
coefficients as features. The Horizon Plot in figure 5
showed the variation of six Log Scale Power Band-
width coefficients, six Wavelet coefficients and one
coefficient of the Autocorrelation, Mean, Root Mean
Square, Total Energy and the Variance from the x-axis
component over time. It is possible to observe that
feature types such as Log Scale Power Bandwidth and
Wavelets are important for the standing and sitting po-
sitions distinction as well in many other tasks.
4.2 Hidden Markov Models Application
All the existing transitions in the test set (predicted la-
bels from the clustering algorithms) with lower prob-
ability of occurrence may be a consequence of clus-
ter miscalculation. These transition probabilities were
gathered from the ground truth (train data) and all
transitions with low probability of occurrence are
avoided and replaced by a more likely transition.
The influence of the Hidden Markov Model ap-
plication and its improvement (in %) is presented in
figure 6 and in table 3. All implemented features
were used in this study and only the FCHA database
was analyzed. The improvement values shown in the
HumanActivityRecognitionBasedonNovelAccelerometryFeaturesandHiddenMarkovModelsApplication
81