Muniandy and Dayou, 2011) extracted spectral
centroid, Shannon entropy, and Re
́
nyi entropy
features. They used k-Nearest Neighbour (k-NN)
classifier to recognize nine frog species gaining just
a 75.6% accuracy. Wavelet transform was combined
with Mel-Cepstral Fourier Coefficients (MFCCs) in
(Colonna et al., 2012) to classifiy nine species of
anuran using k-NN classifier, whereas Xie (Xie et
al., 2015) used perceptual wavelet packet
decomposition sub-band cepstral coefficients for
identifying ten frog species. All the previous works
used wavelet analysis approach due to its ability to
achieve a high accuracy rate compared to time and
frequency domain features. However, these schemes
are very demanding both in computational power
and buffer size.
Among all mentioned features, MFCC feature set
has been widely used because it provides more
powerful discrimination capability. The authors in
(Evangelista et al., 2014) computed a total of 64
MFCCs to recognize bird species using Support
Vector Machine (SVM) classifier. In (Dong et al.,
2015), MFCC and a spectral ridge were applied on
24 bird sound samples, achieving an accuracy of
71%. A combination of MFCC and LPC were
adopted in (Yuan and Ramli, 2013) to classify eight
frog species using k-NN with an identification rate
of 98%. While in (Noda, Travieso and Sánchez-
Rodríguez, 2016; Colonna et al., 2016), high
recognition rates have been attained for the
classification of anuran species using MFCC and
other features such as energy. However, most of the
previous works are not well suited for WASNs as
they involve performing some kind of
transformation and extracting a large set of features.
In such studies, it is common to prioritize
recognition accuracy even if it results in higher
energy and storage costs.
Other studies developed an automatic
recognition system based on syllable features. The
idea behind such approach is to segment the stream
of sound into syllable units and then derive syllable
features. Colonna et al. (Colonna et al., 2015)
applied ZCR and an energy entropy on 896 syllables
of seven different anuran species. Alternatively, Xie
et al. (Xie et al., 2016) introduced a method to
recognize twenty-four frog species using a set of
different frequency based features in addition to
syllable, Linear Predictive Coding (LPC), and
MFCC features and using a five different machine
learning algorithms. In (Noda, Travieso and
Sánchez-Rodríguez, 2016), a syllable feature used in
conjunction with MFCC, and LFCC features for the
classification of 199 anuran spices using SVM
classifier. The results showed that the proposed set
outperforms the approaches that used only MFCC,
LFCC, or both of them. Also, in (Xie et al., 2015;
Xie et al., 2016), the authors proved that using
syllable features provide higher classification
accuracy compared to MFCC under different levels
of noise contamination.
Most of the presented schemes in reviewed
studies didn’t address the complexity and the power
consumption to prove algorithms’ efficiency. Such
works have also been conducted in the field of
species classification, but the recognition of different
types of animals is not widely known in the
literature. In addition, many current feature
extraction systems are designed for particular
applications, and hence are only appropriate for
identifying a particular type of species. Thus, it
becomes essential to develop an efficient recognition
scheme to identify different animal sounds.
3 GENERAL APPROACH FOR
ACOUSTIC-BASED
LOW-POWER SENSING
We considered the implementation of an object
recognition scheme in a WASN environment. The
network is composed of a set of smart microphone
nodes that are capable to sense and process audio
streams and scalar data. At the set-up process, the
end-user loads those sensors with the target audio-
feature’s descriptor. We assume that each sensor
node will acquire a new acoustic signal in a periodic
manner (over a time interval t). Then, the energy of
the recorded signal will be computed and compared
with a predefined threshold to generate a detection
decision. Upon detection of the presence of a new
object, the sensor node should extract a set of
representative feature descriptors that can uniquely
describe the object.
In the proposed scheme, instead of flooding the
network with a large amount and perhaps useless
transmitted data, a local classifier at each sensor
node will first make a decision on the type of
detected objects based on their extracted feature
vector and the reference descriptor. When the target
object is detected, a notification will be processed
and sent to the base station according to the end-user
requirements, which can be either a detection
notification or vector of features.
The development of such a scheme that meets
the constraints of WASNs is considered a big
challenge. In fact, the design of this scheme requires
SENSORNETS 2018 - 7th International Conference on Sensor Networks
32