
available, to check whether our results on the small
sample are similar to the whole dataset. In the non-
speech approach, instead of the MFCCs, which are
designed based on the logarithmic scale of the human
auditory system, other features providing the same
resolution in all frequency ranges might also yield
better results since the features are extracted from
noise-like signals and not speech.
ACKNOWLEDGEMENTS
This research was supported by the Hellenic Foun-
dation for Research and Innovation (H.F.R.I.) under
the “2nd Call for H.F.R.I Research Projects to support
Faculty Members & Researchers” (Project Number:
3888).
REFERENCES
Alsulaiman, M., Muhammad, G., Bencherif, M. A., Mah-
mood, A., and Ali, Z. (2013). Ksu rich arabic speech
database. Information (Japan), 16(6 B):4231–4253.
Baldini, G. and Amerini, I. (2019). Smartphones identi-
fication through the built-in microphones with con-
volutional neural network. IEEE Access, 7:158685–
158696.
Baldini, G. and Amerini, I. (2022). Microphone identifica-
tion based on spectral entropy with convolutional neu-
ral network. In Proc. 2022 IEEE International Work-
shop on Information Forensics and Security (WIFS),
pages 1–6. IEEE.
Berdich, A., Groza, B., Levy, E., Shabtai, A., Elovici,
Y., and Mayrhofer, R. (2022). Fingerprinting smart-
phones based on microphone characteristics from
environment affected recordings. IEEE Access,
10:122399–122413.
Berdich, A., Groza, B., and Mayrhofer, R. (2023). A survey
on fingerprinting technologies for smartphones based
on embedded transducers. IEEE Internet of Things
Journal, 10(16):14646–14670.
Davis, S. and Mermelstein, P. (1980). Comparison of para-
metric representations for monosyllabic word recog-
nition in continuously spoken sentences. IEEE Trans-
actions on Acoustics, Speech, and Signal Processing,
28(4):357–366.
Dempster, A., Laird, N., and Rubin, D. (1977). Maxi-
mum likelihood from incomplete data via the em algo-
rithm. Journal of the Royal Statistical Society: Series
B (methodological), 39(1):1–22.
Garcia-Romero, D. and Espy-Wilson, C. Y. (2010). Au-
tomatic acquisition device identification from speech
recordings. In Proc. 2010 IEEE International Con-
ference on Acoustics, Speech, and Signal Processing,
pages 1806–1809. IEEE.
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G.,
and Pallett, D. S. (1988). Getting started with the
darpa timit cd-rom: An acoustic phonetic continuous
speech database. National Institute of Standards and
Technology (NIST), Gaithersburgh, MD, 107:16.
Giganti, A., Cuccovillo, L., Bestagini, P., Aichroth, P., and
Tubaro, S. (2022). Speaker-independent microphone
identification in noisy conditions. In Proc. 2022 30th
European Signal Processing Conference (EUSIPCO),
pages 1047–1051. IEEE.
Hanilci, C., Ertas, F., Ertas, T., and Eskidere,
¨
O. (2011).
Recognition of brand and models of cell-phones from
recorded speech signals. IEEE Transactions on Infor-
mation Forensics and Security, 7(2):625–634.
Hanilc¸i, C. and Kinnunen, T. (2014). Source cell-phone
recognition from recorded speech using non-speech
segments. Digital Signal Processing, 35:75–85.
Khan, M. K., Zakariah, M., Malik, H., and Choo, K.-K. R.
(2018). A novel audio forensic data-set for digital
multimedia forensics. Australian Journal of Forensic
Sciences, 50(5):525–542.
Kotropoulos, C. (2014). Source phone identification using
sketches of features. IET Biometrics, 3(2):75–83.
Kotropoulos, C. and Samaras, S. (2014). Mobile phone
identification using recorded speech signals. In Proc.
2014 19th International Conference on Digital Signal
Processing, pages 586–591. IEEE.
Leonzio, D. U., Cuccovillo, L., Bestagini, P., Marcon, M.,
Aichroth, P., and Tubaro, S. (2023). Audio splicing
detection and localization based on acquisition device
traces. IEEE Transactions on Information Forensics
and Security, 18:4157–4172.
Maher, R. (2009). Audio forensic examination. IEEE Signal
Processing Magazine, 26(2):84–94.
Martin, R. (2001). Noise power spectral density estimation
based on optimal smoothing and minimum statistics.
IEEE Transactions on Speech and Audio Processing,
9(5):504–512.
Qamhan, M., Alotaibi, Y. A., and Selouani, S. A. (2023).
Transformer for authenticating the source microphone
in digital audio forensics. Forensic Science Interna-
tional: Digital Investigation, 45:301539.
Reynolds, D., Quatieri, T., and Dunn, R. (2000). Speaker
verification using adapted gaussian mixture models.
Digital Signal Processing, 10(1-3):19–41.
Reynolds, D. A. (1997). HTIMIT and LLHDB: speech cor-
pora for the study of handset transducer effects. In
Proc. 1997 IEEE International Conference on Acous-
tics, Speech, and Signal Processing, volume 2, pages
1535–1538. IEEE.
Sohn, J., Kim, N. S., and Sung, W. (1999). A statistical
model-based voice activity detection. IEEE Signal
Processing Letters, 6(1):1–3.
Zeng, C., Feng, S., Zhu, D., and Wang, Z. (2023). Source
acquisition device identification from recorded audio
based on spatiotemporal representation learning with
multi-attention mechanisms. Entropy, 25(4):626.
Zeng, C., Zhu, D., Wang, Z., Wang, Z., Zhao, N., and
He, L. (2020). An end-to-end deep source record-
ing device identification system for web media foren-
sics. International Journal of Web Information Sys-
tems, 16(4):413–425.
ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods
800