
7 CONCLUSION
We examined the influence of varying number of
speakers and speech samples in privacy evaluation
with ASV. The experimental results demonstrated that
a very small subset of speakers or utterances per
speaker might produce unreliable EER that create a
misleading impression of high privacy protection by
the evaluated VC-model. We further showed that we
could decrease the computation time needed for eval-
uation by 99% by reducing the number of speakers
and samples per speaker, while still upholding the re-
liability of the results. We anticipate that our research
could offer insights for conducting privacy evalua-
tion in a way that ensures the validity of the results
and their applicability to the greatest possible number
of scenarios. Experimentation on further VC-models
and with more challenging datasets would provide ad-
ditional contributions to generalizability.
ACKNOWLEDGEMENTS
This research has been partly funded by the Federal
Ministry of Education and Research of Germany in
the project Medinym and partly funded by the Volk-
swagen Foundation in the project AnonymPrevent.
REFERENCES
Baas, M., van Niekerk, B., and Kamper, H. (2023). Voice
conversion with just nearest neighbors. arXiv preprint
arXiv:2305.18975.
Chen, S., Wang, C., Chen, Z., Wu, Y., Liu, S., Chen, Z.,
Li, J., Kanda, N., Yoshioka, T., Xiao, X., Wu, J.,
Zhou, L., Ren, S., Qian, Y., Qian, Y., Wu, J., Zeng,
M., Yu, X., and Wei, F. (2022). Wavlm: Large-scale
self-supervised pre-training for full stack speech pro-
cessing. IEEE Journal of Selected Topics in Signal
Processing, 16(6):1505–1518.
Franzreb, C., Polzehl, T., and M
¨
oller, S. (2023). A compre-
hensive evaluation framework for speaker anonymiza-
tion systems. In 3rd Symposium on Security and Pri-
vacy in Speech Communication, pages 65–72, ISCA.
ISCA.
Ioffe, S. (2006). Probabilistic linear discriminant analy-
sis. In Computer Vision–ECCV 2006: 9th European
Conference on Computer Vision, Graz, Austria, May
7-13, 2006, Proceedings, Part IV 9, pages 531–542.
Springer.
Kong, J., Kim, J., and Bae, J. (2020). Hifi-gan: Genera-
tive adversarial networks for efficient and high fidelity
speech synthesis.
Li, Y. A., Zare, A., and Mesgarani, N. (2021). Starganv2-
vc: A diverse, unsupervised, non-parallel framework
for natural-sounding voice conversion.
Meyer, S., Miao, X., and Vu, N. T. (2023). Voicepat: An
efficient open-source evaluation toolkit for voice pri-
vacy research.
Mohammadi, S. H. and Kain, A. (2017). An overview of
voice conversion systems. Speech Communication,
88:65–82.
Park, S. J., Yeung, G., Kreiman, J., Keating, P. A., and Al-
wan, A. (2017). Using voice quality features to im-
prove short-utterance, text-independent speaker veri-
fication systems. In Interspeech, pages 1522–1526.
Sholokhov, A., Kinnunen, T., Vestman, V., and Lee, K. A.
(2020). Voice biometrics security: Extrapolating
false alarm rate via hierarchical bayesian modeling of
speaker verification scores. Computer Speech & Lan-
guage, 60:101024.
Srivastava, B. M. L., Maouche, M., Sahidullah, M., Vin-
cent, E., Bellet, A., Tommasi, M., Tomashenko,
N., Wang, X., and Yamagishi, J. (2022). Privacy
and utility of x-vector based speaker anonymization.
IEEE/ACM Transactions on Audio, Speech, and Lan-
guage Processing, 30:2383–2395.
Tomashenko, N., Wang, X., Miao, X., Nourtel, H., Cham-
pion, P., Todisco, M., Vincent, E., Evans, N., Yamag-
ishi, J., and Bonastre, J.-F. (2022). The voiceprivacy
2022 challenge evaluation plan.
Yamamoto, R., Song, E., and Kim, J.-M. (2020). Parallel
wavegan: A fast waveform generation model based on
generative adversarial networks with multi-resolution
spectrogram.
On the Effect of Dataset Size and Composition for Privacy Evaluation
517