VOICE FAKE DETECTION: MODERN TECHNIQUES AND APPLICATIONS FOR UKRAINIAN LANGUAGE
DOI:
https://doi.org/10.31891/2219-9365-2025-82-5Keywords:
fake voice detection, speech synthesis, voice conversion, Ukrainian language, ASVspoof, datasets, evaluation metrics, EER, WEER, DSRAbstract
The subject matter of this article is the detection of fake voices generated by text-to-speech (TTS) synthesis and voice conversion (VC) technologies, with a focus on their application to the Ukrainian language. The goal is to analyze modern datasets, competitions (ASVspoof, ADD Challenge), and detection algorithms to assess the feasibility of integrating Ukrainian data into international frameworks or developing a dedicated dataset. This approach addresses not only the shortage of Ukrainian-language recordings in widely used repositories—many of which are limited to English or Chinese—but also the unique phonetic structures, diverse accents, and morphological complexities inherent to Ukrainian. By comparing performance across multiple spoofing scenarios, researchers can more accurately quantify how language-specific features influence classification accuracy, ultimately informing more robust detection frameworks. The tasks solved in the article: to examine existing datasets and their suitability for Ukrainian, evaluate the performance of fake voice detection systems using Equal Error Rate (EER), Weighted EER (WEER), and Detection Success Rate (DSR), and determine the best approach—expanding ASVspoof or creating a new resource. The methods used include systematic analysis, dataset comparison, and performance evaluation of modern synthesis systems like ElevenLabs, Assembly AI, and Tacotron. The results show that adapting fake voice detection systems to the Ukrainian language enhances accuracy and robustness. Moreover, targeted inclusion of different regional dialects and speaker profiles emerges as a key factor in maintaining high Detection Success Rate (DSR) values. The findings highlight that advanced neural vocoders, which replicate fine-grained prosodic and timbral nuances, necessitate specialized countermeasures able to discern subtle synthetic artifacts. Consequently, the study underscores the importance of iterative dataset refinement, periodic algorithmic updates, and cross-lingual benchmarking to sustain robust performance against evolving voice spoofing threats. Conclusions. The study confirms that integrating Ukrainian-language data into international datasets or developing a specialized dataset significantly improves detection reliability. The scientific novelty lies in: 1) the first systematic analysis of Ukrainian fake voice detection; 2) identification of key factors affecting detection performance; 3) recommendations for improving dataset structures and algorithm adaptation for Ukrainian speech.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Іван ВИНОГРАДОВ

This work is licensed under a Creative Commons Attribution 4.0 International License.