AI powered voice synthesizer

V. Vanaja; Venkatesham Tunge; Nithin Kumar Kanagala; Harsha Vardhan Bhumandla; Shruti Kana

doi:10.30574/wjaets.2025.15.2.0590

V. Vanaja, Venkatesham Tunge, Nithin Kumar Kanagala ^*, Harsha Vardhan Bhumandla and Shruti Kana

Department of CSE (Data Science), ACE Engineering College, Hyderabad, Telangana, India.

Research Article

World Journal of Advanced Engineering Technology and Sciences, 2025, 15(02), 663-671

Article DOI: 10.30574/wjaets.2025.15.2.0590

DOI url: https://doi.org/10.30574/wjaets.2025.15.2.0590

Publication history

Received on 22 March 2025; revised on 02 May 2025; accepted on 04 May 2025

Abstract

The AI Voice Synthesizer is an advanced real-time, multilingual voice cloning system that utilizes state-of-the-art deep learning techniques to generate personalized speech with high naturalness and accuracy. Built on the open-source Coqui.ai’s XTTSv2 framework, the system enables users to synthesize speech using their own voice—or any voice sample—by analyzing just a few seconds of audio. It then uses this voice profile to generate natural-sounding speech in multiple languages, even those the original speaker has never spoken, offering a revolutionary leap in the field of synthetic speech and human-computer interaction.

Traditional text-to-speech (TTS) systems often suffer from robotic tone, lack of personalization, limited language support, and high latency. In contrast, this project provides a lightweight, low-latency (<200 ms), and user-friendly platform that supports cross-lingual, few-shot voice cloning. Designed with modularity in mind, the system consists of several independent components: speaker embedding extraction, multilingual text processing, real-time speech synthesis, and a web-based front end. These components are integrated into a seamless workflow that is intuitive and accessible for non-technical users, while also being scalable and customizable for developers and researchers.

Keywords

Real-Time Speech Synthesis; Few-Shot Text-To-Speech; Multilingual TTS; Coqui.AI Speaker Embedding; Personalized Synthetic Voice; Real-Time Voice Cloning System

Download Article PDF

https://wjaets.com/sites/default/files/fulltext_pdf/WJAETS-2025-0590.pdf

Preview Article PDF

How to cite this article

V. Vanaja, Venkatesham Tunge, Nithin Kumar Kanagala, Harsha Vardhan Bhumandla and Shruti Kana. AI powered voice synthesizer. World Journal of Advanced Engineering Technology and Sciences, 2025, 15(02), 663-671. Article DOI: https://doi.org/10.30574/wjaets.2025.15.2.0590.

AI powered voice synthesizer

V. Vanaja, Venkatesham Tunge, Nithin Kumar Kanagala ^*, Harsha Vardhan Bhumandla and Shruti Kana

Preview Article PDF

Get Certificates

Issue details

AI powered voice synthesizer

V. Vanaja, Venkatesham Tunge, Nithin Kumar Kanagala *, Harsha Vardhan Bhumandla and Shruti Kana

Preview Article PDF

Get Certificates

Issue details

V. Vanaja, Venkatesham Tunge, Nithin Kumar Kanagala ^*, Harsha Vardhan Bhumandla and Shruti Kana