Home
World Journal of Advanced Engineering Technology and Sciences
International, Peer reviewed, Referred, Open access | ISSN Approved Journal

Main navigation

  • Home
    • Journal Information
    • Abstracting and Indexing
    • Editorial Board Members
    • Reviewer Panel
    • Journal Policies
    • WJAETS CrossMark Policy
    • Publication Ethics
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Become a Reviewer panel member
    • Join as Editorial Board Member
  • Contact us
  • Downloads

ISSN: 2582-8266 (Online)  || UGC Compliant Journal || Google Indexed || Impact Factor: 9.48 || Crossref DOI

Fast Publication within 2 days || Low Article Processing charges || Peer reviewed and Referred Journal

Research and review articles are invited for publication in Volume 18, Issue 2 (February 2026).... Submit articles

A study on the application of deep learning in Vietnamese speech recognition

Breadcrumb

  • Home
  • A study on the application of deep learning in Vietnamese speech recognition

Van Khoi Nguyen *

Faculty of Electrical and Electronic Engineering, University of Transport and Communications, HaNoi, Vietnam.

Research Article

World Journal of Advanced Engineering Technology and Sciences, 2025, 15(02), 2894–2898

Article DOI: 10.30574/wjaets.2025.15.2.0877

DOI url: https://doi.org/10.30574/wjaets.2025.15.2.0877

Received on 20 April 2025; revised on 27 May 2025; accepted on 30 May 2025

Speech recognition has become increasingly important in various real-world applications. However, Vietnamese presents unique linguistic challenges such as tones, syllabic structures, and complex morphology, which make speech recognition for this language significantly different from that of languages like English. In this paper, we propose a deep learning approach that combines Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) networks to recognize Vietnamese speech using the VIVOS dataset. The CNN component is employed to extract spatial features from audio spectrograms, while the BiLSTM captures the bidirectional temporal dependencies in speech signals. Experimental results show that the proposed CNN-BiLSTM model achieves a competitive Word Error Rate (WER) of 14.7%. These results highlight the potential of deep learning techniques in effectively recognizing tonal languages such as Vietnamese.

Speech Recognition; Vietnamese; VIVOS; CNN; BiLSTM

https://wjaets.com/sites/default/files/fulltext_pdf/WJAETS-2025-0877.pdf

Preview Article PDF

Van Khoi Nguyen. A study on the application of deep learning in Vietnamese speech recognition. World Journal of Advanced Engineering Technology and Sciences, 2025, 15(02), 2894–2898. Article DOI: https://doi.org/10.30574/wjaets.2025.15.2.0877.

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content


Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


Copyright © 2026 World Journal of Advanced Engineering Technology and Sciences

Developed & Designed by VS Infosolution