Home
World Journal of Advanced Engineering Technology and Sciences
International, Peer reviewed, Referred, Open access | ISSN Approved Journal

Main navigation

  • Home
    • Journal Information
    • Abstracting and Indexing
    • Editorial Board Members
    • Reviewer Panel
    • Journal Policies
    • WJAETS CrossMark Policy
    • Publication Ethics
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Become a Reviewer panel member
    • Join as Editorial Board Member
  • Contact us
  • Downloads

ISSN: 2582-8266 (Online)  || UGC Compliant Journal || Google Indexed || Impact Factor: 9.48 || Crossref DOI

Fast Publication within 2 days || Low Article Processing charges || Peer reviewed and Referred Journal

Research and review articles are invited for publication in Volume 18, Issue 2 (February 2026).... Submit articles

Generating high-quality and diverse synthetic datasets with large language models: A survey

Breadcrumb

  • Home
  • Generating high-quality and diverse synthetic datasets with large language models: A survey

Abinandaraj Rajendran *

Raleigh, USA.

Review Article

World Journal of Advanced Engineering Technology and Sciences, 2025, 15(02), 1145-1149

Article DOI: 10.30574/wjaets.2025.15.2.0652

DOI url: https://doi.org/10.30574/wjaets.2025.15.2.0652

Received on 26 March 2025; revised on 03 May 2025; accepted on 06 May 2025

Large Language Models (LLMs) are increasingly leveraged to generate synthetic datasets that overcome challenges in real-world data collection, including privacy risks, imbalance, and scarcity. This paper surveys recent developments in LLM-based synthetic data generation, emphasizing techniques that improve diversity, task alignment, and reliability—crucial factors in high-stakes domains such as predictive maintenance. We categorize state-of-the-art approaches into four methodological pillars: prompt engineering, multi-step generation pipelines, quality control through data curation, and rigorous evaluation methods. Structured generation workflows and controlled prompting strategies significantly enhance output coherence and domain relevance, while self-correction mechanisms and diversity-aware metrics contribute to higher dataset fidelity. Despite progress, open challenges persist, including bias propagation, limited generalization across tasks and modalities, and the need for robust ethical safeguards. We outline promising future directions—such as integrating external knowledge, expanding to multilingual and multimodal settings, and fostering human-AI collaboration—for advancing synthetic data generation using LLMs. 

Synthetic Data Generation; Large Language Models; Predictive Maintenance; Anomaly Detection; Disk Failure Prediction; Cloud Storage Systems

https://wjaets.com/sites/default/files/fulltext_pdf/WJAETS-2025-0652.pdf

Preview Article PDF

Abinandaraj Rajendran. Generating high-quality and diverse synthetic datasets with large language models: A survey. World Journal of Advanced Engineering Technology and Sciences, 2025, 15(02), 1145-1149. Article DOI: https://doi.org/10.30574/wjaets.2025.15.2.0652.

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content


Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


Copyright © 2026 World Journal of Advanced Engineering Technology and Sciences

Developed & Designed by VS Infosolution