Western Governors University, Utah, USA.
World Journal of Advanced Engineering Technology and Sciences, 2026, 18(03), 207-214
Article DOI: 10.30574/wjaets.2026.18.3.0076
Received on 26 December 2025; revised on 18 February 2026; accepted on 21 February 2026
Cloud-native ETL has become a cornerstone of modern data architectures, enabling real-time analytics, scalable machine learning pipelines, and cost-efficient data processing. AWS Glue and Apache Spark represent a powerful duo for building robust and serverless ETL frameworks. This review has examined their capabilities in depth—covering architecture, tuning methods, and best practices. It also highlights experimental benchmarks, key optimization strategies, and emerging trends that define the future of ETL. The findings suggest that with the right design patterns and tuning, organizations can significantly boost performance while reducing both cost and operational complexity.
Cloud ETL; AWS Glue; Apache Spark; DataFrames; DynamicFrames; Partition Pruning; Predicate Pushdown; Parquet; Delta Lake; Data Lakehouse; Serverless Data Pipelines
Get Your e Certificate of Publication using below link
Preview Article PDF
Sarvesh Kumar Gupta. Cloud ETL optimization with AWS glue and spark. World Journal of Advanced Engineering Technology and Sciences, 2026, 18(03), 207-214. Article DOI: https://doi.org/10.30574/wjaets.2026.18.3.0076