University of Texas at Arlington, Texas, USA.
World Journal of Advanced Engineering Technology and Sciences, 2025, 16(02), 010–020
Article DOI: 10.30574/wjaets.2025.16.2.1252
Received on 26 June 2025; revised on 28 July 2025; accepted on 31 July 2025
Deep learning (DL) models have achieved state-of-the-art performance across numerous domains, including natural language processing, computer vision, and speech recognition. However, the transition from research to production, especially at large scales, presents formidable challenges. As model sizes balloon into billions of parameters and user demand scales exponentially, issues such as training time, inference latency, energy consumption, system reliability, and hardware constraints become significant obstacles. Efficiently scaling DL models is not just a matter of model architecture; it requires a multi-faceted approach encompassing algorithmic, infrastructural, and deployment-level strategies. Large-scale deployments must account for factors such as distributed training across heterogeneous hardware, maintaining inference throughput under real-time constraints, handling memory and communication bottlenecks, and ensuring deployment flexibility from cloud clusters to edge devices. The performance and cost-efficiency of DL systems at scale hinge upon techniques such as model and data parallelism, quantisation, mixed-precision training, and sharded inference. Additionally, orchestration tools like Kubernetes, together with specialised inference runtimes such as TensorRT and NVIDIA Triton, are critical for automated, scalable deployment pipelines. This paper presents a deep technical analysis of the core challenges inherent in scaling DL models, examines modern solutions and their trade-offs, and proposes an integrated framework to address real-world deployment needs. By combining innovations at both the model level and system infrastructure level, the goal is to enable resilient, scalable, and production-grade AI deployments.
Deep Learning Scalability; Large-Scale AI Deployment; Distributed Training; Inference Optimization; Model Parallelism
Preview Article PDF
Ankush Jitendrakumar Tyagi. Scaling deep learning models: Challenges and solutions for large-scale deployments. World Journal of Advanced Engineering Technology and Sciences, 2025, 16(02), 010-020. Article DOI: https://doi.org/10.30574/wjaets.2025.16.2.1252.