Home
World Journal of Advanced Engineering Technology and Sciences
International, Peer reviewed, Referred, Open access | ISSN Approved Journal

Main navigation

  • Home
    • Journal Information
    • Abstracting and Indexing
    • Editorial Board Members
    • Reviewer Panel
    • Journal Policies
    • WJAETS CrossMark Policy
    • Publication Ethics
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Become a Reviewer panel member
    • Join as Editorial Board Member
  • Contact us
  • Downloads

ISSN: 2582-8266 (Online)  || UGC Compliant Journal || Google Indexed || Impact Factor: 9.48 || Crossref DOI

Fast Publication within 2 days || Low Article Processing charges || Peer reviewed and Referred Journal

Research and review articles are invited for publication in Volume 18, Issue 2 (February 2026).... Submit articles

OPTIMIZING NVIDIA® GEFORCE RTX™ 5090 & "AMD RX 9070"for machine learning and artificial intelligence workload

Breadcrumb

  • Home
  • OPTIMIZING NVIDIA® GEFORCE RTX™ 5090 & "AMD RX 9070"for machine learning and artificial intelligence workload

Mohit Jain *, Adit Shah, Brahaspati Dev, Ram Kumar and Mathew Campisi

Department of Electrical and Computer Engineering, University of Illinois at Urbana Champaign, Illinois, USA.

Review Article

 

World Journal of Advanced Engineering Technology and Sciences, 2025, 17(03), 357–374

Article DOI: 10.30574/wjaets.2025.17.3.1563

DOI url: https://doi.org/10.30574/wjaets.2025.17.3.1563

Received on 06 November 2025; revised on 17 December 2025; accepted on 20 December 2025

As consumer-grade GPUs have rapidly evolved, efforts have emerged to deploy these computational models for training and inference, typically handled by data center hardware. The paper explores optimization of two next-generation graphics computing units, the NVIDIA GeForce RTX 5090 and the AMD Radeon RX 9070, to optimize the new generation of ML and AI applications. We examine the internal compute pipelines, tensor/matrix acceleration capabilities, memory hierarchies, and software ecosystems (CUDA/cuDNN/TensorRT versus ROCm/MIOpen/HIP) that influence ML performance in a two-pronged architectural and empirical study. The convolutional networks, transformer models, diffusion architecture, and graph neural networks share a standard benchmarking model: training, inference latency, power consumption, precision scaling (FP32-INT8), and bottlenecks. The results of the experiment have demonstrated that the performance profiles of the RTX 5090 and the RX 9070 are different, i.e., the acceleration performance of mixed precision and kernel fusion is higher in the RTX 5090 as compared to the throughput performance of the RX 9070 in the BF16/INT8 workloads with the high memory-bandwidth utilization. Strategies for each platform. Platform-specific optimization strategies, such as kernel tuning, compiler optimization, memory prefetching, gradient checkpointing, and scaling to multiple GPUs, are developed and evaluated. Further, two case studies of real-world performance tuning of transformer fine-tuning and diffusion model inference are also presented.
The findings highlight that hardware alone does not guarantee the best ML performance; effective optimization can deliver performance gains that are even more significant than raw compute alone. The paper will provide a step-by-step roadmap for practitioners, researchers, and engineers who may want to optimize the application of RTX 5090 and RX 9070 in artificial intelligence algorithms, as well as a future perspective on the standard models of unified programming on GPUs and emergent precision formats.
 

Deep learning compute efficiency; Tensor core mixed precision deep learning; Mixed precision training; Large model training GPU efficiency analysis; Deep learning optimization consumer GPUs; AI GPU benchmarking; FP8 acceleration; Low-precision inference

https://wjaets.com/sites/default/files/fulltext_pdf/WJAETS-2025-1563.pdf

Get Your e Certificate of Publication using below link

Download Certificate

Preview Article PDF

Mohit Jain, Adit Shah, Brahaspati Dev, Ram Kumar and Mathew Campisi. OPTIMIZING NVIDIA® GEFORCE RTX™ 5090 & "AMD RX 9070"for machine learning and artificial intelligence workload. World Journal of Advanced Engineering Technology and Sciences, 2025, 17(03), 357-374. Article DOI: https://doi.org/10.30574/wjaets.2025.17.3.1563.

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content


Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


Copyright © 2026 World Journal of Advanced Engineering Technology and Sciences

Developed & Designed by VS Infosolution