Home
World Journal of Advanced Engineering Technology and Sciences
International, Peer reviewed, Referred, Open access | ISSN Approved Journal

Main navigation

  • Home
    • Journal Information
    • Abstracting and Indexing
    • Editorial Board Members
    • Reviewer Panel
    • Journal Policies
    • WJAETS CrossMark Policy
    • Publication Ethics
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Become a Reviewer panel member
    • Join as Editorial Board Member
  • Contact us
  • Downloads

ISSN: 2582-8266 (Online)  || UGC Compliant Journal || Google Indexed || Impact Factor: 9.48 || Crossref DOI

Fast Publication within 2 days || Low Article Processing charges || Peer reviewed and Referred Journal

Research and review articles are invited for publication in Volume 18, Issue 2 (February 2026).... Submit articles

ML-driven data engineering pipeline for health informatics

Breadcrumb

  • Home
  • ML-driven data engineering pipeline for health informatics

NISHANTH JOSEPH PAULRAJ *

Thermo Fisher Scientific, USA.

Review Article

World Journal of Advanced Engineering Technology and Sciences, 2025, 15(02), 765-773

Article DOI: 10.30574/wjaets.2025.15.2.0629

DOI url: https://doi.org/10.30574/wjaets.2025.15.2.0629

Received on 27 March 2025; revised on 03 May 2025; accepted on 06 May 2025

This article presents a comprehensive framework for implementing machine learning-driven data engineering pipelines in healthcare informatics. Healthcare data presents unique challenges including high dimensionality, heterogeneity across sources, missing values, temporal dependencies, and strict privacy requirements. To address these challenges, we propose a four-layer architecture comprising data ingestion, data processing, ML modeling, and model management components. The pipeline leverages Apache Spark and Delta Lake for robust data processing, modern ML frameworks for predictive modeling, and MLflow for model lifecycle management. It demonstrates the practical application of this architecture through a sepsis risk prediction use case, highlighting how temporal patterns in clinical data can be leveraged for early intervention. The article also explores deep learning approaches for genomic data analysis and discusses critical implementation challenges including data privacy, class imbalance, model explainability, and model drift. Throughout, It emphasizes best practices that balance technical performance with clinical utility and regulatory compliance, providing a roadmap for healthcare organizations seeking to implement scalable ML solutions. 

Healthcare Data Engineering; Machine Learning Pipelines; Clinical Predictive Modeling; Model Lifecycle Management; Sepsis Prediction

https://wjaets.com/sites/default/files/fulltext_pdf/WJAETS-2025-0629.pdf

Preview Article PDF

NISHANTH JOSEPH PAULRAJ. ML-driven data engineering pipeline for health informatics. World Journal of Advanced Engineering Technology and Sciences, 2025, 15(02), 765-773. Article DOI: https://doi.org/10.30574/wjaets.2025.15.2.0629.

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content


Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


Copyright © 2026 World Journal of Advanced Engineering Technology and Sciences

Developed & Designed by VS Infosolution