Home
World Journal of Advanced Engineering Technology and Sciences
International, Peer reviewed, Referred, Open access | ISSN Approved Journal

Main navigation

  • Home
    • Journal Information
    • Abstracting and Indexing
    • Editorial Board Members
    • Reviewer Panel
    • Journal Policies
    • WJAETS CrossMark Policy
    • Publication Ethics
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Become a Reviewer panel member
    • Join as Editorial Board Member
  • Contact us
  • Downloads

ISSN: 2582-8266 (Online)  || UGC Compliant Journal || Google Indexed || Impact Factor: 9.48 || Crossref DOI

Fast Publication within 2 days || Low Article Processing charges || Peer reviewed and Referred Journal

Research and review articles are invited for publication in Volume 18, Issue 2 (February 2026).... Submit articles

Optimizing PyFlink for high-throughput machine learning: Streaming feature engineering in banking

Breadcrumb

  • Home
  • Optimizing PyFlink for high-throughput machine learning: Streaming feature engineering in banking

SANDEEP PAMARTHI *

Principal Data Engineer, AI/ML Expert, CGI Inc.

Research Article
 
World Journal of Advanced Engineering Technology and Sciences, 2024, 13(02), 728-737
Article DOI: 10.30574/wjaets.2024.13.2.0549
DOI url: https://doi.org/10.30574/wjaets.2024.13.2.0549

Received on 30 September 2024; revised on 11 November 2024; accepted on 13 November 2024

Real-time feature engineering refers to transforming streaming data into meaningful features for machine learning models as events occur. This capability is critical in fraud detection for banking, where detecting anomalous transactions within seconds can prevent losses. Detecting fraud after hours or even minutes is often too late – by the time an offline system flags a fraudulent transaction, the funds may already be gone. Fraud detection systems must ingest transaction streams and compute features (e.g. recent transaction counts, spending velocity, geolocation patterns) continuously, enabling models to score each transaction in sub-second timescales. Real-time data “beats” slow data in this domain: a “too-late” architecture that relies on batch processing (e.g. daily reports or warehouse analytics) increases risk and can lead to revenue loss and poor customer experience. For example, if credit card fraud is only identified at day’s end in a data lake, the bank and customer suffer unnecessary damage. This urgency drives modern payment platforms to adopt streaming pipelines for immediate analytics to catch fraud as it happens.
Another crucial application is underwriting decisioning for financial loans and credit. Here, streaming machine learning enables lenders to assess credit risk and make approval decisions in real-time, rather than waiting on batch reports. By continuously updating features like an applicant’s transaction history, cash-flow patterns, or credit utilization, banks can generate up-to-the-moment risk scores. This enhances decision accuracy and customer experience – applicants receive faster responses and more dynamic risk-based pricing. A lagging, batch-oriented underwriting process might approve a loan based on outdated data or miss warning signals that appear in the interim. In high-volume commercial banking (new credit requests, renewals, modifications), streaming ML ensures that risk assessments and credit decisions reflect the latest information, improving both fraud prevention (catching fraudulent loan applications) and credit risk management (declining or adjusting terms for risky accounts in near-real-time).
Apache Flink, a distributed stream processing engine, has emerged as a leading platform for real-time analytics. PyFlink – Flink’s Python API – allows data scientists to build streaming pipelines in Python on Flink’s engine. This paper focuses on optimizing PyFlink for high-throughput ML, especially for streaming feature engineering in fraud detection and underwriting use cases. We present benchmarking studies comparing PyFlink with alternative frameworks, discuss how streaming ML improves fraud prevention and underwriting decisions, and outline an end-to-end architecture with implementation considerations. The goal is to offer empirical insights and best practices for financial institutions seeking low-latency, high-throughput streaming ML solutions.

Streaming Machine Learning; PyFlink; Fraud Detection; Underwriting; Feature Engineering; Real-Time Analytics; Financial Services; Apache Flink; Banking; Credit Risk Scoring

https://wjaets.com/sites/default/files/fulltext_pdf/WJAETS-2024-0549.pdf

Get Your e Certificate of Publication using below link

Download Certificate

Preview Article PDF

SANDEEP PAMARTHI. Optimizing PyFlink for high-throughput machine learning: Streaming feature engineering in banking. World Journal of Advanced Engineering Technology and Sciences, 2024, 13(02), 728-737. Article DOI: https://doi.org/10.30574/wjaets.2024.13.2.0549

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content


Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


Copyright © 2026 World Journal of Advanced Engineering Technology and Sciences

Developed & Designed by VS Infosolution