A Generative Pattern Extraction Algorithm Tackling Ultra-Imbalanced Data in Financial Fraud Detection Systems

Fatima M and Al-Nuaimi

doi:10.70023/sahd/242502

A Generative Pattern Extraction Algorithm Tackling Ultra-Imbalanced Data in Financial Fraud Detection Systems

Volume 2, Issue 4 • 2026

Original Research

Fatima M and Al-Nuaimi

Received: 2025-11-04

Accepted: 2025-12-10

Published: 2025-12-30

104 Views 108 Downloads

Abstract

The process of financial fraud detection is increasingly becoming dependent on intelligent systems that are capable of identifying delicate, complex, and dynamic fraudulent activities in the large transactions environments. Fraudulent transactions tend to constitute less than 1 percent of the total volume in the real-world financial data. This renders the learning scenario very unequal. This difference is the challenge facing current research, with the traditional oversampling techniques, deep neural networks, and cost-sensitive learning approaches usually failing to retain the behavioral semantics of the fraudulent patterns. Such methods often lead to noisily or excessively simplified minority samples, which lead to poor generalization, large false-negative rates, and lower detection reliability in practice. To resolve these large issues, the current study proposes the Generative Pattern Extraction Algorithm (GPEA), a sequence-generative-ensemble network that is designed to enhance the representation and detection of minority fraud cases. The proposed system employs a Bi-LSTM encoder in order to learn latent motifs to explain how fraudulent behavior might vary with time. It then generates real minority-class samples in the learnt latent space by using a Conditional Generative Adversarial Network (CGAN). These more advanced representations are then added to a LightGBM -XGBoost ensemble classifier, which is used to distinguish more accurately between fraud and legitimate transactions. The significance of this technique is that it preserves the semantics of fraud, the learning in the minority class is easier, and it is also easier to detect fraud in highly asymmetric scenarios. The results reveal that GPEA consistently outperforms deep learning baselines, ensemble approaches, and the best imbalance management algorithms available. The proposed strategy decreases the false negative rate by 20% while maintaining the industry-leading fraud detection accuracy of 14%, the remarkable recall of 12-08%, the F1-score of 9-15%, and the overall accuracy of 14%. The model's overall detection rate of 96.3% shows that it can find fraud patterns that don't happen very often but are nonetheless very important. The experimental findings on benchmark financial fraud data indicate that our algorithm is significantly superior to oversampling baselines and deep-learning models both in the recall of the minorities, and in the precision/recall AUC, MCC, and the overall detection stability. It is also evident in the discussion that not only does GPEA help to categorize matters more accurately, but also develops patterns of fraud that are easy to understand. This renders it a handy aid to current systems of financial risk analysis and prevention of fraud.

DOI: 10.70023/sahd/242502

Download Full Text (PDF)

PatternIQ Mining (PIQM)

A Generative Pattern Extraction Algorithm Tackling Ultra-Imbalanced Data in Financial Fraud Detection Systems

Abstract

PatternIQ Mining (PIQM)