Case Study

Credit Card Fraud Detection Using Machine Learning

Building a fraud detector on a highly imbalanced credit card dataset (0.17% fraud) with Random Forest, focusing on recall, precision, and MCC—supported by exploratory analysis, model evaluation, and confusion matrix review.

Summary

Key takeaways

0.17% fraud rate tackled with precision/recall focus to avoid false negatives. A Random Forest baseline on PCA features plus Amount/Time delivered strong MCC and precision. EDA confirmed heavy imbalance and low feature correlations—ideal for an ensemble approach. Confusion matrix review shows few false positives and a small set of false negatives that will be the target of the next tuning round.

Overview

Project at a glance

Objective: detect rare fraud while keeping operations focused on true cases. Dataset: 284,807 transactions with 492 fraud labels (~0.17%). Stack: Python, Pandas, Scikit-learn, RandomForestClassifier. Features: Time, Amount, PCA-transformed V1–V28; label: Class (0/1).

Process

Narrative walkthrough

Problem & Goal

Fraud losses are driven by a tiny fraction of transactions—just 492 of 284,807 records (~0.17%). The objective was to detect as many of these rare events as possible while avoiding alert fatigue for operations teams.

Success was defined beyond accuracy: precision to limit false alarms, recall to capture fraud, F1 to balance both, and Matthews Correlation Coefficient (MCC) to reflect performance on the imbalanced data.

Data & Exploration

The dataset includes Time (seconds from first transaction), Amount, and PCA-transformed features V1–V28, with Class as the fraud label. Quick EDA confirmed the extreme imbalance and showed that fraudulent transactions tend to have lower median amounts, consistent with card-testing behavior.

Correlation checks found low pairwise relationships across the PCA components, suggesting no single feature dominates the signal. That pattern supports ensemble approaches that aggregate weak signals rather than relying on one standout variable.

Modeling Approach

A straightforward train/test split (80/20) was applied to all features versus the Class label. A RandomForestClassifier was chosen for its ability to capture nonlinear interactions and for robustness on tabular data without heavy preprocessing.

This first iteration kept defaults and avoided aggressive class rebalancing to establish a clean baseline. The goal was to understand the natural signal-to-noise ratio before tuning thresholds or introducing weighted classes.

Evaluation

On the test split, the model achieved Accuracy 0.9996, Precision 0.9747, Recall 0.7857, F1 0.8701, and MCC 0.8749. These metrics emphasize balanced performance instead of headline accuracy, which is misleading on imbalanced data.

The confusion matrix shows TN ≈ 56,861, FP ≈ 2, FN ≈ 21, TP ≈ 77. Precision remains high, limiting false alerts; recall captures roughly 79% of fraud cases, highlighting where tuning should focus next.

Next Improvements

Raise recall by introducing class weights or balanced subsampling and by tuning probability thresholds per merchant segment or region.

Experiment with anomaly-detection or one-class methods to reduce the remaining false negatives while keeping precision in a safe range.

Results

Model performance

Accuracy

0.9996

Precision

0.9747

Recall

0.7857

F1-Score

0.8701

Matthews Corr. Coef.

0.8749

Fraud Rate

~0.17%

Notebook steps (Python)

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, matthews_corrcoef

data = pd.read_csv("/Users/hassankazemzade/Downloads/creditcard.csv")
X = data.drop(["Class"], axis=1)
y = data["Class"]
x_train, x_test, y_train, y_test = train_test_split(X.values, y.values, test_size=0.2, random_state=42)

model = RandomForestClassifier()
model.fit(x_train, y_train)
preds = model.predict(x_test)

print(
    accuracy_score(y_test, preds),
    precision_score(y_test, preds),
    recall_score(y_test, preds),
    f1_score(y_test, preds),
    matthews_corrcoef(y_test, preds),
)

Confusion Matrix

Where the model succeeds and misses

Interpretation

Strong true negative performance with only a couple of false positives. Remaining false negatives (≈21) are the next tuning target to lift recall without creating alert fatigue.

Next step

Need fraud analytics for your product?

Book a demo