Case Study
Credit Card Fraud Detection Using Machine Learning
Building a fraud detector on a highly imbalanced credit card dataset (0.17% fraud) with Random Forest, focusing on recall, precision, and MCC—supported by exploratory analysis, model evaluation, and confusion matrix review.

Summary
Key takeaways
0.17% fraud rate tackled with precision/recall focus to avoid false negatives. A Random Forest baseline on PCA features plus Amount/Time delivered strong MCC and precision. EDA confirmed heavy imbalance and low feature correlations—ideal for an ensemble approach. Confusion matrix review shows few false positives and a small set of false negatives that will be the target of the next tuning round.
Overview
Project at a glance
Objective: detect rare fraud while keeping operations focused on true cases. Dataset: 284,807 transactions with 492 fraud labels (~0.17%). Stack: Python, Pandas, Scikit-learn, RandomForestClassifier. Features: Time, Amount, PCA-transformed V1–V28; label: Class (0/1).
Process
Narrative walkthrough
Problem & Goal
Fraud losses are driven by a tiny fraction of transactions—just 492 of 284,807 records (~0.17%). The objective was to detect as many of these rare events as possible while avoiding alert fatigue for operations teams.
Success was defined beyond accuracy: precision to limit false alarms, recall to capture fraud, F1 to balance both, and Matthews Correlation Coefficient (MCC) to reflect performance on the imbalanced data.
Data & Exploration
The dataset includes Time (seconds from first transaction), Amount, and PCA-transformed features V1–V28, with Class as the fraud label. Quick EDA confirmed the extreme imbalance and showed that fraudulent transactions tend to have lower median amounts, consistent with card-testing behavior.
Correlation checks found low pairwise relationships across the PCA components, suggesting no single feature dominates the signal. That pattern supports ensemble approaches that aggregate weak signals rather than relying on one standout variable.
Modeling Approach
A straightforward train/test split (80/20) was applied to all features versus the Class label. A RandomForestClassifier was chosen for its ability to capture nonlinear interactions and for robustness on tabular data without heavy preprocessing.
This first iteration kept defaults and avoided aggressive class rebalancing to establish a clean baseline. The goal was to understand the natural signal-to-noise ratio before tuning thresholds or introducing weighted classes.
Evaluation
On the test split, the model achieved Accuracy 0.9996, Precision 0.9747, Recall 0.7857, F1 0.8701, and MCC 0.8749. These metrics emphasize balanced performance instead of headline accuracy, which is misleading on imbalanced data.
The confusion matrix shows TN ≈ 56,861, FP ≈ 2, FN ≈ 21, TP ≈ 77. Precision remains high, limiting false alerts; recall captures roughly 79% of fraud cases, highlighting where tuning should focus next.
Next Improvements
Raise recall by introducing class weights or balanced subsampling and by tuning probability thresholds per merchant segment or region.
Experiment with anomaly-detection or one-class methods to reduce the remaining false negatives while keeping precision in a safe range.
Results
Model performance
Accuracy
0.9996
Precision
0.9747
Recall
0.7857
F1-Score
0.8701
Matthews Corr. Coef.
0.8749
Fraud Rate
~0.17%
Notebook steps (Python)
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, matthews_corrcoef
data = pd.read_csv("/Users/hassankazemzade/Downloads/creditcard.csv")
X = data.drop(["Class"], axis=1)
y = data["Class"]
x_train, x_test, y_train, y_test = train_test_split(X.values, y.values, test_size=0.2, random_state=42)
model = RandomForestClassifier()
model.fit(x_train, y_train)
preds = model.predict(x_test)
print(
accuracy_score(y_test, preds),
precision_score(y_test, preds),
recall_score(y_test, preds),
f1_score(y_test, preds),
matthews_corrcoef(y_test, preds),
)Confusion Matrix
Where the model succeeds and misses

Interpretation
Strong true negative performance with only a couple of false positives. Remaining false negatives (≈21) are the next tuning target to lift recall without creating alert fatigue.
Next step
Need fraud analytics for your product?