Transaction Enrichment & Brand Classification

End-to-end machine learning pipeline for merchant brand and industry classification and banking transaction enrichment, with synthetic data generation, vectorization experiments, and an interactive Streamlit dashboard.

View the Project on GitHub d-daemon/transaction-enrichment-ml

Vectorization Methods

Method Pros Cons Best For
Word TF-IDF Simple Sensitive to typos Clean data
Char n-gram TF-IDF Robust, fast Slightly opaque Noisy merchant text ✅
fastText Semantic power External dependency Multilingual
Transformers High accuracy Heavy R&D only

Chosen Method: Character n-gram TF-IDF (3–5) + Logistic Regression + Isotonic calibration.

← Architecture Home Resources →