End-to-end machine learning pipeline for merchant brand and industry classification and banking transaction enrichment, with synthetic data generation, vectorization experiments, and an interactive Streamlit dashboard.
View the Project on GitHub d-daemon/transaction-enrichment-ml
Method | Pros | Cons | Best For |
---|---|---|---|
Word TF-IDF | Simple | Sensitive to typos | Clean data |
Char n-gram TF-IDF | Robust, fast | Slightly opaque | Noisy merchant text ✅ |
fastText | Semantic power | External dependency | Multilingual |
Transformers | High accuracy | Heavy | R&D only |
Chosen Method: Character n-gram TF-IDF (3–5) + Logistic Regression + Isotonic calibration.
← Architecture | Home | Resources → |