Free downloadable datasets — CSV and Excel — designed for building POC analytics models, practicing data modeling, testing dashboard skills, and exploring messy real-world scenarios. No Kaggle toy sets. Data that actually looks like what you'd find at work.
No spam. One email when datasets drop. Unsubscribe anytime.
Why It's Different
Structured around real business domains — Sales pipelines, Finance P&Ls, HR headcount — not abstract arrays. The kind of data you'd actually find after a CRM export.
Each dataset comes in a clean version (model-ready) and a dirty version (duplicates, nulls, formatting inconsistencies, broken relationships) so you can practice the full workflow.
Fact and dimension tables separated, with a data dictionary included. Built specifically for practicing star schemas, DAX measures, and semantic layer design — not just pivot tables.
What's Coming
Six domain categories, each with multiple datasets in both clean and dirty variants, available as CSV and Excel.
Opportunity pipeline data, closed-won/lost deals, rep performance, quota attainment, and ARR/MRR breakdowns. Great for building CRM-style dashboards.
General ledger transactions, budget vs. actuals, department spend, and multi-period P&L data. Modeled to mirror real ERP exports from NetSuite, SAP, or QuickBooks.
Campaign performance, lead generation funnels, MQL/SQL conversion rates, and channel attribution data. Structured for multi-touch attribution modeling.
Headcount, turnover, time-to-hire, compensation bands, and DEI metrics. Built around real HRIS data structures like Workday or BambooHR.
Datasets intentionally loaded with real-world data quality issues: duplicate keys, inconsistent casing, broken date formats, orphaned foreign keys, and null-heavy columns.
Pre-separated fact and dimension tables designed for star schema modeling practice. Includes a data dictionary and suggested relationships — ideal for Power BI and Fabric model building.
Roadmap
Sales & Revenue is live: 200-row opportunity fact table, 5 dimension tables, clean + dirty variants, and a full data dictionary.
Both categories now live: 400-row lead funnel, weekly campaign performance, 150-employee headcount snapshots, and hiring pipeline data — all with clean + dirty variants.
Full modeling kit launch alongside a searchable download hub — filter by domain, type (clean/dirty), and format (CSV/XLSX).
Community-contributed datasets, automated dataset generators, and a lightweight API for programmatic access — because analysts should be able to script their setups.