Coming Soon

Real-world datasets.
Built by a BI architect.

Free downloadable datasets — CSV and Excel — designed for building POC analytics models, practicing data modeling, testing dashboard skills, and exploring messy real-world scenarios. No Kaggle toy sets. Data that actually looks like what you'd find at work.

Datasets built for
how analysts actually work.

🏗️

Realistic Business Context

Structured around real business domains — Sales pipelines, Finance P&Ls, HR headcount — not abstract arrays. The kind of data you'd actually find after a CRM export.

🧹

Clean & Dirty Variants

Each dataset comes in a clean version (model-ready) and a dirty version (duplicates, nulls, formatting inconsistencies, broken relationships) so you can practice the full workflow.

📐

Modeling-Ready Structure

Fact and dimension tables separated, with a data dictionary included. Built specifically for practicing star schemas, DAX measures, and semantic layer design — not just pivot tables.

Dataset Categories

Six domain categories, each with multiple datasets in both clean and dirty variants, available as CSV and Excel.

Available
💼

Sales & Revenue

Opportunity pipeline data, closed-won/lost deals, rep performance, quota attainment, and ARR/MRR breakdowns. Great for building CRM-style dashboards.

Sales Pipeline Rep Leaderboard Revenue by Product Win/Loss Analysis
Soon
📈

Finance & FP&A

General ledger transactions, budget vs. actuals, department spend, and multi-period P&L data. Modeled to mirror real ERP exports from NetSuite, SAP, or QuickBooks.

Budget vs. Actuals GL Transactions Department Spend P&L Reports
Available
📣

Marketing & Demand Gen

Campaign performance, lead generation funnels, MQL/SQL conversion rates, and channel attribution data. Structured for multi-touch attribution modeling.

Campaign ROI Lead Funnel Channel Attribution Email Performance
Available
👥

HR & People Analytics

Headcount, turnover, time-to-hire, compensation bands, and DEI metrics. Built around real HRIS data structures like Workday or BambooHR.

Headcount Trends Attrition Analysis Hire Pipeline Comp Bands
Soon
🧩

Dirty Data Challenge Sets

Datasets intentionally loaded with real-world data quality issues: duplicate keys, inconsistent casing, broken date formats, orphaned foreign keys, and null-heavy columns.

Duplicate Records Broken Joins Mixed Date Formats Null Patterns
Soon

Star Schema Kits

Pre-separated fact and dimension tables designed for star schema modeling practice. Includes a data dictionary and suggested relationships — ideal for Power BI and Fabric model building.

Fact + Dim Tables Data Dictionary Relationship Map Sample Measures

Release Timeline

Live ✓

Initial Launch — Sales & Revenue

Sales & Revenue is live: 200-row opportunity fact table, 5 dimension tables, clean + dirty variants, and a full data dictionary.

Live ✓

Marketing & Demand Gen + HR & People Analytics

Both categories now live: 400-row lead funnel, weekly campaign performance, 150-employee headcount snapshots, and hiring pipeline data — all with clean + dirty variants.

Q4 2026

Star Schema Kits + Download Hub

Full modeling kit launch alongside a searchable download hub — filter by domain, type (clean/dirty), and format (CSV/XLSX).

2027

Community Submissions & API Access

Community-contributed datasets, automated dataset generators, and a lightweight API for programmatic access — because analysts should be able to script their setups.