Your first dbt tests
TL;DR
Most warehouse errors are caught by three tests you can add in under two hours — not_null and unique on your keys, and accepted_values on your status/enum columns. Wire them into CI with store-failures and a Slack alert, and bad data stops reaching dashboards. On one fintech warehouse this cut "report is wrong" tickets by 78% in 30 days.
The fastest way to build trust in analytics data is to stop bad data reaching the dashboard in the first place. You don't need a data-quality platform to do it — three dbt tests, added in under two hours, catch the large majority of warehouse errors. Here's the set I add first, and how to wire it into CI.
On one fintech warehouse — 60 models, 8 sources — adding exactly these three tests cut "report is broken" tickets by 78% in 30 days.
Why does this matter?
Bad data is expensive in ways that don't show up on a bill:
- Analysts waste hours chasing null primary keys and duplicate rows.
- Broken KPIs quietly erode stakeholder trust in every number you ship.
- Downstream ML models skew on bad enums and you don't find out until later.
Step 1 — not_null on your keys
Null primary keys are the most common and most damaging error. Assert they can't exist.
models:
- name: fct_payments
columns:
- name: payment_id
tests: [not_null, unique]A null key now surfaces immediately in the run instead of silently fanning out into every join downstream.
Step 2 — unique on business keys
Duplicates distort revenue and counts. Add unique on the business key, not just the surrogate key.
- name: order_number
tests: [unique]Step 3 — accepted_values on enums and status
Unexpected enum values are how a dashboard quietly starts under-counting. Pin the allowed set.
- name: status
tests:
- accepted_values:
values: ['pending', 'paid', 'failed', 'refunded']Bonus — source freshness
Catch stale upstream data before it skews a KPI, not after.
sources:
- name: stripe_raw
freshness:
warn_after: {count: 3, period: hour}How do I wire this into CI?
Tests are only worth anything if they run on every change. Add dbt to your pipeline and store the failures.
# .github/workflows/dbt.yml
- name: dbt run + test
run: |
dbt deps
dbt seed
dbt run --profiles-dir .ci/profiles
dbt test --store-failuresThen make a failure loud:
curl -X POST -H "content-type: application/json" \
--data '{"text":"dbt tests failed on main"}' $SLACK_WEBHOOKStore --store-failures output in a _dbt_test__ schema and you can build a small "data quality" dashboard from it — the failing rows are already there.
When do you go further?
Once the basics are in CI, the high-value additions are custom business-logic tests (e.g. assert no order has non-positive revenue) and cross-model reconciliation (fact total vs source total within a tolerance). Those catch the subtler errors the schema tests can't.
What it added up to
On that warehouse: −78% "report is wrong" tickets in 30 days, and a measurable bump in how much analysts trusted the data. Two hours of work for a warehouse that ships with tests from day one.
If you want a warehouse built with this baked in from the start, that's a good conversation to have.
Related service
AI systems architecture & LLM integration