Articles

Notes from production

Specifics from real builds — tail-latency, data-quality testing, and the governance that lets AI-assisted development into an enterprise. No think-piece filler.

RAG8 min

Stop your RAG system hallucinating

Most RAG hallucinations are retrieval failures, not generation failures. Diagnose which, ground answers in cited context, make the model abstain, and track faithfulness.

Read

AI architecture6 min

Self-hosting open models vs an API: where the cost actually crosses over

Self-hosting open-weight models beats API pricing at high steady throughput or under data-residency rules; APIs win for spiky, low-volume, or frontier-quality work.

Read

AI architecture7 min

Model routing: stop sending every request to your biggest model

Most LLM traffic doesn't need a frontier model. Route by rules, a classifier, or a cascade to cut spend several-fold without silently degrading quality.

Read

AI architecture8 min

LLM-as-a-judge: evaluating LLM systems that actually scale

How to use LLM-as-a-judge to evaluate generative systems at scale: rubrics, golden sets, bias mitigation, human calibration with Cohen's kappa, and CI gates.

Read

Enterprise AI8 min

Migrating off tag proliferation to branch/environment CI/CD in GitLab Ultimate

Replace tag-driven releases with a branch/environment promotion model in GitLab Ultimate: protected branches, approval gates, LDAP/AD identity, and build-once-promote pipelines.

Read

Enterprise AI7 min

Enterprise Claude Code without leaking your code

Run Claude Code via Amazon Bedrock with VPC endpoints so prompts and code stay in your AWS account: IAM-scoped, no public egress, no training on your data.

Read

Computer vision7 min

Cutting vision-LLM cost 70-90% with motion-gating

A cheap OpenCV motion-gating layer in front of a vision-language model cuts 24/7 surveillance API cost 70-90%, as built in my dvr_ai project.

Read

Data Engineering2 min

Your first dbt tests

Three low-effort dbt tests catch roughly 80% of warehouse errors. Add not_null, unique and accepted_values on your keys and enums, wire them into CI, and bad data stops reaching dashboards.

Read

Performance3 min

Cutting p95 latency without new hardware

A metric-driven playbook that routinely trims 40%+ off tail latency before you reach for a bigger instance — measure, relieve the DB, control concurrency, shed load.

Read

Working on one of these problems?

Book a 15-minute call — it's faster than reading all nine.

Book a call See services