OvertimeLabs.ai — Articles

OvertimeLabs.ai — Articles https://overtimelabs.ai/articles Field notes from building production AI: RAG, agentic systems, performance, and enterprise Claude Code. en-GB Stop your RAG system hallucinating https://overtimelabs.ai/articles/stop-rag-hallucinating https://overtimelabs.ai/articles/stop-rag-hallucinating Sat, 30 May 2026 00:00:00 GMT RAG Most RAG hallucinations are retrieval failures, not generation failures. Diagnose which, ground answers in cited context, make the model abstain, and track faithfulness. Self-hosting open models vs an API: where the cost actually crosses over https://overtimelabs.ai/articles/self-host-vs-api-cost https://overtimelabs.ai/articles/self-host-vs-api-cost Sat, 30 May 2026 00:00:00 GMT AI architecture Self-hosting open-weight models beats API pricing at high steady throughput or under data-residency rules; APIs win for spiky, low-volume, or frontier-quality work. Model routing: stop sending every request to your biggest model https://overtimelabs.ai/articles/model-routing-cost https://overtimelabs.ai/articles/model-routing-cost Sat, 30 May 2026 00:00:00 GMT AI architecture Most LLM traffic doesn't need a frontier model. Route by rules, a classifier, or a cascade to cut spend several-fold without silently degrading quality. LLM-as-a-judge: evaluating LLM systems that actually scale https://overtimelabs.ai/articles/llm-as-a-judge-evaluation https://overtimelabs.ai/articles/llm-as-a-judge-evaluation Sat, 30 May 2026 00:00:00 GMT AI architecture How to use LLM-as-a-judge to evaluate generative systems at scale: rubrics, golden sets, bias mitigation, human calibration with Cohen's kappa, and CI gates. Migrating off tag proliferation to branch/environment CI/CD in GitLab Ultimate https://overtimelabs.ai/articles/gitlab-branch-environment-cicd https://overtimelabs.ai/articles/gitlab-branch-environment-cicd Sat, 30 May 2026 00:00:00 GMT Enterprise AI Replace tag-driven releases with a branch/environment promotion model in GitLab Ultimate: protected branches, approval gates, LDAP/AD identity, and build-once-promote pipelines. Enterprise Claude Code without leaking your code https://overtimelabs.ai/articles/enterprise-claude-code-without-leaking-code https://overtimelabs.ai/articles/enterprise-claude-code-without-leaking-code Sat, 30 May 2026 00:00:00 GMT Enterprise AI Run Claude Code via Amazon Bedrock with VPC endpoints so prompts and code stay in your AWS account: IAM-scoped, no public egress, no training on your data. Cutting vision-LLM cost 70-90% with motion-gating https://overtimelabs.ai/articles/cutting-vision-llm-cost-motion-gating https://overtimelabs.ai/articles/cutting-vision-llm-cost-motion-gating Sat, 30 May 2026 00:00:00 GMT Computer vision A cheap OpenCV motion-gating layer in front of a vision-language model cuts 24/7 surveillance API cost 70-90%, as built in my dvr_ai project. Your first dbt tests https://overtimelabs.ai/articles/your-first-dbt-tests https://overtimelabs.ai/articles/your-first-dbt-tests Tue, 29 Jul 2025 00:00:00 GMT Data Engineering Three low-effort dbt tests catch roughly 80% of warehouse errors. Add not_null, unique and accepted_values on your keys and enums, wire them into CI, and bad data stops reaching dashboards. Cutting p95 latency without new hardware https://overtimelabs.ai/articles/cutting-p95-latency https://overtimelabs.ai/articles/cutting-p95-latency Tue, 29 Jul 2025 00:00:00 GMT Performance A metric-driven playbook that routinely trims 40%+ off tail latency before you reach for a bigger instance — measure, relieve the DB, control concurrency, shed load.