Challenge
Continuously analysing video with a vision-language model is accurate but ruinously expensive if you send every frame.
Approach
An OpenCV motion-gating layer only escalates interesting frames to Gemini (primary) / Groq vision (fallback). A ring-buffer extracts evidence clips and fires webhook alerts.
Results
- 70–90% reduction in VLM API cost via motion-gating
- Evidence-clip extraction + webhook alerts
- Clean modular services (stream, motion, analyse, clip, alert)