docs(benchmarks): production v0.2 deploy metrics (4 cam × 3 consumer)

Real-world numbers с production deploy 2026-05-19: - RTSP к камерам: 12 → 4 (−67%) - NVDEC sessions: 8 → 4 (−50%) - Camera bandwidth: 34 → 16 Mbps (−54%) - PCIe D2H copies: 346 MB/s → ~0 (−100% через zero-copy CUDA IPC) - Frigate прямые RTSP: 8 → 0 (−100%) Plus live nvidia-smi metrics, что сохранилось vs не сэкономлено, projection table для других setup'ов (8/16 cam × 2/3/4 consumer). Для promotional material — public-facing claims на основе measured deploy.
2026-05-19 19:07:16 +01:00
parent 98d1bb5296
commit 3779175737
1 changed files with 65 additions and 0 deletions
@@ -117,3 +117,68 @@ cd build && cmake -DBUILD_TESTING=ON ..  && cmake --build . && ctest -R stress -
 Production деplo замеры — см. интеграционные guides:
 - [docs/integration.md](docs/integration.md) — cctv-processor C++ pipeline
 - [filter/README.md](filter/README.md) — FFmpeg demuxer (Frigate setup)
+
+---
+
+## Real-world production deployment (2026-05-19, v0.2.0)
+
+**Setup**: 4 Dahua IP-камеры (HEVC main 1920×1080 / 2688×1520, 25 fps) → 3
+одновременных consumer'а на одном RTX 5090 хосте:
+- **Frigate** detect (ONNX D-FINE-S, 640×480) + record (full-res H.265 mp4)
+- **cctv-backend** custom C++ mosaic processor (composes 4×grid → RTSP output для TV)
+
+### Before → after (measured production, идентичный workload)
+
+| Метрика | Без cuframes | С cuframes v0.2 dual-input | Reduction |
+|---|---:|---:|---:|
+| **RTSP connections к камерам** | 12 (4 cam × 3 consumer) | **4** (publishers only) | **−67%** |
+| **NVDEC sessions** | ~8 (decode на каждый consumer) | **4** (publishers only) | **−50%** |
+| **Camera-side bandwidth** | ~34 Mbps (main+main+sub per cam) | **~16 Mbps** (main per cam) | **−54%** |
+| **PCIe D2H copies (consumer side)** | ~346 MB/s (decoded frames → host) | **~0** (zero-copy CUDA IPC) | **−100%** |
+| **Frigate ffmpeg с прямым RTSP** | 8 (detect+record × 4) | **0** (all через cuframes) | **−100%** |
+
+### Live nvidia-smi metrics в running system
+
+```
+GPU SM:     4-5%   (compute: detector + cuframes consumers)
+GPU NVDEC:  2-4%   (без cuframes ожидаемо было 15-25%)
+GPU NVENC:  0-1%
+VRAM:       4 publishers × ~1 GB ring buffers + consumer contexts
+```
+
+### Camera-side benefits
+
+Dahua/Hikvision камеры обычно cap'нуты на 4-5 одновременных RTSP streams.
+До cuframes setup (4 cam × 3 RTSP) делал каждую camera на **60-75% capacity**
+её RTSP server'а. После — **20-25%**, headroom на 2-3 дополнительных
+consumer'а без замены оборудования.
+
+### Что **сохранено** (важно)
+
+- **Качество записи**: record path через `cuframes_packets://` это **passthrough**
+  (`-c:v copy`), bit-exact original encoded stream от камеры. Frigate пишет mp4
+  с full-resolution оригинала, без re-encode.
+- **Latency**: <2 ms publisher → consumer (cuframes IPC) vs ~50-80 ms RTSP setup
+  latency для каждого нового consumer.
+- **Backward compatibility**: v0.2 publishers принимают v1 subscribers
+  (frames-only), rolling upgrade.
+
+### Hardware-agnostic projection (для другого setup)
+
+| If you have | Expected reduction |
+|---|---|
+| 16 cameras × 2 consumers | 32 → 16 NVDEC (−50%), 32 → 16 RTSP (−50%) |
+| 8 cameras × 3 consumers | 24 → 8 NVDEC (−67%), 24 → 8 RTSP (−67%) |
+| 4 cameras × 4 consumers (multi-AI pipeline) | 16 → 4 NVDEC (−75%), 16 → 4 RTSP (−75%) |
+
+Reduction масштабируется **линейно** с N (consumers per camera). v0.1 (frames
+only) сэкономит NVDEC; v0.2 (frames + packets) **дополнительно** сэкономит
+RTSP connections для record/mux consumers.
+
+### Что **НЕ** сэкономлено (честно)
+
+- **Disk space**: запись остаётся full-resolution H.265 mp4. Cuframes не сжимает.
+- **Detector inference latency**: ONNX/TensorRT detector работает на decoded
+  frames независимо от source. Cuframes только меняет где decode произошёл.
+- **Camera RTSP server CPU**: сама камера всё равно encode'ит видео. Cuframes
+  reduces **consumer-side** load, не producer-side.