Files
vf-cuda-grid/controller/cuda_grid_controller
gx d29f3f96e5 pipeline_monitor: + stall watchdog (mediamtx bytes-based detect)
Resilience improvement — раньше pipeline mог hung без exit (NVENC stuck,
output broken pipe), Docker restart policy не triggered. Никакой alert.

Now: poll mediamtx /v3/rtspsessions/list каждые N sec, track publish session
inboundBytes. Не растёт 3 polls (~9 sec) → emit MQTT 'pipeline_stalled' event
(через dispatcher.on_event = mqtt.publish_event). User / Home Assistant
automation решает что делать (restart container, notify).

Wired:
  pipeline_monitor.on_event = mqtt.publish_event  # __main__.py

Bytes started growing again → emit 'pipeline_unstalled'.

Alert single-shot: пока stalled flag set, no dup alerts. Reset когда
bytes counter растёт.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 10:03:41 +01:00
..