Files
gx e76360dbc4 docs: руководства пользователя / разработчика / operations (RU+EN)
Полный комплект документации к Phase 11b:

  docs/ru/user.md        — для админа инсталляции (motion-mode, PTZ,
                            templates.json, mqtt_overlays.json, ZMQ verbs)
  docs/ru/developer.md   — архитектура (Cell / Layout / Decoration),
                            как добавить новый Cell/Decoration, ABI shim,
                            algorithms (best-fit + asymmetric hysteresis)
  docs/ru/operations.md  — build (host + jammy + incremental bake),
                            deploy, logs/telemetry, troubleshooting
                            (broken pipe, MQTT-overlay, motion-mode)
  docs/en/*.md           — английская версия всех трёх
  README.md              — переписан с overview + ссылками на docs/

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 22:02:47 +01:00

403 lines
13 KiB
Markdown

# cfc-grid — operations / deploy / troubleshooting
> Audience: who builds, deploys, monitors cfc-grid in production.
>
> If you're a user — see [user.md](user.md). For developer documentation
> — see [developer.md](developer.md).
## 1. Production setup (R9-88.23)
### 1.1 Stack
```
docker compose -f docker-compose.yml \
-f cuda-grid/docker-compose.override.yml \
-f cuframes-composer/docker-compose.override.yml \
-f onvif/docker-compose.override.yml \
up -d
```
Files — in `localhost-infra/hosts/R9-88.23/docker/cctv/`.
| Service | Image | Purpose |
|---|---|---|
| `cuframes-ipc-anchor` | `gx/cuframes:0.4` | Shared VMM IPC anchor for cuframes |
| `cuframes-pub-*` (parking/back_yard/front_yard/gate_lpr) | `gx/cuframes:0.4` | RTSP → cuframes per-camera publishers |
| `cuda-grid-mediamtx` | `bluenviron/mediamtx` | RTSP/HLS/WebRTC gateway |
| `cctv-mosquitto` | `eclipse-mosquitto` | MQTT broker (+bridge to 192.168.88.4) |
| **`cfc-grid`** | `gx/cuframes-composer:0.11b-step1` | Composer (main service) |
| `cfc-grid-ffmpeg` | `ffmpeg-vf-cuda-grid:phase4b-final` | H.264 pipe → RTSP push |
| `cfc-grid-watchdog` | `gx/cuda-grid-watchdog:0.4` | Restart cfc-grid on stuck inboundBytes |
| `cctv-onvif` | `gx/cctv-onvif:0.6` | ONVIF discovery + PTZ → ZMQ |
| `cctv-frigate` | `ghcr.io/blakeblackshear/frigate` | Object detection → MQTT events |
### 1.2 Frame flow
```
cuframes-pub-X ──VMM──┐
cuframes-pub-Y ──VMM──┼──→ cfc-grid (composer)
cuframes-pub-Z ──VMM──┘ │
│ H.264 NVENC
↓ named pipe /tmp/cfc-pipe-dir/grid.h264
cfc-grid-ffmpeg (re-mux)
│ RTSP push
cuda-grid-mediamtx
rtsp://*/cfc-grid (TCP/UDP)
http://*:8888/cfc-grid (HLS)
http://*:8889/cfc-grid (WebRTC)
```
### 1.3 Networks
- Internal docker network: `cctv`
- External ports on R9-88.23:
- `554/tcp` — RTSP (mediamtx)
- `8888/tcp` — HLS (mediamtx)
- `8889/tcp` — WebRTC (mediamtx)
- `5599/tcp` — ZMQ composer control plane
- `8085/tcp` — ONVIF SOAP (cctv-onvif)
- `3702/udp` — WS-Discovery multicast (cctv-onvif)
## 2. Build
### 2.1 Local host build (Ubuntu 24.04, dev machine)
```bash
cd /home/claude/projects/cuframes-composer
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
```
Artifacts in `build/src/libcuframes_composer.so` and `build/examples/grid_record`.
**IMPORTANT:** host binary (Ubuntu 24.04, glibc 2.39, libavformat60) is
**incompatible** with runtime container (Ubuntu 22.04 jammy, glibc 2.35,
libavformat58). See memory `incremental-ffmpeg-rebuild`.
### 2.2 Jammy build (for production image)
Uses cached builder container `cuframes-composer-builder:cached`
(Ubuntu 22.04 + nvidia/cuda:12.4.1-devel + apt-deps):
```bash
cd /home/claude/projects/cuframes-composer
# If builder not yet cached:
docker image inspect cuframes-composer-builder:cached >/dev/null 2>&1 || {
docker run -d --name cfc-builder-tmp \
nvidia/cuda:12.4.1-devel-ubuntu22.04 sleep 3600
docker exec cfc-builder-tmp bash -c '
apt-get update -qq && apt-get install -y -qq --no-install-recommends \
build-essential cmake git pkg-config \
libpng-dev libfreetype-dev \
libzmq3-dev libjson-c-dev libmosquitto-dev \
libavformat-dev libavcodec-dev libavutil-dev'
docker commit cfc-builder-tmp cuframes-composer-builder:cached
docker rm -f cfc-builder-tmp
}
# Actual build:
docker run --rm --gpus all -v "$PWD":/src -w /src/build-jammy \
cuframes-composer-builder:cached \
bash -c 'cmake -DCMAKE_BUILD_TYPE=Release .. && make -j$(nproc)'
```
Artifacts in `build-jammy/`.
### 2.3 Bake image (incremental — without `docker build`)
We don't use `docker build` (4GB CUDA pull on cache miss). Instead:
```bash
docker rmi gx/cuframes-composer:0.11b-step1 -f 2>/dev/null
CID=$(docker create gx/cuframes-composer:0.10)
docker cp build-jammy/examples/grid_record "$CID":/usr/local/bin/grid_record
docker cp build-jammy/src/libcuframes_composer.so.0.1.0 \
"$CID":/usr/lib/x86_64-linux-gnu/libcuframes_composer.so.0
docker cp docker/templates.json "$CID":/opt/templates.json
docker cp docker/mqtt_overlays.json "$CID":/opt/mqtt_overlays.json
docker commit \
--change 'ENTRYPOINT ["/usr/local/bin/grid_record"]' \
--change 'CMD ["--help"]' \
--change 'ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility,video' \
"$CID" gx/cuframes-composer:0.11b-step1
docker rm "$CID"
```
Uses `gx/cuframes-composer:0.10` as base (with runtime deps already
installed) + overlays fresh artifacts. Faster and no network traffic.
### 2.4 Build ONVIF image
```bash
cd hosts/R9-88.23/docker/cctv/onvif
docker build -t gx/cctv-onvif:0.6 -f Dockerfile .
```
Python image, lightweight. If you change `server.py` — rebuild image
(bump tag) + update `docker-compose.override.yml`.
## 3. Deploy
### 3.1 Production (R9-88.23)
```bash
cd /home/claude/projects/localhost-infra/hosts/R9-88.23/docker/cctv
docker compose -f docker-compose.yml \
-f cuda-grid/docker-compose.override.yml \
-f cuframes-composer/docker-compose.override.yml \
-f onvif/docker-compose.override.yml \
up -d cfc-grid
```
Compose auto-recreates container if image tag changed in
`docker-compose.override.yml`.
### 3.2 Verify post-deploy
```bash
docker logs --tail 30 cfc-grid 2>&1 | grep -iE "loaded|template|pool|motion"
# Expect something like:
# [cfc/loader] /opt/templates.json: loaded 9 templates
# [cfc/composer] templates loaded: 9 (path='/opt/templates.json')
# [cfc/composer] pool+ 'cam-parking' (frigate=parking_overview prio=100)
# [cfc/composer] motion_mode=1 ttl=45000ms pool=4
# [cfc/composer] grow → template='tpl_3' active=3
```
### 3.3 Rollback
```bash
sed -i 's|gx/cuframes-composer:0.11b-step1|gx/cuframes-composer:0.10|' \
hosts/R9-88.23/docker/cctv/cuframes-composer/docker-compose.override.yml
docker compose ... up -d cfc-grid
```
## 4. Logs
### 4.1 Live tail
```bash
docker logs -f --tail 50 cfc-grid
docker logs -f --tail 30 cfc-grid-ffmpeg
docker logs -f --tail 30 cuda-grid-mediamtx
docker logs -f --tail 30 cctv-onvif
docker logs -f --tail 30 cctv-frigate
```
### 4.2 Telemetry patterns
| Marker | Meaning |
|---|---|
| `[grid_record] N kadrov, M IDR, X MB ...` | Composer successfully encoding every ~50 frames |
| `[cfc/composer] grow → template='X'` | New template applied (growth, immediate) |
| `[cfc/composer] shrink → template='X'` | New template applied after hysteresis (shrink) |
| `[cfc/composer] manual override 'X' до +60000ms` | PTZ via ONVIF |
| `[cfc/composer] manual override expired, возврат в motion-mode` | Auto-return after 60s |
| `[cfc/mqtt-overlay/<id>] '<text>'` | MQTT-overlay received/rendered new text |
| `[cfc/frigate] connected, subscribe 'frigate/events'` | Frigate subscriber connected |
### 4.3 When something breaks
| Symptom | Where to look |
|---|---|
| `src active=0 stale=0 dead=4` | cuframes-pub-* containers; check `docker ps` and network access to cameras |
| `overlay 0 draw failed` | `cfc_overlay_text_rebuild_atlas` — usually invalid font or text |
| RTSP stream not delivering | `cfc-grid-ffmpeg` logs; see §6.1 |
| TV/ONVIF can't find | `cctv-onvif` logs; check multicast WS-Discovery in LAN |
## 5. Monitoring
### 5.1 MQTT health
`cfc-grid` publishes health to `cuda_grid/health/composer/cfc-grid`
every ~10 seconds:
```json
{
"uptime_s": 3600,
"frames_encoded": 90000,
"fps_actual": 25.0,
"bitrate_kbps": 6000,
"src_active": 4,
"src_stale": 0,
"src_dead": 0,
"idr_count": 1
}
```
```bash
PW=$(grep '^COMPOSER_MQTT_PASSWORD=' \
/home/claude/projects/localhost-infra/hosts/R9-88.23/docker/cctv/.env | cut -d= -f2)
mosquitto_sub -h 192.168.88.23 -u composer -P "$PW" \
-t 'cuda_grid/health/composer/cfc-grid' -v
```
### 5.2 Watchdog
`cfc-grid-watchdog` is a separate service, monitors mediamtx
`inboundBytes` for `cfc-grid` path. If **30 seconds of silence**
`docker restart cfc-grid`.
Watchdog logs:
```bash
docker logs --tail 30 cfc-grid-watchdog
```
On trigger — publishes to `cuda_grid/health/watchdog/cfc-grid`.
## 6. Troubleshooting
### 6.1 RTSP not delivering / `cfc-grid-ffmpeg` "Broken pipe"
**Symptom:** `docker logs cfc-grid-ffmpeg` shows
`[out#0/rtsp] Task finished with error code: -32 (Broken pipe)`.
**Cause:** `--intra-refresh` in composer (no IDR bursts), mediamtx
tears RTSP publisher when it can't deliver start-frame to a new client.
**Treatment:**
- Full pipeline restart:
```bash
docker compose ... restart cfc-grid-ffmpeg cfc-grid cuda-grid-mediamtx
```
- If recurring — disable `--intra-refresh` in compose override
(cost: IDR bursts in bitrate, but more stable for downstream clients
with frequent disconnect/reconnect)
### 6.2 ffmpeg doesn't receive frames from RTSP
**Symptom:** `ffmpeg -i rtsp://192.168.88.23:554/cfc-grid -frames:v 1 out.jpg`
hangs for 30+ seconds.
**Cause:** Composer writes H.264 without regular IDR (intra-refresh).
A new RTSP client waits for a keyframe to start decoding. ffmpeg in
default config doesn't wait long enough.
**Workaround:**
```bash
ffmpeg -rtsp_transport tcp \
-analyzeduration 10000000 -probesize 10000000 \
-i rtsp://192.168.88.23:554/cfc-grid \
-frames:v 1 -y out.jpg
```
Or use HLS:
```bash
ffmpeg -i http://192.168.88.23:8888/cfc-grid/index.m3u8 \
-frames:v 1 -y out.jpg
```
### 6.3 MQTT-overlay not updating
**Checklist:**
1. Bridge to HA broker (192.168.88.4) working?
```bash
docker logs cctv-mosquitto 2>&1 | grep -i 'bridge'
```
Look for `Connecting bridge ha-bridge` and connect confirmation.
2. Required topic in bridge config?
```bash
docker exec cctv-mosquitto grep 'topic.*in 0' /mosquitto/config/mosquitto.conf
```
If new prefix — add `topic XXX/# in 0` and restart mosquitto.
3. Subscriber connected?
```bash
docker logs cfc-grid 2>&1 | grep 'mqtt-overlay/<id>.*connected'
```
4. Test publish:
```bash
mosquitto_pub -h 192.168.88.4 -t '<your topic>' -m 'test' -r
```
In composer logs, should appear `[cfc/mqtt-overlay/<id>] 'test'`.
### 6.4 Motion-mode not switching layouts
**Checklist:**
1. Frigate sending events?
```bash
mosquitto_sub -h 192.168.88.23 -u composer -P "$PW" \
-t 'frigate/events' -C 3
```
2. Composer receiving events?
```bash
docker logs cfc-grid 2>&1 | grep 'frigate.*started\|grow\|shrink'
```
3. Camera-name matches?
`frigate=<name>` in `--source` must match `event.after.camera`.
4. Zone-filter not blocking?
If `zones=A:B:C` in `--source` — check Frigate event `current_zones`.
If empty or doesn't intersect — pulse is discarded.
5. TTL not expired?
Logs `motion_ttl=45000` (45 sec) — if events come less frequently —
camera drops from active.
### 6.5 ONVIF PTZ presets empty in TV
**Cause:** TV cached old `GetPresets` response (Phase 9 names).
**Treatment:** delete and re-add camera in TV client.
### 6.6 Templates loaded but motion-mode doesn't use new
Composer reads global registry `cfc::current_templates()` on every frame
— changes via `cfc_layout_load_file` (ZMQ or CLI) should be picked up
immediately. If not — check:
```bash
echo '{"cmd":"list_layouts"}' | python3 -c "
import zmq,json,sys
s = zmq.Context().socket(zmq.REQ)
s.connect('tcp://192.168.88.23:5599')
s.send_json({'cmd':'list_layouts'})
print(json.dumps(s.recv_json(), indent=2, ensure_ascii=False))"
```
`source` field shows currently loaded path. If built-in (only `tpl_1` +
`tpl_4`) — JSON didn't load (syntax error, wrong path).
## 7. Configs in repo
| What | Where |
|---|---|
| templates.json | `cuframes-composer/docker/templates.json` |
| mqtt_overlays.json | `cuframes-composer/docker/mqtt_overlays.json` |
| compose override | `localhost-infra/hosts/R9-88.23/docker/cctv/cuframes-composer/docker-compose.override.yml` |
| ONVIF config | `localhost-infra/.../onvif/onvif.yaml` |
| ONVIF server | `localhost-infra/.../onvif/server.py` |
| Mosquitto config | `localhost-infra/.../cctv/mosquitto/config/mosquitto.conf` |
| .env (passwords) | `localhost-infra/.../cctv/.env` (gitignored) |
After changing compose override — `docker compose ... up -d cfc-grid`
auto-recreates.
## 8. Known limitations / TODO
- **`--intra-refresh` ↔ RTSP clients**: trade-off bitrate vs latency
(see §6.1)
- **Watchdog only for cfc-grid**: cfc-grid-ffmpeg in zombie state not
detected directly; only full restart helps
- **Hot-reload of mqtt_overlays.json**: no ZMQ verb
- **Per-overlay MQTT broker config**: all via single broker; for
foreign broker — need to extend `MqttBrokerCfg` per-item
## 9. See also
- [user.md](user.md) — composer configuration
- [developer.md](developer.md) — internals, adding modules
- `memory/host-and-project.md` — general R9-88.23 infra
- `memory/project_cfc-grid-deployed.md` — first prod deploy
- `memory/project_cfc-grid-cpp-refactor.md` — Phase 11b refactor
- `memory/incremental-ffmpeg-rebuild.md` — incremental docker recipe