vf-cuda-grid/docs/design.md


# Design: `vf-cuda-grid` — GPU-native video grid composer with control plane sidecar

**Repo (рекомендуемое имя):** `gx/vf-cuda-grid`

Альтернативы которые рассматривал:
- `gx/vf_cuda_grid` — соответствует именованию FFmpeg-фильтра, но дефис в repo-name удобнее для URL и CLI. **Reject.**
- `gx/cuda-grid` — короче, но скрывает что это FFmpeg-filter (а не standalone tool). **Reject.**
- `gx/ffmpeg-cuda-grid` — точное описание, но префикс `ffmpeg-` намекает на fork всего FFmpeg, а у нас фильтр-патч. **Reject.**
- **`gx/vf-cuda-grid` ✅** — `vf-` префикс — это конвенция FFmpeg video-filter (как `vf_scale_cuda`, `vf_overlay_cuda`), сразу понятно что это; дефис — repo-friendly.

Repo содержит **all three components** (filter source + sidecar + docs/examples) — это monorepo product. Дробить на 2-3 repo рано: компоненты тесно связаны по протоколу, и Phase 1-3 удобнее ревьюить совместно. Когда controller станет multi-product (см. §14 — overlap с `gx/cctv#24`), его можно extract'нуть в `gx/grid-controller` (sub-stable surface).

---

## 1. High-level architecture

```
┌──────────────────────────────────────────────────────────────────────────────┐
│                              HOST PROCESS (FFmpeg)                            │
│                                                                               │
│  cuframes://cam1 ─┐                                                            │
│  cuframes://cam2 ─┼─►┌──────────────────┐                                      │
│  cuframes://cam3 ─┤  │   vf_cuda_grid   │  ───►  scale_cuda? ──►  h264_nvenc  │
│  cuframes://cam4 ─┘  │   (instance #1)  │                            │        │
│                   ├─►│   target: TV-1   │                            ▼        │
│                   │  └────────▲─────────┘                       RTSP/SRT      │
│                   │           │ side data (overlays)                          │
│                   ├─►┌────────┴─────────┐                                      │
│                   │  │   vf_cuda_grid   │  ───►  h264_nvenc  ──►  RTSP        │
│                   │  │   (instance #2)  │                                      │
│                   │  │   target: TV-2   │                                      │
│                   │  └────────▲─────────┘                                      │
│                   │           │                                                │
│                   ├─►┌────────┴─────────┐                                      │
│                   │  │   vf_cuda_grid   │  ───►  h264_nvenc  ──►  WebRTC      │
│                   │  │   (instance #3)  │  (privacy-public)                    │
│                   │  └────────▲─────────┘                                      │
│                   │           │                                                │
│                   │  ┌────────┴────────┐                                       │
│                   │  │   zmq filter    │◄─── tcp://127.0.0.1:5555 (commands)  │
│                   │  └─────────────────┘                                       │
│                   │                                                            │
│   audio ──────────┴─►  amix / sidechaincompress (стандартные FFmpeg filters)  │
└────────────────────────────────────────────────────────────────────▲──────────┘
                                                                     │
                                                          process_command via zmq
                                                                     │
┌────────────────────────────────────────────────────────────────────┴──────────┐
│                       cuda-grid-controller (sidecar)                           │
│                                                                               │
│  ┌────────────────────┐   ┌────────────────────┐   ┌────────────────────┐    │
│  │ HTTP/REST + SSE    │   │ MQTT (paho)        │   │ ZeroMQ pub/sub     │    │
│  │ (FastAPI/aiohttp)  │   │ + HA Discovery     │   │ events outbound    │    │
│  └─────────┬──────────┘   └─────────┬──────────┘   └─────────▲──────────┘    │
│            └──────────────┬─────────┘                         │               │
│                           ▼                                   │               │
│                  ┌────────────────────┐                       │               │
│                  │  Command Router    │──────────────────────►│               │
│                  │  (idempotency,     │   serialised events   │               │
│                  │   conflict-resol)  │                       │               │
│                  └─────────┬──────────┘                       │               │
│                            ▼                                  │               │
│  ┌────────────────────┐  ┌─────────────────────┐  ┌──────────┴────────────┐  │
│  │ State Store        │  │ Layout Registry     │  │ Event Bus (internal)  │  │
│  │ (in-mem +          │  │ (global, versioned) │  │ (asyncio.Queue / Go   │  │
│  │  JSON snapshot)    │  │                     │  │  channel)             │  │
│  └────────────────────┘  └─────────────────────┘  └───────────────────────┘  │
│                            ▼                                                  │
│                  ┌────────────────────┐                                       │
│                  │ FFmpeg ZMQ Client  │──► tcp://127.0.0.1:5555 (commands)   │
│                  │  + Side-Data Pusher│──► AVFrame side-data injection       │
│                  └────────────────────┘    (через named pipe / shared mem)   │
└───────────────────────────────────────────────────────────────────────────────┘
```

**State ownership:**

| Что | Owner | Persistence | Почему |
|---|---|---|---|
| Active layout per instance | filter (in-process) | none | hot path, нужно низкое latency |
| Layout registry (definitions) | controller | JSON-snapshot + version'ed in-mem | переживает restart FFmpeg, может share между filter-instances |
| Camera roster per instance | controller | JSON | runtime modify, push to filter via cmd |
| Overlay state (live elements) | controller | in-mem | ephemeral, восстанавливается с events |
| Audio routing state | controller | JSON | сложная state machine, нужно auditable |
| Statistics (fps, switches, errors) | controller (aggregated) | in-mem + Prometheus | observability |

**Кто кому owner:** controller — single source of truth для **declared intent** (что должно быть). Filter — owner **executing state** (что сейчас работает). При рассинхроне controller reconcile'ит, filter — нет.

**Repo structure:**

```
gx/vf-cuda-grid/
├── README.md
├── ROADMAP.md
├── LICENSE                    # LGPL (для filter — наследуется от FFmpeg) +
│                              # MIT (для controller — отдельный LICENSE-controller)
├── CHANGELOG.md
├── filter/                    # vf_cuda_grid.c + CUDA kernels + FFmpeg patch
│   ├── vf_cuda_grid.c
│   ├── vf_cuda_grid_kernels.cu
│   ├── vf_cuda_grid_overlays.cu
│   ├── ffmpeg-7.1-vf_cuda_grid.patch
│   └── README.md              # как применить patch к ffmpeg-patched
├── controller/                # cuda-grid-controller
│   ├── pyproject.toml         # (или go.mod если Go — см. §10)
│   ├── src/cuda_grid_controller/
│   │   ├── api/               # HTTP, MQTT, ZMQ entrypoints
│   │   ├── domain/            # layouts, overlays, state — pure
│   │   ├── adapters/          # ffmpeg-zmq, mqtt, ha-discovery
│   │   └── orchestration/     # audio rules, event handlers
│   └── tests/
├── schema/                    # JSON-schema для layouts, overlays, events
│   ├── layout.schema.json
│   ├── overlay.schema.json
│   └── event.schema.json
├── examples/
│   ├── frigate-bridge/        # Frigate MQTT events → grid commands
│   ├── cctv-processor-migrate/
│   └── home-assistant/
├── docs/
│   ├── architecture.md        # this document, refined
│   ├── filter-api.md
│   ├── controller-api.md
│   ├── layout-dsl.md
│   ├── overlay-protocol.md
│   └── audio-orchestration.md
└── deploy/
    ├── docker/
    └── systemd/
```

**Главный архитектурный принцип:** *filter знает только про композицию пикселей в текущем кадре; всё что про "почему такой layout, чьи events, какие правила" — в controller*. Это позволяет filter оставаться small, fast, upstreamable, а всю бизнес-логику итерировать в Python/Go без пересборки FFmpeg.

---

## 2. Component design

### 2.1 `vf_cuda_grid` filter API

**Filter declaration:**

```c
const AVFilter ff_vf_cuda_grid = {
    .name          = "cuda_grid",
    .description   = NULL_IF_CONFIG_SMALL("GPU-native multi-input grid composer"),
    .priv_class    = &cuda_grid_class,
    .priv_size     = sizeof(CudaGridContext),
    .flags         = AVFILTER_FLAG_DYNAMIC_INPUTS | AVFILTER_FLAG_HWDEVICE,
    .nb_inputs     = 0,                              // declared via options
    .nb_outputs    = 1,
    FILTER_QUERY_FUNC(query_formats),                // AV_PIX_FMT_CUDA only
    FILTER_INPUTS(NULL),                             // dynamic
    FILTER_OUTPUTS(cuda_grid_outputs),
    .process_command = process_command,              // runtime control
};
```

**Filter options (CLI):**

```
cuda_grid=
  inputs=4:                          # сколько inputs (как у hstack/xstack)
  layout=quad:                       # initial layout name (из registry или inline)
  output_size=1920x1080:
  layout_file=/etc/grids.json:       # static layout registry
  instance_id=tv-living-room:        # unique ID этого instance в graph
  zmq_addr=tcp://127.0.0.1:5555:     # whose process_command'ы слушать
  privacy_profile=public:            # named profile фильтрующий overlays
  fps=25                             # output fps (resampling если inputs разные)
```

**Multi-instance в одном filter_complex:** каждый `cuda_grid` имеет `instance_id` — controller адресует команды по нему. Inputs **shared** через FFmpeg native `split` filter:

```
[cam1][cam2][cam3][cam4]split=4[a1][b1][c1][d1][a2][b2][c2][d2][a3][b3][c3][d3];
[a1][b1][c1][d1]cuda_grid=inputs=4:instance_id=tv1:layout=quad[out1];
[a2][b2][c2][d2]cuda_grid=inputs=4:instance_id=tv2:layout=nine_grid[out2];
[a3][b3][c3][d3]cuda_grid=inputs=4:instance_id=public:privacy_profile=public:layout=quad[out3]
```

`split` уже работает с CUDA frames (по ref-count'у). Декодируем 4 камеры один раз, ref-share в 3 instances.

**`process_command` API (runtime):**

| Command | Args | Side effect |
|---|---|---|
| `set_layout` | `name=<layout_name>` | сменить активный layout у instance |
| `set_layout_transition` | `name=<>;duration_ms=500;type=fade` | переключение с cross-fade |
| `bind_cell` | `cell=<n>;camera=<input_idx>` | per-cell camera→input mapping |
| `swap_cells` | `a=<n>;b=<m>` | поменять две ячейки местами |
| `set_privacy_profile` | `profile=<name>` | переключить фильтрацию overlays |
| `set_overlay` | `id=<>;json=<...>` | add/replace overlay |
| `clear_overlay` | `id=<>` | delete overlay |
| `clear_all_overlays` | `cell=<n>?` | flush либо все, либо конкретной cell |
| `set_text` | `id=<>;text=<utf8>` | быстрый shortcut для text-overlay update |

Commands передаются через FFmpeg `zmq` filter в filter graph (он routes по filter target). Filter implement'ит `process_command` (FFmpeg signature: `int (*process_command)(AVFilterContext*, const char *cmd, const char *arg, char *res, int res_len, int flags)`).

**Ограничение FFmpeg `process_command`:** args — single string. Для сложных overlay payloads используем JSON-string в `arg`, либо **AVFrame side data** для bulk-данных (см. §5).

**Internal context:**

```c
typedef struct CudaGridContext {
    const AVClass *class;

    // config
    int nb_inputs;
    int output_w, output_h;
    char *instance_id;
    char *layout_file;
    char *zmq_addr;
    char *privacy_profile;

    // CUDA
    AVBufferRef *hw_device_ref;       // CUDA device
    AVBufferRef *hw_frames_ref;       // output frames pool
    CUcontext cu_ctx;
    CUstream cu_stream;

    // layouts
    LayoutRegistry *registry;          // shared (см. §9)
    Layout *active_layout;             // current
    Layout *prev_layout;               // для cross-fade
    LayoutTransition transition;       // active transition state

    // per-cell camera binding (cell_idx → input_idx)
    int cell_to_input[MAX_CELLS];

    // overlays
    OverlayStore *overlays;            // per-instance state
    GpuOverlayCache *gpu_cache;        // pre-uploaded textures, glyph atlases

    // input frame queue (lock-step buffering — см. §3)
    InputFrameQueue queues[MAX_INPUTS];
    int64_t last_pts;

    // statistics
    GridStats stats;
} CudaGridContext;
```

### 2.2 `cuda-grid-controller` sidecar — модули

```
controller/src/cuda_grid_controller/
├── domain/
│   ├── layout.py             # Layout, Cell, LayoutRegistry — pure value-objects (pydantic)
│   ├── overlay.py            # Overlay primitives — pydantic discriminated union
│   ├── instance.py           # FilterInstance — state of one cuda_grid filter
│   ├── audio_state.py        # AudioRoutingState, DuckingRule
│   └── events.py             # Event taxonomy (см. §7)
├── api/
│   ├── http/                 # FastAPI app, SSE endpoint
│   ├── mqtt/                 # paho-mqtt handlers + HA Discovery
│   ├── zmq/                  # asyncio pyzmq pub-socket (events out)
│   └── schemas/              # request/response models
├── adapters/
│   ├── ffmpeg_zmq_client.py  # отправка process_command в FFmpeg
│   ├── side_data_pusher.py   # AVFrame side data — через named pipe / unix socket
│   ├── ha_discovery.py       # MQTT discovery config publisher
│   └── frigate_bridge.py     # subscribe Frigate MQTT events → translate
├── orchestration/
│   ├── command_router.py     # routes commands из всех источников в один pipeline
│   ├── conflict_resolver.py  # см. §7 — last-write-wins + per-source priority
│   ├── audio_orchestrator.py # state machine: domofon→duck→swap_screen→restore
│   └── overlay_lifecycle.py  # TTL, throttling, debouncing для charts/chats
├── stores/
│   ├── memory.py             # MemoryStateStore (default)
│   └── redis.py              # RedisStateStore (multi-controller HA, future)
└── cli.py                    # entrypoint, config loading
```

**Разделение responsibilities:**

- `domain/` — pure Python, no I/O, тестируется без mocks. Pydantic-models с validation.
- `api/` — три entrypoint'а (HTTP, MQTT, ZMQ-pub), все нормализуют входной command в внутренний `Command` DTO и кладут в `command_router`.
- `orchestration/` — слышит events, применяет правила, генерирует commands.
- `adapters/` — outbound side (в FFmpeg и в Frigate). Может быть mocked в tests.

**Hot reload config:** controller подписан на signal SIGHUP → re-read JSON конфигов layouts/cameras без restart. State в памяти не теряется.

---

## 3. Composition algorithm

### 3.1 CUDA pipeline для одного output frame

```
для каждого output frame (на target fps):
  1. собрать N input frames (по input queues, см. lock-step ниже)
  2. allocate output AVFrame (NV12, CUDA) из hw_frames_ref pool
  3. clear background (CUDA memset → background_color)
  4. for each visible cell c in active_layout:
       input_idx = cell_to_input[c.cell_idx]
       in_frame = input_frames[input_idx]
       src_rect, dst_rect = compute_rects(in_frame, c, output_w, output_h)

       if c.size == in_frame.size and c.no_scale:
           # fast-path: pure NV12 region memcpy (Y и UV planes отдельно)
           cuMemcpy2DAsync(...)
       else:
           # scale-blit: nppiResizeYUV или custom bilinear kernel
           launch_scale_kernel(in_frame, out_frame, src_rect, dst_rect, stream)

       if c.border or motion_indication:
           launch_border_kernel(out_frame, c.rect, c.color, c.width, stream)

  5. apply overlays (см. §5):
       sort overlays by z_order
       for each overlay o:
           if o.cell_filter and c not in matching cells: skip
           if privacy_profile_excludes(o): skip
           launch_overlay_blit_kernel(out_frame, o.gpu_texture, o.rect, o.alpha, stream)

  6. if transition.active:
       blend(prev_layout_frame, current_layout_frame, t) — линейное мixing двух proxy buffers

  7. cuStreamSynchronize (или semaphore с следующим filter — см. §3.4)
  8. push out_frame в output
```

### 3.2 CUDA kernels (что писать, что брать готовое)

| Operation | Implementation | Зачем custom (если custom) |
|---|---|---|
| NV12 region memcpy (no-scale) | `cuMemcpy2DAsync` × 2 (Y + UV planes) | стандартно |
| Bilinear scale NV12 → NV12 | **NPP `nppiResize_8u_C1R`** для Y, отдельно UV | NPP — NVIDIA-provided, оптимально; не пишем сами |
| Border drawing | custom kernel (rect outline) | мелочь, NPP overkill |
| Background clear | `cuMemsetD8Async` per plane | стандартно |
| Alpha-blit RGBA texture → NV12 | **custom kernel** | конверсия RGBA→YUV + alpha-mix, NPP не делает напрямую; ~80 строк CUDA |
| Glyph blit (text) | custom kernel: 8-bit alpha mask + color | text — alpha-only текстура, blit с color tint |
| Dim/darken area | custom kernel: multiply Y by factor | trivial |
| Linear blend (cross-fade) | custom kernel: `out = a*frame1 + (1-a)*frame2` | trivial |

**Принцип:** где есть NPP — берём NPP (NVIDIA уже оптимизировала). Где нет — ~50-100-строчные kernels с unit tests на known fixtures.

### 3.3 Lock-step inputs — что если не все inputs прибыли

FFmpeg filter API — pull-based: framework сам забирает frame через `ff_inlink_consume_frame()`. Но при N inputs они могут идти **рассинхронно** (камеры в RTSP не synced).

**Стратегия — adaptive PTS bucket'ing:**

1. Каждый input имеет ring queue (size = `max_input_lag_frames`, default 4).
2. Output frame генерируется по target_fps (например 25 fps → каждые 40ms).
3. Для каждого output frame: для каждого input берём frame с PTS ближайшим к `output_pts ± tolerance` (default ±20ms).
4. Если для input нет свежего frame → **stale-frame policy** (configurable):
   - `keep_last` (default) — рендерим last seen frame этого input
   - `black` — заливаем cell черным + label "NO SIGNAL"
   - `freeze_and_warn` — keep_last + красная рамка после `stale_timeout_ms`
5. Если **все** inputs stale → publish event `inputs_starved`, output продолжается с black background.

**Why pull-based work fine here:** output PTS — наш wall-clock; inputs — лучшие из доступных. Это hard real-time scenario, не lossless mux.

**Per-cell stale event:** controller получает `cell_stale{cell=N, camera=cam2, last_seen_ms=3400}` → MQTT publish, HA reagирует "камера 2 не отвечает".

### 3.4 Async vs sync CUDA execution

- Каждый instance имеет свой `CUstream` (parallel composition между instances).
- Внутри одного instance: все kernels на одном stream + asynchronous, финальный `cuEventRecord` → output AVFrame получает `AVCUDADeviceContextInternal::ready_event` (FFmpeg уже умеет это в `hwcontext_cuda`).
- Следующий filter (`scale_cuda`, `h264_nvenc`) делает `cuStreamWaitEvent` — настоящий zero-copy GPU pipeline без CPU блока.

---

## 4. Layout DSL

### 4.1 JSON Schema (layouts.json)

```json
{
  "version": "1",
  "schema": "https://git.goldix.org/gx/vf-cuda-grid/schema/layout.schema.json",
  "defaults": {
    "background": "#000000",
    "border": { "width": 2, "color": "#FFFFFF" },
    "label": { "enabled": true, "position": "top_left", "font_size": 16 }
  },
  "layouts": {
    "quad": {
      "title": "2×2 quad",
      "type": "predefined",
      "cells": [
        { "id": 0, "x": 0.0, "y": 0.0, "w": 0.5, "h": 0.5 },
        { "id": 1, "x": 0.5, "y": 0.0, "w": 0.5, "h": 0.5 },
        { "id": 2, "x": 0.0, "y": 0.5, "w": 0.5, "h": 0.5 },
        { "id": 3, "x": 0.5, "y": 0.5, "w": 0.5, "h": 0.5 }
      ]
    },
    "main_plus_preview": {
      "title": "Main + 3 previews",
      "type": "predefined",
      "cells": [
        { "id": 0, "x": 0.00, "y": 0.00, "w": 0.75, "h": 1.00, "role": "main" },
        { "id": 1, "x": 0.75, "y": 0.00, "w": 0.25, "h": 0.33, "role": "preview" },
        { "id": 2, "x": 0.75, "y": 0.33, "w": 0.25, "h": 0.33, "role": "preview" },
        { "id": 3, "x": 0.75, "y": 0.66, "w": 0.25, "h": 0.34, "role": "preview" }
      ]
    },
    "custom_3x4_with_dim": {
      "title": "Mixed",
      "type": "user",
      "cells": [
        { "id": 0, "x": 0.00, "y": 0.00, "w": 0.50, "h": 0.50,
          "z_index": 0, "cell_overlays": ["dim_if_no_motion"] },
        { "id": 1, "x": 0.50, "y": 0.00, "w": 0.50, "h": 0.50,
          "fit": "contain", "background": "#101010" }
      ]
    }
  },
  "default_layout": "quad",
  "instances": {
    "tv-living-room":   { "default_layout": "main_plus_preview", "privacy_profile": "private" },
    "tv-kitchen":       { "default_layout": "quad",              "privacy_profile": "private" },
    "public-stream":    { "default_layout": "quad",              "privacy_profile": "public"  }
  },
  "privacy_profiles": {
    "private": { "overlays_allow": ["*"] },
    "public":  { "overlays_deny": ["lpr_text", "face_name", "person_count"] }
  }
}
```

**Cell fields:**

- `id` (int) — slot в layout, на который controller `bind_cell` шлёт camera.
- `x,y,w,h` (float 0..1) — нормализованные координаты (consistent с текущим GridComposer).
- `role` (string, optional) — semantic hint (`main`, `preview`, `pip`). Controller использует для auto-selection.
- `fit` (`cover` | `contain` | `stretch`, default `cover`) — поведение при aspect mismatch.
- `background` (hex) — заливка cell за пределами scaled frame (при `contain`).
- `z_index` (int) — для overlapping layouts (PiP).
- `cell_overlays` (string[]) — overlay-templates применяемые автоматически.

**Camera binding** — **не в layout**. Layout определяет геометрию; camera→cell mapping — runtime state хранимый в controller per-instance. При `set_layout` controller `bind_cell` mapping автоматически: если layout имеет cells `[0,1,2,3]` и camera roster `[cam_front, cam_yard, cam_door, cam_garage]` — bind by index. User может override.

### 4.2 In-memory representation (filter side)

```c
typedef struct LayoutCell {
    int id;
    float x, y, w, h;
    int z_index;
    uint32_t bg_color;
    uint32_t border_color;
    int border_width;
    int fit_mode;                    // CELL_FIT_COVER/CONTAIN/STRETCH
    // resolved pixel rect (cached, invalidated при change output_size)
    int px_x, px_y, px_w, px_h;
} LayoutCell;

typedef struct Layout {
    char name[64];
    int nb_cells;
    LayoutCell cells[MAX_CELLS];     // MAX_CELLS = 64 (16×4)
    uint64_t version;                // для cache invalidation
} Layout;

typedef struct LayoutRegistry {
    Layout *layouts;                 // dynamic array
    int nb_layouts;
    pthread_rwlock_t lock;           // редко write, часто read
    uint64_t global_version;
} LayoutRegistry;
```

**Layout registry — global** (см. §9): один process, один registry — переиспользуется между filter-instances. `set_layout name=X` filter ищет в registry, не клонирует — берёт pointer (read-locked).

---

## 5. Overlay system

### 5.1 7 primitive types

```python
# domain/overlay.py (pydantic discriminated union)

class OverlayBase(BaseModel):
    id: str                          # уникальный, для replace/delete
    instance_id: str | None = None   # None = broadcast all instances
    cell_id: int | None = None       # None = на canvas, int = относительно cell
    z_index: int = 100
    alpha: float = 1.0               # 0..1
    ttl_ms: int | None = None        # auto-delete после
    privacy_tag: str | None = None   # для privacy_profile filtering
    visible: bool = True

class RectOverlay(OverlayBase):
    type: Literal["rect"] = "rect"
    x: float; y: float; w: float; h: float  # normalized
    color: str                        # #RRGGBBAA
    stroke_width: int = 0             # 0 = filled
    rounded_corners: int = 0          # px

class TextOverlay(OverlayBase):
    type: Literal["text"] = "text"
    text: str
    x: float; y: float
    font: str = "DejaVuSans"
    size: int = 16
    color: str = "#FFFFFF"
    background: str | None = None     # text bg box
    anchor: Literal["top-left","center","bottom-right",...] = "top-left"

class IconOverlay(OverlayBase):
    type: Literal["icon"] = "icon"
    icon: str                         # name из preloaded sprite sheet
    x: float; y: float
    size: int = 32
    tint: str | None = None

class ImageOverlay(OverlayBase):
    type: Literal["image"] = "image"
    source: str                       # path или URL (preload по first-use)
    x: float; y: float; w: float; h: float

class DimOverlay(OverlayBase):
    type: Literal["dim"] = "dim"
    x: float; y: float; w: float; h: float
    factor: float = 0.5               # 0..1, multiplier для luma

class GraphOverlay(OverlayBase):
    type: Literal["graph"] = "graph"
    kind: Literal["line","bar","histogram","sparkline"]
    x: float; y: float; w: float; h: float
    data_source: str                  # symbolic ID — кто-то push'ает данные
    refresh_rate_hz: float = 2.0
    style: dict                       # passthrough в renderer

class ChatOverlay(OverlayBase):
    type: Literal["chat"] = "chat"
    x: float; y: float; w: float; h: float
    max_lines: int = 5
    line_ttl_ms: int = 8000
    font_size: int = 14
    # сообщения push'аются отдельно через add_chat_message

Overlay = Annotated[
    RectOverlay | TextOverlay | IconOverlay | ImageOverlay |
    DimOverlay | GraphOverlay | ChatOverlay,
    Field(discriminator="type")
]
```

### 5.2 Где живёт overlay state

- **Declarative state (active overlays):** в controller — `OverlayStore` (instance_id → list[Overlay]).
- **GPU texture cache:** в filter — `GpuOverlayCache` (overlay_id → CUDA texture/array). Lazy-загружается при first render.
- **Live data feed для graphs/chats:** controller pumps данные → renders на CPU (cairo) с rate-limit → upload CUDA texture → notify filter.

### 5.3 Как overlay добирается до filter

**Два канала** в зависимости от типа payload:

1. **Mutable lightweight overlays (rect/text/icon/dim)** — через `process_command`:
   ```
   set_overlay  id=event_42  json={"type":"rect","cell_id":0,"x":0.1,...}
   ```
   Filter parse JSON, кэширует, рендерит. Update — same `set_overlay` с тем же id. Delete — `clear_overlay id=event_42`.

2. **Heavy overlays (image/graph/chat — нужны RGBA pixel buffer'ы)** — через **AVFrame side data**:
   - Controller рендерит на CPU (cairo), upload в shared GPU memory (cuMemAlloc'ed, IPC handle переиспользуется через cuframes-style channel).
   - Side data type: `AV_FRAME_DATA_USER + offset` (custom), payload содержит overlay_id + CUDA IPC handle к texture + dirty_flag.
   - Filter dereference handle (один раз — закэшировано), при `dirty_flag=true` re-upload.

**Альтернатива для side data** — простой Unix socket controller↔filter с протоколом "вот тебе IPC handle к новой текстуре для overlay X". Менее coupled с FFmpeg AVFrame, проще debugging. Рекомендую **второй вариант** для phase 1, AVFrame side data — phase 2 (когда захочется detection bboxes от upstream filter'а напрямую через side data из `vf_detect` или Frigate-bridge).

### 5.4 GPU rendering pipeline для graphs/chats

```
controller event-loop                 sidecar→filter channel              filter (GPU)
─────────────────────                ──────────────────────              ────────────
event → update_graph_data(g, value)
  ↓
graph_renderer.queue(g)
  ↓
[rate-limited — refresh_hz]
cairo.render(g.data) → RGBA buf
  ↓
cuda_upload(buf) → device texture
                                     side-channel:
  ↓                                  texture_updated(overlay=g, handle=H, version=V)
                                     ─────────────────────────────────────────►
                                                                                ↓
                                                                       gpu_cache[g] = H
                                                                       mark_dirty(g)
                                                                                ↓
                                                                       (next frame) blit с alpha
```

Critical detail: **chat и graph живут вне frame timeline**. Они обновляются по event-rate (chat — сообщение пришло, graph — секундный tick). На каждом video frame filter просто blit'ит latest cached texture. CPU не делает работу на каждый кадр.

---

## 6. Side data + overlay producers

**Кто кладёт overlay payload:**

| Source | Path | Example |
|---|---|---|
| Controller (default) | HTTP/MQTT/ZMQ → command router → `set_overlay` | UI клик «дать рамку cam3» |
| Frigate events bridge | `frigate_bridge.py` subscribes MQTT `frigate/+/events` → translate → `set_overlay` | bbox на detection |
| External script | curl POST /overlay/add | crontab "show weather widget at 8am" |
| HA automation | MQTT publish | "при звонке домофона show overlay 'door'" |
| Upstream filter (future) | AVFrame side data `AV_FRAME_DATA_DETECTION_BBOXES` | если detection в FFmpeg graph |

**Frigate bridge — конкретный example:**

```python
# adapters/frigate_bridge.py
async def on_frigate_event(payload):
    ev = json.loads(payload)
    if ev["type"] == "new":
        bbox = ev["after"]["box"]
        await commands.set_overlay(RectOverlay(
            id=f"frigate_{ev['after']['id']}",
            cell_id=camera_to_cell(ev["after"]["camera"]),
            x=bbox[0]/ev["after"]["frame_width"],
            y=bbox[1]/ev["after"]["frame_height"],
            w=bbox[2]/ev["after"]["frame_width"],
            h=bbox[3]/ev["after"]["frame_height"],
            color="#FF0000A0",
            stroke_width=3,
            ttl_ms=2000,
            privacy_tag="frigate_bbox",
        ))
        if ev["after"].get("plate"):
            await commands.set_overlay(TextOverlay(
                id=f"frigate_lpr_{ev['after']['id']}",
                cell_id=camera_to_cell(ev["after"]["camera"]),
                text=ev["after"]["plate"]["text"],
                x=bbox[0]/W, y=bbox[3]/H + 0.02,
                size=18, color="#FFFFFF", background="#000000A0",
                ttl_ms=3000,
                privacy_tag="lpr_text",   # ← privacy_profile=public скроет
            ))
```

---

## 7. Control plane protocols

### 7.1 ZeroMQ flow

**Commands IN:**
- FFmpeg `zmq` filter биндится на `tcp://127.0.0.1:5555` (REP socket).
- Controller — REQ client → filter target → command.
- Формат FFmpeg zmq filter: `<target> <command> <arg>` где target = `instance_id` (filter resolve'ит через `instance_id` option, см. §2.1).

**Events OUT:**
- Controller bind'ит PUB socket `tcp://0.0.0.0:5556`.
- Topic prefix = `event/<category>/<instance>`. Subscribers filter'ят.

### 7.2 MQTT topic taxonomy

```
cuda_grid/cmd/<instance_id>/layout/set                ← set_layout
cuda_grid/cmd/<instance_id>/layout/create             ← новое определение
cuda_grid/cmd/<instance_id>/cell/<n>/bind             ← bind_cell
cuda_grid/cmd/<instance_id>/overlay/set               ← set_overlay (payload=Overlay JSON)
cuda_grid/cmd/<instance_id>/overlay/<id>/clear        ← clear
cuda_grid/cmd/<instance_id>/privacy/set               ← set_privacy_profile

cuda_grid/state/<instance_id>/layout                  ← retained: current layout
cuda_grid/state/<instance_id>/cells                   ← retained: cell→camera mapping
cuda_grid/state/<instance_id>/overlays/<id>           ← retained per overlay
cuda_grid/state/<instance_id>/fps                     ← stats, periodic

cuda_grid/event/<instance_id>/layout_switched         ← non-retained, fact-of-event
cuda_grid/event/<instance_id>/cell_camera_changed
cuda_grid/event/<instance_id>/fps_drop
cuda_grid/event/<instance_id>/overlay_added
cuda_grid/event/<instance_id>/overlay_expired
cuda_grid/event/<instance_id>/inputs_starved
cuda_grid/event/<instance_id>/cell_stale
cuda_grid/event/audio/ducked
cuda_grid/event/audio/restored

homeassistant/select/cuda_grid_<instance>_layout/config           ← HA discovery
homeassistant/sensor/cuda_grid_<instance>_fps/config
homeassistant/binary_sensor/cuda_grid_<instance>_input_alive/config
```

**HA Discovery — конкретный пример (layout selector):**

```json
{
  "name": "Living Room TV Layout",
  "unique_id": "cuda_grid_tv1_layout",
  "command_topic": "cuda_grid/cmd/tv1/layout/set",
  "state_topic":   "cuda_grid/state/tv1/layout",
  "options": ["single","quad","nine_grid","main_plus_preview","custom_3x4"],
  "device": {
    "identifiers": ["cuda_grid_tv1"],
    "name": "CUDA Grid: tv1",
    "manufacturer": "gx/vf-cuda-grid",
    "sw_version": "<runtime version>"
  }
}
```

### 7.3 HTTP REST API

| Endpoint | Method | Body | Purpose |
|---|---|---|---|
| `/instances` | GET | — | список filter-instances + текущее state |
| `/instance/{id}` | GET | — | detailed state |
| `/instance/{id}/layout` | POST | `{name, transition?}` | set_layout |
| `/instance/{id}/cell/{n}/bind` | POST | `{camera_id}` | bind_cell |
| `/instance/{id}/privacy` | POST | `{profile}` | privacy switch |
| `/layouts` | GET | — | список layout definitions |
| `/layouts` | POST | Layout JSON | create/update layout |
| `/layouts/{name}` | DELETE | — | удалить |
| `/instance/{id}/overlays` | GET | — | active overlays |
| `/instance/{id}/overlays` | POST | Overlay JSON | add/replace |
| `/instance/{id}/overlays/{oid}` | DELETE | — | remove |
| `/instance/{id}/chat/{oid}/message` | POST | `{text, color?}` | push в chat overlay |
| `/audio/duck` | POST | `{source, duration_ms, ratio}` | manual ducking |
| `/events` | GET (SSE) | — | streaming events |
| `/health` | GET | — | liveness |
| `/metrics` | GET | — | Prometheus |

OpenAPI schema публикуется автоматически (FastAPI). Endpoints версионируются `/v1/...` с самого начала.

### 7.4 Conflict resolution

**Источники конфликтуют** (HA шлёт `layout=quad`, MQTT-rule шлёт `layout=nine_grid` в 50ms интервале). Политика:

1. **Single command router** — все commands из всех протоколов кладутся в один asyncio.Queue (или Go channel). Сериализация естественная.
2. **Idempotency key** — каждая команда имеет (optional) `cmd_id` (UUID). Дубликаты dropped.
3. **Per-source priority** (конфигурируемый):
   ```
   priority:
     ha_automation: 100
     mqtt:          80
     http:          50
     zmq:           50
     frigate:       30
   ```
   В пределах **same-tick window** (50ms) старшая priority побеждает; младшая — discarded + emit event `command_overridden{by=ha_automation}`.
4. **Locking through `set_priority_lock`** — manual UI lock на N секунд: `POST /instance/tv1/lock {duration=60s}` — только источник с лок-token может менять instance.
5. **Last-write-wins** для overlays с одинаковым `id` (естественно через replace-semantics).

### 7.5 Event taxonomy (publishes наружу)

| Event | When | Payload |
|---|---|---|
| `layout_switched` | после применения | `{from, to, reason, source}` |
| `cell_camera_changed` | bind_cell | `{cell, prev_camera, new_camera}` |
| `overlay_added` | set_overlay (новый id) | `{id, type, instance, cell?}` |
| `overlay_updated` | set_overlay (existing) | `{id, type}` |
| `overlay_expired` | TTL fired | `{id, reason: "ttl"\|"manual"}` |
| `fps_drop` | output_fps < threshold | `{instance, current, expected, since_ms}` |
| `inputs_starved` | все inputs stale | `{instance, last_seen_per_input}` |
| `cell_stale` | один input stale | `{instance, cell, camera, age_ms}` |
| `audio_ducked` | ducking active | `{rule, source, duration_ms}` |
| `audio_restored` | ducking ended | `{rule}` |
| `command_overridden` | conflict-resolution dropped | `{cmd, source, by}` |
| `controller_started` | startup | `{version, instances}` |

Все события публикуются в:
- ZeroMQ PUB `tcp://0.0.0.0:5556` topic `event.<category>.<instance>`
- MQTT `cuda_grid/event/...`
- HTTP `/events` SSE

---

## 8. Audio orchestration

**Architectural stance:** vf_cuda_grid **сам аудио не трогает**. Аудио — стандартные FFmpeg filters (`amix`, `sidechaincompress`, `volume`), controller координирует их **через те же `process_command`** что и video.

**State machine — пример "домофон":**

```
states:
  idle:        music plays @ vol=1.0, no doorbell screen
  ringing:     duck music (vol=0.3), switch tv1 to main_plus_preview с camera_door как main,
               show icon "doorbell" на canvas, audio из cam_door amplified
  ringing_acked: keep grid, music back to 0.8, доорбель квитирован
  cooldown:    24s timer, после — restore music+layout

events triggering:
  on(mqtt:doorbell/ringing): from {idle,cooldown} → ringing
  on(mqtt:doorbell/answered): from ringing → ringing_acked
  on(timer 30s): from ringing → ringing_acked
  on(timer 24s in cooldown): → idle

actions per state:
  ringing:
    - ffmpeg.cmd: "vol_music volume 0.3"
    - ffmpeg.cmd: "sc_music threshold 0.05"   # sidechain compression bumps
    - ffmpeg.cmd: "tv1_grid set_layout main_plus_preview"
    - ffmpeg.cmd: "tv1_grid bind_cell cell=0 camera=4"  # cam_door
    - ffmpeg.cmd: "tv1_grid set_overlay id=doorbell_icon json=..."
    - publish event audio_ducked{rule=doorbell}
    - publish event layout_switched
```

**Rule engine реализация:** YAML-конфиг с состояниями и triggers, parse в `transitions` library (pure-Python state-machine). Простой, тестируемый, не over-engineered.

```yaml
# orchestration_rules.yaml
rules:
  doorbell:
    triggers:
      - on: mqtt
        topic: doorbell/state
        value: ringing

    states:
      ringing:
        enter:
          - ffmpeg_cmd: { target: vol_music, cmd: volume, arg: "0.3" }
          - ffmpeg_cmd: { target: tv1_grid, cmd: set_layout, arg: "main_plus_preview" }
          - overlay: { id: doorbell, type: icon, icon: bell, x: 0.45, y: 0.05, size: 64 }
        exit:
          - ffmpeg_cmd: { target: vol_music, cmd: volume, arg: "1.0" }
          - clear_overlay: doorbell
        timeout: 30s → ringing_acked

      ringing_acked:
        timeout: 24s → idle
```

---

## 9. Multi-instance behaviour

### 9.1 Shared inputs

Через FFmpeg native `split` filter (см. §2.1). Каждый `cuda_grid` instance получает свою копию pointer'а на CUDA frame (refcount++). Bandwidth GPU↔GPU = 0 (только ref-count).

### 9.2 Layout registry — global или per-instance

**Рекомендация — hybrid** (как и в требованиях):

- **Global LayoutRegistry** в FFmpeg process — одна instance per FFmpeg process, owned первым `cuda_grid` filter который ini'ится. Subsequent instances reference тот же registry.
- **Active layout pointer + cell_to_input mapping + overlays** — **per-instance** state.
- **Privacy profile** — per-instance.

**Why global registry:** layout definitions — read-mostly (создаются редко, читаются часто). Sharing экономит память (один nine_grid с 9 cell'ами — 9×structures ×3 instances = 27 структур vs 9 with shared).

**Реализация sharing:** при init filter проверяет process-wide singleton (atomic pointer + ref-count) и либо создаёт, либо подключается. При уничтожении последнего — освобождает. Когда controller hot-reload'ит registry — он шлёт `reload_registry` команду которая broadcast'ится в зарегистрированные instances; они apply'ят (read-write lock на registry, atomic version-bump).

### 9.3 AVFrame ref-counting

Стандартный FFmpeg механизм. После `split` каждый instance получает `AVFrame*` с увеличенным refcount. `av_frame_free()` уменьшает; при 0 → освобождение в frame pool. У CUDA frames — pool reuse (через `hw_frames_ctx`), физическая GPU память не deallocate'ится.

### 9.4 Independent output timing

Каждый instance имеет свой `target_fps` option. Pull-based: filter framework сам вызывает `request_frame` на каждый output. Если tv1 = 25fps, public = 30fps — два независимых rates, inputs shared.

---

## 10. Library choice для controller

### Сравнение

| Aspect | Python (FastAPI + asyncio) | Go (chi/fiber + nats) | Rust (axum + tokio) |
|---|---|---|---|
| Ecosystem MQTT/ZMQ | `paho-mqtt`, `pyzmq`, `aiomqtt` — все proven | `eclipse/paho.mqtt.golang`, `go-zeromq/zmq4` — proven | `rumqttc`, `zmq` — proven, но менее mature |
| HA Discovery integration | Direct МQTT — нет специфик | Same | Same |
| Audio-orchestration rule engine | `transitions` library — отличный | `looplab/fsm` — норм | crate `statig` — норм, меньше docs |
| Cairo/skia для graph rendering | `pycairo`/`cairocffi` — solid | `gioui.org/cairo` экзотика, либо CGO | `cairo-rs` — ok, но больше pain |
| FastAPI + Pydantic схемы | First-class, OpenAPI free | manual schemas или `huma` (новее) | `utoipa` + axum-extra — работает |
| Developer velocity | high — Python для control plane проверенный паттерн | medium — больше boilerplate | low для iterate fast |
| Runtime performance | достаточно (control plane, не hot path) | хорошо | отлично |
| Memory footprint | ~30-50MB | ~10-20MB | ~5-10MB |
| Familiarity в нашей экосистеме | high (paddleocr, frigate-cuframes — все Python) | low | low |
| Single static binary distribution | ✗ (Python deps) | ✓ | ✓ |

### Рекомендация: **Python (FastAPI + asyncio + pydantic + paho-mqtt + pyzmq + transitions)**

**Зачем:**

1. **Hot path не controller-side** — pixel-pushing в filter (C/CUDA). Controller — coordination layer, для него Python ровно. Прирост latency MQTT→ZMQ командой 1-3ms на Python — несущественно при cycle time видео 40ms.
2. **Cairo rendering для graphs/chats** — Python `pycairo` зрелый. В Go это pain.
3. **Frigate community — Python-native**. Bridge plugins, examples, doc-set резко легче adopt'ить.
4. **Iteration speed важнее runtime**. Audio rules, overlay logic, conflict resolution — это бизнес-логика которая меняется. Python переписывается быстро.
5. **Memory footprint 30-50MB** для control plane — не существенно (Frigate сам жрёт ~500MB, NVDEC — гигабайты VRAM).

**Стек deps:**

```
fastapi              # HTTP REST + auto OpenAPI
uvicorn              # ASGI server
pydantic >=2         # domain models, validation
pyzmq                # FFmpeg zmq client + events PUB
aiomqtt              # asyncio MQTT (тонкая обёртка над paho)
paho-mqtt            # fallback для HA discovery если нужен
transitions          # state machines для audio orchestration
pycairo              # graph/chat rendering CPU-side
sse-starlette        # /events SSE
prometheus-client    # /metrics
structlog            # structured logs
typer                # CLI
pyyaml               # rule configs
```

**Distribution:** Docker image + PyPI package (`cuda-grid-controller`). systemd unit в `deploy/`.

**Что бы заставило перейти на Go:** если стало бы 100+ filter-instances и controller стал bottleneck. Сейчас realistic 1-10 instances per host. Будет проблема — extract controller (он чистый по domain) и rewrite в Go без переписки filter. Reversible.

---

## 11. Phases of implementation

### PR-1 — MVP filter (fixed quad)

**Scope:**
- `filter/vf_cuda_grid.c` + minimal CUDA kernels (NV12 region memcpy only)
- Hardcoded quad layout (2×2), 4 inputs, fixed output 1920×1080
- No scaling — assumes inputs already at 960×540
- No overlays, no borders, no labels
- `process_command` skeleton (NOP)
- FFmpeg patch file для n7.1
- CLI usable: `ffmpeg -i cuframes://... [×4] -filter_complex cuda_grid=inputs=4 -c:v h264_nvenc out.mp4`
- Unit tests: kernel-level pixel-perfect tests against known fixtures
- Docker integration test

**Deliverable:** working FFmpeg binary с фильтром. Demo: 4 cam → quad mp4.

### PR-2 — Dynamic layouts + per-cell scaling

**Scope:**
- LayoutRegistry, JSON layout loading (`layout_file=` option)
- Cell scaling — NPP integration (NV12 bilinear)
- Borders + background fill + `fit` modes (cover/contain/stretch)
- `process_command`: `set_layout`, `bind_cell`, `swap_cells`
- Cell-camera binding state
- All 9 predefined layouts (соответствующих текущему `grids.json`)
- Cross-fade transition (linear blend) — opt-in через command
- Lock-step input policy: `keep_last` + `stale_timeout_ms`

**Deliverable:** layout switching из CLI/FFmpeg `sendcmd`. Visual transitions.

### PR-3 — Controller skeleton

**Scope:**
- `controller/` package: FastAPI app, MQTT subscribe/publish, ZMQ client (commands)+ pub (events)
- HA Discovery payloads для layout select + fps sensor + input_alive binary_sensor
- Command Router + Conflict Resolver (single-source-of-truth queue)
- State Store (in-memory + JSON snapshot)
- HTTP endpoints для `/layouts`, `/instance/.../layout`, `/instance/.../cell/.../bind`
- `/events` SSE
- ZMQ PUB events outbound
- Filter publishes events back: добавить в filter side `zmq_publish_event` (TX socket в parallel к command RX) — простая JSON-line на тот же socket с тегом
- Docker compose: filter + controller + mosquitto + HA mock
- Integration test: HTTP POST /layout/set → MQTT state retained → HA reflects

**Deliverable:** Production-grade control plane без overlay'ев. Frigate-bridge `examples/`.

### PR-4 — Overlays: rect/text/icon basics

**Scope:**
- `OverlayStore` in controller (in-mem, MQTT retained sync)
- CUDA kernels: alpha-blit RGBA texture → NV12, glyph blit
- Text rendering: **CPU-precomputed glyph atlas** (cairo render glyphs at startup → CUDA texture). Не Pango на каждый кадр.
- Icon rendering: preloaded sprite sheet (configured iconpack)
- `process_command`: `set_overlay`, `clear_overlay` (JSON через `arg`)
- TTL handling (controller-side timers)
- Privacy profile filtering
- Frigate bridge sample: detection bbox → rect+text overlay с TTL

**Deliverable:** Frigate detections visible в mosaic с privacy controls.

### PR-5 — Image/dim/graph/chat overlays

**Scope:**
- Image overlay: lazy-load по path, upload в CUDA texture cache
- Dim overlay: simple luma-multiply kernel
- Graph overlay: cairo CPU rendering, IPC-handle channel controller↔filter
- Chat overlay: rolling message list, cairo рендеринг + fade-out per line
- Side-channel protocol: Unix socket для texture handles (см. §5.3)
- HTTP endpoint `/instance/.../chat/.../message`
- HA Discovery: добавить `sensor` для текущей graph value
- Performance bench: 8 overlays @ 25fps на RTX 5090 — target <0.5ms compose time

**Deliverable:** Полный overlay-роудмап. Weather widget + LPR scroll + motion timeline в production.

### PR-6 — Audio orchestration

**Scope:**
- `orchestration/audio_orchestrator.py` — state machine engine (transitions library)
- YAML rule configs (`orchestration_rules.yaml`)
- FFmpeg ZMQ client расширяется на audio targets (`amix`, `volume`, `sidechaincompress`)
- 3 example rules: doorbell, baby-cry detection, motion-night
- `audio_ducked` / `audio_restored` events
- HTTP `/audio/duck` для manual triggers
- E2E test: simulated MQTT doorbell event → music ducks + grid switches + overlay shown + restore

**Deliverable:** Полная control plane platform. Closes [gx/cctv#22](https://git.goldix.org/gx/cctv/issues/22) Phase 4.

---

## 12. Risks / open questions

| # | Risk | Mitigation / current stance |
|---|---|---|
| R1 | FFmpeg `process_command` arg — single string; сложные overlays нуждаются в JSON. Limit на длину? | FFmpeg AVOption parsing допускает строки до ~4KB. Большие payloads — через side-channel (см. §5.3). Доc'нём ограничение в filter README. |
| R2 | Upstream FFmpeg: примут ли `vf_cuda_grid`? | Не ставим upstream на critical path. Доходим до v0.3 (mature) — submit PR. До этого — out-of-tree patch (как cuframes уже делает с `cuframesdec.c`). |
| R3 | NPP licensing — proprietary, CUDA-bundled | NPP идёт с CUDA Toolkit, который и так нужен. Не нарушает LGPL фильтра (filter — LGPL, NPP — dynamic link). |
| R4 | Glyph atlas: какие шрифты shipped? | DejaVu (free) preloaded; пользователь может override через config. Юникод — full BMP плюс CJK по запросу (lazy). |
| R5 | Multi-GPU не поддерживается v0.1 (consistent с cuframes) | Документируем. Multi-GPU = v0.3+. |
| R6 | Lock-step при сильно разных fps inputs (cam@10fps + cam@30fps) | Tolerance window per-input configurable. Slow camera gets `keep_last`. Документируем как known constraint. |
| R7 | Filter perf на 16-cell layout @ 4K | Бенчмарк в PR-5. Если не укладывается в frame budget — на тяжёлых composition'ах рекомендуем output 1080p (downscale 4K cameras уже в `cuda_grid`). |
| R8 | Cuframes-IPC + filter sharing — interaction с frame pool lifecycle | Проверить что `hw_frames_ctx` от cuframes-demuxer'а correctly ref-share'ится через `split` → filter. Risk integration test обнаружит. |
| R9 | controller restart drops live overlay state | MQTT retained для declared overlays + JSON snapshot для local cache. Restart восстанавливает из обоих. Ephemeral TTL overlays — теряются, что acceptable. |
| R10 | Conflict-resolution priority — UX-проблема (юзер не понимает почему его команда no-op) | Каждый dropped command emit'ит event `command_overridden{by=...}` — UI/log виден. Доc'нём priority defaults. |
| R11 | text/glyph rendering CJK/RTL | Phase 1 — LTR Latin/Cyrillic. CJK/RTL — open question, в v0.4 (после Phase 6). |
| R12 | Cross-fade при разных aspect/size между layouts | Cross-fade — pixel-blend двух уже-композированных canvas, размеры одинаковые (output_size фиксирован per instance). Безопасно. |
| R13 | Audio orchestration — race conditions между MQTT events | Single asyncio queue + state machine = serialized. State machine library `transitions` thread-safe. |

**Open questions для обсуждения с командой:**

1. **CUDA min version.** Cuframes v0.1 нацелен на CUDA 12+. Sticking с этим или 11.8 для wider compat? **Stance: CUDA 12+, явно declared.**
2. **MQTT vs NATS** для events? MQTT — universal (HA, Frigate); NATS — perf. **Stance: MQTT primary (ecosystem fit), NATS не нужен сейчас.**
3. **Layout DSL — JSON vs YAML?** Текущий cctv-processor — JSON. **Stance: JSON для machine-generated/REST-API, YAML для human-edited rules. Schema один, два serialization frontend'а через pydantic.**
4. **Glyph atlas vs FreeType-on-CUDA?** **Stance: атлас (proven, simple). FreeType-on-CUDA — слишком экзотика для v0.1.**
5. **License: LGPL vs MIT?** Filter inherits LGPL от FFmpeg (он LGPL). Controller — отдельная codebase, MIT. **Stance: dual — `filter/` LGPL-2.1+, `controller/` MIT.** Чётко documented в LICENSE и subdir LICENSE files.
6. **Frigate camera ID space.** Frigate camera names — strings, наш `camera_id` — int. **Stance: controller хранит mapping name↔index, abstraction layer.**

---

## 13. Migration path для cctv-processor

**Текущий cctv-processor pipeline:**

```
cuframes → cv::Mat (GPU→host download) → GridComposer (CPU OpenCV) →
swscale → host→GPU upload (для Frigate detect) → h264_nvenc → RTSP-out
```

**После миграции на vf_cuda_grid:**

```
cuframes://cam[1..4] ─►
                       ┌─ split=2 ─► cuda_grid=instance_id=tv (layout=main_plus_preview) ─► h264_nvenc ─► RTSP-tv
                       └─        ─► (preview-only, не масштабируется до Frigate)
```

`cctv-processor` сам **исчезает как frame-processor**. Что остаётся:

- `GridManager` business logic (auto-switching по motion, priority switching, history) — **перевозится в controller** как orchestration rules (Python).
- `SnapshotManager` — становится отдельным маленьким FFmpeg pipe который читает same output (HD-snap каждые 5 мин).
- `EventSystem` internal bus — упраздняется, заменяется на ZMQ events.
- REST API endpoints `cctv-processor` — становятся endpoints в `cuda-grid-controller` (миграция URL-схемы 1:1 + redirects).
- `cameras.json`, `grids.json`, `analytics.json` — **остаются**, читаются controller'ом + translate в наш `layouts.json` schema (compat-конвертер).

### Migration steps

1. **PR-A в cctv repo:** добавить compat layer — cctv-processor может вытаскивать composed frames через `cuframes_packets://` от vf_cuda_grid output. Pure shadow-mode, обе системы работают параллельно, output идентичен (compare-test).
2. **PR-B в localhost-infra:** docker-compose добавляет `cuda-grid-controller` + `ffmpeg-mosaic` containers рядом с cctv-processor.
3. **Production cutover:** TV stream switch из cctv-processor RTSP → ffmpeg-mosaic RTSP. cctv-processor остаётся для snapshots + analytics (короткое время).
4. **PR-C:** snapshots переезжают на отдельный slim FFmpeg pipe.
5. **PR-D:** analytics (motion events) от Frigate напрямую в controller (Frigate bridge). cctv-processor decommissioned. Closes [gx/cctv#22](https://git.goldix.org/gx/cctv/issues/22) Phase 4.

### Compat-конвертер `grids.json` → `layouts.json`

Простой Python скрипт в `examples/cctv-processor-migrate/`. Field mapping:

| old `grids.json` | new `layouts.json` |
|---|---|
| `grid_templates.<name>.cells[].camera_id` | runtime state (per-instance cell binding), извлекается отдельно |
| `cells[].x/y/width/height` | `cells[].x/y/w/h` (то же 0..1) |
| `border/label` | в overlay-template или defaults |
| `motion_indication` | overlay-template `motion_indicator` (rect с alpha, controller добавляет на motion event) |
| `transition_settings.duration_ms` | `set_layout_transition` arg |
| `default_grid` | `instances.<inst>.default_layout` |

---

## 14. Overlap с `cctv-processor` MQTT plugin (gx/cctv#24)

**TL;DR — это один и тот же controller.** Не два.

### Анализ overlap

| Feature | cctv #24 MQTT plugin | cuda-grid-controller |
|---|---|---|
| MQTT subscribe для layout switch commands | ✓ | ✓ |
| HA Discovery для layout selector | ✓ | ✓ |
| MQTT state publishing | ✓ | ✓ |
| Events publishing наружу | ✓ | ✓ |
| Internal event bus integration | EventSystem (C++ in-process) | ZMQ events |
| HTTP REST | partial | ✓ |
| Audio orchestration | — | ✓ |
| Overlay control | — | ✓ |

Overlap составляет 60-70%. И — самое важное — cctv-processor мигрирует на vf_cuda_grid (см. §13). После миграции cctv-processor больше не нуждается в собственном MQTT plugin: его функция уже в `cuda-grid-controller`.

### Рекомендация

1. **`gx/cctv#24` — не строить standalone.** Превращается в эпик "Adopt cuda-grid-controller MQTT/HA-Discovery для cctv-processor mosaic management" с зависимостью от vf-cuda-grid PR-3.
2. **cuda-grid-controller — становится канонический control plane** для grid-style video composition в нашей экосистеме. Дизайнится с учётом, что в перспективе может управлять не только vf_cuda_grid, но и другими grid-producer'ами (любая FFmpeg pipeline со standard ZMQ command-target'ами). Это **уже верный design** — controller talks к filter через generic ZMQ протокол, не специфичный для cuda_grid.
3. **Extract path:** когда controller дозреет (≥ Phase 3) и появится 2-й consumer (например GStreamer pipeline в другом продукте) — extract в отдельный repo `gx/grid-controller`. До этого живёт в `gx/vf-cuda-grid/controller/`. Reversible move без рerпройдiging API.

### Что меняется в cctv#24

Issue переформулируется:

> ~~"Add MQTT plugin to cctv-processor"~~ →
> **"Migrate cctv-processor mosaic management to `cuda-grid-controller` (depends on `gx/vf-cuda-grid` PR-3)"**

Acceptance: cctv-processor либо decommissioned (§13 happy path), либо его mosaic-control logic вызывает HTTP API нашего controller (compatibility-mode).

---

## Next steps (ordered, ready to start)

1. Open issue `gx/vf-cuda-grid#1` ("Design accepted") — paste этот документ как issue body.
2. Create repo `gx/vf-cuda-grid` с скелетом (README, ROADMAP, LICENSE-dual, empty `filter/`, `controller/`, `schema/`, `examples/`, `docs/`).
3. Update `gx/cctv#24` — переформулировать как "depends on vf-cuda-grid PR-3", закрыть как standalone scope.
4. Update `gx/cuframes` ROADMAP "Future ideas → `vf_cuda_grid`" — пометить как **moved to** `gx/vf-cuda-grid` repo.
5. Start PR-1 (MVP filter, fixed quad) — отдельный issue/branch.

---

**Relevant files reviewed for this design:**
- `/home/claude/projects/cctv/cpp/apps/cctv-processor/include/grid/GridComposer.h`
- `/home/claude/projects/cctv/cpp/apps/cctv-processor/include/grid/GridManager.h`
- `/home/claude/projects/cctv/cpp/apps/cctv-processor/config/grids.json`
- `/home/claude/projects/cuframes/ROADMAP.md` (section "Future ideas → `vf_cuda_grid`")
- `/home/claude/projects/cuframes/README.md`
- `/home/claude/projects/cuframes/filter/cuframesdec.c` (existing out-of-tree FFmpeg patch pattern — model для нашего)