Files

T

gx 8a6afa53b3 initial: README + design document (architect-reviewed)

Design document (1124 строки) от ai-systems-architect — покрывает:
- High-level architecture (filter + sidecar + protocols)
- Component design + CUDA composition algorithm
- Layout DSL + dynamic creation
- Overlay system (7 types — rect/text/icon/image/dim/graph/chat)
- Control plane (ZMQ/MQTT/HTTP/HA Discovery, commands IN + events OUT)
- Audio orchestration (domofon ducking use case)
- Multi-instance behaviour (shared inputs, per-screen layout)
- Library choice — Python (FastAPI + asyncio)
- 6 phases implementation plan
- Migration path для cctv-processor (closes gx/cctv#22 Phase 4)
- Overlap analysis с gx/cctv#24 (superseded by cuda-grid-controller)

README — short описание + use cases + architecture diagram + phase table.

Implementation начнётся после ratification design'а и Phase 1 issue.

2026-05-19 20:36:47 +01:00

64 KiB

Raw Blame History

Design: `vf-cuda-grid` — GPU-native video grid composer with control plane sidecar

Repo (рекомендуемое имя): gx/vf-cuda-grid

Альтернативы которые рассматривал:

gx/vf_cuda_grid — соответствует именованию FFmpeg-фильтра, но дефис в repo-name удобнее для URL и CLI. Reject.
gx/cuda-grid — короче, но скрывает что это FFmpeg-filter (а не standalone tool). Reject.
gx/ffmpeg-cuda-grid — точное описание, но префикс ffmpeg- намекает на fork всего FFmpeg, а у нас фильтр-патч. Reject.
gx/vf-cuda-grid ✅ — vf- префикс — это конвенция FFmpeg video-filter (как vf_scale_cuda, vf_overlay_cuda), сразу понятно что это; дефис — repo-friendly.

Repo содержит all three components (filter source + sidecar + docs/examples) — это monorepo product. Дробить на 2-3 repo рано: компоненты тесно связаны по протоколу, и Phase 1-3 удобнее ревьюить совместно. Когда controller станет multi-product (см. §14 — overlap с gx/cctv#24), его можно extract'нуть в gx/grid-controller (sub-stable surface).

1. High-level architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                              HOST PROCESS (FFmpeg)                            │
│                                                                               │
│  cuframes://cam1 ─┐                                                            │
│  cuframes://cam2 ─┼─►┌──────────────────┐                                      │
│  cuframes://cam3 ─┤  │   vf_cuda_grid   │  ───►  scale_cuda? ──►  h264_nvenc  │
│  cuframes://cam4 ─┘  │   (instance #1)  │                            │        │
│                   ├─►│   target: TV-1   │                            ▼        │
│                   │  └────────▲─────────┘                       RTSP/SRT      │
│                   │           │ side data (overlays)                          │
│                   ├─►┌────────┴─────────┐                                      │
│                   │  │   vf_cuda_grid   │  ───►  h264_nvenc  ──►  RTSP        │
│                   │  │   (instance #2)  │                                      │
│                   │  │   target: TV-2   │                                      │
│                   │  └────────▲─────────┘                                      │
│                   │           │                                                │
│                   ├─►┌────────┴─────────┐                                      │
│                   │  │   vf_cuda_grid   │  ───►  h264_nvenc  ──►  WebRTC      │
│                   │  │   (instance #3)  │  (privacy-public)                    │
│                   │  └────────▲─────────┘                                      │
│                   │           │                                                │
│                   │  ┌────────┴────────┐                                       │
│                   │  │   zmq filter    │◄─── tcp://127.0.0.1:5555 (commands)  │
│                   │  └─────────────────┘                                       │
│                   │                                                            │
│   audio ──────────┴─►  amix / sidechaincompress (стандартные FFmpeg filters)  │
└────────────────────────────────────────────────────────────────────▲──────────┘
                                                                     │
                                                          process_command via zmq
                                                                     │
┌────────────────────────────────────────────────────────────────────┴──────────┐
│                       cuda-grid-controller (sidecar)                           │
│                                                                               │
│  ┌────────────────────┐   ┌────────────────────┐   ┌────────────────────┐    │
│  │ HTTP/REST + SSE    │   │ MQTT (paho)        │   │ ZeroMQ pub/sub     │    │
│  │ (FastAPI/aiohttp)  │   │ + HA Discovery     │   │ events outbound    │    │
│  └─────────┬──────────┘   └─────────┬──────────┘   └─────────▲──────────┘    │
│            └──────────────┬─────────┘                         │               │
│                           ▼                                   │               │
│                  ┌────────────────────┐                       │               │
│                  │  Command Router    │──────────────────────►│               │
│                  │  (idempotency,     │   serialised events   │               │
│                  │   conflict-resol)  │                       │               │
│                  └─────────┬──────────┘                       │               │
│                            ▼                                  │               │
│  ┌────────────────────┐  ┌─────────────────────┐  ┌──────────┴────────────┐  │
│  │ State Store        │  │ Layout Registry     │  │ Event Bus (internal)  │  │
│  │ (in-mem +          │  │ (global, versioned) │  │ (asyncio.Queue / Go   │  │
│  │  JSON snapshot)    │  │                     │  │  channel)             │  │
│  └────────────────────┘  └─────────────────────┘  └───────────────────────┘  │
│                            ▼                                                  │
│                  ┌────────────────────┐                                       │
│                  │ FFmpeg ZMQ Client  │──► tcp://127.0.0.1:5555 (commands)   │
│                  │  + Side-Data Pusher│──► AVFrame side-data injection       │
│                  └────────────────────┘    (через named pipe / shared mem)   │
└───────────────────────────────────────────────────────────────────────────────┘

State ownership:

Что	Owner	Persistence	Почему
Active layout per instance	filter (in-process)	none	hot path, нужно низкое latency
Layout registry (definitions)	controller	JSON-snapshot + version'ed in-mem	переживает restart FFmpeg, может share между filter-instances
Camera roster per instance	controller	JSON	runtime modify, push to filter via cmd
Overlay state (live elements)	controller	in-mem	ephemeral, восстанавливается с events
Audio routing state	controller	JSON	сложная state machine, нужно auditable
Statistics (fps, switches, errors)	controller (aggregated)	in-mem + Prometheus	observability

Кто кому owner: controller — single source of truth для declared intent (что должно быть). Filter — owner executing state (что сейчас работает). При рассинхроне controller reconcile'ит, filter — нет.

Repo structure:

gx/vf-cuda-grid/
├── README.md
├── ROADMAP.md
├── LICENSE                    # LGPL (для filter — наследуется от FFmpeg) +
│                              # MIT (для controller — отдельный LICENSE-controller)
├── CHANGELOG.md
├── filter/                    # vf_cuda_grid.c + CUDA kernels + FFmpeg patch
│   ├── vf_cuda_grid.c
│   ├── vf_cuda_grid_kernels.cu
│   ├── vf_cuda_grid_overlays.cu
│   ├── ffmpeg-7.1-vf_cuda_grid.patch
│   └── README.md              # как применить patch к ffmpeg-patched
├── controller/                # cuda-grid-controller
│   ├── pyproject.toml         # (или go.mod если Go — см. §10)
│   ├── src/cuda_grid_controller/
│   │   ├── api/               # HTTP, MQTT, ZMQ entrypoints
│   │   ├── domain/            # layouts, overlays, state — pure
│   │   ├── adapters/          # ffmpeg-zmq, mqtt, ha-discovery
│   │   └── orchestration/     # audio rules, event handlers
│   └── tests/
├── schema/                    # JSON-schema для layouts, overlays, events
│   ├── layout.schema.json
│   ├── overlay.schema.json
│   └── event.schema.json
├── examples/
│   ├── frigate-bridge/        # Frigate MQTT events → grid commands
│   ├── cctv-processor-migrate/
│   └── home-assistant/
├── docs/
│   ├── architecture.md        # this document, refined
│   ├── filter-api.md
│   ├── controller-api.md
│   ├── layout-dsl.md
│   ├── overlay-protocol.md
│   └── audio-orchestration.md
└── deploy/
    ├── docker/
    └── systemd/

Главный архитектурный принцип: filter знает только про композицию пикселей в текущем кадре; всё что про "почему такой layout, чьи events, какие правила" — в controller. Это позволяет filter оставаться small, fast, upstreamable, а всю бизнес-логику итерировать в Python/Go без пересборки FFmpeg.

2. Component design

2.1 `vf_cuda_grid` filter API

Filter declaration:

const AVFilter ff_vf_cuda_grid = {
    .name          = "cuda_grid",
    .description   = NULL_IF_CONFIG_SMALL("GPU-native multi-input grid composer"),
    .priv_class    = &cuda_grid_class,
    .priv_size     = sizeof(CudaGridContext),
    .flags         = AVFILTER_FLAG_DYNAMIC_INPUTS | AVFILTER_FLAG_HWDEVICE,
    .nb_inputs     = 0,                              // declared via options
    .nb_outputs    = 1,
    FILTER_QUERY_FUNC(query_formats),                // AV_PIX_FMT_CUDA only
    FILTER_INPUTS(NULL),                             // dynamic
    FILTER_OUTPUTS(cuda_grid_outputs),
    .process_command = process_command,              // runtime control
};

Filter options (CLI):

cuda_grid=
  inputs=4:                          # сколько inputs (как у hstack/xstack)
  layout=quad:                       # initial layout name (из registry или inline)
  output_size=1920x1080:
  layout_file=/etc/grids.json:       # static layout registry
  instance_id=tv-living-room:        # unique ID этого instance в graph
  zmq_addr=tcp://127.0.0.1:5555:     # whose process_command'ы слушать
  privacy_profile=public:            # named profile фильтрующий overlays
  fps=25                             # output fps (resampling если inputs разные)

Multi-instance в одном filter_complex: каждый cuda_grid имеет instance_id — controller адресует команды по нему. Inputs shared через FFmpeg native split filter:

[cam1][cam2][cam3][cam4]split=4[a1][b1][c1][d1][a2][b2][c2][d2][a3][b3][c3][d3];
[a1][b1][c1][d1]cuda_grid=inputs=4:instance_id=tv1:layout=quad[out1];
[a2][b2][c2][d2]cuda_grid=inputs=4:instance_id=tv2:layout=nine_grid[out2];
[a3][b3][c3][d3]cuda_grid=inputs=4:instance_id=public:privacy_profile=public:layout=quad[out3]

split уже работает с CUDA frames (по ref-count'у). Декодируем 4 камеры один раз, ref-share в 3 instances.

process_command API (runtime):

Command	Args	Side effect
`set_layout`	`name=<layout_name>`	сменить активный layout у instance
`set_layout_transition`	`name=<>;duration_ms=500;type=fade`	переключение с cross-fade
`bind_cell`	`cell=<n>;camera=<input_idx>`	per-cell camera→input mapping
`swap_cells`	`a=<n>;b=<m>`	поменять две ячейки местами
`set_privacy_profile`	`profile=<name>`	переключить фильтрацию overlays
`set_overlay`	`id=<>;json=<...>`	add/replace overlay
`clear_overlay`	`id=<>`	delete overlay
`clear_all_overlays`	`cell=<n>?`	flush либо все, либо конкретной cell
`set_text`	`id=<>;text=<utf8>`	быстрый shortcut для text-overlay update

Commands передаются через FFmpeg zmq filter в filter graph (он routes по filter target). Filter implement'ит process_command (FFmpeg signature: int (*process_command)(AVFilterContext*, const char *cmd, const char *arg, char *res, int res_len, int flags)).

Ограничение FFmpeg process_command: args — single string. Для сложных overlay payloads используем JSON-string в arg, либо AVFrame side data для bulk-данных (см. §5).

Internal context:

typedef struct CudaGridContext {
    const AVClass *class;
    
    // config
    int nb_inputs;
    int output_w, output_h;
    char *instance_id;
    char *layout_file;
    char *zmq_addr;
    char *privacy_profile;
    
    // CUDA
    AVBufferRef *hw_device_ref;       // CUDA device
    AVBufferRef *hw_frames_ref;       // output frames pool
    CUcontext cu_ctx;
    CUstream cu_stream;
    
    // layouts
    LayoutRegistry *registry;          // shared (см. §9)
    Layout *active_layout;             // current
    Layout *prev_layout;               // для cross-fade
    LayoutTransition transition;       // active transition state
    
    // per-cell camera binding (cell_idx → input_idx)
    int cell_to_input[MAX_CELLS];
    
    // overlays
    OverlayStore *overlays;            // per-instance state
    GpuOverlayCache *gpu_cache;        // pre-uploaded textures, glyph atlases
    
    // input frame queue (lock-step buffering — см. §3)
    InputFrameQueue queues[MAX_INPUTS];
    int64_t last_pts;
    
    // statistics
    GridStats stats;
} CudaGridContext;

2.2 `cuda-grid-controller` sidecar — модули

controller/src/cuda_grid_controller/
├── domain/
│   ├── layout.py             # Layout, Cell, LayoutRegistry — pure value-objects (pydantic)
│   ├── overlay.py            # Overlay primitives — pydantic discriminated union
│   ├── instance.py           # FilterInstance — state of one cuda_grid filter
│   ├── audio_state.py        # AudioRoutingState, DuckingRule
│   └── events.py             # Event taxonomy (см. §7)
├── api/
│   ├── http/                 # FastAPI app, SSE endpoint
│   ├── mqtt/                 # paho-mqtt handlers + HA Discovery
│   ├── zmq/                  # asyncio pyzmq pub-socket (events out)
│   └── schemas/              # request/response models
├── adapters/
│   ├── ffmpeg_zmq_client.py  # отправка process_command в FFmpeg
│   ├── side_data_pusher.py   # AVFrame side data — через named pipe / unix socket
│   ├── ha_discovery.py       # MQTT discovery config publisher
│   └── frigate_bridge.py     # subscribe Frigate MQTT events → translate
├── orchestration/
│   ├── command_router.py     # routes commands из всех источников в один pipeline
│   ├── conflict_resolver.py  # см. §7 — last-write-wins + per-source priority
│   ├── audio_orchestrator.py # state machine: domofon→duck→swap_screen→restore
│   └── overlay_lifecycle.py  # TTL, throttling, debouncing для charts/chats
├── stores/
│   ├── memory.py             # MemoryStateStore (default)
│   └── redis.py              # RedisStateStore (multi-controller HA, future)
└── cli.py                    # entrypoint, config loading

Разделение responsibilities:

domain/ — pure Python, no I/O, тестируется без mocks. Pydantic-models с validation.
api/ — три entrypoint'а (HTTP, MQTT, ZMQ-pub), все нормализуют входной command в внутренний Command DTO и кладут в command_router.
orchestration/ — слышит events, применяет правила, генерирует commands.
adapters/ — outbound side (в FFmpeg и в Frigate). Может быть mocked в tests.

Hot reload config: controller подписан на signal SIGHUP → re-read JSON конфигов layouts/cameras без restart. State в памяти не теряется.

3. Composition algorithm

3.1 CUDA pipeline для одного output frame

для каждого output frame (на target fps):
  1. собрать N input frames (по input queues, см. lock-step ниже)
  2. allocate output AVFrame (NV12, CUDA) из hw_frames_ref pool
  3. clear background (CUDA memset → background_color)
  4. for each visible cell c in active_layout:
       input_idx = cell_to_input[c.cell_idx]
       in_frame = input_frames[input_idx]
       src_rect, dst_rect = compute_rects(in_frame, c, output_w, output_h)
       
       if c.size == in_frame.size and c.no_scale:
           # fast-path: pure NV12 region memcpy (Y и UV planes отдельно)
           cuMemcpy2DAsync(...)
       else:
           # scale-blit: nppiResizeYUV или custom bilinear kernel
           launch_scale_kernel(in_frame, out_frame, src_rect, dst_rect, stream)
       
       if c.border or motion_indication:
           launch_border_kernel(out_frame, c.rect, c.color, c.width, stream)
  
  5. apply overlays (см. §5):
       sort overlays by z_order
       for each overlay o:
           if o.cell_filter and c not in matching cells: skip
           if privacy_profile_excludes(o): skip
           launch_overlay_blit_kernel(out_frame, o.gpu_texture, o.rect, o.alpha, stream)
  
  6. if transition.active:
       blend(prev_layout_frame, current_layout_frame, t) — линейное мixing двух proxy buffers
  
  7. cuStreamSynchronize (или semaphore с следующим filter — см. §3.4)
  8. push out_frame в output

3.2 CUDA kernels (что писать, что брать готовое)

Operation	Implementation	Зачем custom (если custom)
NV12 region memcpy (no-scale)	`cuMemcpy2DAsync` × 2 (Y + UV planes)	стандартно
Bilinear scale NV12 → NV12	NPP `nppiResize_8u_C1R` для Y, отдельно UV	NPP — NVIDIA-provided, оптимально; не пишем сами
Border drawing	custom kernel (rect outline)	мелочь, NPP overkill
Background clear	`cuMemsetD8Async` per plane	стандартно
Alpha-blit RGBA texture → NV12	custom kernel	конверсия RGBA→YUV + alpha-mix, NPP не делает напрямую; ~80 строк CUDA
Glyph blit (text)	custom kernel: 8-bit alpha mask + color	text — alpha-only текстура, blit с color tint
Dim/darken area	custom kernel: multiply Y by factor	trivial
Linear blend (cross-fade)	custom kernel: `out = aframe1 + (1-a)frame2`	trivial

Принцип: где есть NPP — берём NPP (NVIDIA уже оптимизировала). Где нет — ~50-100-строчные kernels с unit tests на known fixtures.

3.3 Lock-step inputs — что если не все inputs прибыли

FFmpeg filter API — pull-based: framework сам забирает frame через ff_inlink_consume_frame(). Но при N inputs они могут идти рассинхронно (камеры в RTSP не synced).

Стратегия — adaptive PTS bucket'ing:

Каждый input имеет ring queue (size = max_input_lag_frames, default 4).
Output frame генерируется по target_fps (например 25 fps → каждые 40ms).
Для каждого output frame: для каждого input берём frame с PTS ближайшим к output_pts ± tolerance (default ±20ms).
Если для input нет свежего frame → stale-frame policy (configurable):
- keep_last (default) — рендерим last seen frame этого input
- black — заливаем cell черным + label "NO SIGNAL"
- freeze_and_warn — keep_last + красная рамка после stale_timeout_ms
Если все inputs stale → publish event inputs_starved, output продолжается с black background.

Why pull-based work fine here: output PTS — наш wall-clock; inputs — лучшие из доступных. Это hard real-time scenario, не lossless mux.

Per-cell stale event: controller получает cell_stale{cell=N, camera=cam2, last_seen_ms=3400} → MQTT publish, HA reagирует "камера 2 не отвечает".

3.4 Async vs sync CUDA execution

Каждый instance имеет свой CUstream (parallel composition между instances).
Внутри одного instance: все kernels на одном stream + asynchronous, финальный cuEventRecord → output AVFrame получает AVCUDADeviceContextInternal::ready_event (FFmpeg уже умеет это в hwcontext_cuda).
Следующий filter (scale_cuda, h264_nvenc) делает cuStreamWaitEvent — настоящий zero-copy GPU pipeline без CPU блока.

4. Layout DSL

4.1 JSON Schema (layouts.json)

{
  "version": "1",
  "schema": "https://git.goldix.org/gx/vf-cuda-grid/schema/layout.schema.json",
  "defaults": {
    "background": "#000000",
    "border": { "width": 2, "color": "#FFFFFF" },
    "label": { "enabled": true, "position": "top_left", "font_size": 16 }
  },
  "layouts": {
    "quad": {
      "title": "2×2 quad",
      "type": "predefined",
      "cells": [
        { "id": 0, "x": 0.0, "y": 0.0, "w": 0.5, "h": 0.5 },
        { "id": 1, "x": 0.5, "y": 0.0, "w": 0.5, "h": 0.5 },
        { "id": 2, "x": 0.0, "y": 0.5, "w": 0.5, "h": 0.5 },
        { "id": 3, "x": 0.5, "y": 0.5, "w": 0.5, "h": 0.5 }
      ]
    },
    "main_plus_preview": {
      "title": "Main + 3 previews",
      "type": "predefined",
      "cells": [
        { "id": 0, "x": 0.00, "y": 0.00, "w": 0.75, "h": 1.00, "role": "main" },
        { "id": 1, "x": 0.75, "y": 0.00, "w": 0.25, "h": 0.33, "role": "preview" },
        { "id": 2, "x": 0.75, "y": 0.33, "w": 0.25, "h": 0.33, "role": "preview" },
        { "id": 3, "x": 0.75, "y": 0.66, "w": 0.25, "h": 0.34, "role": "preview" }
      ]
    },
    "custom_3x4_with_dim": {
      "title": "Mixed",
      "type": "user",
      "cells": [
        { "id": 0, "x": 0.00, "y": 0.00, "w": 0.50, "h": 0.50,
          "z_index": 0, "cell_overlays": ["dim_if_no_motion"] },
        { "id": 1, "x": 0.50, "y": 0.00, "w": 0.50, "h": 0.50,
          "fit": "contain", "background": "#101010" }
      ]
    }
  },
  "default_layout": "quad",
  "instances": {
    "tv-living-room":   { "default_layout": "main_plus_preview", "privacy_profile": "private" },
    "tv-kitchen":       { "default_layout": "quad",              "privacy_profile": "private" },
    "public-stream":    { "default_layout": "quad",              "privacy_profile": "public"  }
  },
  "privacy_profiles": {
    "private": { "overlays_allow": ["*"] },
    "public":  { "overlays_deny": ["lpr_text", "face_name", "person_count"] }
  }
}

Cell fields:

id (int) — slot в layout, на который controller bind_cell шлёт camera.
x,y,w,h (float 0..1) — нормализованные координаты (consistent с текущим GridComposer).
role (string, optional) — semantic hint (main, preview, pip). Controller использует для auto-selection.
fit (cover | contain | stretch, default cover) — поведение при aspect mismatch.
background (hex) — заливка cell за пределами scaled frame (при contain).
z_index (int) — для overlapping layouts (PiP).
cell_overlays (string[]) — overlay-templates применяемые автоматически.

Camera binding — не в layout. Layout определяет геометрию; camera→cell mapping — runtime state хранимый в controller per-instance. При set_layout controller bind_cell mapping автоматически: если layout имеет cells [0,1,2,3] и camera roster [cam_front, cam_yard, cam_door, cam_garage] — bind by index. User может override.

4.2 In-memory representation (filter side)

typedef struct LayoutCell {
    int id;
    float x, y, w, h;
    int z_index;
    uint32_t bg_color;
    uint32_t border_color;
    int border_width;
    int fit_mode;                    // CELL_FIT_COVER/CONTAIN/STRETCH
    // resolved pixel rect (cached, invalidated при change output_size)
    int px_x, px_y, px_w, px_h;
} LayoutCell;

typedef struct Layout {
    char name[64];
    int nb_cells;
    LayoutCell cells[MAX_CELLS];     // MAX_CELLS = 64 (16×4)
    uint64_t version;                // для cache invalidation
} Layout;

typedef struct LayoutRegistry {
    Layout *layouts;                 // dynamic array
    int nb_layouts;
    pthread_rwlock_t lock;           // редко write, часто read
    uint64_t global_version;
} LayoutRegistry;

Layout registry — global (см. §9): один process, один registry — переиспользуется между filter-instances. set_layout name=X filter ищет в registry, не клонирует — берёт pointer (read-locked).

5. Overlay system

5.1 7 primitive types

# domain/overlay.py (pydantic discriminated union)

class OverlayBase(BaseModel):
    id: str                          # уникальный, для replace/delete
    instance_id: str | None = None   # None = broadcast all instances
    cell_id: int | None = None       # None = на canvas, int = относительно cell
    z_index: int = 100
    alpha: float = 1.0               # 0..1
    ttl_ms: int | None = None        # auto-delete после
    privacy_tag: str | None = None   # для privacy_profile filtering
    visible: bool = True

class RectOverlay(OverlayBase):
    type: Literal["rect"] = "rect"
    x: float; y: float; w: float; h: float  # normalized
    color: str                        # #RRGGBBAA
    stroke_width: int = 0             # 0 = filled
    rounded_corners: int = 0          # px

class TextOverlay(OverlayBase):
    type: Literal["text"] = "text"
    text: str
    x: float; y: float
    font: str = "DejaVuSans"
    size: int = 16
    color: str = "#FFFFFF"
    background: str | None = None     # text bg box
    anchor: Literal["top-left","center","bottom-right",...] = "top-left"

class IconOverlay(OverlayBase):
    type: Literal["icon"] = "icon"
    icon: str                         # name из preloaded sprite sheet
    x: float; y: float
    size: int = 32
    tint: str | None = None

class ImageOverlay(OverlayBase):
    type: Literal["image"] = "image"
    source: str                       # path или URL (preload по first-use)
    x: float; y: float; w: float; h: float

class DimOverlay(OverlayBase):
    type: Literal["dim"] = "dim"
    x: float; y: float; w: float; h: float
    factor: float = 0.5               # 0..1, multiplier для luma

class GraphOverlay(OverlayBase):
    type: Literal["graph"] = "graph"
    kind: Literal["line","bar","histogram","sparkline"]
    x: float; y: float; w: float; h: float
    data_source: str                  # symbolic ID — кто-то push'ает данные
    refresh_rate_hz: float = 2.0
    style: dict                       # passthrough в renderer

class ChatOverlay(OverlayBase):
    type: Literal["chat"] = "chat"
    x: float; y: float; w: float; h: float
    max_lines: int = 5
    line_ttl_ms: int = 8000
    font_size: int = 14
    # сообщения push'аются отдельно через add_chat_message

Overlay = Annotated[
    RectOverlay | TextOverlay | IconOverlay | ImageOverlay | 
    DimOverlay | GraphOverlay | ChatOverlay,
    Field(discriminator="type")
]

5.2 Где живёт overlay state

Declarative state (active overlays): в controller — OverlayStore (instance_id → list[Overlay]).
GPU texture cache: в filter — GpuOverlayCache (overlay_id → CUDA texture/array). Lazy-загружается при first render.
Live data feed для graphs/chats: controller pumps данные → renders на CPU (cairo) с rate-limit → upload CUDA texture → notify filter.

5.3 Как overlay добирается до filter

Два канала в зависимости от типа payload:

Mutable lightweight overlays (rect/text/icon/dim) — через process_command:
```
set_overlay  id=event_42  json={"type":"rect","cell_id":0,"x":0.1,...}
```
Filter parse JSON, кэширует, рендерит. Update — same set_overlay с тем же id. Delete — clear_overlay id=event_42.
Heavy overlays (image/graph/chat — нужны RGBA pixel buffer'ы) — через AVFrame side data:
- Controller рендерит на CPU (cairo), upload в shared GPU memory (cuMemAlloc'ed, IPC handle переиспользуется через cuframes-style channel).
- Side data type: AV_FRAME_DATA_USER + offset (custom), payload содержит overlay_id + CUDA IPC handle к texture + dirty_flag.
- Filter dereference handle (один раз — закэшировано), при dirty_flag=true re-upload.

Альтернатива для side data — простой Unix socket controller↔filter с протоколом "вот тебе IPC handle к новой текстуре для overlay X". Менее coupled с FFmpeg AVFrame, проще debugging. Рекомендую второй вариант для phase 1, AVFrame side data — phase 2 (когда захочется detection bboxes от upstream filter'а напрямую через side data из vf_detect или Frigate-bridge).

5.4 GPU rendering pipeline для graphs/chats

controller event-loop                 sidecar→filter channel              filter (GPU)
─────────────────────                ──────────────────────              ────────────
event → update_graph_data(g, value)
  ↓
graph_renderer.queue(g)               
  ↓
[rate-limited — refresh_hz]
cairo.render(g.data) → RGBA buf      
  ↓                                  
cuda_upload(buf) → device texture     
                                     side-channel:
  ↓                                  texture_updated(overlay=g, handle=H, version=V)
                                     ─────────────────────────────────────────►
                                                                                ↓
                                                                       gpu_cache[g] = H
                                                                       mark_dirty(g)
                                                                                ↓
                                                                       (next frame) blit с alpha

Critical detail: chat и graph живут вне frame timeline. Они обновляются по event-rate (chat — сообщение пришло, graph — секундный tick). На каждом video frame filter просто blit'ит latest cached texture. CPU не делает работу на каждый кадр.

6. Side data + overlay producers

Кто кладёт overlay payload:

Source	Path	Example
Controller (default)	HTTP/MQTT/ZMQ → command router → `set_overlay`	UI клик «дать рамку cam3»
Frigate events bridge	`frigate_bridge.py` subscribes MQTT `frigate/+/events` → translate → `set_overlay`	bbox на detection
External script	curl POST /overlay/add	crontab "show weather widget at 8am"
HA automation	MQTT publish	"при звонке домофона show overlay 'door'"
Upstream filter (future)	AVFrame side data `AV_FRAME_DATA_DETECTION_BBOXES`	если detection в FFmpeg graph

Frigate bridge — конкретный example:

# adapters/frigate_bridge.py
async def on_frigate_event(payload):
    ev = json.loads(payload)
    if ev["type"] == "new":
        bbox = ev["after"]["box"]
        await commands.set_overlay(RectOverlay(
            id=f"frigate_{ev['after']['id']}",
            cell_id=camera_to_cell(ev["after"]["camera"]),
            x=bbox[0]/ev["after"]["frame_width"],
            y=bbox[1]/ev["after"]["frame_height"],
            w=bbox[2]/ev["after"]["frame_width"],
            h=bbox[3]/ev["after"]["frame_height"],
            color="#FF0000A0",
            stroke_width=3,
            ttl_ms=2000,
            privacy_tag="frigate_bbox",
        ))
        if ev["after"].get("plate"):
            await commands.set_overlay(TextOverlay(
                id=f"frigate_lpr_{ev['after']['id']}",
                cell_id=camera_to_cell(ev["after"]["camera"]),
                text=ev["after"]["plate"]["text"],
                x=bbox[0]/W, y=bbox[3]/H + 0.02,
                size=18, color="#FFFFFF", background="#000000A0",
                ttl_ms=3000,
                privacy_tag="lpr_text",   # ← privacy_profile=public скроет
            ))

7. Control plane protocols

7.1 ZeroMQ flow

Commands IN:

FFmpeg zmq filter биндится на tcp://127.0.0.1:5555 (REP socket).
Controller — REQ client → filter target → command.
Формат FFmpeg zmq filter: <target> <command> <arg> где target = instance_id (filter resolve'ит через instance_id option, см. §2.1).

Events OUT:

Controller bind'ит PUB socket tcp://0.0.0.0:5556.
Topic prefix = event/<category>/<instance>. Subscribers filter'ят.

7.2 MQTT topic taxonomy

cuda_grid/cmd/<instance_id>/layout/set                ← set_layout
cuda_grid/cmd/<instance_id>/layout/create             ← новое определение
cuda_grid/cmd/<instance_id>/cell/<n>/bind             ← bind_cell
cuda_grid/cmd/<instance_id>/overlay/set               ← set_overlay (payload=Overlay JSON)
cuda_grid/cmd/<instance_id>/overlay/<id>/clear        ← clear
cuda_grid/cmd/<instance_id>/privacy/set               ← set_privacy_profile

cuda_grid/state/<instance_id>/layout                  ← retained: current layout
cuda_grid/state/<instance_id>/cells                   ← retained: cell→camera mapping
cuda_grid/state/<instance_id>/overlays/<id>           ← retained per overlay
cuda_grid/state/<instance_id>/fps                     ← stats, periodic

cuda_grid/event/<instance_id>/layout_switched         ← non-retained, fact-of-event
cuda_grid/event/<instance_id>/cell_camera_changed
cuda_grid/event/<instance_id>/fps_drop
cuda_grid/event/<instance_id>/overlay_added
cuda_grid/event/<instance_id>/overlay_expired
cuda_grid/event/<instance_id>/inputs_starved
cuda_grid/event/<instance_id>/cell_stale
cuda_grid/event/audio/ducked
cuda_grid/event/audio/restored

homeassistant/select/cuda_grid_<instance>_layout/config           ← HA discovery
homeassistant/sensor/cuda_grid_<instance>_fps/config
homeassistant/binary_sensor/cuda_grid_<instance>_input_alive/config

HA Discovery — конкретный пример (layout selector):

{
  "name": "Living Room TV Layout",
  "unique_id": "cuda_grid_tv1_layout",
  "command_topic": "cuda_grid/cmd/tv1/layout/set",
  "state_topic":   "cuda_grid/state/tv1/layout",
  "options": ["single","quad","nine_grid","main_plus_preview","custom_3x4"],
  "device": {
    "identifiers": ["cuda_grid_tv1"],
    "name": "CUDA Grid: tv1",
    "manufacturer": "gx/vf-cuda-grid",
    "sw_version": "<runtime version>"
  }
}

7.3 HTTP REST API

Endpoint	Method	Body	Purpose
`/instances`	GET	—	список filter-instances + текущее state
`/instance/{id}`	GET	—	detailed state
`/instance/{id}/layout`	POST	`{name, transition?}`	set_layout
`/instance/{id}/cell/{n}/bind`	POST	`{camera_id}`	bind_cell
`/instance/{id}/privacy`	POST	`{profile}`	privacy switch
`/layouts`	GET	—	список layout definitions
`/layouts`	POST	Layout JSON	create/update layout
`/layouts/{name}`	DELETE	—	удалить
`/instance/{id}/overlays`	GET	—	active overlays
`/instance/{id}/overlays`	POST	Overlay JSON	add/replace
`/instance/{id}/overlays/{oid}`	DELETE	—	remove
`/instance/{id}/chat/{oid}/message`	POST	`{text, color?}`	push в chat overlay
`/audio/duck`	POST	`{source, duration_ms, ratio}`	manual ducking
`/events`	GET (SSE)	—	streaming events
`/health`	GET	—	liveness
`/metrics`	GET	—	Prometheus

OpenAPI schema публикуется автоматически (FastAPI). Endpoints версионируются /v1/... с самого начала.

7.4 Conflict resolution

Источники конфликтуют (HA шлёт layout=quad, MQTT-rule шлёт layout=nine_grid в 50ms интервале). Политика:

Single command router — все commands из всех протоколов кладутся в один asyncio.Queue (или Go channel). Сериализация естественная.
Idempotency key — каждая команда имеет (optional) cmd_id (UUID). Дубликаты dropped.
Per-source priority (конфигурируемый):
```
priority:
  ha_automation: 100
  mqtt:          80
  http:          50  
  zmq:           50
  frigate:       30
```
В пределах same-tick window (50ms) старшая priority побеждает; младшая — discarded + emit event command_overridden{by=ha_automation}.
Locking through set_priority_lock — manual UI lock на N секунд: POST /instance/tv1/lock {duration=60s} — только источник с лок-token может менять instance.
Last-write-wins для overlays с одинаковым id (естественно через replace-semantics).

7.5 Event taxonomy (publishes наружу)

Event	When	Payload
`layout_switched`	после применения	`{from, to, reason, source}`
`cell_camera_changed`	bind_cell	`{cell, prev_camera, new_camera}`
`overlay_added`	set_overlay (новый id)	`{id, type, instance, cell?}`
`overlay_updated`	set_overlay (existing)	`{id, type}`
`overlay_expired`	TTL fired	`{id, reason: "ttl"\|"manual"}`
`fps_drop`	output_fps < threshold	`{instance, current, expected, since_ms}`
`inputs_starved`	все inputs stale	`{instance, last_seen_per_input}`
`cell_stale`	один input stale	`{instance, cell, camera, age_ms}`
`audio_ducked`	ducking active	`{rule, source, duration_ms}`
`audio_restored`	ducking ended	`{rule}`
`command_overridden`	conflict-resolution dropped	`{cmd, source, by}`
`controller_started`	startup	`{version, instances}`

Все события публикуются в:

ZeroMQ PUB tcp://0.0.0.0:5556 topic event.<category>.<instance>
MQTT cuda_grid/event/...
HTTP /events SSE

8. Audio orchestration

Architectural stance: vf_cuda_grid сам аудио не трогает. Аудио — стандартные FFmpeg filters (amix, sidechaincompress, volume), controller координирует их через те же process_command что и video.

State machine — пример "домофон":

states:
  idle:        music plays @ vol=1.0, no doorbell screen
  ringing:     duck music (vol=0.3), switch tv1 to main_plus_preview с camera_door как main,
               show icon "doorbell" на canvas, audio из cam_door amplified
  ringing_acked: keep grid, music back to 0.8, доорбель квитирован
  cooldown:    24s timer, после — restore music+layout
  
events triggering:
  on(mqtt:doorbell/ringing): from {idle,cooldown} → ringing
  on(mqtt:doorbell/answered): from ringing → ringing_acked
  on(timer 30s): from ringing → ringing_acked  
  on(timer 24s in cooldown): → idle
  
actions per state:
  ringing:
    - ffmpeg.cmd: "vol_music volume 0.3"
    - ffmpeg.cmd: "sc_music threshold 0.05"   # sidechain compression bumps
    - ffmpeg.cmd: "tv1_grid set_layout main_plus_preview"
    - ffmpeg.cmd: "tv1_grid bind_cell cell=0 camera=4"  # cam_door
    - ffmpeg.cmd: "tv1_grid set_overlay id=doorbell_icon json=..."
    - publish event audio_ducked{rule=doorbell}
    - publish event layout_switched

Rule engine реализация: YAML-конфиг с состояниями и triggers, parse в transitions library (pure-Python state-machine). Простой, тестируемый, не over-engineered.

# orchestration_rules.yaml
rules:
  doorbell:
    triggers:
      - on: mqtt
        topic: doorbell/state
        value: ringing
      
    states:
      ringing:
        enter:
          - ffmpeg_cmd: { target: vol_music, cmd: volume, arg: "0.3" }
          - ffmpeg_cmd: { target: tv1_grid, cmd: set_layout, arg: "main_plus_preview" }
          - overlay: { id: doorbell, type: icon, icon: bell, x: 0.45, y: 0.05, size: 64 }
        exit:
          - ffmpeg_cmd: { target: vol_music, cmd: volume, arg: "1.0" }
          - clear_overlay: doorbell
        timeout: 30s → ringing_acked
        
      ringing_acked:
        timeout: 24s → idle

9. Multi-instance behaviour

9.1 Shared inputs

Через FFmpeg native split filter (см. §2.1). Каждый cuda_grid instance получает свою копию pointer'а на CUDA frame (refcount++). Bandwidth GPU↔GPU = 0 (только ref-count).

9.2 Layout registry — global или per-instance

Рекомендация — hybrid (как и в требованиях):

Global LayoutRegistry в FFmpeg process — одна instance per FFmpeg process, owned первым cuda_grid filter который ini'ится. Subsequent instances reference тот же registry.
Active layout pointer + cell_to_input mapping + overlays — per-instance state.
Privacy profile — per-instance.

Why global registry: layout definitions — read-mostly (создаются редко, читаются часто). Sharing экономит память (один nine_grid с 9 cell'ами — 9×structures ×3 instances = 27 структур vs 9 with shared).

Реализация sharing: при init filter проверяет process-wide singleton (atomic pointer + ref-count) и либо создаёт, либо подключается. При уничтожении последнего — освобождает. Когда controller hot-reload'ит registry — он шлёт reload_registry команду которая broadcast'ится в зарегистрированные instances; они apply'ят (read-write lock на registry, atomic version-bump).

9.3 AVFrame ref-counting

Стандартный FFmpeg механизм. После split каждый instance получает AVFrame* с увеличенным refcount. av_frame_free() уменьшает; при 0 → освобождение в frame pool. У CUDA frames — pool reuse (через hw_frames_ctx), физическая GPU память не deallocate'ится.

9.4 Independent output timing

Каждый instance имеет свой target_fps option. Pull-based: filter framework сам вызывает request_frame на каждый output. Если tv1 = 25fps, public = 30fps — два независимых rates, inputs shared.

10. Library choice для controller

Сравнение

Aspect	Python (FastAPI + asyncio)	Go (chi/fiber + nats)	Rust (axum + tokio)
Ecosystem MQTT/ZMQ	`paho-mqtt`, `pyzmq`, `aiomqtt` — все proven	`eclipse/paho.mqtt.golang`, `go-zeromq/zmq4` — proven	`rumqttc`, `zmq` — proven, но менее mature
HA Discovery integration	Direct МQTT — нет специфик	Same	Same
Audio-orchestration rule engine	`transitions` library — отличный	`looplab/fsm` — норм	crate `statig` — норм, меньше docs
Cairo/skia для graph rendering	`pycairo`/`cairocffi` — solid	`gioui.org/cairo` экзотика, либо CGO	`cairo-rs` — ok, но больше pain
FastAPI + Pydantic схемы	First-class, OpenAPI free	manual schemas или `huma` (новее)	`utoipa` + axum-extra — работает
Developer velocity	high — Python для control plane проверенный паттерн	medium — больше boilerplate	low для iterate fast
Runtime performance	достаточно (control plane, не hot path)	хорошо	отлично
Memory footprint	~30-50MB	~10-20MB	~5-10MB
Familiarity в нашей экосистеме	high (paddleocr, frigate-cuframes — все Python)	low	low
Single static binary distribution	✗ (Python deps)	✓	✓

Рекомендация: Python (FastAPI + asyncio + pydantic + paho-mqtt + pyzmq + transitions)

Зачем:

Hot path не controller-side — pixel-pushing в filter (C/CUDA). Controller — coordination layer, для него Python ровно. Прирост latency MQTT→ZMQ командой 1-3ms на Python — несущественно при cycle time видео 40ms.
Cairo rendering для graphs/chats — Python pycairo зрелый. В Go это pain.
Frigate community — Python-native. Bridge plugins, examples, doc-set резко легче adopt'ить.
Iteration speed важнее runtime. Audio rules, overlay logic, conflict resolution — это бизнес-логика которая меняется. Python переписывается быстро.
Memory footprint 30-50MB для control plane — не существенно (Frigate сам жрёт ~500MB, NVDEC — гигабайты VRAM).

Стек deps:

fastapi              # HTTP REST + auto OpenAPI
uvicorn              # ASGI server
pydantic >=2         # domain models, validation
pyzmq                # FFmpeg zmq client + events PUB
aiomqtt              # asyncio MQTT (тонкая обёртка над paho)
paho-mqtt            # fallback для HA discovery если нужен
transitions          # state machines для audio orchestration
pycairo              # graph/chat rendering CPU-side
sse-starlette        # /events SSE
prometheus-client    # /metrics
structlog            # structured logs
typer                # CLI
pyyaml               # rule configs

Distribution: Docker image + PyPI package (cuda-grid-controller). systemd unit в deploy/.

Что бы заставило перейти на Go: если стало бы 100+ filter-instances и controller стал bottleneck. Сейчас realistic 1-10 instances per host. Будет проблема — extract controller (он чистый по domain) и rewrite в Go без переписки filter. Reversible.

11. Phases of implementation

PR-1 — MVP filter (fixed quad)

Scope:

filter/vf_cuda_grid.c + minimal CUDA kernels (NV12 region memcpy only)
Hardcoded quad layout (2×2), 4 inputs, fixed output 1920×1080
No scaling — assumes inputs already at 960×540
No overlays, no borders, no labels
process_command skeleton (NOP)
FFmpeg patch file для n7.1
CLI usable: ffmpeg -i cuframes://... [×4] -filter_complex cuda_grid=inputs=4 -c:v h264_nvenc out.mp4
Unit tests: kernel-level pixel-perfect tests against known fixtures
Docker integration test

Deliverable: working FFmpeg binary с фильтром. Demo: 4 cam → quad mp4.

PR-2 — Dynamic layouts + per-cell scaling

Scope:

LayoutRegistry, JSON layout loading (layout_file= option)
Cell scaling — NPP integration (NV12 bilinear)
Borders + background fill + fit modes (cover/contain/stretch)
process_command: set_layout, bind_cell, swap_cells
Cell-camera binding state
All 9 predefined layouts (соответствующих текущему grids.json)
Cross-fade transition (linear blend) — opt-in через command
Lock-step input policy: keep_last + stale_timeout_ms

Deliverable: layout switching из CLI/FFmpeg sendcmd. Visual transitions.

PR-3 — Controller skeleton

Scope:

controller/ package: FastAPI app, MQTT subscribe/publish, ZMQ client (commands)+ pub (events)
HA Discovery payloads для layout select + fps sensor + input_alive binary_sensor
Command Router + Conflict Resolver (single-source-of-truth queue)
State Store (in-memory + JSON snapshot)
HTTP endpoints для /layouts, /instance/.../layout, /instance/.../cell/.../bind
/events SSE
ZMQ PUB events outbound
Filter publishes events back: добавить в filter side zmq_publish_event (TX socket в parallel к command RX) — простая JSON-line на тот же socket с тегом
Docker compose: filter + controller + mosquitto + HA mock
Integration test: HTTP POST /layout/set → MQTT state retained → HA reflects

Deliverable: Production-grade control plane без overlay'ев. Frigate-bridge examples/.

PR-4 — Overlays: rect/text/icon basics

Scope:

OverlayStore in controller (in-mem, MQTT retained sync)
CUDA kernels: alpha-blit RGBA texture → NV12, glyph blit
Text rendering: CPU-precomputed glyph atlas (cairo render glyphs at startup → CUDA texture). Не Pango на каждый кадр.
Icon rendering: preloaded sprite sheet (configured iconpack)
process_command: set_overlay, clear_overlay (JSON через arg)
TTL handling (controller-side timers)
Privacy profile filtering
Frigate bridge sample: detection bbox → rect+text overlay с TTL

Deliverable: Frigate detections visible в mosaic с privacy controls.

PR-5 — Image/dim/graph/chat overlays

Scope:

Image overlay: lazy-load по path, upload в CUDA texture cache
Dim overlay: simple luma-multiply kernel
Graph overlay: cairo CPU rendering, IPC-handle channel controller↔filter
Chat overlay: rolling message list, cairo рендеринг + fade-out per line
Side-channel protocol: Unix socket для texture handles (см. §5.3)
HTTP endpoint /instance/.../chat/.../message
HA Discovery: добавить sensor для текущей graph value
Performance bench: 8 overlays @ 25fps на RTX 5090 — target <0.5ms compose time

Deliverable: Полный overlay-роудмап. Weather widget + LPR scroll + motion timeline в production.

PR-6 — Audio orchestration

Scope:

orchestration/audio_orchestrator.py — state machine engine (transitions library)
YAML rule configs (orchestration_rules.yaml)
FFmpeg ZMQ client расширяется на audio targets (amix, volume, sidechaincompress)
3 example rules: doorbell, baby-cry detection, motion-night
audio_ducked / audio_restored events
HTTP /audio/duck для manual triggers
E2E test: simulated MQTT doorbell event → music ducks + grid switches + overlay shown + restore

Deliverable: Полная control plane platform. Closes gx/cctv#22 Phase 4.

12. Risks / open questions

#	Risk	Mitigation / current stance
R1	FFmpeg `process_command` arg — single string; сложные overlays нуждаются в JSON. Limit на длину?	FFmpeg AVOption parsing допускает строки до ~4KB. Большие payloads — через side-channel (см. §5.3). Доc'нём ограничение в filter README.
R2	Upstream FFmpeg: примут ли `vf_cuda_grid`?	Не ставим upstream на critical path. Доходим до v0.3 (mature) — submit PR. До этого — out-of-tree patch (как cuframes уже делает с `cuframesdec.c`).
R3	NPP licensing — proprietary, CUDA-bundled	NPP идёт с CUDA Toolkit, который и так нужен. Не нарушает LGPL фильтра (filter — LGPL, NPP — dynamic link).
R4	Glyph atlas: какие шрифты shipped?	DejaVu (free) preloaded; пользователь может override через config. Юникод — full BMP плюс CJK по запросу (lazy).
R5	Multi-GPU не поддерживается v0.1 (consistent с cuframes)	Документируем. Multi-GPU = v0.3+.
R6	Lock-step при сильно разных fps inputs (cam@10fps + cam@30fps)	Tolerance window per-input configurable. Slow camera gets `keep_last`. Документируем как known constraint.
R7	Filter perf на 16-cell layout @ 4K	Бенчмарк в PR-5. Если не укладывается в frame budget — на тяжёлых composition'ах рекомендуем output 1080p (downscale 4K cameras уже в `cuda_grid`).
R8	Cuframes-IPC + filter sharing — interaction с frame pool lifecycle	Проверить что `hw_frames_ctx` от cuframes-demuxer'а correctly ref-share'ится через `split` → filter. Risk integration test обнаружит.
R9	controller restart drops live overlay state	MQTT retained для declared overlays + JSON snapshot для local cache. Restart восстанавливает из обоих. Ephemeral TTL overlays — теряются, что acceptable.
R10	Conflict-resolution priority — UX-проблема (юзер не понимает почему его команда no-op)	Каждый dropped command emit'ит event `command_overridden{by=...}` — UI/log виден. Доc'нём priority defaults.
R11	text/glyph rendering CJK/RTL	Phase 1 — LTR Latin/Cyrillic. CJK/RTL — open question, в v0.4 (после Phase 6).
R12	Cross-fade при разных aspect/size между layouts	Cross-fade — pixel-blend двух уже-композированных canvas, размеры одинаковые (output_size фиксирован per instance). Безопасно.
R13	Audio orchestration — race conditions между MQTT events	Single asyncio queue + state machine = serialized. State machine library `transitions` thread-safe.

Open questions для обсуждения с командой:

CUDA min version. Cuframes v0.1 нацелен на CUDA 12+. Sticking с этим или 11.8 для wider compat? Stance: CUDA 12+, явно declared.
MQTT vs NATS для events? MQTT — universal (HA, Frigate); NATS — perf. Stance: MQTT primary (ecosystem fit), NATS не нужен сейчас.
Layout DSL — JSON vs YAML? Текущий cctv-processor — JSON. Stance: JSON для machine-generated/REST-API, YAML для human-edited rules. Schema один, два serialization frontend'а через pydantic.
Glyph atlas vs FreeType-on-CUDA? Stance: атлас (proven, simple). FreeType-on-CUDA — слишком экзотика для v0.1.
License: LGPL vs MIT? Filter inherits LGPL от FFmpeg (он LGPL). Controller — отдельная codebase, MIT. Stance: dual — filter/ LGPL-2.1+, controller/ MIT. Чётко documented в LICENSE и subdir LICENSE files.
Frigate camera ID space. Frigate camera names — strings, наш camera_id — int. Stance: controller хранит mapping name↔index, abstraction layer.

13. Migration path для cctv-processor

Текущий cctv-processor pipeline:

cuframes → cv::Mat (GPU→host download) → GridComposer (CPU OpenCV) → 
swscale → host→GPU upload (для Frigate detect) → h264_nvenc → RTSP-out

После миграции на vf_cuda_grid:

cuframes://cam[1..4] ─►
                       ┌─ split=2 ─► cuda_grid=instance_id=tv (layout=main_plus_preview) ─► h264_nvenc ─► RTSP-tv
                       └─        ─► (preview-only, не масштабируется до Frigate)

cctv-processor сам исчезает как frame-processor. Что остаётся:

GridManager business logic (auto-switching по motion, priority switching, history) — перевозится в controller как orchestration rules (Python).
SnapshotManager — становится отдельным маленьким FFmpeg pipe который читает same output (HD-snap каждые 5 мин).
EventSystem internal bus — упраздняется, заменяется на ZMQ events.
REST API endpoints cctv-processor — становятся endpoints в cuda-grid-controller (миграция URL-схемы 1:1 + redirects).
cameras.json, grids.json, analytics.json — остаются, читаются controller'ом + translate в наш layouts.json schema (compat-конвертер).

Migration steps

PR-A в cctv repo: добавить compat layer — cctv-processor может вытаскивать composed frames через cuframes_packets:// от vf_cuda_grid output. Pure shadow-mode, обе системы работают параллельно, output идентичен (compare-test).
PR-B в localhost-infra: docker-compose добавляет cuda-grid-controller + ffmpeg-mosaic containers рядом с cctv-processor.
Production cutover: TV stream switch из cctv-processor RTSP → ffmpeg-mosaic RTSP. cctv-processor остаётся для snapshots + analytics (короткое время).
PR-C: snapshots переезжают на отдельный slim FFmpeg pipe.
PR-D: analytics (motion events) от Frigate напрямую в controller (Frigate bridge). cctv-processor decommissioned. Closes gx/cctv#22 Phase 4.

Compat-конвертер `grids.json` → `layouts.json`

Простой Python скрипт в examples/cctv-processor-migrate/. Field mapping:

old `grids.json`	new `layouts.json`
`grid_templates.<name>.cells[].camera_id`	runtime state (per-instance cell binding), извлекается отдельно
`cells[].x/y/width/height`	`cells[].x/y/w/h` (то же 0..1)
`border/label`	в overlay-template или defaults
`motion_indication`	overlay-template `motion_indicator` (rect с alpha, controller добавляет на motion event)
`transition_settings.duration_ms`	`set_layout_transition` arg
`default_grid`	`instances.<inst>.default_layout`

14. Overlap с `cctv-processor` MQTT plugin (gx/cctv#24)

TL;DR — это один и тот же controller. Не два.

Анализ overlap

Feature	cctv #24 MQTT plugin	cuda-grid-controller
MQTT subscribe для layout switch commands	✓	✓
HA Discovery для layout selector	✓	✓
MQTT state publishing	✓	✓
Events publishing наружу	✓	✓
Internal event bus integration	EventSystem (C++ in-process)	ZMQ events
HTTP REST	partial	✓
Audio orchestration	—	✓
Overlay control	—	✓

Overlap составляет 60-70%. И — самое важное — cctv-processor мигрирует на vf_cuda_grid (см. §13). После миграции cctv-processor больше не нуждается в собственном MQTT plugin: его функция уже в cuda-grid-controller.

Что меняется в cctv#24

Issue переформулируется:

~~"Add MQTT plugin to cctv-processor"~~ → "Migrate cctv-processor mosaic management to cuda-grid-controller (depends on gx/vf-cuda-grid PR-3)"

Acceptance: cctv-processor либо decommissioned (§13 happy path), либо его mosaic-control logic вызывает HTTP API нашего controller (compatibility-mode).

Next steps (ordered, ready to start)

Open issue gx/vf-cuda-grid#1 ("Design accepted") — paste этот документ как issue body.
Create repo gx/vf-cuda-grid с скелетом (README, ROADMAP, LICENSE-dual, empty filter/, controller/, schema/, examples/, docs/).
Update gx/cctv#24 — переформулировать как "depends on vf-cuda-grid PR-3", закрыть как standalone scope.
Update gx/cuframes ROADMAP "Future ideas → vf_cuda_grid" — пометить как moved to gx/vf-cuda-grid repo.
Start PR-1 (MVP filter, fixed quad) — отдельный issue/branch.

Relevant files reviewed for this design:

/home/claude/projects/cctv/cpp/apps/cctv-processor/include/grid/GridComposer.h
/home/claude/projects/cctv/cpp/apps/cctv-processor/include/grid/GridManager.h
/home/claude/projects/cctv/cpp/apps/cctv-processor/config/grids.json
/home/claude/projects/cuframes/ROADMAP.md (section "Future ideas → vf_cuda_grid")
/home/claude/projects/cuframes/README.md
/home/claude/projects/cuframes/filter/cuframesdec.c (existing out-of-tree FFmpeg patch pattern — model для нашего)

64 KiB Raw Blame History Unescape Escape

Design: vf-cuda-grid — GPU-native video grid composer with control plane sidecar

1. High-level architecture

2. Component design

2.1 vf_cuda_grid filter API

2.2 cuda-grid-controller sidecar — модули

3. Composition algorithm

3.1 CUDA pipeline для одного output frame

3.2 CUDA kernels (что писать, что брать готовое)

3.3 Lock-step inputs — что если не все inputs прибыли

3.4 Async vs sync CUDA execution

4. Layout DSL

4.1 JSON Schema (layouts.json)

4.2 In-memory representation (filter side)

5. Overlay system

5.1 7 primitive types

5.2 Где живёт overlay state

5.3 Как overlay добирается до filter

5.4 GPU rendering pipeline для graphs/chats

6. Side data + overlay producers

7. Control plane protocols

7.1 ZeroMQ flow

7.2 MQTT topic taxonomy

7.3 HTTP REST API

7.4 Conflict resolution

7.5 Event taxonomy (publishes наружу)

8. Audio orchestration

9. Multi-instance behaviour

9.1 Shared inputs

9.2 Layout registry — global или per-instance

9.3 AVFrame ref-counting

9.4 Independent output timing

10. Library choice для controller

Сравнение

Рекомендация: Python (FastAPI + asyncio + pydantic + paho-mqtt + pyzmq + transitions)

11. Phases of implementation

PR-1 — MVP filter (fixed quad)

PR-2 — Dynamic layouts + per-cell scaling

PR-3 — Controller skeleton

PR-4 — Overlays: rect/text/icon basics

PR-5 — Image/dim/graph/chat overlays

PR-6 — Audio orchestration

12. Risks / open questions

13. Migration path для cctv-processor

Migration steps

Compat-конвертер grids.json → layouts.json

14. Overlap с cctv-processor MQTT plugin (gx/cctv#24)

Анализ overlap

Рекомендация

Что меняется в cctv#24

Next steps (ordered, ready to start)

64 KiB

Raw Blame History

Design: `vf-cuda-grid` — GPU-native video grid composer with control plane sidecar

2.1 `vf_cuda_grid` filter API

2.2 `cuda-grid-controller` sidecar — модули

Compat-конвертер `grids.json` → `layouts.json`

14. Overlap с `cctv-processor` MQTT plugin (gx/cctv#24)