gx/ffmpeg-patched

WIP: libavfilter: vf_cuda_grid — Phase 1 MVP (fixed quad layout) #2

Draft

gx wants to merge 14 commits from n7.1-vf-cuda-grid into n7.1-cuframes

Author	SHA1	Message	Date
gx	636bd78854	vf_cuda_grid: overlay с invalid cell — silent skip вместо AVERROR Когда controller broadcast'ит overlay command ко всем cuda_grid instances (quad/single/mpp), overlay с cell=2 (gate_lpr cell в quad/mpp) попадает в single layout который имеет nb_cells=1 → overlay_pixel_rect возвращал AVERROR(EINVAL) → render_overlays propagated к cuda_grid_compose → "Error while filtering: Invalid argument" каждый frame → pipeline crash loop. Fix: return 1 (positive skip) вместо AVERROR. render_overlay_rect уже имеет `if (ret != 0) return ret < 0 ? ret : 0;` — positive skip правильно обрабатывается без error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-21 07:08:23 +01:00
gx	a326ef146c	vf_cuda_grid: Phase 6 — process_command 'reload_icon <name>' Invalidate cached icon atlas by name. Next render автоматически re-reads PNG file с disk. Used controller'ом для periodic re-render dynamic overlays (charts/chats) — write new PNG → send reload_icon → filter подхватывает. Без этого pattern dynamic overlay невозможен — icon overlay cached forever after first load by icon_name. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 21:48:09 +01:00
gx	c5130cb15c	vf_cuda_grid: Phase 4b-4 — icon overlay (PNG/JPG decode + sprite blit) PNG/JPG decode через libavcodec (системный decoder), conversion в RGBA8 через swscale, upload в pitched CUDA buffer, blit через Alpha_Blit_RGBA_Y/UV (те же kernels что text). Icon cache — shared by name (одна и та же иконка для нескольких overlays = один GPU atlas). Lifecycle: load on first use, free в uninit. Path resolution: <icon_dir>/<name> + try .png/.jpg extensions: /var/lib/cuda-grid/icons (default) /opt/cuda-grid/icons (fallback) + abs path supported Options: icon_dir= — overrides default search paths Wire format: add_overlay <id> icon x=.. y=.. icon_name=domofon opacity=200 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 22:41:11 +01:00
gx	b88f966f83	vf_cuda_grid: URL-decode для string overlay fields (text, icon_name) Controller URL-encode'ит string values (text="hello world" → text=hello%20world) чтобы пройти sscanf("%s") который stops on whitespace. Filter inline decode'ит '%xx' obratno в bytes. Только для text + icon_name. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 22:32:38 +01:00
gx	4010461300	vf_cuda_grid: Phase 4b-3 — text overlay (freetype + RGBA atlas) CPU rasterization через freetype (DejaVu/Liberation auto-detect или font_file= option), upload pitched RGBA buffer to GPU, blit через Alpha_Blit_RGBA_Y/UV kernels (Phase 4b-2 уже had). Cache: per-overlay-id atlas с keys (text, font_size, r/g/b) — re-rasterize только при change. Cleanup при remove_overlay/clear_overlays/uninit. Options: font_file= — TTF path (default: search DejaVu/Liberation) font_size= — default size if overlay не указал свой Wire format: add_overlay <id> text x=.. y=.. text=hello font_size=24 r=255 g=255 b=255 opacity=200 Conditional на CONFIG_LIBFREETYPE — без него text overlays no-op (остальные типы работают как обычно). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 22:30:36 +01:00
gx	1e54f04e24	configure: add cuda_grid_filter_deps_any для cuda_nvcc/cuda_llvm Без этого filter compileд но .ptx step failед: configure не knows что filter требует nvcc/clang для .cu compile. Same pattern как scale_cuda. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 22:26:12 +01:00
gx	8ca590004b	vf_cuda_grid: Phase 4b-2 — CUDA kernels (alpha-blended overlays) Custom .cu kernels компилируются через ffbuild → .ptx, embedded в binary через bin2c pattern (как vf_scale_cuda). Loading через ff_cuda_load_module в config_output, kernel handles cached на life-of-filter. Kernels: Alpha_Fill_Y/UV — solid colour α-blend (rect, dim, border strips) Alpha_Blit_RGBA_Y/UV — blit RGBA atlas → NV12 (text/icon в 4b-3/4) Render side: render_strip_alpha — заменил render_strip_solid (alpha=255 ≡ solid) render_overlay_rect — opacity передаётся в kernel render_overlay_dim — Y=16/UV=neutral × dim.amount (затемнение region) Также: убран unused cuda_grid_config_input (-Werror), добавлены fields CUmodule + 4×CUfunction в Context; unload в uninit. cuMemsetD2D* НЕ доступны в FFmpeg's CudaFunctions wrapper, поэтому solid fill реализован через kernel — лишний overhead minimal (16×16 threads). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 22:24:01 +01:00
gx	9deaca7697	vf_cuda_grid: Phase 4b-1 — rect overlay primitives (solid fill, no alpha) Добавляет inner overlay state с mutex + process_command handler. Rendering filled/border rects через cuMemsetD2D8Async/D2D16Async — без custom kernel'а (Phase 4b-2 = alpha blend, требует .cu). Commands: add_overlay <id> rect cell=N x=.. y=.. w=.. h=.. r=.. g=.. b=.. thickness=.. opacity=.. remove_overlay <id> clear_overlays text/icon/dim — типы определены, render заглушен до Phase 4b-2/3/4. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 22:17:41 +01:00
gx	178fc5bb4e	vf_cuda_grid: Phase 2b — delegated scaling to upstream scale_npp После попытки in-filter NPP scaling обнаружено что nppiResize не имеет _C2R variant для NV12 UV interleaved (только C1R, C3R, C4R). Alternatives: - 2× nppiResize_8u_C1R с split/merge через intermediate buffers - custom CUDA kernel - treat UV pair как 16u (blending artifact на boundaries) Pragmatic decision: cuda_grid делает только composition (same-size memcpy), а scaling делегируется существующему scale_npp filter в filter chain: [0]scale_npp=1280:1080[s0]; [1]scale_npp=640:360[s1]; ... \ [s0][s1]...cuda_grid=layout=main_plus_preview Unix philosophy + leverages production-tested NPP code. Controller (Phase 3) auto-generates filter graph с scale_npp per input. Revert: - #include <nppi.h> - libnpp dependency в configure (cuda_grid_filter_deps="ffnvcodec") - nppiResize* calls в compose path Add: - Error message с примером scale_npp chain pattern - Doc в file header c filter graph пример Phase 2 = full deliverable (2a + 2b). Дальше Phase 3 controller.	2026-05-19 21:45:40 +01:00
gx	11f310061a	vf_cuda_grid: Phase 2b — NPP scaling для mixed-size inputs - Add libnpp dependency в configure (cuda_grid_filter_deps) - #include <nppi.h>, nppSetStream(s->cu_stream) перед resize batch - Smart copy-or-scale в compose path: - src.size == cell.size → cuMemcpy2DAsync (fast path, zero overhead) - else → nppiResizeSqrPixel_8u_C1R для Y + _C2R для NV12 UV interleaved - NPPI_INTER_LINEAR interpolation (bilinear — стандартный для video) - Destination pointer offset через explicit pointer arithmetic (NPP не имеет dst_offset параметра, нужно сместить pSrc указатель) Это unblock'ает mixed-size cameras (parking 1920x1080 + gate_lpr 2688x1520 в одном grid с main_plus_preview layout — big cell scaled до 1280x1080, small cells scaled до 640x360). Phase 2 complete (2a + 2b). Phase 3 будет controller sidecar.	2026-05-19 21:20:04 +01:00
gx	df476472e2	vf_cuda_grid: fix include avstring.h для av_asprintf	2026-05-19 20:58:29 +01:00
gx	6ee2f474c7	vf_cuda_grid: Phase 2a — layout templates + dynamic nb_inputs Layout templates (9): - single, dual_horizontal, dual_vertical - quad (default), main_plus_preview (1 big + 3 small) - six_grid (3x2), nine_grid (3x3), sixteen_grid (4x4) - panoramic Cells определены в normalized координатах (0.0-1.0), переводятся в pixels в config_output (× out_w/out_h). Alignment до chroma boundary (NV12 ÷ 2). Filter options: - layout=<name> (default quad) - out_w=<int> (default 1920) - out_h=<int> (default 1080) Dynamic inputs: - nb_inputs derived из layout (single=1, quad=4, nine_grid=9, sixteen_grid=16) - ff_append_inpad_free_name в init() для каждой cell - AVFILTER_FLAG_DYNAMIC_INPUTS на filter Phase 2a limitation: - Каждый input должен быть точно cell_px size (no scaling). - Phase 2b добавит NPP resize для mixed-size inputs.	2026-05-19 20:57:08 +01:00
gx	4313c3f30d	vf_cuda_grid: fix #include cuda_check.h + mixed decl warnings (-Werror)	2026-05-19 20:50:17 +01:00
gx	097ca81605	vf_cuda_grid: Phase 1 MVP — fixed quad layout, 4 CUDA inputs → 1 output Phase 1 deliverable (см. gx/vf-cuda-grid#1): - libavfilter/vf_cuda_grid.c (~270 LOC): multi-input filter, fixed 2×2 quad - 4 NV12 CUDA frames same size → 2W × 2H output frame - Composition: cuMemcpy2DAsync per Y + UV plane на каждый input - framesync для lock-step pull всех 4 inputs - Output hw_frames_ctx allocated from input device_ref - Build wiring: CONFIG_CUDA_GRID_FILTER → libavfilter/{Makefile,allfilters.c}, configure deps на ffnvcodec Limitations Phase 1: - All inputs must be same size (no scaling) - Quad layout hardcoded (no DSL, no runtime switching) - NV12 only (no RGBA/YUV420P) Phase 2: dynamic layouts + scaling. Phase 3: runtime control via process_command.	2026-05-19 20:47:00 +01:00