docs: launch drafts (Frigate discussion + FFmpeg-devel RFC + Show HN)

3 черновика для upstream visibility (Etap E): - docs/launch/frigate-integration-issue.md — Discussion на blakeblackshear/frigate - docs/launch/ffmpeg-devel-rfc.md — RFC patch + cover letter для ffmpeg-devel ML - docs/launch/hn-show-post.md — Show HN draft (Etap F) - docs/launch/README.md — порядок, чек-лист, pre-flight notes См. issue #3.
2026-05-19 02:04:42 +01:00
parent bcc1d29ae8
commit 612843bd39
4 changed files with 429 additions and 0 deletions
@@ -0,0 +1,47 @@
+# Launch drafts
+
+Drafts для outreach / launch. Все — **draft material**, перед отправкой review.
+
+## Порядок (рекомендуемый)
+
+1. **`frigate-integration-issue.md`** — soft-launch, низкий риск отказа, целевая
+   аудитория уже жалуется на проблему в 3 discussion'ах. Может дать первых
+   early-adopter'ов и social proof для следующего шага.
+2. **`ffmpeg-devel-rfc.md`** — после того как Frigate-discussion получит
+   позитивный engagement (даже один "+1, would use" комментарий — уже traction).
+   Mailing-list FFmpeg-devel предъявляет высокий стандарт; готовиться тщательно.
+3. **`hn-show-post.md`** — финальный, после того как либо RFC получит первый
+   response, либо ясно что молчат. HN — это amplifier, не starting line.
+
+## Что в каждом draft
+
+| Файл | Куда | Формат | Когда |
+|---|---|---|---|
+| [`frigate-integration-issue.md`](frigate-integration-issue.md) | github.com/blakeblackshear/frigate | Discussion (Ideas category) | Сейчас |
+| [`ffmpeg-devel-rfc.md`](ffmpeg-devel-rfc.md) | `ffmpeg-devel@ffmpeg.org` | Patch + cover letter via `git send-email` | После Frigate engagement |
+| [`hn-show-post.md`](hn-show-post.md) | news.ycombinator.com | Show HN | Etap F (finale) |
+
+## Что **не** делать
+
+- Не публиковать всё сразу в один день — невозможно отвечать на all-channels параллельно.
+- Не публиковать в выходные / праздники / во время большого tech-event (Apple keynote, GTC, etc).
+- Не упоминать "AI", "battle-tested", "production-ready", "enterprise" в тексте — все эти аудитории (FFmpeg-devel, Frigate, HN) аллергичны к маркетинговому языку.
+- Не публиковать FFmpeg patch **без** sign-off — automatic rejection.
+- Не отправлять HN-пост если не можешь быть онлайн первые 2 часа после публикации — ранжирование умрёт.
+
+## Что подготовить перед отправкой
+
+- [ ] Subscribe на ffmpeg-devel (https://ffmpeg.org/mailman/listinfo/ffmpeg-devel) — иначе reply'ы не получишь
+- [ ] `git config --global` для send-email (см. ffmpeg-devel-rfc.md шаги)
+- [ ] Sign-off в FFmpeg commit (`git commit --amend -s` если ещё нет)
+- [ ] GitHub аккаунт для Frigate discussion (если нет уже)
+- [ ] HN аккаунт с пара дней истории — fresh accounts автоматически шадо-банятся
+
+## После отправки
+
+Следить за reply'ями в течение первой недели. Все три канала — асинхронные, но первые **48 часов** обычно решающие.
+
+Куда смотреть статус engagement:
+- ffmpeg-devel: https://ffmpeg.org/pipermail/ffmpeg-devel/
+- Frigate discussion: появится в правой панели repo
+- HN: https://news.ycombinator.com/threads?id=YOURUSER
@@ -0,0 +1,160 @@
+# FFmpeg-devel RFC submission
+
+**Status:** DRAFT — review перед отправкой.
+
+**Куда:** `ffmpeg-devel@ffmpeg.org` (subscribe: https://ffmpeg.org/mailman/listinfo/ffmpeg-devel)
+
+**Как:** patch генерится через `git format-patch`, отправляется `git send-email` с cover-letter. FFmpeg **не использует** GitHub PR / pull-request — только mailing-list patches.
+
+---
+
+## Шаги отправки
+
+```bash
+# 1. Конфигурация git send-email (один раз)
+git config --global sendemail.smtpserver smtp.gmail.com
+git config --global sendemail.smtpserverport 587
+git config --global sendemail.smtpencryption tls
+git config --global sendemail.smtpuser ВАШ-EMAIL
+# password — через ~/.netrc или интерактивно
+
+# 2. На fork ffmpeg-patched, в ветке n7.1-cuframes:
+cd /path/to/ffmpeg-patched
+git log --oneline n7.1..n7.1-cuframes  # должна быть одна commit
+
+# 3. Подготовить .patch
+git format-patch -1 --cover-letter --subject-prefix='RFC PATCH' \
+    --output-directory=/tmp/cuframes-rfc \
+    n7.1..n7.1-cuframes
+
+# 4. Отредактировать /tmp/cuframes-rfc/0000-cover-letter.patch:
+#    - Заменить *** SUBJECT HERE *** → см. ниже
+#    - Заменить *** BLURB HERE *** → cover-letter body (см. ниже)
+
+# 5. Dry-run
+git send-email --dry-run --to=ffmpeg-devel@ffmpeg.org /tmp/cuframes-rfc/*.patch
+
+# 6. Реальная отправка
+git send-email --to=ffmpeg-devel@ffmpeg.org /tmp/cuframes-rfc/*.patch
+```
+
+## Subject line
+
+```
+[RFC PATCH 0/1] libavformat/cuframesdec: zero-copy CUDA frame ingest via IPC
+```
+
+## Cover-letter body
+
+```
+Hi all,
+
+This RFC adds a new demuxer "cuframes" to libavformat that ingests already-
+decoded video frames residing in CUDA device memory, produced by another
+process via the libcuframes IPC layer [1].
+
+# Why
+
+In multi-consumer GPU video pipelines (CCTV with multiple analytics
+services, multi-stream transcoding farms, ML inference + recording on the
+same source) every consumer typically runs its own NVDEC session. On 16
+cameras × 25 fps × N consumers this multiplies NVDEC sessions, OS
+context-switches and host<->device PCIe traffic for what is logically the
+same decoded frame.
+
+cuframes addresses this by letting one process decode (e.g. via FFmpeg's
+existing CUDA hwaccel) and publish the decoded frames into a small CUDA
+ring buffer; other processes import the buffer via cudaIpcOpenMemHandle
+and consume the same VRAM allocation without redecoding or copying.
+
+The libavformat demuxer in this RFC is the consumer side: it exposes the
+remote ring buffer as a regular AVFormat input source, so any downstream
+FFmpeg filter chain or muxer can use it transparently.
+
+# Scope of this patch
+
+  libavformat/cuframesdec.c   — new demuxer
+  libavformat/allformats.c    — registration
+  configure                   — --enable-libcuframes option
+
+The demuxer currently outputs NV12 frames via cudaMemcpy2DAsync to host
+memory (rawvideo path). A v0.2 follow-up is planned that emits frames
+directly as CUDA AVHWFramesContext (true zero-copy into a CUDA-aware
+filter chain) — see [2].
+
+# Out-of-tree library
+
+libcuframes (the producer side, the IPC handshake, the ring-buffer
+allocator) lives out-of-tree at [1], licensed LGPL-2.1+ to match FFmpeg.
+The demuxer links against libcuframes via pkg-config.
+
+This mirrors the model used by other libavformat plugins that wrap third-
+party libraries (libsmbclient, librist, libsrt, etc.).
+
+# Testing
+
+- Unit smoke tests in the libcuframes repo (1 publisher × 4 subscribers ×
+  2000 frames @ 120 fps — 0 torn frames, 0 gaps).
+- E2E test against a real RTSP IP camera (Dahua HEVC 1920×1080, 25 fps,
+  100/100 frames, avg_fps=25.03).
+- ~24h production deployment serving Frigate (object detection) and a
+  custom analytics pipeline from a single decoder, single NVDEC session.
+
+# Prior art and what this is not
+
+There is no in-tree mechanism for sharing decoded GPU frames between
+unrelated FFmpeg processes. Existing alternatives are:
+  - CUDA hwdownload + hwupload (defeats the purpose — round-trips via PCIe)
+  - DeepStream Gst-nvstreammux (NVIDIA, closed, GStreamer-only)
+  - Vendor-locked NVENC/NVDEC pooling helpers
+
+cuframes is intentionally minimal: ring buffer + handshake + IPC handles.
+No transcoding logic, no policy.
+
+# Limitations / known issues for review
+
+  - NVIDIA GPUs only (CUDA IPC is vendor-specific).
+  - Linux only (POSIX SHM + AF_UNIX sockets).
+  - Producer and consumer must share the same CUDA device (CUDA IPC limit).
+  - NV12 only in v0.1; other pixel formats are roadmap items.
+  - Driver ≥ 525, CUDA toolkit ≥ 12.0 (≥ 13.0 recommended).
+
+# Feedback wanted
+
+  1. Is the libavformat demuxer the right home for this, or would a
+     hwcontext_cuda extension + a thin demuxer be a better split?
+  2. Are folks open to an out-of-tree library dependency under
+     --enable-libcuframes, given the precedent of librist/libsrt?
+  3. Naming: "cuframes" vs "cudaipcframes" vs something else?
+
+Happy to iterate. Patch follows.
+
+[1] https://git.goldix.org/gx/cuframes  (LGPL-2.1+)
+[2] https://git.goldix.org/gx/cuframes/issues/2  (v0.2 zero-copy plan)
+
+Signed-off-by: <YOUR NAME> <YOUR EMAIL>
+```
+
+## Notes на review
+
+- **Subject prefix `[RFC PATCH]`** — потому что это design discussion, не "merge this now". Если получите конструктивный feedback и сделаете revision — следующая будет `[PATCH v2]`.
+- **Sign-off обязателен** — иначе patch отклонят на уровне tooling.
+- **Не упоминать** "production-ready", "battle-tested", "30 days of uptime" — FFmpeg-devel список **очень** аллергичен на маркетинговый тон. Numbers OK, эпитеты нет.
+- **Не CC** maintainers без приглашения — ответят те, кому интересно. Можно CC Timo Rothenpieler (CUDA hwaccel maintainer) если хочется ускорить — но **только** после первого revision если тишина.
+- Возможные возражения:
+  - "Why not Vulkan video?" — Vulkan video не имеет cross-process sharing API на уровне CUDA IPC. Vulkan external memory работает с DMA-BUF на Linux но требует DRM device sharing, что тоже non-trivial — отдельный RFC материал.
+  - "Why a new demuxer, not a filter?" — потому что producer уже **вне** этого FFmpeg-процесса; demuxer — это место где AVFormat читает из внешнего источника. Filter pull'ает из upstream AVStream — здесь нет upstream.
+
+## Альтернативный путь — ffmpeg-user (lighter)
+
+Если кажется что для `-devel` сразу с patch'ем тяжело — можно начать с **awareness email** в `ffmpeg-user@ffmpeg.org`:
+
+```
+Subject: ANNOUNCE: libcuframes — zero-copy CUDA frame sharing for FFmpeg pipelines
+
+[3 параграфа: what / why / link to repo]
+
+Patch для libavformat будет отправлен в -devel список после feedback от пользователей.
+```
+
+Это **soft launch** — мень рисков отказа, больше шансов получить early adopters которые потом support'ят RFC. Рекомендую этот шаг **сначала**.
@@ -0,0 +1,115 @@
+# Frigate integration issue
+
+**Status:** DRAFT — review перед публикацией.
+
+**Куда:** https://github.com/blakeblackshear/frigate
+
+**Тип:** GitHub **Discussion** (category: Ideas), **не** Issue. Причина: это feature proposal, не баг. Frigate активно использует discussions (см. [#17033](https://github.com/blakeblackshear/frigate/discussions/17033), [#20191](https://github.com/blakeblackshear/frigate/discussions/20191), [#21559](https://github.com/blakeblackshear/frigate/discussions/21559) — все три уже жалуются на эту проблему).
+
+**Альтернатива:** ответить в одной из существующих discussion'ов о NVDEC saturation. Может быть лучше — там уже собралась audience.
+
+---
+
+## Title
+
+```
+[Ideas] Reduce NVDEC duplication on multi-consumer cameras via shared CUDA frame buffer (cuframes)
+```
+
+## Body
+
+```markdown
+## Problem
+
+When Frigate co-exists with other GPU-using video consumers on the same
+camera stream (separate AI processor, custom analytics, recording to a
+second NVR, etc.), each process opens its own NVDEC session and decodes
+the same H.264/HEVC stream independently. On 16+ cameras at 25 fps this
+becomes the bottleneck on consumer GPUs:
+
+- NVDEC sessions are limited (4 concurrent on RTX 30xx/40xx, more on
+  workstation cards). Decoder context creation / destruction is not free.
+- Each duplicate decode burns PCIe bandwidth pushing the same NV12 frame
+  to host memory (in setups that go through `hwdownload`).
+- Power draw and thermals scale with redundant decoding.
+
+Related discussions: #17033, #20191, #21559.
+
+## Existing workarounds
+
+- Single Frigate restream and have everything else pull from go2rtc — works
+  for re-encoding to TCP/UDP, but every downstream still re-decodes.
+- DeepStream `nvstreammux` — solves it but is closed-source NVIDIA stack,
+  GStreamer-only, not co-installable with current Frigate ffmpeg pipeline.
+
+## Proposal: cuframes ingest source
+
+[cuframes](https://git.goldix.org/gx/cuframes) (LGPL-2.1+) is a small
+library that lets one process decode once into a CUDA ring buffer and any
+number of other processes import that buffer via CUDA IPC and consume
+**zero-copy** in VRAM.
+
+Concretely for Frigate this would mean a new ffmpeg input source like:
+
+```yaml
+cameras:
+  driveway:
+    ffmpeg:
+      inputs:
+        - path: cuframes://driveway
+          input_args: preset-cuframes
+          roles: [detect]
+```
+
+where a sentinel container (one per camera, ~5MB RAM, runs
+`cuframes-rtsp-source`) does the actual RTSP pull + NVDEC and Frigate
+attaches to that pre-decoded stream.
+
+## Working integration (early proof)
+
+I've been running this in production for ~24h: a single
+`cuframes-rtsp-source` container per camera serves both Frigate
+(detection role) **and** a separate C++ analytics pipeline from the same
+NVDEC session. Frigate gets pre-decoded NV12 frames; no detection or
+recording behaviour was changed.
+
+Integration guide with full docker-compose and a patched Frigate Dockerfile:
+https://git.goldix.org/gx/cuframes/src/branch/main/docs/integrations/frigate.md
+
+## What I'm asking for
+
+Not a PR yet — first I'd like maintainer / community input on:
+
+1. Would Frigate be open to **upstream** a `cuframes://` input source, or
+   should this stay a third-party patched Frigate image?
+2. If upstream — what's the preferred shape: new ffmpeg preset only
+   (zero core code changes), or a first-class `decoder: cuframes` option
+   in the Frigate config schema?
+3. The cuframes library currently requires `--ipc` and `--pid` namespace
+   sharing between producer and consumer containers. Frigate uses
+   `s6-overlay` which is incompatible with `--pid` share (s6 needs PID 1).
+   The current integration uses a small race-window workaround
+   ([troubleshooting #2](https://git.goldix.org/gx/cuframes/src/branch/main/docs/troubleshooting.md));
+   a cleaner solution requires either making s6 optional in the Frigate
+   image or moving the IPC handshake to a sidecar pattern.
+
+## Limitations of cuframes (full disclosure)
+
+- NVIDIA GPUs only.
+- Linux only.
+- Producer + consumer must share the same CUDA device.
+- NV12 frame format only in v0.1.
+- Requires patching FFmpeg with a small (~400 LOC) demuxer; an upstream
+  FFmpeg RFC is in flight separately.
+
+If this looks worth pursuing I'm happy to open a draft PR against a feature
+branch and iterate.
+```
+
+## Notes на review
+
+- **Tone:** Frigate maintainer (Blake) ценит конкретику и production proof — без них любой feature request кладётся в backlog. У нас есть production proof (24h+) — это сильный аргумент, использован прямо.
+- **Не обещаем upstream без request'а** — спрашиваем discussion'ом, не PR'ом. Если Blake скажет "не наш scope, оставайтесь third-party" — это OK; integration guide уже валиден как standalone.
+- **Прозрачно про s6-overlay constraint** — это блокирующий issue для clean upstream'а. Лучше упомянуть сразу чем спрятать и получить отказ через 2 недели review.
+- **Линки на 3 existing discussions** — показывает что problem подтверждена сообществом, не наша одинокая боль.
+- **Не упоминать другие AI-системы** (ANPR, face recognition итд) — Blake уже несколько раз говорил что Frigate scope = детектор и NVR, не platform. Подача "cuframes решает вашу проблему" работает лучше чем "cuframes построит экосистему".
@@ -0,0 +1,107 @@
+# Show HN post (для Etap F — позже)
+
+**Status:** DRAFT — не публикуем сейчас. Этот файл черновик к Etap F (launch).
+
+**Куда:** https://news.ycombinator.com/submit
+
+**Когда публиковать:**
+- После того как FFmpeg-devel RFC получит первый response (даже отказ — это traction)
+- ИЛИ после того как Frigate discussion получит +5 upvotes / 3+ комментариев
+- ИЛИ если оба молчат 2 недели — публиковать в любом случае, HN-аудитория более независимая
+- **Время:** будний день, 13:00-15:00 UTC (peak HN traffic from US morning + EU afternoon)
+- **Не публиковать** в пятницу вечером / в выходные / в крупный tech-event день (Apple keynote, GTC, etc.) — drown'ит в шуме
+
+---
+
+## Title
+
+Опции (выбрать одну):
+
+1. `Show HN: Cuframes – zero-copy sharing of decoded video frames between processes via CUDA IPC`
+2. `Show HN: Stop redecoding the same RTSP stream in every consumer`
+3. `Show HN: Cuframes – one NVDEC, many consumers, zero-copy in VRAM`
+
+Рекомендую **#2** — describes problem in 7 words, HN любит problem-first titles. #1 — для технической HN ниши тоже OK.
+
+## Body
+
+```markdown
+Hi HN,
+
+I run a homelab CCTV stack with 16 cameras feeding into Frigate (object
+detection), a custom C++ analytics service, and a recording NVR. All three
+were running NVDEC on the same RTSP streams. On an RTX 3060 this saturated
+the decoder slots and the consumer GPUs in my office burnt about 40W of
+redundant decoding when nothing interesting was happening.
+
+So I wrote a small library that lets one process decode the stream once
+into a CUDA ring buffer and the others import the same buffer via
+cudaIpcOpenMemHandle. Decoded NV12 frame lands in VRAM exactly once, every
+consumer reads it zero-copy.
+
+Repo (LGPL-2.1+): https://git.goldix.org/gx/cuframes
+
+What's in it:
+
+  - libcuframes — the producer/consumer C/C++ library
+  - cuframes-rtsp-source — standalone RTSP → cuframes bridge (one per cam)
+  - A small out-of-tree FFmpeg demuxer ("cuframes://") so downstream
+    consumers don't need to know they're consuming shared frames
+  - Reference docker-compose for the Frigate + custom-app setup
+  - 24h production deployment on the homelab, ~25 fps × 16 cameras × 3
+    consumers from a single NVDEC session
+
+What surprised me along the way:
+
+  - CUDA IPC handles are bound to the device that allocated them, not just
+    a CUDA context — both peers must be on the same GPU. (Documented;
+    bit out of the way in the Programming Guide §3.2.8.)
+  - Cross-container CUDA IPC needs both --ipc and --pid namespace share,
+    not just --ipc. The latter wasn't obvious from the error message
+    ("invalid device context" with no mention of /proc visibility).
+  - Frigate's s6-overlay is incompatible with --pid share because s6
+    insists on being PID 1. There's a documented race-window workaround
+    but it's the one rough edge.
+
+What it is not:
+
+  - Not a transcoding framework. No re-encoding, no filtering, no policy.
+  - Not multi-GPU (CUDA IPC is single-device).
+  - Not Windows / macOS / WSL2 / AMD.
+
+What's next:
+
+  - Upstream FFmpeg RFC for the demuxer (drafted, not sent yet — would
+    appreciate review of the RFC text first).
+  - v0.2 makes the FFmpeg path true zero-copy via AVHWFramesContext (no
+    cudaMemcpy2DAsync round-trip).
+
+Happy to answer questions. Especially interested in:
+
+  - Anyone running multi-consumer GPU video pipelines with a different
+    solution? Curious what tradeoffs you hit.
+  - Vulkan-video folks: is there an obvious cross-process sharing path
+    via VkExternalMemory + DMA-BUF that I'm missing? I went CUDA-only
+    because that's what worked first, but Vulkan would be vendor-neutral.
+
+— [your handle]
+```
+
+## Notes на review
+
+- **HN формат:** первая строка — hook (concrete problem, concrete numbers — "40W redundant decoding"). НЕ начинать с "Hi everyone, today I'm excited to share..."
+- **Без emoji**, без markdown headers (HN не renders'ит markdown в title-area; body тоже почти plain text)
+- **Конкретные числа** — HN respect'ит numbers. "40W", "24h", "25 fps × 16 cam × 3 consumer", "~400 LOC patch"
+- **"What it is not"** — отсекает Vue Apologists которые иначе пишут "why don't you support Windows?". Это HN best practice
+- **Open questions внизу** — driver discussion. Без них первый комментарий = "и зачем это?". С ними — "вот мой опыт с DeepStream"
+- **Avoid:** "battle-tested", "production-ready", "enterprise-grade", "10x faster than X" — HN crowd специально downvotes такое
+- **Будь готов** отвечать **первые 2 часа** активно — HN ранжирование сильно зависит от engagement в первый час. Если не сможешь быть в офлайне — не публикуй
+- **Если автор — не main maintainer** repo — упомянуть это в первом комменте от собственного аккаунта чтобы не выглядело как третье-лицо PR
+
+## Альтернатива — r/selfhosted
+
+Если HN кажется слишком high-stakes, можно сначала **r/selfhosted** (180k subs) — там Frigate-аудитория, прямой fit. Менее brutal, легче получить early feedback.
+
+Title для reddit: `Reduced NVDEC saturation across Frigate + custom apps by sharing decoded frames over CUDA IPC — open-sourced the library`
+
+Этот текст короче (HN body слишком длинный для reddit), но идея та же.