Files
gx 612843bd39 docs: launch drafts (Frigate discussion + FFmpeg-devel RFC + Show HN)
3 черновика для upstream visibility (Etap E):
- docs/launch/frigate-integration-issue.md — Discussion на blakeblackshear/frigate
- docs/launch/ffmpeg-devel-rfc.md — RFC patch + cover letter для ffmpeg-devel ML
- docs/launch/hn-show-post.md — Show HN draft (Etap F)
- docs/launch/README.md — порядок, чек-лист, pre-flight notes

См. issue #3.
2026-05-19 02:04:42 +01:00

5.9 KiB
Raw Permalink Blame History

Show HN post (для Etap F — позже)

Status: DRAFT — не публикуем сейчас. Этот файл черновик к Etap F (launch).

Куда: https://news.ycombinator.com/submit

Когда публиковать:

  • После того как FFmpeg-devel RFC получит первый response (даже отказ — это traction)
  • ИЛИ после того как Frigate discussion получит +5 upvotes / 3+ комментариев
  • ИЛИ если оба молчат 2 недели — публиковать в любом случае, HN-аудитория более независимая
  • Время: будний день, 13:00-15:00 UTC (peak HN traffic from US morning + EU afternoon)
  • Не публиковать в пятницу вечером / в выходные / в крупный tech-event день (Apple keynote, GTC, etc.) — drown'ит в шуме

Title

Опции (выбрать одну):

  1. Show HN: Cuframes zero-copy sharing of decoded video frames between processes via CUDA IPC
  2. Show HN: Stop redecoding the same RTSP stream in every consumer
  3. Show HN: Cuframes one NVDEC, many consumers, zero-copy in VRAM

Рекомендую #2 — describes problem in 7 words, HN любит problem-first titles. #1 — для технической HN ниши тоже OK.

Body

Hi HN,

I run a homelab CCTV stack with 16 cameras feeding into Frigate (object
detection), a custom C++ analytics service, and a recording NVR. All three
were running NVDEC on the same RTSP streams. On an RTX 3060 this saturated
the decoder slots and the consumer GPUs in my office burnt about 40W of
redundant decoding when nothing interesting was happening.

So I wrote a small library that lets one process decode the stream once
into a CUDA ring buffer and the others import the same buffer via
cudaIpcOpenMemHandle. Decoded NV12 frame lands in VRAM exactly once, every
consumer reads it zero-copy.

Repo (LGPL-2.1+): https://git.goldix.org/gx/cuframes

What's in it:

  - libcuframes — the producer/consumer C/C++ library
  - cuframes-rtsp-source — standalone RTSP → cuframes bridge (one per cam)
  - A small out-of-tree FFmpeg demuxer ("cuframes://") so downstream
    consumers don't need to know they're consuming shared frames
  - Reference docker-compose for the Frigate + custom-app setup
  - 24h production deployment on the homelab, ~25 fps × 16 cameras × 3
    consumers from a single NVDEC session

What surprised me along the way:

  - CUDA IPC handles are bound to the device that allocated them, not just
    a CUDA context — both peers must be on the same GPU. (Documented;
    bit out of the way in the Programming Guide §3.2.8.)
  - Cross-container CUDA IPC needs both --ipc and --pid namespace share,
    not just --ipc. The latter wasn't obvious from the error message
    ("invalid device context" with no mention of /proc visibility).
  - Frigate's s6-overlay is incompatible with --pid share because s6
    insists on being PID 1. There's a documented race-window workaround
    but it's the one rough edge.

What it is not:

  - Not a transcoding framework. No re-encoding, no filtering, no policy.
  - Not multi-GPU (CUDA IPC is single-device).
  - Not Windows / macOS / WSL2 / AMD.

What's next:

  - Upstream FFmpeg RFC for the demuxer (drafted, not sent yet — would
    appreciate review of the RFC text first).
  - v0.2 makes the FFmpeg path true zero-copy via AVHWFramesContext (no
    cudaMemcpy2DAsync round-trip).

Happy to answer questions. Especially interested in:

  - Anyone running multi-consumer GPU video pipelines with a different
    solution? Curious what tradeoffs you hit.
  - Vulkan-video folks: is there an obvious cross-process sharing path
    via VkExternalMemory + DMA-BUF that I'm missing? I went CUDA-only
    because that's what worked first, but Vulkan would be vendor-neutral.

— [your handle]

Notes на review

  • HN формат: первая строка — hook (concrete problem, concrete numbers — "40W redundant decoding"). НЕ начинать с "Hi everyone, today I'm excited to share..."
  • Без emoji, без markdown headers (HN не renders'ит markdown в title-area; body тоже почти plain text)
  • Конкретные числа — HN respect'ит numbers. "40W", "24h", "25 fps × 16 cam × 3 consumer", "~400 LOC patch"
  • "What it is not" — отсекает Vue Apologists которые иначе пишут "why don't you support Windows?". Это HN best practice
  • Open questions внизу — driver discussion. Без них первый комментарий = "и зачем это?". С ними — "вот мой опыт с DeepStream"
  • Avoid: "battle-tested", "production-ready", "enterprise-grade", "10x faster than X" — HN crowd специально downvotes такое
  • Будь готов отвечать первые 2 часа активно — HN ранжирование сильно зависит от engagement в первый час. Если не сможешь быть в офлайне — не публикуй
  • Если автор — не main maintainer repo — упомянуть это в первом комменте от собственного аккаунта чтобы не выглядело как третье-лицо PR

Альтернатива — r/selfhosted

Если HN кажется слишком high-stakes, можно сначала r/selfhosted (180k subs) — там Frigate-аудитория, прямой fit. Менее brutal, легче получить early feedback.

Title для reddit: Reduced NVDEC saturation across Frigate + custom apps by sharing decoded frames over CUDA IPC — open-sourced the library

Этот текст короче (HN body слишком длинный для reddit), но идея та же.