cuframes

Author	SHA1	Message	Date
gx	612843bd39	docs: launch drafts (Frigate discussion + FFmpeg-devel RFC + Show HN) 3 черновика для upstream visibility (Etap E): - docs/launch/frigate-integration-issue.md — Discussion на blakeblackshear/frigate - docs/launch/ffmpeg-devel-rfc.md — RFC patch + cover letter для ffmpeg-devel ML - docs/launch/hn-show-post.md — Show HN draft (Etap F) - docs/launch/README.md — порядок, чек-лист, pre-flight notes См. issue #3.	2026-05-19 02:04:42 +01:00
gx	fbe1d18c39	docs: troubleshooting guide + production notes - docs/troubleshooting.md — 13 секций с реальными grабельками которые мы прошли: cudaIpcOpenEventHandle invalid device context (pid namespace), s6-overlay vs pid share, scale_cuda missing (cuda-llvm + stdbit.h glibc 2.36), libcuframes not found install paths, ffbuild/ missing source, GMP no working compiler (long-long reliability), zlib.net deprecated URL, RTSP/RTP UDP docker NAT, gitea actions Node version - docs/architecture.md — Appendix A "Production deployment notes" с реальными observations после 24h+ run: что подтвердилось, что доработали, что не учли - docs/requirements.md — production deployment matrix + Docker namespace requirements таблица (cross-container CUDA IPC требует 5 условий) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 00:37:13 +01:00
gx	12708618d4	docs: reference integrations + examples - docs/integrations/frigate.md — полный production-tested guide: Dockerfile, docker-compose, config.yml, troubleshooting (s6+pid, scale_cuda, hwaccel issues), build steps - docs/integrations/cctv-cpp.md — C++ pattern: IFrameSource interface + CuframesSource skeleton + CMake setup + runtime requirements - examples/frigate-compose/ — reference compose stack (cuframes-pub + Frigate) с config.yml stub, .env.example, README - examples/python-consumer/ — ctypes-based skeleton для AI/ML pipeline'ов (до v0.3 native pybind11 bindings) - docs/integration.md — превратился в index-страницу, ссылается на specific guides Reorganization упрощает onboarding: пользователь выбирает guide по типу integration'а (Frigate/C++/Python/FFmpeg) и сразу видит реальный code. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 21:37:35 +01:00
gx	f10413580d	docs: cross-container CUDA IPC requires both --ipc и --pid namespace share Реальный тест на 192.168.88.98 (1920x1080 HEVC, 25fps) показал: для отдельных consumer-container'ов недостаточно ipc=container:X — нужен также pid=container:X, иначе cudaIpcOpenEventHandle падает с invalid device context. CUDA driver валидирует IPC peer через /proc/<pid>/... E2E на реальной камере проверен: publisher (отдельный контейнер) -> consumer (docker exec): 250 frames, 0 gaps publisher (отдельный контейнер) -> consumer (отдельный с pid+ipc): 200, 0 gaps Обновлено: - docs/integration.md compose snippet, verification, troubleshooting section - docker-compose.example.yml — добавлен pid: container:cuframes-cam-test - README.md quickstart — добавлен --pid в docker run subscriber Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-15 06:37:09 +01:00
gx	44dab75e08	docs+docker: integration guide и runtime image для Frigate/cctv stack docs/integration.md — детальный guide для интеграции в существующий CCTV docker-compose: критичные требования (ipc=shareable/container, общий shared volume для socket), пример CuframesSource для cctv-processor, verification checklist, troubleshooting (timeout, ipc namespace mismatch, high latency). Зафиксировано: v0.1 frigate-decode не убирается без patch'а FFmpeg — это v0.2 scope. docker/Dockerfile.runtime — multi-stage build (devel → runtime), копирует libcuframes.so + cuframes-rtsp-source + sub_count в /usr/local. Образ ~700 MB (vs ~7 GB у dev'а). Smoke-test: бинарки запускаются, ldd видит все нужные libs. docker-compose.example.yml — reference docker-compose с правильным ipc mode и volume mounts для копирования в свои проекты. .dockerignore — исключает build/ и build-*/ из COPY context. README обновлён: статус v0.1 done, quickstart с реальным docker run, ссылка на integration guide.	2026-05-14 23:47:56 +01:00
gx	dc478c7cda	docs: system requirements (hardware, software, build, Docker, k8s) docs/requirements.md (220 строк): - Hardware: NVIDIA GPU CC ≥7.5 (Turing+), Linux x86_64, VRAM/RAM/CPU minimum - Software host: kernel ≥5.4, driver ≥525/555, glibc ≥2.31, Ubuntu/Debian/RHEL - Build deps: CUDA Toolkit ≥12.0, GCC 11+, CMake 3.20+, FFmpeg 4.4+ - Docker: nvidia-container-toolkit, --gpus, --ipc=shareable, --shm-size=2gb - Cross-container CUDA IPC: variant A (--ipc=container:X), variant B (host), k8s через emptyDir + shareProcessNamespace - Out-of-scope: AMD/Intel/macOS/Windows/WSL2/Jetson/multi-GPU/multi-host - Quick-check команды (nvidia-smi, uname, ldd, df /dev/shm) - Tested matrix (Phase 0): RTX 5090, driver 595, CUDA 13.0.88, Ubuntu 24.04 README.md обновлён: - Краткая таблица minimum vs recommended - Список не-поддерживаемых платформ - Ссылки на все docs/ файлы (architecture, protocol, requirements, benchmarks)	2026-05-14 23:11:30 +01:00
gx	6608f5d2f6	docs(protocol): bit-exact wire protocol specification (R4) Closes последний RED-flag из arch review. Что описано (§-sections): 1. Resources & lifecycle (socket / shm / IPC handles cleanup, crash recovery) 2. Shared memory byte-by-byte layout (offsets, packing, atomics) 2.1 frame meta (64 bytes) 2.2 slot descriptor (192 bytes) 2.3 subscriber slot (128 bytes) 3. Unix socket TLV protocol (8 message types, framing) 4. State machines (subscriber-side, publisher-side per-subscriber) 5. ACK protocol с cudaEventRecord / cudaStreamWaitEvent 6. Versioning rules (proto_version vs lib_version, reserved fields) 7. Conformance test skeleton (offset checks, sizeof checks, handshake) 8. Open для v0.2 (TLS, multi-format, ROCm) 9. Reference impl pointer (libcuframes/src/protocol.c — Phase 1) После v0.2 release — wire protocol frozen, breaking changes = bump proto_version. До v0.2 — experimental. Решает все 4 пункта из arch review section R4: ✓ SHM layout (annotated struct + ASCII layout) ✓ Socket protocol (state machine + message framing) ✓ Versioning rules ✓ Lifecycle / cleanup (incl. CUDA IPC handle leak при crash) Готов к Step 2 (Phase 1 implementation).	2026-05-14 23:04:46 +01:00
gx	fe330ca279	arch: close open question §6.6 — events as default for cross-process sync См. spike-v2 (commit `ad54305`) + arch review 2026-05-15. cudaStreamSynchronize-only фактически работает на single-host single-GPU (0 torn в 4 scenarios PoC), но NVIDIA Programming Guide §3.2.8 не даёт contractual гарантии. Переключаемся на cudaIpcEventHandle_t как default, stream-sync остаётся опциональным fallback. Net: +20µs mean latency, -3× max latency (predictable tail), future-proof для multi-GPU.	2026-05-14 23:00:40 +01:00
gx	ad543054fc	spike-v2: validate sync semantics (R1/R2 architectural review) Architectural review (2026-05-15) указал что cudaStreamSynchronize-only на producer-side не достаточен для cross-process visibility — NVIDIA Programming Guide §3.2.8 требует cudaIpcEventHandle_t. Phase 0 PoC v1 не проверял этот случай из-за cudaMemcpy который имеет implicit barriers. spike-v2 воспроизводит правильный сценарий: consumer запускает verify_kernel на ОТДЕЛЬНОМ stream'е (real-world use case — PyTorch / OpenCV CUDA), pattern включает row-based component для отлова partial-frame torn. Запуск 4 scenarios × 1500/600 frames: A-fhd60 (stream sync, FHD@60): 0 torn, p99=267µs, max=14.7ms B-fhd60 (event sync, FHD@60): 0 torn, p99=344µs, max=5.2ms A-4k30 (stream sync, 4K@30): 0 torn, p99=606µs, max=4.4ms B-4k30 (event sync, 4K@30): 0 torn, p99=437µs, max=3.7ms Все 4 показали 0 torn frames. R1 на single-host single-GPU фактически не воспроизводится — но NVIDIA contractually не гарантирует это. Decision: events as default (R1/R2 resolved). Architecture.md §6.6 закрыт. Tradeoff: mean latency +20µs, max latency в 3× ниже (predictable tail) + future-proof для multi-GPU. Также Dockerfile.dev — апдейт CUDA до 13.0.3 (12.4 не существует с devel-ubuntu24.04). Связано с PR review: R1, R2, R3 (R3, R4 — в следующих коммитах).	2026-05-14 23:00:13 +01:00
gx	c2c2a9751a	phase0: benchmark results — PASSED on RTX 5090 (Blackwell sm_120) Basic (1 producer × 1 consumer): p50=75µs p95=146µs p99=152µs (target was <5ms — мы 33× ниже) 500 frames, 0 torn, 0 skipped, zero-copy verified Multi-consumer (1 × 3): p99 для всех 3: 151-152µs (identical = proof zero-copy без contention) 300 frames each, 0 torn, 0 skipped Acceptance criteria — GREEN. Переходим к Phase 1 (libcuframes API). Sync через cudaStreamSynchronize достаточен для v0.1; CUDA IPC event handles overlap отложен до v0.2. Raw measurement logs сохранены в docs/measurements/phase0-consumer-*.log для verification (4 файла из 2 scenarios). Также fixed unused variable warning в pingpong_consumer.cu.	2026-05-14 22:02:49 +01:00
gx	c8ab4522f2	initial commit: design specification + repo scaffolding cuframes — open-source FFmpeg-плагин и runtime library для zero-copy sharing декодированных видеокадров между процессами через CUDA IPC. Содержимое initial commit: - docs/architecture.md — полная design-spec (418 строк) с prior art, protocol design, API draft, phase plan, acceptance criteria - README.md — landing с описанием идеи, состава, quickstart-tease, roadmap, ссылки на community-discussions подтверждающие спрос - CONTRIBUTING.md — guidelines, code style, commit message convention - CHANGELOG.md — Keep a Changelog format, Unreleased / 0.0.1 - LICENSE — LGPL-2.1+ (compatibility с FFmpeg) - .gitignore — build/CMake/Docker/Python/CUDA-specific Следующие шаги (отдельные коммиты): - docker/Dockerfile.dev (CUDA 12.x dev environment) - tools/spike/ (Phase 0 PoC код для measurement CUDA IPC latency)	2026-05-14 21:17:34 +01:00

11 Commits