cuframes

Author	SHA1	Message	Date
gx	00fb3e9528	ci: preinstall node+git в CUDA container (actions/checkout требует node) build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Failing after 1m6s Details build / ffmpeg filter patch (out-of-tree) (push) Has been skipped Details Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 21:47:25 +01:00
gx	4a6a6f4a6c	ci: gitea Actions workflows (build, release) + README badges build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Failing after 1m4s Details build / ffmpeg filter patch (out-of-tree) (push) Has been skipped Details - .gitea/workflows/build.yml — on push/PR: * cmake build на CUDA 12.4 devel image (Ubuntu 22.04 base) * compile-only smoke (no GPU нужен): libcuframes.so + tools + examples * install-prefix layout verify (headers + libs в правильных путях) * filter/ — clone FFmpeg n7.1 + apply patch + build minimal patched ffmpeg, verify cuframes demuxer registered - .gitea/workflows/release.yml — on tag v: build runtime Docker image, push в git.goldix.org/gx/cuframes:<version> * build source tarball cuframes-<version>.tar.gz как artifact - README.md badges: build status, release version, license Runner: gitea act_runner v0.4.1 на R9-88.23 — labels ubuntu-22.04 / ubuntu-24.04 доступны через docker.gitea.com/runner-images. CUDA devel image использует nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04 (уже cached на runner host). Stress test (требует GPU) намерено НЕ в CI — runner без GPU. Запускать отдельно на dev-машине через ctest. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 21:43:55 +01:00
gx	12708618d4	docs: reference integrations + examples - docs/integrations/frigate.md — полный production-tested guide: Dockerfile, docker-compose, config.yml, troubleshooting (s6+pid, scale_cuda, hwaccel issues), build steps - docs/integrations/cctv-cpp.md — C++ pattern: IFrameSource interface + CuframesSource skeleton + CMake setup + runtime requirements - examples/frigate-compose/ — reference compose stack (cuframes-pub + Frigate) с config.yml stub, .env.example, README - examples/python-consumer/ — ctypes-based skeleton для AI/ML pipeline'ов (до v0.3 native pybind11 bindings) - docs/integration.md — превратился в index-страницу, ссылается на specific guides Reorganization упрощает onboarding: пользователь выбирает guide по типу integration'а (Frigate/C++/Python/FFmpeg) и сразу видит реальный code. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 21:37:35 +01:00
gx	a3ba3a95b2	docs: ROADMAP + CHANGELOG v0.1.0 + BENCHMARKS - ROADMAP.md: structured v0.1✅ / v0.2📋 (encoded packet sharing + FFmpeg upstream PR + scale-cuda alt) / v0.3 (Python bindings, Jetson, multi-GPU) / v1.0 (stable ABI) - CHANGELOG.md: full v0.1.0 release notes — features, tested config, production deployment, known limitations - BENCHMARKS.md: measurements (stress 1×pub×4×sub, E2E real camera, prod multi-consumer 24h, VRAM cost per resolution, cuframes vs N×NVDEC) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> v0.1.0	2026-05-18 21:11:37 +01:00
gx	601806a5f8	build: add cmake install rules for libcuframes cmake --install теперь правильно кладёт libcuframes.so/.a в lib/ и headers в include/cuframes/. Нужно для downstream builders (FFmpeg patched build, deb packaging). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 12:52:16 +01:00
gx	99ab0e0524	Merge pull request 'feat(filter): FFmpeg 7.1 cuframes:// input demuxer (PoC v1)' (#1 ) from feat/ffmpeg-demuxer into main Reviewed-on: #1	2026-05-17 09:08:09 +01:00
gx	99df68f69c	feat(filter): FFmpeg 7.1 cuframes:// input demuxer Out-of-tree patch + sources для FFmpeg-демаксера, который позволяет любому FFmpeg-based потребителю (Frigate, кастомные рекордеры, re-streamers) читать "cuframes://<key>" как обычный URL — без своего NVDEC. Состав: - filter/cuframesdec.c — реализация (libavformat-style) - filter/ffmpeg-7.1-cuframes-demuxer.patch — patch для FFmpeg n7.1 (Makefile / allformats.c / configure) - filter/README.md — инструкции по сборке + CLI smoke test + Frigate plan v1 ограничения (намеренно): - только NV12 - GPU → CPU копия через cudaMemcpy2DAsync (zero-copy AVHWFramesContext — v2) CLI smoke test 2026-05-17 (host build FFmpeg + libcuframes, publisher на камере 192.168.88.98 1920x1080 HEVC 25fps): ffmpeg -f cuframes -i cuframes://cam-ff -c:v copy -f null - → frame=100 fps=25 q=-1.0 speed=1x ✓ → "cuframes: connected to 'cam-ff' — 1920x1080 NV12" Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 09:02:12 +01:00
gx	f10413580d	docs: cross-container CUDA IPC requires both --ipc и --pid namespace share Реальный тест на 192.168.88.98 (1920x1080 HEVC, 25fps) показал: для отдельных consumer-container'ов недостаточно ipc=container:X — нужен также pid=container:X, иначе cudaIpcOpenEventHandle падает с invalid device context. CUDA driver валидирует IPC peer через /proc/<pid>/... E2E на реальной камере проверен: publisher (отдельный контейнер) -> consumer (docker exec): 250 frames, 0 gaps publisher (отдельный контейнер) -> consumer (отдельный с pid+ipc): 200, 0 gaps Обновлено: - docs/integration.md compose snippet, verification, troubleshooting section - docker-compose.example.yml — добавлен pid: container:cuframes-cam-test - README.md quickstart — добавлен --pid в docker run subscriber Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-15 06:37:09 +01:00
gx	44dab75e08	docs+docker: integration guide и runtime image для Frigate/cctv stack docs/integration.md — детальный guide для интеграции в существующий CCTV docker-compose: критичные требования (ipc=shareable/container, общий shared volume для socket), пример CuframesSource для cctv-processor, verification checklist, troubleshooting (timeout, ipc namespace mismatch, high latency). Зафиксировано: v0.1 frigate-decode не убирается без patch'а FFmpeg — это v0.2 scope. docker/Dockerfile.runtime — multi-stage build (devel → runtime), копирует libcuframes.so + cuframes-rtsp-source + sub_count в /usr/local. Образ ~700 MB (vs ~7 GB у dev'а). Smoke-test: бинарки запускаются, ldd видит все нужные libs. docker-compose.example.yml — reference docker-compose с правильным ipc mode и volume mounts для копирования в свои проекты. .dockerignore — исключает build/ и build-*/ из COPY context. README обновлён: статус v0.1 done, quickstart с реальным docker run, ссылка на integration guide.	2026-05-14 23:47:56 +01:00
gx	a21812d3f6	tools+examples+test: end-to-end pipeline ready (Steps 9-10) cuframes-rtsp-source — standalone bridge между RTSP/file и cuframes IPC. Декодирует на CUDA (nvdec), копирует D2D в pre-allocated pool (EXTERNAL ownership), публикует через cuframes. --realtime для pacing файлового ввода, --loop для зацикливания. Альтернатива FFmpeg-фильтра до v0.2 (filter требует patch FFmpeg, конфликтует с Frigate's bundled build). examples/sub_count — reference subscriber на raw C API: counts frames, trackit gaps, выходит clean при disconnect/timeout/SIGINT. test_stress (4 subscribers × 2000 frames @ 120fps) — PASS на RTX 5090. 0 torn frames у всех consumers (включая 2 slow с 5ms sleep). Smoke-проверено: testsrc 25fps → cuframes-rtsp-source → cuframes IPC → sub_count (отдельный процесс) → 200/200 frames, 0 gaps, avg_fps=25.2.	2026-05-14 23:39:01 +01:00
gx	2530057507	hpp: C++ RAII wrapper (header-only, Step 7) Тонкий слой поверх C API: - cuframes::Error — exception при ошибках, code() для подробностей - cuframes::Publisher — RAII обёртка publisher'а (LIBRARY + EXTERNAL constructors) - cuframes::Subscriber + cuframes::FrameRef — RAII frame с автo-release - cuframes::AsyncSubscriber — с std::function callbacks - cuframes::Frame — read-only view (для callback'а) - cuframes::calc_frame_size(), now_ns() — utilities Smoke test (in dev container): $ g++ -std=c++17 ... -lcuframes -lcudart smoke.cpp $ ./smoke version: 0.1.0 FullHD NV12 frame: 3317760 bytes (pitch_y=2048, pitch_uv=2048)	2026-05-14 23:23:35 +01:00
gx	46c2b94939	libcuframes v0.1: producer + consumer (sync + async) + tests Implements Steps 3-6 of Phase 1 according to docs/protocol.md. libcuframes/src/: - internal.h (660 lines) — shared structs (byte-exact protocol.md layout) + _Static_assert на offsets/sizes - utils.c — error strings, frame size calc, now_ns, key validation - protocol.c — TLV framing для Unix socket с poll-based timeout - producer.c (~700 lines) — Step 3: * LIBRARY mode: cudaMalloc pool, IPC handle export * EXTERNAL mode: register user-provided pointers * cudaIpcEventHandle_t для cross-process sync (R1/R2) * Unix socket accept thread, handshake state machine * Bit allocation 1..31, name collision check (Y5) * STRICT_WAIT policy: timeout with dead-subscriber eviction - consumer.c (~400 lines) — Step 4: * Synchronous next() with poll-based wait * cudaStreamWaitEvent на consumer-stream (R1/R2) * Opaque cuframes_frame_t с accessor functions (Y6) * NEWEST_ONLY и STRICT_ORDER modes * ACK via atomic_fetch_or на bitmap - consumer_async.c — Step 5: thread + callback wrapper над sync API libcuframes/tests/: - test_pingpong.cu — single producer × single consumer, 200 frames @ 60fps, verify через kernel-on-consumer-stream (правильный test для sync semantics, см. spike-v2) - test_multi.cu — 1 producer × 3 consumers через fork() Build: - Top-level CMakeLists.txt с options - libcuframes/CMakeLists.txt: shared + static library, c_std_11 - Suppress -Waddress-of-packed-member (известная безопасная warning x86_64) Results (внутри cuframes-dev container, RTX 5090): - pingpong_basic PASS 4.5s 200 frames, 0 torn - multi_consumer PASS 4.1s 1 × 3 consumers, all PASS Phase 1 Step 6 done. Дальше: Step 7 (C++ wrapper), Step 9 (FFmpeg filter).	2026-05-14 23:21:30 +01:00
gx	dc478c7cda	docs: system requirements (hardware, software, build, Docker, k8s) docs/requirements.md (220 строк): - Hardware: NVIDIA GPU CC ≥7.5 (Turing+), Linux x86_64, VRAM/RAM/CPU minimum - Software host: kernel ≥5.4, driver ≥525/555, glibc ≥2.31, Ubuntu/Debian/RHEL - Build deps: CUDA Toolkit ≥12.0, GCC 11+, CMake 3.20+, FFmpeg 4.4+ - Docker: nvidia-container-toolkit, --gpus, --ipc=shareable, --shm-size=2gb - Cross-container CUDA IPC: variant A (--ipc=container:X), variant B (host), k8s через emptyDir + shareProcessNamespace - Out-of-scope: AMD/Intel/macOS/Windows/WSL2/Jetson/multi-GPU/multi-host - Quick-check команды (nvidia-smi, uname, ldd, df /dev/shm) - Tested matrix (Phase 0): RTX 5090, driver 595, CUDA 13.0.88, Ubuntu 24.04 README.md обновлён: - Краткая таблица minimum vs recommended - Список не-поддерживаемых платформ - Ссылки на все docs/ файлы (architecture, protocol, requirements, benchmarks)	2026-05-14 23:11:30 +01:00
gx	6608f5d2f6	docs(protocol): bit-exact wire protocol specification (R4) Closes последний RED-flag из arch review. Что описано (§-sections): 1. Resources & lifecycle (socket / shm / IPC handles cleanup, crash recovery) 2. Shared memory byte-by-byte layout (offsets, packing, atomics) 2.1 frame meta (64 bytes) 2.2 slot descriptor (192 bytes) 2.3 subscriber slot (128 bytes) 3. Unix socket TLV protocol (8 message types, framing) 4. State machines (subscriber-side, publisher-side per-subscriber) 5. ACK protocol с cudaEventRecord / cudaStreamWaitEvent 6. Versioning rules (proto_version vs lib_version, reserved fields) 7. Conformance test skeleton (offset checks, sizeof checks, handshake) 8. Open для v0.2 (TLS, multi-format, ROCm) 9. Reference impl pointer (libcuframes/src/protocol.c — Phase 1) После v0.2 release — wire protocol frozen, breaking changes = bump proto_version. До v0.2 — experimental. Решает все 4 пункта из arch review section R4: ✓ SHM layout (annotated struct + ASCII layout) ✓ Socket protocol (state machine + message framing) ✓ Versioning rules ✓ Lifecycle / cleanup (incl. CUDA IPC handle leak при crash) Готов к Step 2 (Phase 1 implementation).	2026-05-14 23:04:46 +01:00
gx	98a60b7730	header v2: address arch review R3 + Y4/Y5/Y6/Y7/Y9 R3 (publisher API не работает с FFmpeg's hwframe pool): - Добавлен ownership_mode field: LIBRARY (default, текущий API) или EXTERNAL. - Новая функция publisher_create_external(cuda_ptrs[], ptr_count, frame_size) для случая когда CUDA память выделена upstream (FFmpeg AVHWFramesContext). - Новая publish_external(cuda_ptr) — публикует один из pre-registered handles. - Для FFmpeg filter теперь zero-copy: filter получает AVFrame, library уже имеет IPC handle на этот pointer (registered в create), publish — atomic seq bump. R1/R2 closure отражено в API: - publish() теперь принимает cudaStream_t — library делает cudaEventRecord вместо stream sync. - next() теперь принимает consumer_stream — library делает cudaStreamWaitEvent перед возвратом frame. Cross-process sync через cudaIpcEventHandle_t. Y6 (opaque frame через handle, не struct с _internal_*): - cuframes_frame_t стал opaque (typedef struct, не определена). - Accessor functions: cuda_ptr, format, size, pitch_y, pitch_uv, seq, pts_ns. - ABI-stable при добавлении полей в minor releases. Y7 (redundant try_next): - Удалён subscriber_try_next. next(.., timeout_ms=0) — non-blocking с CUFRAMES_ERR_WOULD_BLOCK. Y5 (consumer_name uniqueness): - Документировано что duplicate name → ALREADY_EXISTS. - Добавлен CUFRAMES_ERR_TOO_MANY для случая >32 subscribers. Y9 (pts_ns clock): - Документировано что MONOTONIC у publisher'а, consumer должен sanity-check на epoch reset при publisher restart. Также: - meta-блок (cuframes_frame_meta_t) перестал быть public — meta доступна через accessor'ы на opaque frame. - _reserved[4] в configs для forward-compat без breaking ABI. - Добавлен cuframes_protocol_version() — wire protocol majoring отдельно от lib version. Готов к Step 2 (docs/protocol.md + implementation).	2026-05-14 23:02:50 +01:00
gx	fe330ca279	arch: close open question §6.6 — events as default for cross-process sync См. spike-v2 (commit `ad54305`) + arch review 2026-05-15. cudaStreamSynchronize-only фактически работает на single-host single-GPU (0 torn в 4 scenarios PoC), но NVIDIA Programming Guide §3.2.8 не даёт contractual гарантии. Переключаемся на cudaIpcEventHandle_t как default, stream-sync остаётся опциональным fallback. Net: +20µs mean latency, -3× max latency (predictable tail), future-proof для multi-GPU.	2026-05-14 23:00:40 +01:00
gx	ad543054fc	spike-v2: validate sync semantics (R1/R2 architectural review) Architectural review (2026-05-15) указал что cudaStreamSynchronize-only на producer-side не достаточен для cross-process visibility — NVIDIA Programming Guide §3.2.8 требует cudaIpcEventHandle_t. Phase 0 PoC v1 не проверял этот случай из-за cudaMemcpy который имеет implicit barriers. spike-v2 воспроизводит правильный сценарий: consumer запускает verify_kernel на ОТДЕЛЬНОМ stream'е (real-world use case — PyTorch / OpenCV CUDA), pattern включает row-based component для отлова partial-frame torn. Запуск 4 scenarios × 1500/600 frames: A-fhd60 (stream sync, FHD@60): 0 torn, p99=267µs, max=14.7ms B-fhd60 (event sync, FHD@60): 0 torn, p99=344µs, max=5.2ms A-4k30 (stream sync, 4K@30): 0 torn, p99=606µs, max=4.4ms B-4k30 (event sync, 4K@30): 0 torn, p99=437µs, max=3.7ms Все 4 показали 0 torn frames. R1 на single-host single-GPU фактически не воспроизводится — но NVIDIA contractually не гарантирует это. Decision: events as default (R1/R2 resolved). Architecture.md §6.6 закрыт. Tradeoff: mean latency +20µs, max latency в 3× ниже (predictable tail) + future-proof для multi-GPU. Также Dockerfile.dev — апдейт CUDA до 13.0.3 (12.4 не существует с devel-ubuntu24.04). Связано с PR review: R1, R2, R3 (R3, R4 — в следующих коммитах).	2026-05-14 23:00:13 +01:00
gx	c2c2a9751a	phase0: benchmark results — PASSED on RTX 5090 (Blackwell sm_120) Basic (1 producer × 1 consumer): p50=75µs p95=146µs p99=152µs (target was <5ms — мы 33× ниже) 500 frames, 0 torn, 0 skipped, zero-copy verified Multi-consumer (1 × 3): p99 для всех 3: 151-152µs (identical = proof zero-copy без contention) 300 frames each, 0 torn, 0 skipped Acceptance criteria — GREEN. Переходим к Phase 1 (libcuframes API). Sync через cudaStreamSynchronize достаточен для v0.1; CUDA IPC event handles overlap отложен до v0.2. Raw measurement logs сохранены в docs/measurements/phase0-consumer-*.log для verification (4 файла из 2 scenarios). Также fixed unused variable warning в pingpong_consumer.cu.	2026-05-14 22:02:49 +01:00
gx	604cffb5e5	spike(phase0): minimal CUDA IPC ping-pong producer/consumer PoC для validation концепта перед инвестированием в Phase 1. Структура: - tools/spike/common.h — типы SharedHeader / SlotDescriptor / NV12 meta - tools/spike/pingpong_producer.cu — аллоцирует CUDA pool, экспортирует IPC handles в /dev/shm/cuframes-spike-<key>, имитирует publish frames с monotonic pattern - tools/spike/pingpong_consumer.cu — открывает handles, читает frames, verify содержимого (no torn frames), измеряет latency, печатает summary - tools/spike/CMakeLists.txt — sm_75/86/89/90/120 для RTX 5090 - tools/spike/bench.sh — basic / multi-consumer / stress scenarios - tools/spike/README.md — what / how / acceptance Намеренные упрощения PoC (не идём в Phase 1 пока без validation): - 2-slot ring (Phase 1 будет N) - POSIX shared memory + atomic seq (без Unix socket handshake) - cudaStreamSynchronize sync (Phase 0 spike проверит будет ли достаточно; альтернатива cudaIpcEventHandle_t — отложена) - NV12 hardcoded (других форматов в Phase 1) - Drop-oldest backpressure (без ACK protocol) Acceptance Phase 0: - p99 latency на RTX 5090 для FullHD < 5 ms - throughput ≥ 1 GB/s - multi-consumer (3) с сопоставимой latency - cross-container работает - 1-hour stress без VRAM/RAM leak Если acceptance fail → дизайн пересмотр (sync через CUDA IPC events).	2026-05-14 21:20:39 +01:00
gx	6962bc3c7e	docker: dev environment с CUDA 12.4 + build tools Dockerfile.dev + docker-compose.dev.yml + docker/README.md. Base: nvidia/cuda:12.4.1-cudnn-devel-ubuntu24.04. В контейнер включены: - CUDA toolkit (nvcc, headers, libs) - GCC 12, Clang + clang-format + clang-tidy - CMake + Ninja - FFmpeg dev headers (6.x системные) — для linking при разработке filter - Python 3.12 + dev (для Phase 3 bindings) - Profiling/debug tools: valgrind, gdb, strace, ltrace docker-compose.dev.yml настройки: - runtime: nvidia + --gpus all - ipc: shareable — для cross-container CUDA IPC (Phase 1+) - shm_size: 2gb — стандартный 64 MB не хватит для frame buffers - SYS_PTRACE + seccomp:unconfined — для gdb/strace внутри (dev-only) - bind-mount корня репо → /workspace - /run/cuframes для Unix sockets Использование документировано в docker/README.md. Production-images (FFmpeg-with-plugin, Frigate drop-in) — отдельная работа в Phase 4.	2026-05-14 21:18:32 +01:00
gx	c8ab4522f2	initial commit: design specification + repo scaffolding cuframes — open-source FFmpeg-плагин и runtime library для zero-copy sharing декодированных видеокадров между процессами через CUDA IPC. Содержимое initial commit: - docs/architecture.md — полная design-spec (418 строк) с prior art, protocol design, API draft, phase plan, acceptance criteria - README.md — landing с описанием идеи, состава, quickstart-tease, roadmap, ссылки на community-discussions подтверждающие спрос - CONTRIBUTING.md — guidelines, code style, commit message convention - CHANGELOG.md — Keep a Changelog format, Unreleased / 0.0.1 - LICENSE — LGPL-2.1+ (compatibility с FFmpeg) - .gitignore — build/CMake/Docker/Python/CUDA-specific Следующие шаги (отдельные коммиты): - docker/Dockerfile.dev (CUDA 12.x dev environment) - tools/spike/ (Phase 0 PoC код для measurement CUDA IPC latency)	2026-05-14 21:17:34 +01:00

21 Commits