cuframes

Author	SHA1	Message	Date
gx	655649f4d8	cmake: использовать PROJECT_SOURCE_DIR вместо CMAKE_SOURCE_DIR build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Failing after 5m19s Details build / ffmpeg filter patch (out-of-tree) (pull_request) Has been skipped Details build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Failing after 4m14s Details build / ffmpeg filter patch (out-of-tree) (push) Has been skipped Details При сборке cuframes как подпроекта родительского CMake-проекта (add_subdirectory) CMAKE_SOURCE_DIR указывает на корень родителя, а не cuframes. Из-за этого target_include_directories cuframes получал неверный путь и компиляция падала с fatal error: cuframes/cuframes.h: No such file or directory PROJECT_SOURCE_DIR резолвится в каталог project(), то есть всегда указывает на корень cuframes независимо от способа подключения. Standalone-сборка ведёт себя как раньше — оба пути одинаковы.	2026-06-03 04:27:24 +01:00
gx	4862247fe2	v0.4: VMM + POSIX FD — namespace decoupling (no pid share required) build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m46s Details build / ffmpeg filter patch (out-of-tree) (push) Failing after 1m30s Details Заменяет cudaMalloc + cudaIpcGetMemHandle на cuMemCreate (VMM) + cuMemExportToShareableHandle(POSIX_FILE_DESCRIPTOR). FDs передаются consumer'у через sendmsg(SCM_RIGHTS) в handshake. Frigate (s6-overlay не даёт share PID) и любой другой consumer работают БЕЗ pid namespace share — только volume mount unix socket'a /run/cuframes и IPC share для /dev/shm header. Sync: cudaEventRecord+IPC events → cuStreamSynchronize в do_publish. Producer ждёт ~1 ms что stream flush'нулся, потом atomic_store(seq). Consumer читает seq через memory_order_acquire и копирует DtoD без event wait — HW coherence гарантирована на одном GPU. ABI break (согласован с user'ом): - magic 0xCC7C1DCC → 0xCC7C1DCE (старые consumers fail cleanly) - protocol V3 → V4 - libcuframes.so.0 SOVERSION остаётся, но .so.0.3.0 → .so.0.4.0 - EXTERNAL ownership убран (VMM требует cuMemCreate-allocated memory, нельзя export'нуть произвольный cudaMalloc-pointer как POSIX FD) - cuframes-rtsp-source переведён на LIBRARY mode + один D2D memcpy в acquire'нутый slot (overhead малый — публишер всё равно делал такой D2D из FFmpeg hwframe pool в EXTERNAL pool раньше) Размер: granularity 2 MB на 5090 → NV12 1920×1080 (~3.1 MB) округляется до 4 MB, +1 MB на slot × 16 × 4 камеры = +64 MB VRAM. Терпимо. Packet ring (cuframes_packets://) НЕ затронут — отдельный SHM с своим magic, работает как раньше. PoC + smoke в spike/: - vmm_fd_pingpong/ — minimal cuMemCreate+FD round-trip - smoke_v04/ — full publisher+subscriber, 100/100 frames без pid share Base image: Dockerfile.runtime → CUDA 12.4 (был 13.0). Matching prod pipeline + Frigate base, иначе libcudart conflict при load. Compose stack (localhost-infra repo) — параллельный commit: - убран pid: container:cuframes-pub-parking из subscribers - image теги: gx/cuframes:0.4, gx/cuda-grid-pipeline:phase8, gx/frigate:cuframes-v0.4 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 20:13:31 +01:00
gx	d646f5a4e4	v0.3.3: consumer post-sync verify даже для v0.3 per-slot events release / build runtime Docker image (push) Failing after 0s Details release / build source tarball (push) Successful in 4s Details build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m41s Details build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m29s Details test-u4-runner / u4 runner smoke test (push) Has been cancelled Details Bug: cudaEventRecord(event[slot]) overwrites previous state каждый publish. Когда producer wraps ring (~640ms при ring=16), event[slot] re-recorded для new content. Consumer's pending cudaStreamWaitEvent satisfied новым signal — consumer reads slot[slot_idx] thinking it's target_seq, реально получает seq+ring_size content (stale-by-1-wrap drift). После 50k+ wraps в long-running pipeline (9h uptime) drift накапливается: output stream имеет 60-70% duplicate frames (vs 10% сразу после restart). Симптом: TV picture freezes на 1-2 sec периодически. Encoder fps=25 stable (content duplicates same PTS-advance), но motion choppy на 8-9 fps real. Fix: unconditional post-sync verify (atomic re-read slot.seq после event wait). Если producer wrap occurred — slot.seq != target_seq → continue к новому target_seq. Cheap (one atomic load), correctness > perf. Verified: после deploy с fresh pipeline, 18-sec sample = 4% duplicates (vs 8.4% при том же setup но без fix). Proper v0.4 fix: per-slot+per-publish event pool с unique handle per cycle. Текущий v0.3.3 — sufficient mitigation для current production scale. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-24 20:27:00 +01:00
gx	656e36e9b0	v0.3.1: per-subscriber monitor thread — fix bitmap leak release / build runtime Docker image (push) Failing after 0s Details release / build source tarball (push) Successful in 4s Details build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m39s Details build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m32s Details test-u4-runner / u4 runner smoke test (push) Has been cancelled Details Bug: handshake_subscriber assigned bit + activated slot но НЕ tracked client_fd. Когда subscriber container exited, socket closed on client side но producer не detected → bit оставался set forever → после 32 connections subscribe_create('cam-X'): too many subscribers (max 32). Симптом в production: каждый pipeline recreate accumulated 1 stale subscriber. После 4-5 recreate операций publishers перестали accept new pipeline → "too many subscribers" crash loop. Fix: после успешного handshake spawn detached pthread monitoring socket via blocking recv(). recv() returns 0 (EOF) когда other side closes — monitor clears bit (subscriber_bitmap &= ~(1<<bit)) + state[bit] = 0, closes fd, exits. Cost: 1 thread per active subscriber. Max 32 threads — небольшой overhead. Threads detached, no join needed. Stress test: 5x pipeline recreate без single "too many subscribers" error. Раньше: 2-3 recreate → bitmap overflow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-24 08:00:41 +01:00
gx	8c7abbc4e8	v0.3: per-slot CUDA events — закрывает TOCTOU race без crutches release / build runtime Docker image (push) Failing after 1s Details release / build source tarball (push) Successful in 5s Details build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m40s Details build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m22s Details test-u4-runner / u4 runner smoke test (push) Has been cancelled Details Protocol bump V2→V3: + shm header: cudaIpcEventHandle_t slot_event_handles[CUFRAMES_MAX_RING] + producer creates ring_size events (вместо одного global) + producer.do_publish records event[slot] (вместо pub->event) + consumer opens all slot events при subscribe + consumer waits event[slot_idx] specifically (вместо global producer_event) Backward compat: - Legacy pub->event сохранён + ipc_event_handle export'ится — v0.2 consumers видят его и работают по-старому (с post-sync verify hack из `517107d`). - v0.3 consumer auto-detects proto_version >= 3, fallback к legacy если cudaIpcOpenEventHandle на slot fail (graceful degradation). Effect (15-sec sample на Phase 7 single-cam, motion): v0.1 production: dup runs 34.7%, max 14 frames (560ms freeze) v0.2.1 fix: dup runs 10%, max 6, 0 back-jumps detected v0.3 per-slot: dup runs 1.9%, max 5, 3 back-jumps (likely encoder static-content artifacts, not real race) Размер shm header: 7424 → 8448 bytes (+1024 для slot_event_handles). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 09:23:53 +01:00
gx	517107d741	libcuframes: fix TOCTOU race в consumer slot read build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m34s Details build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m19s Details release / build runtime Docker image (push) Failing after 1s Details release / build source tarball (push) Successful in 4s Details test-u4-runner / u4 runner smoke test (push) Has been cancelled Details Bug: producer signals один global cudaEvent для всего ring (один на producer). Consumer waits этот event после slot_seq validation, но event соответствует ПОСЛЕДНЕМУ published frame, не slot[target_seq]. Если producer wrap'нет ring во время event wait (ring=6 = 240ms окно), slot содержит уже next-gen data, consumer возвращает torn/stale frame. Симптом в production: video stream показывает «back-jump на момент» periodically — camera OSD timestamp дёргается, motion machines briefly teleport назад. cluster md5 analysis НЕ ловит (содержимое frames всё ещё unique, просто из неправильной epoch). Fix: post-sync verify. После cudaStreamWaitEvent / cudaEventSynchronize re-check slots[slot_idx].seq == target_seq. Если producer перезаписал — continue outer loop с новым target_seq. Закрывает race window между slot validation и event sync return. Остаются открытыми: - downstream GPU access после frame fill (consumer-side) — producer может wrap во время этого. Mitigation: STRICT_WAIT policy в publisher + ack discipline в consumer (cuframes_release_frame ack уже works). - bigger ring size снижает wrap frequency (240ms → 1.2s при ring=30). Test: после deploy в cuda-grid-pipeline (Phase 7 single cam), camera OSD clock больше не дёргается (раньше дёргалось каждые ~16 sec). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-21 22:27:39 +01:00
gx	98d1bb5296	release: v0.2.0 — encoded packet ring build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Failing after 3m3s Details test-u4-runner / u4 runner smoke test (push) Successful in 1s Details build / ffmpeg filter patch (out-of-tree) (push) Has been skipped Details release / build runtime Docker image (push) Failing after 5m58s Details release / build source tarball (push) Successful in 6m2s Details - CHANGELOG: [Unreleased] → [0.2.0] — 2026-05-19 - CMakeLists VERSION 0.1.0 → 0.2.0 (both root + libcuframes) - CUFRAMES_VERSION_MINOR: 1 → 2 в include/cuframes/cuframes.h См. issue #2 (closed) + PR #4 (merged).	2026-05-19 17:49:14 +01:00
gx	fca07bf669	test+docs: packet ring stress test + Frigate dual-input guide (v0.2 Step 6) build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Failing after 3m43s Details build / ffmpeg filter patch (out-of-tree) (pull_request) Has been skipped Details Тесты: - libcuframes/tests/test_packet_ring.c — 2 scenarios: 1) normal flow: 1 pub × 1 sub × 2000 packets, varied sizes, GOP=30, payload integrity check (seq в первых 8 байтах + pattern). PTS monotonicity, first KEY seq, нет data errors. 2) slow consumer (10ms delay): publisher 200 fps, subscriber должен detect OVERRUN, library resync на keyframe — verify received >10 даже на сильно медленном консьюмере. - libcuframes/tests/CMakeLists.txt: add_test packet_ring_basic. Docs: - CHANGELOG.md: новая [Unreleased] секция с full v0.2 highlights и явно declared limitations (sub-stream, audio, codec change → v0.3). - docs/integrations/frigate.md: новая секция "v0.2: dual-input (detect + record через один RTSP)" с config example, requirements, trade-offs. Связано: #2, PR #4. Step 6 (final) перед снятием draft.	2026-05-19 17:08:17 +01:00
gx	4cb0321a6f	feat(api): public C API для packet ring (v0.2 Step 3) build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Successful in 1m36s Details build / ffmpeg filter patch (out-of-tree) (pull_request) Successful in 1m24s Details Публичные функции в include/cuframes/cuframes.h: - cuframes_publisher_enable_packets(opts) — активирует ring на существующем publisher'е; default sizing (64 slots, 8MiB data, 2MiB max). - cuframes_publisher_set_codec_extradata(data, size) — SPS/PPS bytes. - cuframes_publisher_publish_packet(data, size, pts, dts, flags) - cuframes_subscriber_enable_packets() — открывает packet shm у subscriber'а. - cuframes_subscriber_next_packet(pkt_out, timeout_ms) с поллингом 1ms. - cuframes_packet_data/size/pts/dts/flags/seq accessors. - cuframes_subscriber_release_packet() - cuframes_subscriber_get_codec_params() Internal: - producer.c: расширена struct cuframes_publisher (has_pkt_ring, max_packet_size, pkt_ring); cleanup в destroy(); enable_packets() bump'ит proto_version=2 в frames header. - consumer.c: расширена struct cuframes_subscriber (has_pkt_ring, pkt_ring, last_packet_seq, packet_obj); single-packet pattern (как frame_obj — busy flag, переиспользование buffer). enable_packets() стартует с last_keyframe_seq-1 для late subscriber resync. На PACKET_OVERRUN автоматически resync на last_keyframe и возвращает ERR наружу для signalling discontinuity. Связано: #2, PR #4.	2026-05-19 16:27:05 +01:00
gx	bd7fd95fef	feat(libcuframes): packet ring buffer implementation (v0.2 Step 2) build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Successful in 1m37s Details build / ffmpeg filter patch (out-of-tree) (pull_request) Successful in 1m21s Details Реализация encoded packet ring per docs/protocol.md §10. Files: - internal.h: cuframes_pkt_slot_t (64b packed), cuframes_pkt_header_t (0x1040 fixed header), cuframes_pkt_ring_t handle, constants for default sizing, packet flags, helper inline functions for slot/data pointer arithmetic. - packet_ring.c (new, ~290 LOC): create/open/publish/read/destroy. Stale recovery симметрично frames SHM (pid liveness check). Seqlock pattern для subscriber защиты от overrun mid-read (post-check seq после copy). Wraparound memcpy helpers для variable-length data ring. - utils.c: cuframes_internal_pkt_shm_name helper + strerror entries. - cuframes.h: 4 новых error codes (PACKET_OVERSIZED, NO_PACKET_RING, NO_CODEC_PARAMS, PACKET_OVERRUN). - CMakeLists.txt: src/packet_ring.c в sources. API внутренний (cuframes_internal_pkt_ring_*) — publicly exposed функции будут в Step 3 (cuframes.h API extension). Связано: #2 (v0.2), PR #4 (draft).	2026-05-19 16:11:42 +01:00
gx	601806a5f8	build: add cmake install rules for libcuframes cmake --install теперь правильно кладёт libcuframes.so/.a в lib/ и headers в include/cuframes/. Нужно для downstream builders (FFmpeg patched build, deb packaging). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 12:52:16 +01:00
gx	a21812d3f6	tools+examples+test: end-to-end pipeline ready (Steps 9-10) cuframes-rtsp-source — standalone bridge между RTSP/file и cuframes IPC. Декодирует на CUDA (nvdec), копирует D2D в pre-allocated pool (EXTERNAL ownership), публикует через cuframes. --realtime для pacing файлового ввода, --loop для зацикливания. Альтернатива FFmpeg-фильтра до v0.2 (filter требует patch FFmpeg, конфликтует с Frigate's bundled build). examples/sub_count — reference subscriber на raw C API: counts frames, trackit gaps, выходит clean при disconnect/timeout/SIGINT. test_stress (4 subscribers × 2000 frames @ 120fps) — PASS на RTX 5090. 0 torn frames у всех consumers (включая 2 slow с 5ms sleep). Smoke-проверено: testsrc 25fps → cuframes-rtsp-source → cuframes IPC → sub_count (отдельный процесс) → 200/200 frames, 0 gaps, avg_fps=25.2.	2026-05-14 23:39:01 +01:00
gx	46c2b94939	libcuframes v0.1: producer + consumer (sync + async) + tests Implements Steps 3-6 of Phase 1 according to docs/protocol.md. libcuframes/src/: - internal.h (660 lines) — shared structs (byte-exact protocol.md layout) + _Static_assert на offsets/sizes - utils.c — error strings, frame size calc, now_ns, key validation - protocol.c — TLV framing для Unix socket с poll-based timeout - producer.c (~700 lines) — Step 3: * LIBRARY mode: cudaMalloc pool, IPC handle export * EXTERNAL mode: register user-provided pointers * cudaIpcEventHandle_t для cross-process sync (R1/R2) * Unix socket accept thread, handshake state machine * Bit allocation 1..31, name collision check (Y5) * STRICT_WAIT policy: timeout with dead-subscriber eviction - consumer.c (~400 lines) — Step 4: * Synchronous next() with poll-based wait * cudaStreamWaitEvent на consumer-stream (R1/R2) * Opaque cuframes_frame_t с accessor functions (Y6) * NEWEST_ONLY и STRICT_ORDER modes * ACK via atomic_fetch_or на bitmap - consumer_async.c — Step 5: thread + callback wrapper над sync API libcuframes/tests/: - test_pingpong.cu — single producer × single consumer, 200 frames @ 60fps, verify через kernel-on-consumer-stream (правильный test для sync semantics, см. spike-v2) - test_multi.cu — 1 producer × 3 consumers через fork() Build: - Top-level CMakeLists.txt с options - libcuframes/CMakeLists.txt: shared + static library, c_std_11 - Suppress -Waddress-of-packed-member (известная безопасная warning x86_64) Results (внутри cuframes-dev container, RTX 5090): - pingpong_basic PASS 4.5s 200 frames, 0 torn - multi_consumer PASS 4.1s 1 × 3 consumers, all PASS Phase 1 Step 6 done. Дальше: Step 7 (C++ wrapper), Step 9 (FFmpeg filter).	2026-05-14 23:21:30 +01:00

13 Commits