6 Commits

Author SHA1 Message Date
gx a7da4ea728 python: skeleton pybind11 bindings (issue #6 task #197)
Каркас Python-пакета `cuframes`:
- python/pyproject.toml — scikit-build-core конфиг
- python/CMakeLists.txt — pybind11 module через FetchContent
- python/src/_native.cpp — module entry, error таксономия,
  enum mirrors (PixelFormat, SubscriberMode), version
- python/cuframes/__init__.py — re-export публичного API
- python/tests/test_smoke.py — smoke tests без real subscribe
- python/README.md — статус + build instructions
- CMakeLists.txt — подключение python/ при BUILD_PYTHON_BINDINGS=ON

Реальный subscriber/frame wrapper в следующих коммитах
(tasks #198-#202).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-13 12:59:04 +01:00
gx 4862247fe2 v0.4: VMM + POSIX FD — namespace decoupling (no pid share required)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m46s
build / ffmpeg filter patch (out-of-tree) (push) Failing after 1m30s
Заменяет cudaMalloc + cudaIpcGetMemHandle на cuMemCreate (VMM) +
cuMemExportToShareableHandle(POSIX_FILE_DESCRIPTOR). FDs передаются consumer'у
через sendmsg(SCM_RIGHTS) в handshake. Frigate (s6-overlay не даёт share PID)
и любой другой consumer работают БЕЗ pid namespace share — только volume mount
unix socket'a /run/cuframes и IPC share для /dev/shm header.

Sync: cudaEventRecord+IPC events → cuStreamSynchronize в do_publish.
Producer ждёт ~1 ms что stream flush'нулся, потом atomic_store(seq).
Consumer читает seq через memory_order_acquire и копирует DtoD без
event wait — HW coherence гарантирована на одном GPU.

ABI break (согласован с user'ом):
  - magic 0xCC7C1DCC → 0xCC7C1DCE (старые consumers fail cleanly)
  - protocol V3 → V4
  - libcuframes.so.0 SOVERSION остаётся, но .so.0.3.0 → .so.0.4.0
  - EXTERNAL ownership убран (VMM требует cuMemCreate-allocated memory,
    нельзя export'нуть произвольный cudaMalloc-pointer как POSIX FD)
  - cuframes-rtsp-source переведён на LIBRARY mode + один D2D memcpy
    в acquire'нутый slot (overhead малый — публишер всё равно делал такой
    D2D из FFmpeg hwframe pool в EXTERNAL pool раньше)

Размер: granularity 2 MB на 5090 → NV12 1920×1080 (~3.1 MB) округляется до
4 MB, +1 MB на slot × 16 × 4 камеры = +64 MB VRAM. Терпимо.

Packet ring (cuframes_packets://) НЕ затронут — отдельный SHM с своим
magic, работает как раньше.

PoC + smoke в spike/:
  - vmm_fd_pingpong/ — minimal cuMemCreate+FD round-trip
  - smoke_v04/ — full publisher+subscriber, 100/100 frames без pid share

Base image: Dockerfile.runtime → CUDA 12.4 (был 13.0). Matching prod
pipeline + Frigate base, иначе libcudart conflict при load.

Compose stack (localhost-infra repo) — параллельный commit:
  - убран pid: container:cuframes-pub-parking из subscribers
  - image теги: gx/cuframes:0.4, gx/cuda-grid-pipeline:phase8,
    gx/frigate:cuframes-v0.4

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 20:13:31 +01:00
gx 8c7abbc4e8 v0.3: per-slot CUDA events — закрывает TOCTOU race без crutches
release / build runtime Docker image (push) Failing after 1s
release / build source tarball (push) Successful in 5s
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m40s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m22s
test-u4-runner / u4 runner smoke test (push) Has been cancelled
Protocol bump V2→V3:
  + shm header: cudaIpcEventHandle_t slot_event_handles[CUFRAMES_MAX_RING]
  + producer creates ring_size events (вместо одного global)
  + producer.do_publish records event[slot] (вместо pub->event)
  + consumer opens all slot events при subscribe
  + consumer waits event[slot_idx] specifically (вместо global producer_event)

Backward compat:
  - Legacy pub->event сохранён + ipc_event_handle export'ится — v0.2 consumers
    видят его и работают по-старому (с post-sync verify hack из 517107d).
  - v0.3 consumer auto-detects proto_version >= 3, fallback к legacy если
    cudaIpcOpenEventHandle на slot fail (graceful degradation).

Effect (15-sec sample на Phase 7 single-cam, motion):
  v0.1 production:  dup runs 34.7%, max 14 frames (560ms freeze)
  v0.2.1 fix:       dup runs 10%, max 6, 0 back-jumps detected
  v0.3 per-slot:    dup runs 1.9%, max 5, 3 back-jumps (likely encoder
                    static-content artifacts, not real race)

Размер shm header: 7424 → 8448 bytes (+1024 для slot_event_handles).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 09:23:53 +01:00
gx 98d1bb5296 release: v0.2.0 — encoded packet ring
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Failing after 3m3s
test-u4-runner / u4 runner smoke test (push) Successful in 1s
build / ffmpeg filter patch (out-of-tree) (push) Has been skipped
release / build runtime Docker image (push) Failing after 5m58s
release / build source tarball (push) Successful in 6m2s
- CHANGELOG: [Unreleased] → [0.2.0] — 2026-05-19
- CMakeLists VERSION 0.1.0 → 0.2.0 (both root + libcuframes)
- CUFRAMES_VERSION_MINOR: 1 → 2 в include/cuframes/cuframes.h

См. issue #2 (closed) + PR #4 (merged).
2026-05-19 17:49:14 +01:00
gx a21812d3f6 tools+examples+test: end-to-end pipeline ready (Steps 9-10)
cuframes-rtsp-source — standalone bridge между RTSP/file и cuframes IPC.
Декодирует на CUDA (nvdec), копирует D2D в pre-allocated pool (EXTERNAL
ownership), публикует через cuframes. --realtime для pacing файлового
ввода, --loop для зацикливания. Альтернатива FFmpeg-фильтра до v0.2
(filter требует patch FFmpeg, конфликтует с Frigate's bundled build).

examples/sub_count — reference subscriber на raw C API: counts frames,
trackit gaps, выходит clean при disconnect/timeout/SIGINT.

test_stress (4 subscribers × 2000 frames @ 120fps) — PASS на RTX 5090.
0 torn frames у всех consumers (включая 2 slow с 5ms sleep).

Smoke-проверено: testsrc 25fps → cuframes-rtsp-source → cuframes IPC
→ sub_count (отдельный процесс) → 200/200 frames, 0 gaps, avg_fps=25.2.
2026-05-14 23:39:01 +01:00
gx 46c2b94939 libcuframes v0.1: producer + consumer (sync + async) + tests
Implements Steps 3-6 of Phase 1 according to docs/protocol.md.

libcuframes/src/:
- internal.h     (660 lines) — shared structs (byte-exact protocol.md layout)
                                + _Static_assert на offsets/sizes
- utils.c        — error strings, frame size calc, now_ns, key validation
- protocol.c     — TLV framing для Unix socket с poll-based timeout
- producer.c     (~700 lines) — Step 3:
                    * LIBRARY mode: cudaMalloc pool, IPC handle export
                    * EXTERNAL mode: register user-provided pointers
                    * cudaIpcEventHandle_t для cross-process sync (R1/R2)
                    * Unix socket accept thread, handshake state machine
                    * Bit allocation 1..31, name collision check (Y5)
                    * STRICT_WAIT policy: timeout with dead-subscriber eviction
- consumer.c     (~400 lines) — Step 4:
                    * Synchronous next() with poll-based wait
                    * cudaStreamWaitEvent на consumer-stream (R1/R2)
                    * Opaque cuframes_frame_t с accessor functions (Y6)
                    * NEWEST_ONLY и STRICT_ORDER modes
                    * ACK via atomic_fetch_or на bitmap
- consumer_async.c — Step 5: thread + callback wrapper над sync API

libcuframes/tests/:
- test_pingpong.cu — single producer × single consumer, 200 frames @ 60fps,
                     verify через kernel-on-consumer-stream (правильный test
                     для sync semantics, см. spike-v2)
- test_multi.cu    — 1 producer × 3 consumers через fork()

Build:
- Top-level CMakeLists.txt с options
- libcuframes/CMakeLists.txt: shared + static library, c_std_11
- Suppress -Waddress-of-packed-member (известная безопасная warning x86_64)

Results (внутри cuframes-dev container, RTX 5090):
- pingpong_basic PASS  4.5s  200 frames, 0 torn
- multi_consumer PASS  4.1s  1 × 3 consumers, all PASS

Phase 1 Step 6 done. Дальше: Step 7 (C++ wrapper), Step 9 (FFmpeg filter).
2026-05-14 23:21:30 +01:00