Commit Graph

10 Commits

Author SHA1 Message Date
gx 656e36e9b0 v0.3.1: per-subscriber monitor thread — fix bitmap leak
release / build runtime Docker image (push) Failing after 0s
release / build source tarball (push) Successful in 4s
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m39s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m32s
test-u4-runner / u4 runner smoke test (push) Has been cancelled
Bug: handshake_subscriber assigned bit + activated slot но НЕ tracked
client_fd. Когда subscriber container exited, socket closed on client side
но producer не detected → bit оставался set forever → после 32 connections
subscribe_create('cam-X'): too many subscribers (max 32).

Симптом в production: каждый pipeline recreate accumulated 1 stale subscriber.
После 4-5 recreate операций publishers перестали accept new pipeline →
"too many subscribers" crash loop.

Fix: после успешного handshake spawn detached pthread monitoring socket
via blocking recv(). recv() returns 0 (EOF) когда other side closes —
monitor clears bit (subscriber_bitmap &= ~(1<<bit)) + state[bit] = 0,
closes fd, exits.

Cost: 1 thread per active subscriber. Max 32 threads — небольшой
overhead. Threads detached, no join needed.

Stress test: 5x pipeline recreate без single "too many subscribers" error.
Раньше: 2-3 recreate → bitmap overflow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 08:00:41 +01:00
gx 8c7abbc4e8 v0.3: per-slot CUDA events — закрывает TOCTOU race без crutches
release / build runtime Docker image (push) Failing after 1s
release / build source tarball (push) Successful in 5s
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m40s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m22s
test-u4-runner / u4 runner smoke test (push) Has been cancelled
Protocol bump V2→V3:
  + shm header: cudaIpcEventHandle_t slot_event_handles[CUFRAMES_MAX_RING]
  + producer creates ring_size events (вместо одного global)
  + producer.do_publish records event[slot] (вместо pub->event)
  + consumer opens all slot events при subscribe
  + consumer waits event[slot_idx] specifically (вместо global producer_event)

Backward compat:
  - Legacy pub->event сохранён + ipc_event_handle export'ится — v0.2 consumers
    видят его и работают по-старому (с post-sync verify hack из 517107d).
  - v0.3 consumer auto-detects proto_version >= 3, fallback к legacy если
    cudaIpcOpenEventHandle на slot fail (graceful degradation).

Effect (15-sec sample на Phase 7 single-cam, motion):
  v0.1 production:  dup runs 34.7%, max 14 frames (560ms freeze)
  v0.2.1 fix:       dup runs 10%, max 6, 0 back-jumps detected
  v0.3 per-slot:    dup runs 1.9%, max 5, 3 back-jumps (likely encoder
                    static-content artifacts, not real race)

Размер shm header: 7424 → 8448 bytes (+1024 для slot_event_handles).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 09:23:53 +01:00
gx 517107d741 libcuframes: fix TOCTOU race в consumer slot read
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m34s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m19s
release / build runtime Docker image (push) Failing after 1s
release / build source tarball (push) Successful in 4s
test-u4-runner / u4 runner smoke test (push) Has been cancelled
Bug: producer signals **один global** cudaEvent для всего ring (один на
producer). Consumer waits этот event после slot_seq validation, но event
соответствует ПОСЛЕДНЕМУ published frame, не slot[target_seq]. Если
producer wrap'нет ring во время event wait (ring=6 = 240ms окно), slot
содержит уже next-gen data, consumer возвращает torn/stale frame.

Симптом в production: video stream показывает «back-jump на момент»
periodically — camera OSD timestamp дёргается, motion machines briefly
teleport назад. cluster md5 analysis НЕ ловит (содержимое frames всё ещё
unique, просто из неправильной epoch).

Fix: post-sync verify. После cudaStreamWaitEvent / cudaEventSynchronize
re-check slots[slot_idx].seq == target_seq. Если producer перезаписал —
continue outer loop с новым target_seq.

Закрывает race window между slot validation и event sync return. Остаются
открытыми:
  - downstream GPU access после frame fill (consumer-side) — producer
    может wrap во время этого. Mitigation: STRICT_WAIT policy в publisher
    + ack discipline в consumer (cuframes_release_frame ack уже works).
  - bigger ring size снижает wrap frequency (240ms → 1.2s при ring=30).

Test: после deploy в cuda-grid-pipeline (Phase 7 single cam), camera OSD
clock больше не дёргается (раньше дёргалось каждые ~16 sec).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 22:27:39 +01:00
gx 98d1bb5296 release: v0.2.0 — encoded packet ring
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Failing after 3m3s
test-u4-runner / u4 runner smoke test (push) Successful in 1s
build / ffmpeg filter patch (out-of-tree) (push) Has been skipped
release / build runtime Docker image (push) Failing after 5m58s
release / build source tarball (push) Successful in 6m2s
- CHANGELOG: [Unreleased] → [0.2.0] — 2026-05-19
- CMakeLists VERSION 0.1.0 → 0.2.0 (both root + libcuframes)
- CUFRAMES_VERSION_MINOR: 1 → 2 в include/cuframes/cuframes.h

См. issue #2 (closed) + PR #4 (merged).
2026-05-19 17:49:14 +01:00
gx fca07bf669 test+docs: packet ring stress test + Frigate dual-input guide (v0.2 Step 6)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Failing after 3m43s
build / ffmpeg filter patch (out-of-tree) (pull_request) Has been skipped
Тесты:
- libcuframes/tests/test_packet_ring.c — 2 scenarios:
  1) normal flow: 1 pub × 1 sub × 2000 packets, varied sizes, GOP=30,
     payload integrity check (seq в первых 8 байтах + pattern). PTS
     monotonicity, first KEY seq, нет data errors.
  2) slow consumer (10ms delay): publisher 200 fps, subscriber должен
     detect OVERRUN, library resync на keyframe — verify received >10
     даже на сильно медленном консьюмере.
- libcuframes/tests/CMakeLists.txt: add_test packet_ring_basic.

Docs:
- CHANGELOG.md: новая [Unreleased] секция с full v0.2 highlights и
  явно declared limitations (sub-stream, audio, codec change → v0.3).
- docs/integrations/frigate.md: новая секция "v0.2: dual-input (detect +
  record через один RTSP)" с config example, requirements, trade-offs.

Связано: #2, PR #4. Step 6 (final) перед снятием draft.
2026-05-19 17:08:17 +01:00
gx 4cb0321a6f feat(api): public C API для packet ring (v0.2 Step 3)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Successful in 1m36s
build / ffmpeg filter patch (out-of-tree) (pull_request) Successful in 1m24s
Публичные функции в include/cuframes/cuframes.h:
- cuframes_publisher_enable_packets(opts)  — активирует ring на
  существующем publisher'е; default sizing (64 slots, 8MiB data, 2MiB max).
- cuframes_publisher_set_codec_extradata(data, size) — SPS/PPS bytes.
- cuframes_publisher_publish_packet(data, size, pts, dts, flags)
- cuframes_subscriber_enable_packets()  — открывает packet shm у subscriber'а.
- cuframes_subscriber_next_packet(pkt_out, timeout_ms) с поллингом 1ms.
- cuframes_packet_data/size/pts/dts/flags/seq accessors.
- cuframes_subscriber_release_packet()
- cuframes_subscriber_get_codec_params()

Internal:
- producer.c: расширена struct cuframes_publisher (has_pkt_ring,
  max_packet_size, pkt_ring); cleanup в destroy(); enable_packets()
  bump'ит proto_version=2 в frames header.
- consumer.c: расширена struct cuframes_subscriber (has_pkt_ring,
  pkt_ring, last_packet_seq, packet_obj); single-packet pattern (как
  frame_obj — busy flag, переиспользование buffer). enable_packets()
  стартует с last_keyframe_seq-1 для late subscriber resync. На
  PACKET_OVERRUN автоматически resync на last_keyframe и возвращает
  ERR наружу для signalling discontinuity.

Связано: #2, PR #4.
2026-05-19 16:27:05 +01:00
gx bd7fd95fef feat(libcuframes): packet ring buffer implementation (v0.2 Step 2)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Successful in 1m37s
build / ffmpeg filter patch (out-of-tree) (pull_request) Successful in 1m21s
Реализация encoded packet ring per docs/protocol.md §10.

Files:
- internal.h: cuframes_pkt_slot_t (64b packed), cuframes_pkt_header_t
  (0x1040 fixed header), cuframes_pkt_ring_t handle, constants for
  default sizing, packet flags, helper inline functions for slot/data
  pointer arithmetic.
- packet_ring.c (new, ~290 LOC): create/open/publish/read/destroy.
  Stale recovery симметрично frames SHM (pid liveness check). Seqlock
  pattern для subscriber защиты от overrun mid-read (post-check seq
  после copy). Wraparound memcpy helpers для variable-length data ring.
- utils.c: cuframes_internal_pkt_shm_name helper + strerror entries.
- cuframes.h: 4 новых error codes (PACKET_OVERSIZED, NO_PACKET_RING,
  NO_CODEC_PARAMS, PACKET_OVERRUN).
- CMakeLists.txt: src/packet_ring.c в sources.

API внутренний (cuframes_internal_pkt_ring_*) — publicly exposed
функции будут в Step 3 (cuframes.h API extension).

Связано: #2 (v0.2), PR #4 (draft).
2026-05-19 16:11:42 +01:00
gx 601806a5f8 build: add cmake install rules for libcuframes
cmake --install теперь правильно кладёт libcuframes.so/.a в lib/ и
headers в include/cuframes/. Нужно для downstream builders (FFmpeg
patched build, deb packaging).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 12:52:16 +01:00
gx a21812d3f6 tools+examples+test: end-to-end pipeline ready (Steps 9-10)
cuframes-rtsp-source — standalone bridge между RTSP/file и cuframes IPC.
Декодирует на CUDA (nvdec), копирует D2D в pre-allocated pool (EXTERNAL
ownership), публикует через cuframes. --realtime для pacing файлового
ввода, --loop для зацикливания. Альтернатива FFmpeg-фильтра до v0.2
(filter требует patch FFmpeg, конфликтует с Frigate's bundled build).

examples/sub_count — reference subscriber на raw C API: counts frames,
trackit gaps, выходит clean при disconnect/timeout/SIGINT.

test_stress (4 subscribers × 2000 frames @ 120fps) — PASS на RTX 5090.
0 torn frames у всех consumers (включая 2 slow с 5ms sleep).

Smoke-проверено: testsrc 25fps → cuframes-rtsp-source → cuframes IPC
→ sub_count (отдельный процесс) → 200/200 frames, 0 gaps, avg_fps=25.2.
2026-05-14 23:39:01 +01:00
gx 46c2b94939 libcuframes v0.1: producer + consumer (sync + async) + tests
Implements Steps 3-6 of Phase 1 according to docs/protocol.md.

libcuframes/src/:
- internal.h     (660 lines) — shared structs (byte-exact protocol.md layout)
                                + _Static_assert на offsets/sizes
- utils.c        — error strings, frame size calc, now_ns, key validation
- protocol.c     — TLV framing для Unix socket с poll-based timeout
- producer.c     (~700 lines) — Step 3:
                    * LIBRARY mode: cudaMalloc pool, IPC handle export
                    * EXTERNAL mode: register user-provided pointers
                    * cudaIpcEventHandle_t для cross-process sync (R1/R2)
                    * Unix socket accept thread, handshake state machine
                    * Bit allocation 1..31, name collision check (Y5)
                    * STRICT_WAIT policy: timeout with dead-subscriber eviction
- consumer.c     (~400 lines) — Step 4:
                    * Synchronous next() with poll-based wait
                    * cudaStreamWaitEvent на consumer-stream (R1/R2)
                    * Opaque cuframes_frame_t с accessor functions (Y6)
                    * NEWEST_ONLY и STRICT_ORDER modes
                    * ACK via atomic_fetch_or на bitmap
- consumer_async.c — Step 5: thread + callback wrapper над sync API

libcuframes/tests/:
- test_pingpong.cu — single producer × single consumer, 200 frames @ 60fps,
                     verify через kernel-on-consumer-stream (правильный test
                     для sync semantics, см. spike-v2)
- test_multi.cu    — 1 producer × 3 consumers через fork()

Build:
- Top-level CMakeLists.txt с options
- libcuframes/CMakeLists.txt: shared + static library, c_std_11
- Suppress -Waddress-of-packed-member (известная безопасная warning x86_64)

Results (внутри cuframes-dev container, RTX 5090):
- pingpong_basic PASS  4.5s  200 frames, 0 torn
- multi_consumer PASS  4.1s  1 × 3 consumers, all PASS

Phase 1 Step 6 done. Дальше: Step 7 (C++ wrapper), Step 9 (FFmpeg filter).
2026-05-14 23:21:30 +01:00