Commit Graph

53 Commits

Author SHA1 Message Date
gx 7f4bdfcaab Merge pull request 'Python bindings (pybind11) — Phase 0 v1' (#7) from feat/python-bindings into main
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Failing after 39s
build / ffmpeg filter patch (out-of-tree) (push) Has been skipped
2026-06-13 21:34:29 +01:00
gx afc2dd7fff python: DLPack + health stats + CUDA stream + docs (tasks #199-#202)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Failing after 1m50s
build / ffmpeg filter patch (out-of-tree) (pull_request) Has been skipped
#199 DLPack export:
- frame.dlpack_y() / .dlpack_uv() — explicit multi-plane access для NV12
- frame.__dlpack__() / __dlpack_device__() — protocol для torch/cupy
- Capsule deleter правильно держит refcount на frame_keep_alive,
  releases shape/strides arrays. CUDA pointer принадлежит frame.

#200 Health/stats counters:
- frames_received, timeouts, errors — per-call counters
- last_seq, gap_count — proxy для drop count (NEWEST_ONLY mode)
- last_frame_pts_ns
- stats() — snapshot dict для MQTT health publish
- counted в pybind layer т.к. C API не expose'ит ring_occupancy

#201 Per-subscriber CUDA stream + thread-safety:
- consumer_stream kwarg в subscribe() — int (cudaStream_t pointer)
- subscriber.consumer_stream property
- Thread-safety contract в docstring CuframesSubscriber
- next_frame() передаёт consumer_stream_ в cuframes_subscriber_next

#202 Smoke test + docs:
- 10/10 pytest passed (расширен +2 теста на consumer_stream)
- docs/python.md (~250 строк): quick start, API reference, integration
  с PyTorch/CuPy, reconnect-loop pattern, per-stream usage,
  pitch alignment, thread-safety, error taxonomy, backpressure,
  Phase 0 limitations

Verify build + tests:
  cmake -B build-python -DBUILD_PYTHON_BINDINGS=ON
  cmake --build build-python -j
  pytest python/tests/ -v   # 10/10

Закрывает Phase 0 issue gx/cuframes#6.
Разблокирует goldix-smart-home/yolo-world-detector Phase 1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-13 21:33:21 +01:00
gx 5d1eaedb38 python: CuframesSubscriber + CuframesFrame wrapper (task #198)
Реализует subscriber-side wrapper над cuframes_subscriber_* и
cuframes_frame_* C API.

Что добавлено:
- CuframesFrame — owning RAII wrapper над cuframes_frame_t*
  - properties: cuda_ptr, format, width, height, pitch_y, pitch_uv,
    seq, pts_ns, released
  - release() idempotent
  - context manager (__enter__/__exit__) — release при выходе
  - после release() property access бросает CuframesError

- CuframesSubscriber — owning RAII wrapper над cuframes_subscriber_t*
  - конструктор с key/consumer_name/mode/cuda_device/connect_timeout_ms
  - next_frame(timeout_ms) → CuframesFrame
  - close() idempotent
  - context manager
  - GIL released на блокирующих вызовах (create, next_frame)

- subscribe() — module-level factory shortcut

Архитектурные решения:
- GIL release в py::gil_scoped_release на subscriber_create и _next —
  чтобы другие Python потоки могли работать пока ждём frame
- consumer_stream передаётся как nullptr в Phase 0 (default stream);
  per-subscriber stream в task #201
- Frame держит raw pointer на subscriber, refcount Python-стороной;
  если subscriber уничтожен раньше, frame.release() становится no-op

Smoke tests расширены до 8 — добавлены проверки exposed API и
error mapping на subscribe к несуществующему publisher'у.

Verify: pytest tests/test_smoke.py — 8/8 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-13 21:23:42 +01:00
gx 7b6d43efeb python: fix exception hierarchy — не вызывать .attr("__class__")
py::exception<T>(...) уже возвращает Python class object. Дополнительный
.attr("__class__") давал metaclass (type), из-за чего issubclass()
проверка для всех subexc возвращала False.

Verify: pytest tests/test_smoke.py — 5/5 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-13 21:19:03 +01:00
gx a7da4ea728 python: skeleton pybind11 bindings (issue #6 task #197)
Каркас Python-пакета `cuframes`:
- python/pyproject.toml — scikit-build-core конфиг
- python/CMakeLists.txt — pybind11 module через FetchContent
- python/src/_native.cpp — module entry, error таксономия,
  enum mirrors (PixelFormat, SubscriberMode), version
- python/cuframes/__init__.py — re-export публичного API
- python/tests/test_smoke.py — smoke tests без real subscribe
- python/README.md — статус + build instructions
- CMakeLists.txt — подключение python/ при BUILD_PYTHON_BINDINGS=ON

Реальный subscriber/frame wrapper в следующих коммитах
(tasks #198-#202).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-13 12:59:04 +01:00
gx 655649f4d8 cmake: использовать PROJECT_SOURCE_DIR вместо CMAKE_SOURCE_DIR
build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Failing after 5m19s
build / ffmpeg filter patch (out-of-tree) (pull_request) Has been skipped
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Failing after 4m14s
build / ffmpeg filter patch (out-of-tree) (push) Has been skipped
При сборке cuframes как подпроекта родительского CMake-проекта
(add_subdirectory) CMAKE_SOURCE_DIR указывает на корень родителя,
а не cuframes. Из-за этого target_include_directories cuframes
получал неверный путь и компиляция падала с

  fatal error: cuframes/cuframes.h: No such file or directory

PROJECT_SOURCE_DIR резолвится в каталог project(), то есть всегда
указывает на корень cuframes независимо от способа подключения.

Standalone-сборка ведёт себя как раньше — оба пути одинаковы.
2026-06-03 04:27:24 +01:00
Claude Opus 78824c4ed1 docker: +mosquitto-clients в runtime image
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m42s
build / ffmpeg filter patch (out-of-tree) (push) Failing after 1m22s
Нужен для loop-publisher.sh wrapper в cctv stack — heartbeat и alert MQTT
publish. 4.5 MB добавил, runtime image теперь ~590 MB. Без него wrapper
silent fail на mqtt_alert/mqtt_state (но retry-loop работает).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 17:59:56 +01:00
gx 4862247fe2 v0.4: VMM + POSIX FD — namespace decoupling (no pid share required)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m46s
build / ffmpeg filter patch (out-of-tree) (push) Failing after 1m30s
Заменяет cudaMalloc + cudaIpcGetMemHandle на cuMemCreate (VMM) +
cuMemExportToShareableHandle(POSIX_FILE_DESCRIPTOR). FDs передаются consumer'у
через sendmsg(SCM_RIGHTS) в handshake. Frigate (s6-overlay не даёт share PID)
и любой другой consumer работают БЕЗ pid namespace share — только volume mount
unix socket'a /run/cuframes и IPC share для /dev/shm header.

Sync: cudaEventRecord+IPC events → cuStreamSynchronize в do_publish.
Producer ждёт ~1 ms что stream flush'нулся, потом atomic_store(seq).
Consumer читает seq через memory_order_acquire и копирует DtoD без
event wait — HW coherence гарантирована на одном GPU.

ABI break (согласован с user'ом):
  - magic 0xCC7C1DCC → 0xCC7C1DCE (старые consumers fail cleanly)
  - protocol V3 → V4
  - libcuframes.so.0 SOVERSION остаётся, но .so.0.3.0 → .so.0.4.0
  - EXTERNAL ownership убран (VMM требует cuMemCreate-allocated memory,
    нельзя export'нуть произвольный cudaMalloc-pointer как POSIX FD)
  - cuframes-rtsp-source переведён на LIBRARY mode + один D2D memcpy
    в acquire'нутый slot (overhead малый — публишер всё равно делал такой
    D2D из FFmpeg hwframe pool в EXTERNAL pool раньше)

Размер: granularity 2 MB на 5090 → NV12 1920×1080 (~3.1 MB) округляется до
4 MB, +1 MB на slot × 16 × 4 камеры = +64 MB VRAM. Терпимо.

Packet ring (cuframes_packets://) НЕ затронут — отдельный SHM с своим
magic, работает как раньше.

PoC + smoke в spike/:
  - vmm_fd_pingpong/ — minimal cuMemCreate+FD round-trip
  - smoke_v04/ — full publisher+subscriber, 100/100 frames без pid share

Base image: Dockerfile.runtime → CUDA 12.4 (был 13.0). Matching prod
pipeline + Frigate base, иначе libcudart conflict при load.

Compose stack (localhost-infra repo) — параллельный commit:
  - убран pid: container:cuframes-pub-parking из subscribers
  - image теги: gx/cuframes:0.4, gx/cuda-grid-pipeline:phase8,
    gx/frigate:cuframes-v0.4

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 20:13:31 +01:00
gx d646f5a4e4 v0.3.3: consumer post-sync verify даже для v0.3 per-slot events
release / build runtime Docker image (push) Failing after 0s
release / build source tarball (push) Successful in 4s
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m41s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m29s
test-u4-runner / u4 runner smoke test (push) Has been cancelled
Bug: cudaEventRecord(event[slot]) overwrites previous state каждый publish.
Когда producer wraps ring (~640ms при ring=16), event[slot] re-recorded для
new content. Consumer's pending cudaStreamWaitEvent satisfied новым signal —
consumer reads slot[slot_idx] thinking it's target_seq, реально получает
seq+ring_size content (stale-by-1-wrap drift).

После 50k+ wraps в long-running pipeline (9h uptime) drift накапливается:
output stream имеет 60-70% duplicate frames (vs 10% сразу после restart).

Симптом: TV picture freezes на 1-2 sec периодически. Encoder fps=25 stable
(content duplicates same PTS-advance), но motion choppy на 8-9 fps real.

Fix: unconditional post-sync verify (atomic re-read slot.seq после event wait).
Если producer wrap occurred — slot.seq != target_seq → continue к новому
target_seq. Cheap (one atomic load), correctness > perf.

Verified: после deploy с fresh pipeline, 18-sec sample = 4% duplicates
(vs 8.4% при том же setup но без fix).

Proper v0.4 fix: per-slot+per-publish event pool с unique handle per cycle.
Текущий v0.3.3 — sufficient mitigation для current production scale.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v0.3.3
2026-05-24 20:27:00 +01:00
gx becfbebc78 cuframes-rtsp-source: + --policy + --ack-timeout-ms CLI flags
release / build runtime Docker image (push) Failing after 0s
release / build source tarball (push) Successful in 2s
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m39s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m25s
test-u4-runner / u4 runner smoke test (push) Has been cancelled
Opt-in для STRICT_WAIT policy (default остаётся DROP_OLDEST).

Use case STRICT_WAIT:
  Frame integrity критичен (e.g. recording, frame-accurate analytics).
  Producer ждёт ack от всех subscribers перед wrap ring → no torn frames.
  Trade-off: slow consumer задерживает all (default 200ms timeout затем
  subscriber dropped from bitmap).

Use case DROP_OLDEST (default):
  Low-latency real-time display (TV grid). Producer wraps freely; v0.3
  per-slot CUDA events закрывают race без waiting.

Validation: policy=wait + ack-timeout-ms<=0 = infinite hold dead consumer —
warning + force к 200ms safe default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v0.3.2
2026-05-24 08:47:14 +01:00
gx 656e36e9b0 v0.3.1: per-subscriber monitor thread — fix bitmap leak
release / build runtime Docker image (push) Failing after 0s
release / build source tarball (push) Successful in 4s
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m39s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m32s
test-u4-runner / u4 runner smoke test (push) Has been cancelled
Bug: handshake_subscriber assigned bit + activated slot но НЕ tracked
client_fd. Когда subscriber container exited, socket closed on client side
но producer не detected → bit оставался set forever → после 32 connections
subscribe_create('cam-X'): too many subscribers (max 32).

Симптом в production: каждый pipeline recreate accumulated 1 stale subscriber.
После 4-5 recreate операций publishers перестали accept new pipeline →
"too many subscribers" crash loop.

Fix: после успешного handshake spawn detached pthread monitoring socket
via blocking recv(). recv() returns 0 (EOF) когда other side closes —
monitor clears bit (subscriber_bitmap &= ~(1<<bit)) + state[bit] = 0,
closes fd, exits.

Cost: 1 thread per active subscriber. Max 32 threads — небольшой
overhead. Threads detached, no join needed.

Stress test: 5x pipeline recreate без single "too many subscribers" error.
Раньше: 2-3 recreate → bitmap overflow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v0.3.1
2026-05-24 08:00:41 +01:00
gx 8c7abbc4e8 v0.3: per-slot CUDA events — закрывает TOCTOU race без crutches
release / build runtime Docker image (push) Failing after 1s
release / build source tarball (push) Successful in 5s
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m40s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m22s
test-u4-runner / u4 runner smoke test (push) Has been cancelled
Protocol bump V2→V3:
  + shm header: cudaIpcEventHandle_t slot_event_handles[CUFRAMES_MAX_RING]
  + producer creates ring_size events (вместо одного global)
  + producer.do_publish records event[slot] (вместо pub->event)
  + consumer opens all slot events при subscribe
  + consumer waits event[slot_idx] specifically (вместо global producer_event)

Backward compat:
  - Legacy pub->event сохранён + ipc_event_handle export'ится — v0.2 consumers
    видят его и работают по-старому (с post-sync verify hack из 517107d).
  - v0.3 consumer auto-detects proto_version >= 3, fallback к legacy если
    cudaIpcOpenEventHandle на slot fail (graceful degradation).

Effect (15-sec sample на Phase 7 single-cam, motion):
  v0.1 production:  dup runs 34.7%, max 14 frames (560ms freeze)
  v0.2.1 fix:       dup runs 10%, max 6, 0 back-jumps detected
  v0.3 per-slot:    dup runs 1.9%, max 5, 3 back-jumps (likely encoder
                    static-content artifacts, not real race)

Размер shm header: 7424 → 8448 bytes (+1024 для slot_event_handles).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v0.3.0
2026-05-22 09:23:53 +01:00
gx 517107d741 libcuframes: fix TOCTOU race в consumer slot read
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m34s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m19s
release / build runtime Docker image (push) Failing after 1s
release / build source tarball (push) Successful in 4s
test-u4-runner / u4 runner smoke test (push) Has been cancelled
Bug: producer signals **один global** cudaEvent для всего ring (один на
producer). Consumer waits этот event после slot_seq validation, но event
соответствует ПОСЛЕДНЕМУ published frame, не slot[target_seq]. Если
producer wrap'нет ring во время event wait (ring=6 = 240ms окно), slot
содержит уже next-gen data, consumer возвращает torn/stale frame.

Симптом в production: video stream показывает «back-jump на момент»
periodically — camera OSD timestamp дёргается, motion machines briefly
teleport назад. cluster md5 analysis НЕ ловит (содержимое frames всё ещё
unique, просто из неправильной epoch).

Fix: post-sync verify. После cudaStreamWaitEvent / cudaEventSynchronize
re-check slots[slot_idx].seq == target_seq. Если producer перезаписал —
continue outer loop с новым target_seq.

Закрывает race window между slot validation и event sync return. Остаются
открытыми:
  - downstream GPU access после frame fill (consumer-side) — producer
    может wrap во время этого. Mitigation: STRICT_WAIT policy в publisher
    + ack discipline в consumer (cuframes_release_frame ack уже works).
  - bigger ring size снижает wrap frequency (240ms → 1.2s при ring=30).

Test: после deploy в cuda-grid-pipeline (Phase 7 single cam), camera OSD
clock больше не дёргается (раньше дёргалось каждые ~16 sec).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v0.2.1
2026-05-21 22:27:39 +01:00
gx 4d54173bb2 roadmap: vf_cuda_grid выделен в отдельный продукт gx/vf-cuda-grid 2026-05-19 20:39:47 +01:00
gx 52fb2ad722 benchmarks: actual measured VRAM + network bandwidth (tcpdump-based)
VRAM breakdown (nvidia-smi pmon):
- 4 publishers = 4.4 GB (FHD + 2688x1520 ring buffers + NVDEC)
- cctv-backend = 1.0 GB
- frigate embeddings_manager = 1.6 GB
- frigate detector:onnx = 0.6 GB
- Total cuframes-stack = ~7.7 GB

Network (10-sec tcpdump capture от camera subnet к R9):
- Measured: 31.5 Mbps (всё включая go2rtc on-demand, ONVIF)
- cuframes core: ~16 Mbps (4 publishers × main HEVC)
- ONVIF/RTSP keepalives: ~1-2 Mbps
- Без cuframes setup тех же 4 cam × 3 consumer был бы ~45-50 Mbps

Source: production deploy 2026-05-19 measurement.
2026-05-19 19:22:53 +01:00
gx 3779175737 docs(benchmarks): production v0.2 deploy metrics (4 cam × 3 consumer)
Real-world numbers с production deploy 2026-05-19:
- RTSP к камерам: 12 → 4 (−67%)
- NVDEC sessions: 8 → 4 (−50%)
- Camera bandwidth: 34 → 16 Mbps (−54%)
- PCIe D2H copies: 346 MB/s → ~0 (−100% через zero-copy CUDA IPC)
- Frigate прямые RTSP: 8 → 0 (−100%)

Plus live nvidia-smi metrics, что сохранилось vs не сэкономлено,
projection table для других setup'ов (8/16 cam × 2/3/4 consumer).

Для promotional material — public-facing claims на основе measured deploy.
2026-05-19 19:07:16 +01:00
gx 98d1bb5296 release: v0.2.0 — encoded packet ring
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Failing after 3m3s
test-u4-runner / u4 runner smoke test (push) Successful in 1s
build / ffmpeg filter patch (out-of-tree) (push) Has been skipped
release / build runtime Docker image (push) Failing after 5m58s
release / build source tarball (push) Successful in 6m2s
- CHANGELOG: [Unreleased] → [0.2.0] — 2026-05-19
- CMakeLists VERSION 0.1.0 → 0.2.0 (both root + libcuframes)
- CUFRAMES_VERSION_MINOR: 1 → 2 в include/cuframes/cuframes.h

См. issue #2 (closed) + PR #4 (merged).
v0.2.0
2026-05-19 17:49:14 +01:00
gx 5536d23992 Merge pull request 'v0.2: encoded packet ring' (#4) from v0.2-encoded-packets into main
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 10m0s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 8m32s
2026-05-19 17:47:10 +01:00
gx 2b94742df4 ci: retry + explicit Node 20 version check в bootstrap
build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Successful in 6m24s
build / ffmpeg filter patch (out-of-tree) (pull_request) Successful in 6m21s
Symptom (run #1826 fail на u4-runner):
  Bootstrap step молча установил Node 12 (Ubuntu default) вместо Node 20
  из NodeSource → actions/checkout@v4 не парсится (ES2022 static blocks).

Cause:
  curl ... setup_20.x на slow network (u4 через VPN) timeout/fail silently,
  apt install fallback на default ubuntu nodejs (Node 12). Без error.

Fix:
  - curl --retry 3 --retry-delay 5 --connect-timeout 30
  - retry-loop на NodeSource setup (3 попытки)
  - явная verification major version >= 18 после install, fail с exit 1
    если установился Node < 18

Применяется к обоим jobs (cmake-build и filter-build).

Связано: PR #4 (v0.2), run #1826 fail.
2026-05-19 17:31:33 +01:00
gx fca07bf669 test+docs: packet ring stress test + Frigate dual-input guide (v0.2 Step 6)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Failing after 3m43s
build / ffmpeg filter patch (out-of-tree) (pull_request) Has been skipped
Тесты:
- libcuframes/tests/test_packet_ring.c — 2 scenarios:
  1) normal flow: 1 pub × 1 sub × 2000 packets, varied sizes, GOP=30,
     payload integrity check (seq в первых 8 байтах + pattern). PTS
     monotonicity, first KEY seq, нет data errors.
  2) slow consumer (10ms delay): publisher 200 fps, subscriber должен
     detect OVERRUN, library resync на keyframe — verify received >10
     даже на сильно медленном консьюмере.
- libcuframes/tests/CMakeLists.txt: add_test packet_ring_basic.

Docs:
- CHANGELOG.md: новая [Unreleased] секция с full v0.2 highlights и
  явно declared limitations (sub-stream, audio, codec change → v0.3).
- docs/integrations/frigate.md: новая секция "v0.2: dual-input (detect +
  record через один RTSP)" с config example, requirements, trade-offs.

Связано: #2, PR #4. Step 6 (final) перед снятием draft.
2026-05-19 17:08:17 +01:00
gx 8cd96721ff feat(rtsp-source): packet ring publishing (v0.2 Step 4)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Successful in 1m39s
build / ffmpeg filter patch (out-of-tree) (pull_request) Successful in 1m44s
- cuframes::Publisher (C++ wrapper): добавлены enable_packets(),
  set_codec_extradata(), publish_packet() методы.
- cuframes-rtsp-source: новый CLI flag --enable-packet-ring. При его
  установке после opening stream — pub.enable_packets(codec_id) +
  set_codec_extradata из vstream->codecpar->extradata.
- В main loop: после av_read_frame, до avcodec_send_packet, packet
  публикуется в packet ring с конверсией pts/dts из stream_tb в ns,
  AV_PKT_FLAG_KEY/CORRUPT/DISCONTINUITY → CUFRAMES_PKT_FLAG_*.

Тест:
  cuframes-rtsp-source --rtsp rtsp://... --key cam1 --enable-packet-ring
  # frames consumer'ы продолжают работать через cuframes:// (как v0.1)
  # record consumer'ы могут brать packets через cuframes_packets:// (Step 5)

Связано: #2, PR #4.
2026-05-19 16:45:29 +01:00
gx 4cb0321a6f feat(api): public C API для packet ring (v0.2 Step 3)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Successful in 1m36s
build / ffmpeg filter patch (out-of-tree) (pull_request) Successful in 1m24s
Публичные функции в include/cuframes/cuframes.h:
- cuframes_publisher_enable_packets(opts)  — активирует ring на
  существующем publisher'е; default sizing (64 slots, 8MiB data, 2MiB max).
- cuframes_publisher_set_codec_extradata(data, size) — SPS/PPS bytes.
- cuframes_publisher_publish_packet(data, size, pts, dts, flags)
- cuframes_subscriber_enable_packets()  — открывает packet shm у subscriber'а.
- cuframes_subscriber_next_packet(pkt_out, timeout_ms) с поллингом 1ms.
- cuframes_packet_data/size/pts/dts/flags/seq accessors.
- cuframes_subscriber_release_packet()
- cuframes_subscriber_get_codec_params()

Internal:
- producer.c: расширена struct cuframes_publisher (has_pkt_ring,
  max_packet_size, pkt_ring); cleanup в destroy(); enable_packets()
  bump'ит proto_version=2 в frames header.
- consumer.c: расширена struct cuframes_subscriber (has_pkt_ring,
  pkt_ring, last_packet_seq, packet_obj); single-packet pattern (как
  frame_obj — busy flag, переиспользование buffer). enable_packets()
  стартует с last_keyframe_seq-1 для late subscriber resync. На
  PACKET_OVERRUN автоматически resync на last_keyframe и возвращает
  ERR наружу для signalling discontinuity.

Связано: #2, PR #4.
2026-05-19 16:27:05 +01:00
gx bd7fd95fef feat(libcuframes): packet ring buffer implementation (v0.2 Step 2)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Successful in 1m37s
build / ffmpeg filter patch (out-of-tree) (pull_request) Successful in 1m21s
Реализация encoded packet ring per docs/protocol.md §10.

Files:
- internal.h: cuframes_pkt_slot_t (64b packed), cuframes_pkt_header_t
  (0x1040 fixed header), cuframes_pkt_ring_t handle, constants for
  default sizing, packet flags, helper inline functions for slot/data
  pointer arithmetic.
- packet_ring.c (new, ~290 LOC): create/open/publish/read/destroy.
  Stale recovery симметрично frames SHM (pid liveness check). Seqlock
  pattern для subscriber защиты от overrun mid-read (post-check seq
  после copy). Wraparound memcpy helpers для variable-length data ring.
- utils.c: cuframes_internal_pkt_shm_name helper + strerror entries.
- cuframes.h: 4 новых error codes (PACKET_OVERSIZED, NO_PACKET_RING,
  NO_CODEC_PARAMS, PACKET_OVERRUN).
- CMakeLists.txt: src/packet_ring.c в sources.

API внутренний (cuframes_internal_pkt_ring_*) — publicly exposed
функции будут в Step 3 (cuframes.h API extension).

Связано: #2 (v0.2), PR #4 (draft).
2026-05-19 16:11:42 +01:00
gx ad75aa9624 docs(protocol): v0.2 — encoded packet ring spec (§10)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (pull_request) Successful in 1m35s
build / ffmpeg filter patch (out-of-tree) (pull_request) Successful in 1m39s
Полный wire-protocol spec для encoded packet ring:
- Отдельный SHM /dev/shm/cuframes-<key>-packets (variable-length)
- Backward-compat с v1: proto_version=2 publishers принимают v1 subscribers
- HELLO_REQ/HELLO_RESP extension через reserved bytes — без слома v1 layout
- Codec extradata (SPS/PPS) в shared header
- Late subscriber → keyframe-aligned start (initial_packet_seq)
- Seqlock pattern для защиты от overrun mid-read
- API extension: publish_packet, next_packet, get_codec_params
- 4 новых error codes (OVERSIZED, NO_PACKET_RING, NO_CODEC_PARAMS, PACKET_OVERRUN)

Связано: #2
2026-05-19 16:04:00 +01:00
gx 264b9d59db roadmap: future ideas — gst-cuframes-src + vf_cuda_grid
Две идеи добавлены в новую секцию "Future ideas" (без ETA):

- gst-cuframes-src: GStreamer source-element для DeepStream / обычных
  GStreamer pipeline'ов. Аналог FFmpeg-демуксера для другого стека.

- vf_cuda_grid: FFmpeg filter с runtime grid composition полностью
  на GPU. Заменяет custom C++ GridComposer cctv-processor (см. gx/cctv#22).
  Превращает cuframes в GPU-native video routing platform.

Обе идеи waiting на планирование, scope для v0.5+.
2026-05-19 15:58:49 +01:00
gx d2bae7d0fd ci: clone ffmpeg-patched через GITHUB_SERVER_URL (для VPN-runner'а)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m57s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 3m36s
Жёсткий URL git.goldix.org не работает на u4-runner — там
gitea доступен только через VPN (10.8.0.6:3222). Используем
переменную runner'а — на R9 = 192.168.88.23:3222, на u4 = 10.8.0.6:3222.
2026-05-19 02:55:14 +01:00
gx eb3c058341 ci: smoke test workflow для verify u4 runner через VPN
test-u4-runner / u4 runner smoke test (push) Successful in 54s
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 38m52s
build / ffmpeg filter patch (out-of-tree) (push) Failing after 1m34s
2026-05-19 02:12:38 +01:00
gx 612843bd39 docs: launch drafts (Frigate discussion + FFmpeg-devel RFC + Show HN)
3 черновика для upstream visibility (Etap E):
- docs/launch/frigate-integration-issue.md — Discussion на blakeblackshear/frigate
- docs/launch/ffmpeg-devel-rfc.md — RFC patch + cover letter для ffmpeg-devel ML
- docs/launch/hn-show-post.md — Show HN draft (Etap F)
- docs/launch/README.md — порядок, чек-лист, pre-flight notes

См. issue #3.
2026-05-19 02:04:42 +01:00
gx bcc1d29ae8 ci: clone FFmpeg из local gitea fork (вместо unstable upstream github clone)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m52s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m31s
git clone github.com/FFmpeg/FFmpeg на слабом интернете оборвался через 11 мин
(RPC HTTP/2 CANCEL). Local gx/ffmpeg-patched n7.1-cuframes branch имеет
patch уже applied — clone instant без internet round-trip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 00:40:40 +01:00
gx fbe1d18c39 docs: troubleshooting guide + production notes
- docs/troubleshooting.md — 13 секций с реальными grабельками которые мы
  прошли: cudaIpcOpenEventHandle invalid device context (pid namespace),
  s6-overlay vs pid share, scale_cuda missing (cuda-llvm + stdbit.h glibc 2.36),
  libcuframes not found install paths, ffbuild/ missing source, GMP no working
  compiler (long-long reliability), zlib.net deprecated URL, RTSP/RTP UDP
  docker NAT, gitea actions Node version
- docs/architecture.md — Appendix A "Production deployment notes" с реальными
  observations после 24h+ run: что подтвердилось, что доработали, что не учли
- docs/requirements.md — production deployment matrix + Docker namespace
  requirements таблица (cross-container CUDA IPC требует 5 условий)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 00:37:13 +01:00
gx 022a198c33 ci: same Node 20 bootstrap для filter-build job (как в cmake-build)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 13m20s
build / ffmpeg filter patch (out-of-tree) (push) Failing after 18m48s
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 00:05:59 +01:00
gx 611918ce7a ci: install Node 20 from NodeSource (apt nodejs = Node 12 — слишком старый для actions/checkout@v4)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m48s
build / ffmpeg filter patch (out-of-tree) (push) Failing after 51s
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 21:56:33 +01:00
gx 00fb3e9528 ci: preinstall node+git в CUDA container (actions/checkout требует node)
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Failing after 1m6s
build / ffmpeg filter patch (out-of-tree) (push) Has been skipped
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 21:47:25 +01:00
gx 4a6a6f4a6c ci: gitea Actions workflows (build, release) + README badges
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Failing after 1m4s
build / ffmpeg filter patch (out-of-tree) (push) Has been skipped
- .gitea/workflows/build.yml — on push/PR:
    * cmake build на CUDA 12.4 devel image (Ubuntu 22.04 base)
    * compile-only smoke (no GPU нужен): libcuframes.so + tools + examples
    * install-prefix layout verify (headers + libs в правильных путях)
    * filter/ — clone FFmpeg n7.1 + apply patch + build minimal patched
      ffmpeg, verify cuframes demuxer registered

- .gitea/workflows/release.yml — on tag v*:
    * build runtime Docker image, push в git.goldix.org/gx/cuframes:<version>
    * build source tarball cuframes-<version>.tar.gz как artifact

- README.md badges: build status, release version, license

Runner: gitea act_runner v0.4.1 на R9-88.23 — labels ubuntu-22.04 / ubuntu-24.04
доступны через docker.gitea.com/runner-images. CUDA devel image использует
nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04 (уже cached на runner host).

Stress test (требует GPU) намерено НЕ в CI — runner без GPU. Запускать
отдельно на dev-машине через ctest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 21:43:55 +01:00
gx 12708618d4 docs: reference integrations + examples
- docs/integrations/frigate.md — полный production-tested guide:
  Dockerfile, docker-compose, config.yml, troubleshooting (s6+pid, scale_cuda,
  hwaccel issues), build steps
- docs/integrations/cctv-cpp.md — C++ pattern: IFrameSource interface +
  CuframesSource skeleton + CMake setup + runtime requirements
- examples/frigate-compose/ — reference compose stack (cuframes-pub + Frigate)
  с config.yml stub, .env.example, README
- examples/python-consumer/ — ctypes-based skeleton для AI/ML pipeline'ов
  (до v0.3 native pybind11 bindings)
- docs/integration.md — превратился в index-страницу, ссылается на specific guides

Reorganization упрощает onboarding: пользователь выбирает guide по типу
integration'а (Frigate/C++/Python/FFmpeg) и сразу видит реальный code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 21:37:35 +01:00
gx a3ba3a95b2 docs: ROADMAP + CHANGELOG v0.1.0 + BENCHMARKS
- ROADMAP.md: structured v0.1 / v0.2📋 (encoded packet sharing + FFmpeg
  upstream PR + scale-cuda alt) / v0.3 (Python bindings, Jetson, multi-GPU)
  / v1.0 (stable ABI)
- CHANGELOG.md: full v0.1.0 release notes — features, tested config,
  production deployment, known limitations
- BENCHMARKS.md: measurements (stress 1×pub×4×sub, E2E real camera,
  prod multi-consumer 24h, VRAM cost per resolution, cuframes vs N×NVDEC)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
v0.1.0
2026-05-18 21:11:37 +01:00
gx 601806a5f8 build: add cmake install rules for libcuframes
cmake --install теперь правильно кладёт libcuframes.so/.a в lib/ и
headers в include/cuframes/. Нужно для downstream builders (FFmpeg
patched build, deb packaging).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 12:52:16 +01:00
gx 99ab0e0524 Merge pull request 'feat(filter): FFmpeg 7.1 cuframes:// input demuxer (PoC v1)' (#1) from feat/ffmpeg-demuxer into main
Reviewed-on: #1
2026-05-17 09:08:09 +01:00
gx 99df68f69c feat(filter): FFmpeg 7.1 cuframes:// input demuxer
Out-of-tree patch + sources для FFmpeg-демаксера, который позволяет любому
FFmpeg-based потребителю (Frigate, кастомные рекордеры, re-streamers)
читать "cuframes://<key>" как обычный URL — без своего NVDEC.

Состав:
- filter/cuframesdec.c — реализация (libavformat-style)
- filter/ffmpeg-7.1-cuframes-demuxer.patch — patch для FFmpeg n7.1
  (Makefile / allformats.c / configure)
- filter/README.md — инструкции по сборке + CLI smoke test + Frigate plan

v1 ограничения (намеренно):
- только NV12
- GPU → CPU копия через cudaMemcpy2DAsync (zero-copy AVHWFramesContext — v2)

CLI smoke test 2026-05-17 (host build FFmpeg + libcuframes,
publisher на камере 192.168.88.98 1920x1080 HEVC 25fps):
  ffmpeg -f cuframes -i cuframes://cam-ff -c:v copy -f null -
  → frame=100 fps=25 q=-1.0 speed=1x  ✓
  → "cuframes: connected to 'cam-ff' — 1920x1080 NV12"

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 09:02:12 +01:00
gx f10413580d docs: cross-container CUDA IPC requires both --ipc и --pid namespace share
Реальный тест на 192.168.88.98 (1920x1080 HEVC, 25fps) показал: для отдельных
consumer-container'ов недостаточно ipc=container:X — нужен также
pid=container:X, иначе cudaIpcOpenEventHandle падает с invalid device
context. CUDA driver валидирует IPC peer через /proc/<pid>/...

E2E на реальной камере проверен:
  publisher (отдельный контейнер) -> consumer (docker exec): 250 frames, 0 gaps
  publisher (отдельный контейнер) -> consumer (отдельный с pid+ipc): 200, 0 gaps

Обновлено:
- docs/integration.md compose snippet, verification, troubleshooting section
- docker-compose.example.yml — добавлен pid: container:cuframes-cam-test
- README.md quickstart — добавлен --pid в docker run subscriber

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 06:37:09 +01:00
gx 44dab75e08 docs+docker: integration guide и runtime image для Frigate/cctv stack
docs/integration.md — детальный guide для интеграции в существующий CCTV
docker-compose: критичные требования (ipc=shareable/container, общий
shared volume для socket), пример CuframesSource для cctv-processor,
verification checklist, troubleshooting (timeout, ipc namespace mismatch,
high latency). Зафиксировано: v0.1 frigate-decode не убирается без
patch'а FFmpeg — это v0.2 scope.

docker/Dockerfile.runtime — multi-stage build (devel → runtime), копирует
libcuframes.so + cuframes-rtsp-source + sub_count в /usr/local. Образ
~700 MB (vs ~7 GB у dev'а). Smoke-test: бинарки запускаются, ldd видит
все нужные libs.

docker-compose.example.yml — reference docker-compose с правильным ipc
mode и volume mounts для копирования в свои проекты.

.dockerignore — исключает build/ и build-*/ из COPY context.

README обновлён: статус v0.1 done, quickstart с реальным docker run,
ссылка на integration guide.
2026-05-14 23:47:56 +01:00
gx a21812d3f6 tools+examples+test: end-to-end pipeline ready (Steps 9-10)
cuframes-rtsp-source — standalone bridge между RTSP/file и cuframes IPC.
Декодирует на CUDA (nvdec), копирует D2D в pre-allocated pool (EXTERNAL
ownership), публикует через cuframes. --realtime для pacing файлового
ввода, --loop для зацикливания. Альтернатива FFmpeg-фильтра до v0.2
(filter требует patch FFmpeg, конфликтует с Frigate's bundled build).

examples/sub_count — reference subscriber на raw C API: counts frames,
trackit gaps, выходит clean при disconnect/timeout/SIGINT.

test_stress (4 subscribers × 2000 frames @ 120fps) — PASS на RTX 5090.
0 torn frames у всех consumers (включая 2 slow с 5ms sleep).

Smoke-проверено: testsrc 25fps → cuframes-rtsp-source → cuframes IPC
→ sub_count (отдельный процесс) → 200/200 frames, 0 gaps, avg_fps=25.2.
2026-05-14 23:39:01 +01:00
gx 2530057507 hpp: C++ RAII wrapper (header-only, Step 7)
Тонкий слой поверх C API:
- cuframes::Error — exception при ошибках, code() для подробностей
- cuframes::Publisher — RAII обёртка publisher'а (LIBRARY + EXTERNAL constructors)
- cuframes::Subscriber + cuframes::FrameRef — RAII frame с автo-release
- cuframes::AsyncSubscriber — с std::function callbacks
- cuframes::Frame — read-only view (для callback'а)
- cuframes::calc_frame_size(), now_ns() — utilities

Smoke test (in dev container):
  $ g++ -std=c++17 ... -lcuframes -lcudart smoke.cpp
  $ ./smoke
  version: 0.1.0
  FullHD NV12 frame: 3317760 bytes (pitch_y=2048, pitch_uv=2048)
2026-05-14 23:23:35 +01:00
gx 46c2b94939 libcuframes v0.1: producer + consumer (sync + async) + tests
Implements Steps 3-6 of Phase 1 according to docs/protocol.md.

libcuframes/src/:
- internal.h     (660 lines) — shared structs (byte-exact protocol.md layout)
                                + _Static_assert на offsets/sizes
- utils.c        — error strings, frame size calc, now_ns, key validation
- protocol.c     — TLV framing для Unix socket с poll-based timeout
- producer.c     (~700 lines) — Step 3:
                    * LIBRARY mode: cudaMalloc pool, IPC handle export
                    * EXTERNAL mode: register user-provided pointers
                    * cudaIpcEventHandle_t для cross-process sync (R1/R2)
                    * Unix socket accept thread, handshake state machine
                    * Bit allocation 1..31, name collision check (Y5)
                    * STRICT_WAIT policy: timeout with dead-subscriber eviction
- consumer.c     (~400 lines) — Step 4:
                    * Synchronous next() with poll-based wait
                    * cudaStreamWaitEvent на consumer-stream (R1/R2)
                    * Opaque cuframes_frame_t с accessor functions (Y6)
                    * NEWEST_ONLY и STRICT_ORDER modes
                    * ACK via atomic_fetch_or на bitmap
- consumer_async.c — Step 5: thread + callback wrapper над sync API

libcuframes/tests/:
- test_pingpong.cu — single producer × single consumer, 200 frames @ 60fps,
                     verify через kernel-on-consumer-stream (правильный test
                     для sync semantics, см. spike-v2)
- test_multi.cu    — 1 producer × 3 consumers через fork()

Build:
- Top-level CMakeLists.txt с options
- libcuframes/CMakeLists.txt: shared + static library, c_std_11
- Suppress -Waddress-of-packed-member (известная безопасная warning x86_64)

Results (внутри cuframes-dev container, RTX 5090):
- pingpong_basic PASS  4.5s  200 frames, 0 torn
- multi_consumer PASS  4.1s  1 × 3 consumers, all PASS

Phase 1 Step 6 done. Дальше: Step 7 (C++ wrapper), Step 9 (FFmpeg filter).
2026-05-14 23:21:30 +01:00
gx dc478c7cda docs: system requirements (hardware, software, build, Docker, k8s)
docs/requirements.md (220 строк):
- Hardware: NVIDIA GPU CC ≥7.5 (Turing+), Linux x86_64, VRAM/RAM/CPU minimum
- Software host: kernel ≥5.4, driver ≥525/555, glibc ≥2.31, Ubuntu/Debian/RHEL
- Build deps: CUDA Toolkit ≥12.0, GCC 11+, CMake 3.20+, FFmpeg 4.4+
- Docker: nvidia-container-toolkit, --gpus, --ipc=shareable, --shm-size=2gb
- Cross-container CUDA IPC: variant A (--ipc=container:X), variant B (host),
  k8s через emptyDir + shareProcessNamespace
- Out-of-scope: AMD/Intel/macOS/Windows/WSL2/Jetson/multi-GPU/multi-host
- Quick-check команды (nvidia-smi, uname, ldd, df /dev/shm)
- Tested matrix (Phase 0): RTX 5090, driver 595, CUDA 13.0.88, Ubuntu 24.04

README.md обновлён:
- Краткая таблица minimum vs recommended
- Список не-поддерживаемых платформ
- Ссылки на все docs/ файлы (architecture, protocol, requirements, benchmarks)
2026-05-14 23:11:30 +01:00
gx 6608f5d2f6 docs(protocol): bit-exact wire protocol specification (R4)
Closes последний RED-flag из arch review.

Что описано (§-sections):
1. Resources & lifecycle (socket / shm / IPC handles cleanup, crash recovery)
2. Shared memory byte-by-byte layout (offsets, packing, atomics)
   2.1 frame meta (64 bytes)
   2.2 slot descriptor (192 bytes)
   2.3 subscriber slot (128 bytes)
3. Unix socket TLV protocol (8 message types, framing)
4. State machines (subscriber-side, publisher-side per-subscriber)
5. ACK protocol с cudaEventRecord / cudaStreamWaitEvent
6. Versioning rules (proto_version vs lib_version, reserved fields)
7. Conformance test skeleton (offset checks, sizeof checks, handshake)
8. Open для v0.2 (TLS, multi-format, ROCm)
9. Reference impl pointer (libcuframes/src/protocol.c — Phase 1)

После v0.2 release — wire protocol frozen, breaking changes = bump
proto_version. До v0.2 — experimental.

Решает все 4 пункта из arch review section R4:
✓ SHM layout (annotated struct + ASCII layout)
✓ Socket protocol (state machine + message framing)
✓ Versioning rules
✓ Lifecycle / cleanup (incl. CUDA IPC handle leak при crash)

Готов к Step 2 (Phase 1 implementation).
2026-05-14 23:04:46 +01:00
gx 98a60b7730 header v2: address arch review R3 + Y4/Y5/Y6/Y7/Y9
R3 (publisher API не работает с FFmpeg's hwframe pool):
- Добавлен ownership_mode field: LIBRARY (default, текущий API) или EXTERNAL.
- Новая функция publisher_create_external(cuda_ptrs[], ptr_count, frame_size)
  для случая когда CUDA память выделена upstream (FFmpeg AVHWFramesContext).
- Новая publish_external(cuda_ptr) — публикует один из pre-registered handles.
- Для FFmpeg filter теперь zero-copy: filter получает AVFrame, library уже
  имеет IPC handle на этот pointer (registered в create), publish — atomic seq bump.

R1/R2 closure отражено в API:
- publish() теперь принимает cudaStream_t — library делает cudaEventRecord
  вместо stream sync.
- next() теперь принимает consumer_stream — library делает cudaStreamWaitEvent
  перед возвратом frame. Cross-process sync через cudaIpcEventHandle_t.

Y6 (opaque frame через handle, не struct с _internal_*):
- cuframes_frame_t стал opaque (typedef struct, не определена).
- Accessor functions: cuda_ptr, format, size, pitch_y, pitch_uv, seq, pts_ns.
- ABI-stable при добавлении полей в minor releases.

Y7 (redundant try_next):
- Удалён subscriber_try_next. next(.., timeout_ms=0) — non-blocking
  с CUFRAMES_ERR_WOULD_BLOCK.

Y5 (consumer_name uniqueness):
- Документировано что duplicate name → ALREADY_EXISTS.
- Добавлен CUFRAMES_ERR_TOO_MANY для случая >32 subscribers.

Y9 (pts_ns clock):
- Документировано что MONOTONIC у publisher'а, consumer должен sanity-check
  на epoch reset при publisher restart.

Также:
- meta-блок (cuframes_frame_meta_t) перестал быть public — meta доступна
  через accessor'ы на opaque frame.
- _reserved[4] в configs для forward-compat без breaking ABI.
- Добавлен cuframes_protocol_version() — wire protocol majoring отдельно
  от lib version.

Готов к Step 2 (docs/protocol.md + implementation).
2026-05-14 23:02:50 +01:00
gx fe330ca279 arch: close open question §6.6 — events as default for cross-process sync
См. spike-v2 (commit ad54305) + arch review 2026-05-15.

cudaStreamSynchronize-only фактически работает на single-host single-GPU
(0 torn в 4 scenarios PoC), но NVIDIA Programming Guide §3.2.8 не даёт
contractual гарантии. Переключаемся на cudaIpcEventHandle_t как default,
stream-sync остаётся опциональным fallback.

Net: +20µs mean latency, -3× max latency (predictable tail), future-proof
для multi-GPU.
2026-05-14 23:00:40 +01:00
gx ad543054fc spike-v2: validate sync semantics (R1/R2 architectural review)
Architectural review (2026-05-15) указал что cudaStreamSynchronize-only на
producer-side не достаточен для cross-process visibility — NVIDIA Programming
Guide §3.2.8 требует cudaIpcEventHandle_t. Phase 0 PoC v1 не проверял этот
случай из-за cudaMemcpy который имеет implicit barriers.

spike-v2 воспроизводит правильный сценарий: consumer запускает verify_kernel
на ОТДЕЛЬНОМ stream'е (real-world use case — PyTorch / OpenCV CUDA), pattern
включает row-based component для отлова partial-frame torn.

Запуск 4 scenarios × 1500/600 frames:
  A-fhd60 (stream sync, FHD@60):  0 torn, p99=267µs, max=14.7ms
  B-fhd60 (event  sync, FHD@60):  0 torn, p99=344µs, max=5.2ms
  A-4k30  (stream sync, 4K@30):   0 torn, p99=606µs, max=4.4ms
  B-4k30  (event  sync, 4K@30):   0 torn, p99=437µs, max=3.7ms

Все 4 показали 0 torn frames. R1 на single-host single-GPU фактически
не воспроизводится — но NVIDIA contractually не гарантирует это.

Decision: events as default (R1/R2 resolved). Architecture.md §6.6 закрыт.
Tradeoff: mean latency +20µs, max latency в 3× ниже (predictable tail) +
future-proof для multi-GPU.

Также Dockerfile.dev — апдейт CUDA до 13.0.3 (12.4 не существует с devel-ubuntu24.04).

Связано с PR review: R1, R2, R3 (R3, R4 — в следующих коммитах).
2026-05-14 23:00:13 +01:00
gx c2c2a9751a phase0: benchmark results — PASSED on RTX 5090 (Blackwell sm_120)
Basic (1 producer × 1 consumer):
  p50=75µs  p95=146µs  p99=152µs   (target was <5ms — мы 33× ниже)
  500 frames, 0 torn, 0 skipped, zero-copy verified

Multi-consumer (1 × 3):
  p99 для всех 3: 151-152µs (identical = proof zero-copy без contention)
  300 frames each, 0 torn, 0 skipped

Acceptance criteria — GREEN. Переходим к Phase 1 (libcuframes API).

Sync через cudaStreamSynchronize достаточен для v0.1; CUDA IPC event
handles overlap отложен до v0.2.

Raw measurement logs сохранены в docs/measurements/phase0-consumer-*.log
для verification (4 файла из 2 scenarios).

Также fixed unused variable warning в pingpong_consumer.cu.
2026-05-14 22:02:49 +01:00