From d646f5a4e41bd0fbae436bbaa430c586afe2d298 Mon Sep 17 00:00:00 2001 From: Evgeny Demchenko Date: Sun, 24 May 2026 20:27:00 +0100 Subject: [PATCH] =?UTF-8?q?v0.3.3:=20consumer=20post-sync=20verify=20?= =?UTF-8?q?=D0=B4=D0=B0=D0=B6=D0=B5=20=D0=B4=D0=BB=D1=8F=20v0.3=20per-slot?= =?UTF-8?q?=20events?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bug: cudaEventRecord(event[slot]) overwrites previous state каждый publish. Когда producer wraps ring (~640ms при ring=16), event[slot] re-recorded для new content. Consumer's pending cudaStreamWaitEvent satisfied новым signal — consumer reads slot[slot_idx] thinking it's target_seq, реально получает seq+ring_size content (stale-by-1-wrap drift). После 50k+ wraps в long-running pipeline (9h uptime) drift накапливается: output stream имеет 60-70% duplicate frames (vs 10% сразу после restart). Симптом: TV picture freezes на 1-2 sec периодически. Encoder fps=25 stable (content duplicates same PTS-advance), но motion choppy на 8-9 fps real. Fix: unconditional post-sync verify (atomic re-read slot.seq после event wait). Если producer wrap occurred — slot.seq != target_seq → continue к новому target_seq. Cheap (one atomic load), correctness > perf. Verified: после deploy с fresh pipeline, 18-sec sample = 4% duplicates (vs 8.4% при том же setup но без fix). Proper v0.4 fix: per-slot+per-publish event pool с unique handle per cycle. Текущий v0.3.3 — sufficient mitigation для current production scale. Co-Authored-By: Claude Opus 4.7 --- libcuframes/src/consumer.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/libcuframes/src/consumer.c b/libcuframes/src/consumer.c index 5cc8b33..fce8170 100644 --- a/libcuframes/src/consumer.c +++ b/libcuframes/src/consumer.c @@ -322,15 +322,22 @@ int cuframes_subscriber_next(cuframes_subscriber_t *sub, if (cerr != cudaSuccess) return CUFRAMES_ERR_CUDA; } - /* TOCTOU защита (v0.2 fallback only): legacy single event signals - * для последнего published frame. v0.3 per-slot events не нужны - * этой проверки — event[slot] = strict slot ordering guarantee. */ - if (!sub->has_slot_events) { - uint64_t verify_seq = atomic_load_explicit(&sub->hdr->slots[slot_idx].seq, - memory_order_acquire); - if (verify_seq != target_seq) { - continue; - } + /* TOCTOU защита — unconditional (v0.2 и v0.3 обa). v0.3 per-slot + * events НЕ guaranty ordering: cudaEventRecord overwrites previous + * state каждый publish. Если producer wrapped ring пока consumer + * ждал event sync, slot[slot_idx] уже содержит seq > target. + * Event signal от nового publish satisfies stale wait — consumer + * читает new content thinking it's old (lazy consumption). + * + * Симптом в long-running pipeline: 50k+ ring wraps накапливают drift, + * output stream duplicates 60-70% frames despite stable encoder fps. + * + * Proper v0.4 fix: per-slot+per-publish event handle (event pool). + * Сейчас — post-sync verify catches main race window. */ + uint64_t verify_seq = atomic_load_explicit(&sub->hdr->slots[slot_idx].seq, + memory_order_acquire); + if (verify_seq != target_seq) { + continue; } /* Fill frame_out */