v0.3.3: consumer post-sync verify даже для v0.3 per-slot events
release / build runtime Docker image (push) Failing after 0s
release / build source tarball (push) Successful in 4s
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m41s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m29s
test-u4-runner / u4 runner smoke test (push) Has been cancelled
release / build runtime Docker image (push) Failing after 0s
release / build source tarball (push) Successful in 4s
build / cmake build (CUDA 12.4, Ubuntu 22.04) (push) Successful in 1m41s
build / ffmpeg filter patch (out-of-tree) (push) Successful in 1m29s
test-u4-runner / u4 runner smoke test (push) Has been cancelled
Bug: cudaEventRecord(event[slot]) overwrites previous state каждый publish. Когда producer wraps ring (~640ms при ring=16), event[slot] re-recorded для new content. Consumer's pending cudaStreamWaitEvent satisfied новым signal — consumer reads slot[slot_idx] thinking it's target_seq, реально получает seq+ring_size content (stale-by-1-wrap drift). После 50k+ wraps в long-running pipeline (9h uptime) drift накапливается: output stream имеет 60-70% duplicate frames (vs 10% сразу после restart). Симптом: TV picture freezes на 1-2 sec периодически. Encoder fps=25 stable (content duplicates same PTS-advance), но motion choppy на 8-9 fps real. Fix: unconditional post-sync verify (atomic re-read slot.seq после event wait). Если producer wrap occurred — slot.seq != target_seq → continue к новому target_seq. Cheap (one atomic load), correctness > perf. Verified: после deploy с fresh pipeline, 18-sec sample = 4% duplicates (vs 8.4% при том же setup но без fix). Proper v0.4 fix: per-slot+per-publish event pool с unique handle per cycle. Текущий v0.3.3 — sufficient mitigation для current production scale. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -322,15 +322,22 @@ int cuframes_subscriber_next(cuframes_subscriber_t *sub,
|
||||
if (cerr != cudaSuccess) return CUFRAMES_ERR_CUDA;
|
||||
}
|
||||
|
||||
/* TOCTOU защита (v0.2 fallback only): legacy single event signals
|
||||
* для последнего published frame. v0.3 per-slot events не нужны
|
||||
* этой проверки — event[slot] = strict slot ordering guarantee. */
|
||||
if (!sub->has_slot_events) {
|
||||
uint64_t verify_seq = atomic_load_explicit(&sub->hdr->slots[slot_idx].seq,
|
||||
memory_order_acquire);
|
||||
if (verify_seq != target_seq) {
|
||||
continue;
|
||||
}
|
||||
/* TOCTOU защита — unconditional (v0.2 и v0.3 обa). v0.3 per-slot
|
||||
* events НЕ guaranty ordering: cudaEventRecord overwrites previous
|
||||
* state каждый publish. Если producer wrapped ring пока consumer
|
||||
* ждал event sync, slot[slot_idx] уже содержит seq > target.
|
||||
* Event signal от nового publish satisfies stale wait — consumer
|
||||
* читает new content thinking it's old (lazy consumption).
|
||||
*
|
||||
* Симптом в long-running pipeline: 50k+ ring wraps накапливают drift,
|
||||
* output stream duplicates 60-70% frames despite stable encoder fps.
|
||||
*
|
||||
* Proper v0.4 fix: per-slot+per-publish event handle (event pool).
|
||||
* Сейчас — post-sync verify catches main race window. */
|
||||
uint64_t verify_seq = atomic_load_explicit(&sub->hdr->slots[slot_idx].seq,
|
||||
memory_order_acquire);
|
||||
if (verify_seq != target_seq) {
|
||||
continue;
|
||||
}
|
||||
|
||||
/* Fill frame_out */
|
||||
|
||||
Reference in New Issue
Block a user