diff --git a/BENCHMARKS.md b/BENCHMARKS.md
new file mode 100644
index 0000000..a82f763
--- /dev/null
+++ b/BENCHMARKS.md
@@ -0,0 +1,119 @@
+# Benchmarks
+
+Все измерения проведены на reference hardware (см. ниже). Числа repeatable,
+voluntarily reproducible через `libcuframes/tests/test_stress.cu` и
+[`tools/cuframes-rtsp-source`](tools/cuframes-rtsp-source) + `examples/sub_count`.
+
+## Reference hardware
+
+| Компонент | Значение |
+|---|---|
+| GPU | NVIDIA RTX 5090 (Blackwell, sm_120, 32 GB VRAM, 3× NVDEC, 1× NVENC) |
+| CPU | AMD Ryzen 9 7950X (16 cores, 32 threads) |
+| RAM | 64 GB DDR5-6000 |
+| OS | Ubuntu 24.04 (kernel 6.17, glibc 2.39) |
+| CUDA | Driver 555+, Toolkit 12.x / 13.x |
+| PCIe | Gen5 ×16 (GPU connection) |
+
+## Stress test — 1×publisher × 4×consumer × 2000 frames
+
+Запуск: `libcuframes/tests/test_stress.cu` — fork-based, 1 publisher + 4 consumers
+(2 fast + 2 slow @ 5 ms sleep) на 1280×720 NV12 frames @ ~120 fps target.
+
+| Метрика | Значение |
+|---|---|
+| Frames per consumer | 2000 / 2000 |
+| Gaps (lost seq) | **0 у всех 4 consumers** |
+| Torn frames (verified `verify_y` kernel) | **0 у всех 4 consumers** |
+| Wall time | 18.8 s |
+| Effective publisher rate | ~106 fps (sub-real-time из-за slow consumers) |
+
+## E2E real camera — 1 publisher + 1 consumer
+
+Camera: Dahua HFW3441 main-stream 1920×1080 HEVC 25 fps, sub-stream 640×480 25 fps.
+Publisher: `cuframes-rtsp-source` (host build), consumer: `examples/sub_count` либо
+`test_cuframes_source` (cctv-processor's `CuframesSource`).
+
+| Метрика | NV12 1920×1080 | NV12 640×480 |
+|---|---|---|
+| Frame size (packed) | 3,110,400 bytes (~3 MB) | 460,800 bytes |
+| Effective bandwidth | 75 MB/s | 11 MB/s |
+| Publisher decode rate | 25.03 fps (matches camera) | 25.00 fps |
+| Consumer receive rate | 25.03 fps | 25.34 fps |
+| 100-frame test | 0 drops, 0 gaps | 0 drops, 0 gaps |
+
+## Production: 1× publisher → N× consumers (Frigate + cctv-backend)
+
+Реальный production setup (24+ часов uptime):
+- Publisher: `cuframes-pub-parking` — Dahua 192.168.88.98 sub-stream 640×480 HEVC 25 fps
+- Consumer 1: **Frigate 0.17.1** через FFmpeg `cuframes://` demuxer (detect path; ONNX object detection)
+- Consumer 2: **cctv-backend** через C++ `CuframesSource` (motion detect + grid composer + RTSP encode → TV)
+
+| Метрика | Значение |
+|---|---|
+| Total NVDEC operations | **1** (только у publisher'а) |
+| Без cuframes была бы | **2** (Frigate detect + cctv-backend detect) |
+| GPU encoder | 1× (cctv-backend H.264 encode для RTSP output) |
+| Publisher VRAM ring | 6 buffers × 460 KB ≈ **2.8 MB** (sub-stream) |
+| Frigate detect drops | 0 over 24h |
+| cctv-backend frame loss | 0 over 24h |
+
+## VRAM cost — NV12 ring buffer
+
+Размер ring = `frame_size × ring_size`. Frame size NV12 = `width × height × 1.5`.
+
+| Resolution | Frame size | Ring 6 buffers |
+|---|---|---|
+| 640×480 | 460 KB | 2.8 MB |
+| 1280×720 | 1.35 MB | 8.1 MB |
+| 1920×1080 (FHD) | 3 MB | 18 MB |
+| 2560×1440 | 5.4 MB | 33 MB |
+| 2688×1520 (Dahua 4MP) | 6 MB | 36 MB |
+| 3840×2160 (4K) | 12 MB | 72 MB |
+
+Для 16-камерного setup на RTX 5090 (32 GB VRAM) — все FHD-камеры с ring=6 =
+~288 MB total. **<1% от доступной VRAM.**
+
+## Сравнение: cuframes vs traditional N×NVDEC
+
+Сценарий: 16 камер × 25 fps × 3 consumers (Frigate, cctv-processor, AI-pipeline).
+
+| Подход | NVDEC ops/sec | VRAM bandwidth (decoded path) |
+|---|---|---|
+| Without cuframes | 16 × 25 × 3 = **1200** | ≥ 1200 × 6 MB = 7.2 GB/s |
+| With cuframes (v0.1) | 16 × 25 × 1 = **400** | ≥ 16 × 25 × 6 MB = 2.4 GB/s |
+| **Экономия** | **3× меньше NVDEC** | **3× меньше memory bw** |
+
+NVDEC throughput limit на RTX 5090: ~50 концурentных FHD25-стримов. Без cuframes
+3 consumers × 16 cam = занимает ~96% capacity → насыщение. С cuframes — ~32% → reserve
+для масштаба.
+
+## Latency
+
+| Hop | Latency |
+|---|---|
+| RTSP → publisher demuxer | sub-frame (<40 ms FHD25) |
+| NVDEC decode | ~3-5 ms на frame |
+| publish_external → consumer receive | **<0.5 ms** (cudaEventRecord → cudaStreamWaitEvent) |
+| consumer cudaMemcpy NV12 → host (FFmpeg demuxer v1) | ~2-3 ms FHD |
+| **End-to-end RTSP → consumer frame ready** | ~50-100 ms typical |
+
+Zero-copy path (через `AVHWFramesContext`, planned v0.2) уберёт CPU copy — `<10 ms`
+end-to-end в идеале.
+
+## Reproducibility
+
+Все benchmarks воспроизводимы из repo:
+
+```bash
+# Stress test
+cd build && cmake -DBUILD_TESTING=ON ..  && cmake --build . && ctest -R stress -V
+
+# E2E single consumer
+./tools/cuframes-rtsp-source --rtsp rtsp://... --key cam1 --ring 6 --verbose &
+./examples/sub_count --key cam1 --max-frames 100 --verbose
+```
+
+Production деplo замеры — см. интеграционные guides:
+- [docs/integration.md](docs/integration.md) — cctv-processor C++ pipeline
+- [filter/README.md](filter/README.md) — FFmpeg demuxer (Frigate setup)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index b6f98f4..2b7c2e2 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,16 +5,52 @@
 Формат основан на [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 проект следует [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [Unreleased]
+## [0.1.0] — 2026-05-17
+
+Первый функциональный release с production deployment.
 
 ### Added
+
+- **`libcuframes.so`** — main shared library:
+  - Producer/consumer ring buffer в CUDA-памяти (через `cudaIpcGetMemHandle`)
+  - Cross-process sync через `cudaIpcEventHandle_t` (NVIDIA Programming Guide §3.2.8)
+  - Handshake protocol по Unix domain socket (`/run/cuframes/<key>.sock`)
+  - Shared metadata в POSIX SHM (`/dev/shm/cuframes-<key>`)
+  - Поддержка `EXTERNAL` ownership (publisher передаёт свои pre-allocated CUDA pointers)
+  - 32 одновременных subscriber'а на publisher (configurable bit-mask)
+- **`cuframes.hpp`** — header-only C++17 RAII обёртка над C API
+- **`cuframes-rtsp-source`** — standalone tool: RTSP → libavformat decode (hwaccel CUDA) → cuframes IPC
+- **FFmpeg input demuxer `cuframes://`** (out-of-tree patch для n7.1):
+  - `--enable-libcuframes` configure option
+  - URL form `cuframes://<key>`, опция `-cuda_device` и `-connect_timeout`
+  - NV12 frame output через `cudaMemcpy2DAsync` GPU→host (v1 path, zero-copy в v2)
+- **CMake install rules** для downstream проектов
+- **Docker runtime image** (`gx/cuframes:0.1`) — publisher + sub_count + libs
+- **Integration guides**:
+  - `docs/integration.md` — общая интеграция с cctv-стeком
+  - `filter/README.md` — FFmpeg patch + smoke test
+- **Reference deployments**:
+  - cctv-processor C++ через `CuframesSource` (см. [gx/cctv](https://git.goldix.org/gx/cctv) PR #19, #20)
+  - Frigate 0.17.1 через patched FFmpeg (Frigate image rebuild — см. `gx/frigate:cuframes`)
+
+### Tested
+
+- Stress: 1 publisher × 4 consumer × 2000 frames @ 120 fps — **0 torn frames, 0 gaps**
+- E2E: реальная камера Dahua 192.168.88.98 (1920×1080 HEVC 25 fps) — 100/100 frames, 0 drops, avg_fps=25.03
+- **Production**: 1 publisher обслуживает одновременно Frigate (detect) + cctv-backend (motion+grid+encode→RTSP→TV) на одном NVDEC, ~24h uptime
+
+### Limitations (документировано — см. docs/integration.md)
+
+- Только NV12 frame format (v1)
+- GPU→CPU copy в FFmpeg demuxer (zero-copy через `AVHWFramesContext` — v0.2)
+- Cross-container CUDA IPC требует `ipc + pid` namespace share. Если consumer
+  использует s6-overlay (как Frigate) — pid не shareable, нужны альтернативы.
+- Только Linux + NVIDIA GPU compute capability ≥ 7.5
+
+## [0.0.1] — 2026-05-14
+
+### Added
+
+- Initial repository, LICENSE (LGPL-2.1+), README.md, CONTRIBUTING.md
 - Design specification (docs/architecture.md)
-- Prior art analysis — подтверждено что ниша свободна
-- Roadmap v0.1 — 6 phases, ~6-8 недель работы
-
-## [0.0.1] - 2026-05-14
-
-### Added
-- Initial repository
-- LICENSE (LGPL-2.1+)
-- README.md, CONTRIBUTING.md
+- Prior art analysis
diff --git a/ROADMAP.md b/ROADMAP.md
new file mode 100644
index 0000000..63f9df7
--- /dev/null
+++ b/ROADMAP.md
@@ -0,0 +1,77 @@
+# Roadmap
+
+cuframes — zero-copy sharing декодированных видеокадров между процессами через
+CUDA IPC. Текущая публичная версия: **v0.1.0** (см. [CHANGELOG.md](CHANGELOG.md)).
+
+## Принцип релизов
+
+Семвер: MAJOR.MINOR.PATCH.
+
+- `0.x.y` — pre-1.0: API может меняться между minor-релизами. Patch только
+  для bugfixes без ABI-breaking.
+- `1.0+` — стабильный ABI в пределах major. Минорные релизы добавляют функции
+  без ломки существующего кода.
+
+Проверять текущий протокол: `cuframes_protocol_version()` (см. C API). Subscriber
+с несовместимым protocol отказывается подключаться (`CUFRAMES_ERR_PROTOCOL`).
+
+## v0.1 — Foundation ✅ (released 2026-05-17)
+
+| Компонент | Статус |
+|---|---|
+| `libcuframes.so` — producer/consumer ring + CUDA IPC handshake | ✅ |
+| C++ RAII wrapper `cuframes.hpp` | ✅ |
+| `cuframes-rtsp-source` standalone publisher (RTSP → NVDEC → IPC) | ✅ |
+| FFmpeg input demuxer `cuframes://` (out-of-tree patch для n7.1) | ✅ |
+| Docker runtime image | ✅ |
+| CMake install rules | ✅ |
+| Integration guide для cctv-processor (C++) | ✅ |
+| Stress test 1×pub × 4×sub × 2000 frames @ 120 fps (0 torn) | ✅ |
+| **Production deployment** на multi-camera CCTV-стeке (Frigate + custom processor) | ✅ |
+
+## v0.2 — Encoded packet sharing 📋 (planned)
+
+Главное расширение: publisher отдаёт не только decoded NV12 frames, но и
+**encoded packets** (H.264/H.265 NAL units) через отдельный shared ring.
+Use case: Frigate `record` role (mux в mp4 без re-encode), AI-pipeline которому
+не нужен decode — фильтруют по metadata + сохраняют encoded clip.
+
+| Feature | Зачем |
+|---|---|
+| `cuframes_publisher_publish_packet()` C API | Publisher шлёт AVPacket-эквивалент в shared ring |
+| `cuframes_subscriber_next_packet()` C API | Consumer читает encoded packets |
+| Variable-length ring buffer для packets | Encoded size variable (≠ fixed NV12) |
+| FFmpeg `cuframes_packets://` demuxer | Тhe complement to existing `cuframes://` |
+| Sub-stream selection (для multi-resolution streams) | Один camera RTSP даёт 2-3 substreams |
+| **Scale-cuda alternative**: software bilinear resize фильтр для FFmpeg сборок без cuda-llvm | Patched ffmpeg на glibc-2.36 платформах (Debian 12 Frigate base) не имеет cuda-llvm → scale_cuda недоступен. Workaround = CPU scale, регресс. Alt: cuframes-side resize в publisher'е (publish pre-scaled frames). |
+| **FFmpeg upstream PR**: submit `cuframesdec.c` в FFmpeg mainline | Снижает trеnding overhead для adoption; патчить FFmpeg не надо будет. |
+
+ETA: 1-2 недели focused работы.
+
+## v0.3 — Bindings & Platforms 📋 (planned)
+
+| Feature | Зачем |
+|---|---|
+| Python bindings (pybind11) | AI/ML-скрипты сейчас вынуждены ctypes-обёртку писать |
+| Jetson (Tegra arm64) support | Edge-deployment — Frigate тоже популярен на Jetson |
+| Multi-GPU producer/consumer | NVIDIA IPC поддерживает только same-GPU; нужен fallback через encoded path |
+| `pkg-config` `.pc` файл | Downstream cmake/meson — drop `--extra-cflags/-ldflags` ad-hoc |
+| Frigate plugin POC (Python side, не FFmpeg) | Альтернативный путь для users которые не хотят патчить FFmpeg |
+| Docker images в public registry | Snapshot CI-built tarballs + multi-arch |
+
+## v1.0 — Stable ABI 📋
+
+- Стабильный wire-protocol (minor versions add fields в reserved space)
+- Multi-GPU официально supported
+- Credentials/config через env / Docker secrets (не в config.json)
+- Comprehensive test suite (unit + integration + soak)
+- FFmpeg upstream merge accomplished
+- 2+ независимых production deployments документированы
+
+## Связанные документы
+
+- [docs/architecture.md](docs/architecture.md) — внутренности cuframes IPC
+- [docs/integration.md](docs/integration.md) — guide для downstream проектов
+- [BENCHMARKS.md](BENCHMARKS.md) — измерения латентности и throughput
+- [CHANGELOG.md](CHANGELOG.md) — release notes
+- [filter/README.md](filter/README.md) — FFmpeg demuxer