docs(streaming): add single-server implementation plan

This commit is contained in:
2026-02-07 14:50:00 +00:00
parent b9920497b1
commit 63e7700340
2 changed files with 95 additions and 0 deletions

View File

@@ -147,6 +147,7 @@ Stream realtime events:
- This backend currently acts as a control plane (commands, session state, credentials, events), not a full media plane/SFU. - This backend currently acts as a control plane (commands, session state, credentials, events), not a full media plane/SFU.
- Running live transport + fan-out + recording on the same web server is possible for small loads but introduces significant CPU, RAM, and network egress pressure under concurrency. - Running live transport + fan-out + recording on the same web server is possible for small loads but introduces significant CPU, RAM, and network egress pressure under concurrency.
- For larger deployments, use a dedicated media plane (managed or self-hosted SFU + recorder) and keep this service focused on auth/session/control APIs. - For larger deployments, use a dedicated media plane (managed or self-hosted SFU + recorder) and keep this service focused on auth/session/control APIs.
- For a pragmatic prototype path that keeps media on the current server, see `docs/streaming-on-web-server-plan.md`.
### API Docs ### API Docs
OpenAPI docs are generated from Zod/OpenAPI definitions: OpenAPI docs are generated from Zod/OpenAPI definitions:

View File

@@ -0,0 +1,94 @@
# Single-Server Streaming Implementation Plan (Prototype)
## Scope
Build live camera streaming and simultaneous recording on the current web server for low-to-moderate load testing, with explicit non-scale assumptions.
## Constraints
- Keep existing backend as the control API (`/streams/*`, device auth, command lifecycle).
- Add media transport and recording in the same deployment for now.
- Prefer solutions that can later be split into a dedicated media service.
## Recommended Stack (Current Server)
1. SFU: `mediasoup` (Node.js SFU library).
2. TURN/STUN: `coturn` (external process/service, mandatory for NAT traversal reliability).
3. Recording worker: `ffmpeg` process consuming RTP from SFU plain transports.
4. Signaling: keep existing Socket.IO channel (`webrtc:signal`) or migrate to REST+WS messages while preserving auth.
5. Storage: keep MinIO upload path and reuse current recordings finalize flow.
## Why this stack
- `mediasoup` gives server-side fan-out (camera publishes once, multiple subscribers).
- `ffmpeg` can write MP4/HLS outputs from server-side RTP.
- `coturn` is required for real-world networks where direct ICE paths fail.
- This minimizes changes to existing route structure and DB entities.
## Candidate Library Check
- `mediasoup`: mature SFU for Node, suitable for self-hosted media routing.
- `@roamhq/wrtc` / `node-webrtc` style bindings: useful for peer/bot use-cases, but not a full SFU architecture by itself.
- `werift`: pure TypeScript WebRTC stack; possible for custom flows, but higher implementation risk than mediasoup for production-like behavior.
- Managed alternatives (LiveKit/Twilio/Agora/100ms/Mux/Cloudflare): faster and more reliable, but outside strict single-server-in-process scope.
## Implementation Phases
### Phase 0: Environment + Guardrails
1. Add env vars:
- `TURN_URLS`, `TURN_USERNAME`, `TURN_CREDENTIAL`
- `MEDIA_RECORDINGS_DIR`
- `MEDIA_MAX_PUBLISHERS`, `MEDIA_MAX_SUBSCRIBERS_PER_ROOM`
2. Add explicit README warning that this mode is prototype-only.
3. Add metrics baseline (CPU, RAM, event loop lag, outbound bitrate, active sessions).
### Phase 1: Media Plane Skeleton
1. Add `media/sfu/` module:
- worker bootstrap
- router lifecycle per stream session
- transport creation helpers
2. Extend `media/types.ts` provider contracts:
- publish transport params
- subscribe transport params
- producer/consumer lifecycle ops
3. Add stream session registry in memory + DB mapping (`streamSessionId -> router/producer state`).
### Phase 2: Publish/Subscribe Handshake
1. Camera flow:
- request publish transport params
- connect DTLS
- produce video track
2. Client flow:
- request subscribe transport params
- connect DTLS
- consume producer track
3. Use existing device auth checks and stream ownership checks.
4. Keep `stream:started`/`stream:ended` events for UI state updates.
### Phase 3: Recording on Server
1. On first producer for a stream, start `ffmpeg` recording worker.
2. Record strategy:
- start with single-track MP4 for simplicity
- optionally add HLS segment output later
3. On `/streams/:id/end`:
- stop recorder
- upload result to MinIO
- call existing recording finalize path
4. Add retry and orphan cleanup worker for interrupted recordings.
### Phase 4: Reliability + Backpressure
1. Remove JPEG `stream:frame` fallback from simulator once SFU path is stable.
2. Add connection timeout, ICE restart, and stream health checks.
3. Add admission limits per account and global concurrent stream caps.
4. Add stale session cleanup and worker crash recovery.
### Phase 5: Load Test + Exit Criteria
1. Target load test:
- 1 publisher + N viewers per stream
- multiple concurrent streams
2. Capture:
- startup latency (request -> first frame)
- packet loss behavior
- server CPU/RAM/network saturation points
3. Define threshold to migrate to dedicated media service when limits are hit.
## Immediate Code Changes (Low-Risk First)
1. Add `docs` and env scaffolding for TURN and recording worker.
2. Add `media/sfu` interfaces with no routing behavior yet (feature-flagged).
3. Implement one end-to-end stream path behind a flag (`MEDIA_MODE=single_server_sfu`).
4. Deprecate frame relay fallback after validation.