docs(streaming): add single-server implementation plan
This commit is contained in:
94
Backend/docs/streaming-on-web-server-plan.md
Normal file
94
Backend/docs/streaming-on-web-server-plan.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# Single-Server Streaming Implementation Plan (Prototype)
|
||||
|
||||
## Scope
|
||||
Build live camera streaming and simultaneous recording on the current web server for low-to-moderate load testing, with explicit non-scale assumptions.
|
||||
|
||||
## Constraints
|
||||
- Keep existing backend as the control API (`/streams/*`, device auth, command lifecycle).
|
||||
- Add media transport and recording in the same deployment for now.
|
||||
- Prefer solutions that can later be split into a dedicated media service.
|
||||
|
||||
## Recommended Stack (Current Server)
|
||||
1. SFU: `mediasoup` (Node.js SFU library).
|
||||
2. TURN/STUN: `coturn` (external process/service, mandatory for NAT traversal reliability).
|
||||
3. Recording worker: `ffmpeg` process consuming RTP from SFU plain transports.
|
||||
4. Signaling: keep existing Socket.IO channel (`webrtc:signal`) or migrate to REST+WS messages while preserving auth.
|
||||
5. Storage: keep MinIO upload path and reuse current recordings finalize flow.
|
||||
|
||||
## Why this stack
|
||||
- `mediasoup` gives server-side fan-out (camera publishes once, multiple subscribers).
|
||||
- `ffmpeg` can write MP4/HLS outputs from server-side RTP.
|
||||
- `coturn` is required for real-world networks where direct ICE paths fail.
|
||||
- This minimizes changes to existing route structure and DB entities.
|
||||
|
||||
## Candidate Library Check
|
||||
- `mediasoup`: mature SFU for Node, suitable for self-hosted media routing.
|
||||
- `@roamhq/wrtc` / `node-webrtc` style bindings: useful for peer/bot use-cases, but not a full SFU architecture by itself.
|
||||
- `werift`: pure TypeScript WebRTC stack; possible for custom flows, but higher implementation risk than mediasoup for production-like behavior.
|
||||
- Managed alternatives (LiveKit/Twilio/Agora/100ms/Mux/Cloudflare): faster and more reliable, but outside strict single-server-in-process scope.
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 0: Environment + Guardrails
|
||||
1. Add env vars:
|
||||
- `TURN_URLS`, `TURN_USERNAME`, `TURN_CREDENTIAL`
|
||||
- `MEDIA_RECORDINGS_DIR`
|
||||
- `MEDIA_MAX_PUBLISHERS`, `MEDIA_MAX_SUBSCRIBERS_PER_ROOM`
|
||||
2. Add explicit README warning that this mode is prototype-only.
|
||||
3. Add metrics baseline (CPU, RAM, event loop lag, outbound bitrate, active sessions).
|
||||
|
||||
### Phase 1: Media Plane Skeleton
|
||||
1. Add `media/sfu/` module:
|
||||
- worker bootstrap
|
||||
- router lifecycle per stream session
|
||||
- transport creation helpers
|
||||
2. Extend `media/types.ts` provider contracts:
|
||||
- publish transport params
|
||||
- subscribe transport params
|
||||
- producer/consumer lifecycle ops
|
||||
3. Add stream session registry in memory + DB mapping (`streamSessionId -> router/producer state`).
|
||||
|
||||
### Phase 2: Publish/Subscribe Handshake
|
||||
1. Camera flow:
|
||||
- request publish transport params
|
||||
- connect DTLS
|
||||
- produce video track
|
||||
2. Client flow:
|
||||
- request subscribe transport params
|
||||
- connect DTLS
|
||||
- consume producer track
|
||||
3. Use existing device auth checks and stream ownership checks.
|
||||
4. Keep `stream:started`/`stream:ended` events for UI state updates.
|
||||
|
||||
### Phase 3: Recording on Server
|
||||
1. On first producer for a stream, start `ffmpeg` recording worker.
|
||||
2. Record strategy:
|
||||
- start with single-track MP4 for simplicity
|
||||
- optionally add HLS segment output later
|
||||
3. On `/streams/:id/end`:
|
||||
- stop recorder
|
||||
- upload result to MinIO
|
||||
- call existing recording finalize path
|
||||
4. Add retry and orphan cleanup worker for interrupted recordings.
|
||||
|
||||
### Phase 4: Reliability + Backpressure
|
||||
1. Remove JPEG `stream:frame` fallback from simulator once SFU path is stable.
|
||||
2. Add connection timeout, ICE restart, and stream health checks.
|
||||
3. Add admission limits per account and global concurrent stream caps.
|
||||
4. Add stale session cleanup and worker crash recovery.
|
||||
|
||||
### Phase 5: Load Test + Exit Criteria
|
||||
1. Target load test:
|
||||
- 1 publisher + N viewers per stream
|
||||
- multiple concurrent streams
|
||||
2. Capture:
|
||||
- startup latency (request -> first frame)
|
||||
- packet loss behavior
|
||||
- server CPU/RAM/network saturation points
|
||||
3. Define threshold to migrate to dedicated media service when limits are hit.
|
||||
|
||||
## Immediate Code Changes (Low-Risk First)
|
||||
1. Add `docs` and env scaffolding for TURN and recording worker.
|
||||
2. Add `media/sfu` interfaces with no routing behavior yet (feature-flagged).
|
||||
3. Implement one end-to-end stream path behind a flag (`MEDIA_MODE=single_server_sfu`).
|
||||
4. Deprecate frame relay fallback after validation.
|
||||
Reference in New Issue
Block a user