docs(streaming): add single-server implementation plan
This commit is contained in:
@@ -147,6 +147,7 @@ Stream realtime events:
|
|||||||
- This backend currently acts as a control plane (commands, session state, credentials, events), not a full media plane/SFU.
|
- This backend currently acts as a control plane (commands, session state, credentials, events), not a full media plane/SFU.
|
||||||
- Running live transport + fan-out + recording on the same web server is possible for small loads but introduces significant CPU, RAM, and network egress pressure under concurrency.
|
- Running live transport + fan-out + recording on the same web server is possible for small loads but introduces significant CPU, RAM, and network egress pressure under concurrency.
|
||||||
- For larger deployments, use a dedicated media plane (managed or self-hosted SFU + recorder) and keep this service focused on auth/session/control APIs.
|
- For larger deployments, use a dedicated media plane (managed or self-hosted SFU + recorder) and keep this service focused on auth/session/control APIs.
|
||||||
|
- For a pragmatic prototype path that keeps media on the current server, see `docs/streaming-on-web-server-plan.md`.
|
||||||
|
|
||||||
### API Docs
|
### API Docs
|
||||||
OpenAPI docs are generated from Zod/OpenAPI definitions:
|
OpenAPI docs are generated from Zod/OpenAPI definitions:
|
||||||
|
|||||||
94
Backend/docs/streaming-on-web-server-plan.md
Normal file
94
Backend/docs/streaming-on-web-server-plan.md
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
# Single-Server Streaming Implementation Plan (Prototype)
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
Build live camera streaming and simultaneous recording on the current web server for low-to-moderate load testing, with explicit non-scale assumptions.
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
- Keep existing backend as the control API (`/streams/*`, device auth, command lifecycle).
|
||||||
|
- Add media transport and recording in the same deployment for now.
|
||||||
|
- Prefer solutions that can later be split into a dedicated media service.
|
||||||
|
|
||||||
|
## Recommended Stack (Current Server)
|
||||||
|
1. SFU: `mediasoup` (Node.js SFU library).
|
||||||
|
2. TURN/STUN: `coturn` (external process/service, mandatory for NAT traversal reliability).
|
||||||
|
3. Recording worker: `ffmpeg` process consuming RTP from SFU plain transports.
|
||||||
|
4. Signaling: keep existing Socket.IO channel (`webrtc:signal`) or migrate to REST+WS messages while preserving auth.
|
||||||
|
5. Storage: keep MinIO upload path and reuse current recordings finalize flow.
|
||||||
|
|
||||||
|
## Why this stack
|
||||||
|
- `mediasoup` gives server-side fan-out (camera publishes once, multiple subscribers).
|
||||||
|
- `ffmpeg` can write MP4/HLS outputs from server-side RTP.
|
||||||
|
- `coturn` is required for real-world networks where direct ICE paths fail.
|
||||||
|
- This minimizes changes to existing route structure and DB entities.
|
||||||
|
|
||||||
|
## Candidate Library Check
|
||||||
|
- `mediasoup`: mature SFU for Node, suitable for self-hosted media routing.
|
||||||
|
- `@roamhq/wrtc` / `node-webrtc` style bindings: useful for peer/bot use-cases, but not a full SFU architecture by itself.
|
||||||
|
- `werift`: pure TypeScript WebRTC stack; possible for custom flows, but higher implementation risk than mediasoup for production-like behavior.
|
||||||
|
- Managed alternatives (LiveKit/Twilio/Agora/100ms/Mux/Cloudflare): faster and more reliable, but outside strict single-server-in-process scope.
|
||||||
|
|
||||||
|
## Implementation Phases
|
||||||
|
|
||||||
|
### Phase 0: Environment + Guardrails
|
||||||
|
1. Add env vars:
|
||||||
|
- `TURN_URLS`, `TURN_USERNAME`, `TURN_CREDENTIAL`
|
||||||
|
- `MEDIA_RECORDINGS_DIR`
|
||||||
|
- `MEDIA_MAX_PUBLISHERS`, `MEDIA_MAX_SUBSCRIBERS_PER_ROOM`
|
||||||
|
2. Add explicit README warning that this mode is prototype-only.
|
||||||
|
3. Add metrics baseline (CPU, RAM, event loop lag, outbound bitrate, active sessions).
|
||||||
|
|
||||||
|
### Phase 1: Media Plane Skeleton
|
||||||
|
1. Add `media/sfu/` module:
|
||||||
|
- worker bootstrap
|
||||||
|
- router lifecycle per stream session
|
||||||
|
- transport creation helpers
|
||||||
|
2. Extend `media/types.ts` provider contracts:
|
||||||
|
- publish transport params
|
||||||
|
- subscribe transport params
|
||||||
|
- producer/consumer lifecycle ops
|
||||||
|
3. Add stream session registry in memory + DB mapping (`streamSessionId -> router/producer state`).
|
||||||
|
|
||||||
|
### Phase 2: Publish/Subscribe Handshake
|
||||||
|
1. Camera flow:
|
||||||
|
- request publish transport params
|
||||||
|
- connect DTLS
|
||||||
|
- produce video track
|
||||||
|
2. Client flow:
|
||||||
|
- request subscribe transport params
|
||||||
|
- connect DTLS
|
||||||
|
- consume producer track
|
||||||
|
3. Use existing device auth checks and stream ownership checks.
|
||||||
|
4. Keep `stream:started`/`stream:ended` events for UI state updates.
|
||||||
|
|
||||||
|
### Phase 3: Recording on Server
|
||||||
|
1. On first producer for a stream, start `ffmpeg` recording worker.
|
||||||
|
2. Record strategy:
|
||||||
|
- start with single-track MP4 for simplicity
|
||||||
|
- optionally add HLS segment output later
|
||||||
|
3. On `/streams/:id/end`:
|
||||||
|
- stop recorder
|
||||||
|
- upload result to MinIO
|
||||||
|
- call existing recording finalize path
|
||||||
|
4. Add retry and orphan cleanup worker for interrupted recordings.
|
||||||
|
|
||||||
|
### Phase 4: Reliability + Backpressure
|
||||||
|
1. Remove JPEG `stream:frame` fallback from simulator once SFU path is stable.
|
||||||
|
2. Add connection timeout, ICE restart, and stream health checks.
|
||||||
|
3. Add admission limits per account and global concurrent stream caps.
|
||||||
|
4. Add stale session cleanup and worker crash recovery.
|
||||||
|
|
||||||
|
### Phase 5: Load Test + Exit Criteria
|
||||||
|
1. Target load test:
|
||||||
|
- 1 publisher + N viewers per stream
|
||||||
|
- multiple concurrent streams
|
||||||
|
2. Capture:
|
||||||
|
- startup latency (request -> first frame)
|
||||||
|
- packet loss behavior
|
||||||
|
- server CPU/RAM/network saturation points
|
||||||
|
3. Define threshold to migrate to dedicated media service when limits are hit.
|
||||||
|
|
||||||
|
## Immediate Code Changes (Low-Risk First)
|
||||||
|
1. Add `docs` and env scaffolding for TURN and recording worker.
|
||||||
|
2. Add `media/sfu` interfaces with no routing behavior yet (feature-flagged).
|
||||||
|
3. Implement one end-to-end stream path behind a flag (`MEDIA_MODE=single_server_sfu`).
|
||||||
|
4. Deprecate frame relay fallback after validation.
|
||||||
Reference in New Issue
Block a user