1) System Context
Core Role
Acts primarily as a control plane for auth, command routing, stream state, credential issuance, and recording metadata. It is not yet a full production media plane.
Transport Split
HTTP handles CRUD/state endpoints; Socket.IO handles realtime command delivery, acknowledgements, and WebRTC signaling relay.
Persistence Split
Postgres stores state + metadata. MinIO stores binary objects. Routes often coordinate both.
2) Startup Sequence
If MinIO initialization fails, process exits with code 1 by design (index.ts).
3) HTTP Request Pipeline (Express)
4) Authentication and Identity Model
A) Session auth (requireAuth)
- Used by user-facing REST routes like /videos, /devices/register, /device-links.
- Reads Better Auth session from request headers/cookies via auth.api.getSession().
- Attaches session object to req.auth.
- Backed by Better Auth tables: users, account, session, verification.
B) Device auth (requireDeviceAuth)
- Used by device-to-backend routes and Socket.IO auth.
- Bearer token format: base64url(payload).hmac.
- Payload fields: userId, deviceId, role, exp.
- Signed with HMAC-SHA256 using BETTER_AUTH_SECRET.
- Token role is verified against device role in realtime handshake.
This dual model separates user session identity from per-device identity and permissions.
5) Realtime Gateway (Socket.IO)
Connection model
- Devices authenticate with token in handshake.auth.token or Authorization header.
- Each device joins room device:{deviceId}.
- Presence updates devices.status + lastSeenAt.
- Disconnect applies a 500ms delay to reduce status flapping on fast reconnect.
Gateway responsibilities
- command:received delivery to target room.
- command:ack validation + DB update + source notification.
- webrtc:signal relay with stream-participant validation.
- stream:requested, stream:started, and stream:ended lifecycle fan-out.
- Legacy command retries remain only for non-stream commands while SIMPLE_STREAMING is enabled.
Command dispatch and ack sequence
6) Stream Lifecycle and Media Control
State machine (stream session)
On-demand stream end-to-end sequence
Media provider abstraction
- Current provider: mock (media/providers/mock.ts).
- Creates deterministic mock media session IDs.
- Issues signed publish/subscribe tokens with TTL.
- Uses BETTER_AUTH_SECRET for HMAC signing.
SFU mode status
- MEDIA_MODE=single_server_sfu enables SFU endpoints.
- Current implementation is a noop scaffold with in-memory session registry + synthetic transport IDs.
- No full server-side RTP forwarding pipeline implemented yet.
7) Data Model (Core Tables and Relationships)
Note: notifications table exists for event notification tracking; push delivery queue is modeled separately by push_notifications.
8) Route Surface and Responsibilities
| Area | Auth | Main tables/resources | Side effects |
|---|---|---|---|
| /api/auth/* | Better Auth | users, account, session, verification | session cookie lifecycle |
| /devices | session + device token for heartbeat | devices, device_links | auto-link opposite-role devices on register; stale-status projection on list |
| /device-links | session | device_links, devices | enforces camera/client role pairing |
| /commands | session + device token ack fallback | device_commands, devices, device_links | dispatch to Socket.IO, ack/reject propagation |
| /events | device token (start/end), session (list) | events, device_links, notifications | realtime motion fanout, push fallback, audit log |
| /streams | device token | stream_sessions, device_commands, recordings | stream command dispatch, media credentials, stream realtime events, push fallback, optional SFU calls |
| /recordings | device token | recordings | storage object validation, presigned download URL, audit log |
| /videos | session | videos, devices, MinIO | presigned PUT/GET generation, object listing/deletion |
| /push-notifications | device token | push_notifications | manual worker dispatch trigger |
| /audit | device token | audit_logs | none (read only) |
| /ops | none | DB, MinIO, in-memory metrics, SFU service | readiness checks, metric export |
| /admin | HTTP Basic auth | MinIO | embedded admin UI + object operations |
Detailed endpoint groups
Devices and Links
- POST /devices/register: creates device, sets initial online status, auto-creates links with existing opposite-role devices, returns device token.
- GET /devices: lists user devices with computed effective presence status using DEVICE_ONLINE_STALE_SECONDS.
- PATCH /devices/:id: updates mutable metadata and role.
- POST /devices/:id/heartbeat: token-authenticated presence update for exact device token/device match.
- /device-links: ensures one active camera-client pair and ownership checks.
Commands, Events, Streams
- POST /commands: only client -> camera, only for active links.
- POST /events/motion/start: camera-only; sends realtime to linked clients, queues push if offline.
- POST /streams/request: creates stream session + start_stream command + realtime notification.
- POST /streams/:id/accept: camera transitions stream to streaming; creates media session and optional SFU bootstrap.
- GET /streams/:id/publish-credentials: camera-only credential issuance.
- GET /streams/:id/subscribe-credentials: participant credential issuance.
- POST /streams/:id/end: closes session, ends SFU (if enabled), creates recording placeholder, notifies both parties.
Storage and Recordings
- POST /videos/upload-url: session route to mint presigned PUT + metadata row.
- POST /recordings/:id/finalize: camera marks recording ready once object exists, or creates simulator placeholder if object key starts with sim/.
- GET /recordings/:id/download-url: requester/camera only, ready-only, verifies object exists before presigning.
9) Workers and Reliability Mechanisms
Command retry loop
Inside realtime gateway. Scans device_commands where status is sent and stale by >10s. Re-dispatches every 5s. Fails after 3 retries.
Push worker
Interval (default 10s) dispatches queued notifications with nextAttemptAt <= now. Missing push token triggers retry backoff; max attempts configurable.
Recording worker
Interval (default 30s) marks stale awaiting_upload recordings as failed after timeout window (default 30 min).
Workers perform startup guards using hasRequiredTables() so they do not run before migrations are applied.
10) Security Controls and Guardrails
Implemented
- Helmet CSP with explicit script/style/font/connect/media/image directives.
- CORS tied to BETTER_AUTH_TRUSTED_ORIGINS (or permissive fallback).
- Rate limiting globally and on high-traffic route groups.
- Ownership checks on almost all queries (user-scoped data access).
- Role constraints (for example client->camera command direction).
- Token integrity via timing-safe HMAC verification.
Important caveats
- Rate limits are in-memory; not shared across replicas.
- Metrics are in-memory counters only (no persistence/export protocol).
- Mock push provider treats presence of push token as delivery success.
- Mock media provider + SFU scaffold are not production media infrastructure.
11) Configuration Map (Key Env Variables)
| Domain | Variables | Architectural effect |
|---|---|---|
| Core server | PORT, DATABASE_URL | listener + DB connectivity |
| Auth | BETTER_AUTH_SECRET, BETTER_AUTH_BASE_URL, BETTER_AUTH_TRUSTED_ORIGINS | session signing, base URL, trusted origins, device token signing |
| Presence | DEVICE_ONLINE_STALE_SECONDS | effective online/offline projection in device listings |
| Storage | MINIO_ENDPOINT, MINIO_PORT, MINIO_USE_SSL, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, MINIO_BUCKET, MINIO_PRESIGNED_EXPIRY_SECONDS | object I/O, presign TTL, startup bucket bootstrap |
| Media | MEDIA_MODE, MEDIA_PROVIDER, TURN_URLS, TURN_USERNAME, TURN_CREDENTIAL | control plane mode and transport descriptor generation |
| Workers | PUSH_WORKER_INTERVAL_MS, PUSH_MAX_ATTEMPTS, RECORDING_WORKER_INTERVAL_MS, RECORDING_STALE_SECONDS | queue throughput, retry/failure timing |
| Admin | ADMIN_USERNAME, ADMIN_PASSWORD | required to mount admin dashboard route logic |
12) Code Ownership Map (Where to Modify What)
Server composition
index.ts: middleware stack, route mounting, startup ordering, workers, realtime setup.
Identity
auth.ts, middleware/auth.ts, middleware/device-auth.ts, utils/device-token.ts.
Persistence schema
db/schema.ts + drizzle/* migrations.
Realtime + command delivery
realtime/gateway.ts and routes/commands.ts.
Streaming control
routes/streams.ts, media/*, routes/recordings.ts.
Operational views
routes/ops.ts, observability/metrics.ts, routes/admin.ts.
13) Current Constraints and Scaling Boundaries
State locality
Presence, rate-limit counters, metrics counters, and SFU registry are process-local. Horizontal scaling requires external shared state.
Media realism
Media provider is mock; SFU service is scaffold/noop. Production deployment needs real media infrastructure for reliability and scale.
Queue semantics
Push and command retries are interval-based polling workers. Throughput, ordering guarantees, and dead-letter handling are minimal.
For load-bearing evolution, the natural next architecture step is extracting shared state (Redis/queue), production media plane, and distributed rate/metrics telemetry.