350 lines
12 KiB
Markdown
350 lines
12 KiB
Markdown
# X Article to Audio
|
|
|
|
Turn X long-form Articles into listenable audio by mentioning a bot account.
|
|
|
|
## Product Scope
|
|
|
|
### Primary user flow
|
|
1. User publishes an Article on X.
|
|
2. User replies to the Article's parent post and mentions `@YourBot`.
|
|
3. Bot detects the mention and finds the parent post.
|
|
4. System checks whether the parent post contains an Article payload.
|
|
5. If not an Article, bot replies that the parent post is not an Article and no credits are charged.
|
|
6. If valid and user has credits, system generates audio and replies with a listen link.
|
|
|
|
### Billing model
|
|
- Credits are charged per article up to `X` characters.
|
|
- Above `X`, charge `+1` credit per `Y` characters.
|
|
- Formula:
|
|
|
|
```text
|
|
credits_needed = base_credits
|
|
+ max(0, ceil((char_count - included_chars) / step_chars)) * step_credits
|
|
```
|
|
|
|
V1 default configuration (recommended):
|
|
- `base_credits = 1`
|
|
- `included_chars = 25_000`
|
|
- `step_chars = 10_000`
|
|
- `step_credits = 1`
|
|
- `max_chars_per_article = 120_000`
|
|
|
|
Rationale:
|
|
1. `25,000` included chars keeps most normal Articles at predictable 1-credit pricing.
|
|
2. `10,000`-char increments avoid overcharging for small overages.
|
|
3. `120,000` hard cap controls abuse and runaway TTS cost.
|
|
|
|
### Ownership and access model
|
|
1. The caller (the authenticated account that mentions the bot) pays for generation.
|
|
2. Generated audio is tied to caller ownership.
|
|
3. Bot posts a public link, but the asset is access-controlled.
|
|
4. Non-owner access rule:
|
|
- If unauthenticated, user is prompted to create/sign in to an account.
|
|
- If authenticated but no access grant, user pays the same credit amount originally charged for that audio.
|
|
- After payment, user receives an access grant for that audio.
|
|
- Access grants are permanent (no expiry) once purchased.
|
|
|
|
## Architecture
|
|
|
|
### High-level components
|
|
1. Web App (`Next.js` + PWA)
|
|
- Auth, wallet UI, history, playback.
|
|
- UI stack: Tailwind CSS + `daisyUI`.
|
|
- Design requirement: mobile-first layouts and interactions by default.
|
|
|
|
2. Backend (`Convex`)
|
|
- Core domain logic.
|
|
- Credit ledger and atomic debit/refund.
|
|
- Job queue/state machine.
|
|
- File metadata and playback authorization.
|
|
|
|
3. X Integration Service
|
|
- Receives mention events (polling first in V1, webhook added in V2).
|
|
- Fetches parent post + article metadata/content.
|
|
- Posts success/failure replies back to X.
|
|
|
|
4. TTS Worker
|
|
- Pulls queued jobs.
|
|
- Calls TTS model (e.g. Qwen3-TTS).
|
|
- Stores audio in object storage.
|
|
|
|
5. Payments (`Polar.sh`)
|
|
- Checkout for credit packs/subscription.
|
|
- Webhook-driven wallet top-ups.
|
|
|
|
### Suggested deployment layout
|
|
1. `frontend`: Vercel (or similar) for Next.js + API routes.
|
|
2. `backend`: Convex deployment for DB/functions.
|
|
3. `worker`: container on Fly.io/Render/Railway for X polling/webhooks + TTS jobs.
|
|
4. `storage`: S3-compatible bucket (audio assets + signed URLs).
|
|
|
|
## Domain Model
|
|
|
|
### Core entities
|
|
1. `users`
|
|
- `id`, `x_user_id`, `username`, `created_at`
|
|
|
|
2. `wallets`
|
|
- `user_id`, `balance_credits`, `updated_at`
|
|
|
|
3. `wallet_transactions` (append-only ledger)
|
|
- `id`, `user_id`, `type` (`credit|debit|refund`), `amount`, `reason`, `idempotency_key`, `created_at`
|
|
|
|
4. `mention_events`
|
|
- `id`, `mention_post_id`, `mention_author_id`, `parent_post_id`, `status`, `error_code`, `created_at`
|
|
|
|
5. `articles`
|
|
- `id`, `x_article_id`, `parent_post_id`, `author_id`, `title`, `char_count`, `content_hash`, `raw_content`, `created_at`
|
|
|
|
6. `audio_jobs`
|
|
- `id`, `user_id`, `mention_event_id`, `article_id`, `status`, `credits_charged`, `tts_provider`, `tts_model`, `error`, `created_at`, `updated_at`
|
|
|
|
7. `audio_assets`
|
|
- `id`, `job_id`, `storage_key`, `duration_sec`, `size_bytes`, `codec`, `public_url_ttl`
|
|
|
|
8. `audio_access_grants`
|
|
- `id`, `audio_asset_id`, `user_id`, `granted_via` (`owner|repurchase|admin`), `credits_paid`, `created_at`
|
|
|
|
9. `payment_events`
|
|
- `id`, `provider`, `provider_event_id`, `status`, `payload_hash`, `created_at`
|
|
|
|
### State machine (`audio_jobs`)
|
|
1. `received`
|
|
2. `validated`
|
|
3. `priced`
|
|
4. `charged`
|
|
5. `synthesizing`
|
|
6. `uploaded`
|
|
7. `completed`
|
|
8. `failed_refunded`
|
|
9. `failed_not_refunded`
|
|
|
|
## Event Flows
|
|
|
|
### A) Mention -> audio flow
|
|
1. X Integration receives mention event.
|
|
2. Deduplicate by `mention_post_id` (idempotency).
|
|
3. Resolve parent post.
|
|
4. Check if parent has `article` field/metadata.
|
|
5. Extract canonical article text and compute `char_count`.
|
|
6. Compute `credits_needed`.
|
|
7. Atomic wallet debit in Convex.
|
|
8. Enqueue TTS job.
|
|
9. Worker synthesizes audio and uploads file.
|
|
10. Mark complete and reply on X with playback URL.
|
|
|
|
### B) Payment -> credit top-up flow
|
|
1. User checks out via Polar.
|
|
2. Polar webhook arrives.
|
|
3. Verify signature and deduplicate by `provider_event_id`.
|
|
4. Append `wallet_transactions` credit event.
|
|
5. Update wallet balance.
|
|
|
|
### C) Failure handling
|
|
1. If TTS fails after debit, create refund transaction.
|
|
2. If webhook duplicated, no duplicate charges due to idempotency key.
|
|
3. If parent post is not an article, no charge and bot replies with reason.
|
|
|
|
### D) Shared link access flow
|
|
1. User opens playback URL.
|
|
2. If owner or already in `audio_access_grants`, allow stream.
|
|
3. If unauthenticated, redirect to auth.
|
|
4. If authenticated without grant, charge `credits_charged` from original job.
|
|
5. On successful debit, create `audio_access_grants` entry and allow stream.
|
|
|
|
## X API Strategy
|
|
|
|
### Mention ingestion
|
|
1. V1 recommendation: Account Activity / Activity webhooks as the primary ingestion path.
|
|
2. Verify webhook signatures and support CRC challenge where required.
|
|
3. Keep polling `/2/users/{id}/mentions` as failover/recovery if webhook delivery degrades.
|
|
|
|
### Parent/article resolution
|
|
1. Read mention's `referenced_tweets` to locate parent post ID.
|
|
2. Fetch parent post with `tweet.fields=article,...`.
|
|
3. If `article` field exists, use article metadata/content path.
|
|
4. If article content is partially available, run resolver fallback (follow canonical URL and extract body).
|
|
|
|
### Required robustness
|
|
1. Replay-safe processing.
|
|
2. Backoff + retry on `429` and `5xx`.
|
|
3. Strict rate-limit budget per minute.
|
|
4. Idempotent `mention_post_id` + `parent_post_id` dedupe keys.
|
|
|
|
## Credit and Pricing Design
|
|
|
|
### Configurable credit policy
|
|
Store in Convex config:
|
|
- `included_chars`
|
|
- `base_credits`
|
|
- `step_chars`
|
|
- `step_credits`
|
|
- `max_chars_per_article`
|
|
|
|
### Anti-abuse controls
|
|
1. Max jobs per user/day.
|
|
2. Max chars/article.
|
|
3. Cooldown per mention author.
|
|
4. Deny-list and spam heuristics.
|
|
|
|
## API Surface (internal)
|
|
|
|
### Web app -> backend
|
|
1. `GET /api/jobs/:id`
|
|
2. `GET /api/me/wallet`
|
|
3. `POST /api/payments/create-checkout`
|
|
4. `POST /api/audio/:id/unlock` (debit credits equal to original generation)
|
|
|
|
### External webhooks
|
|
1. `POST /api/webhooks/x` (mention events + CRC challenge support)
|
|
2. `POST /api/webhooks/polar`
|
|
|
|
### Worker-only actions
|
|
1. `POST /internal/jobs/:id/start`
|
|
2. `POST /internal/jobs/:id/complete`
|
|
3. `POST /internal/jobs/:id/fail`
|
|
|
|
## Security and Compliance
|
|
|
|
1. Verify signatures for X and Polar webhooks.
|
|
2. Store only needed article data; optionally strip raw text after audio generation.
|
|
3. Encrypt secrets and use least-privilege API keys.
|
|
4. Keep an audit trail for wallet and job state changes.
|
|
5. Include takedown/delete endpoint for generated assets.
|
|
6. Enforce signed, short-lived playback URLs backed by access checks.
|
|
|
|
## Retention Policy (recommended)
|
|
|
|
1. Raw article text: retain for `24 hours`, then delete and keep only hash/metadata.
|
|
2. Generated audio files: retain for `90 days` from last play.
|
|
3. Access grants and financial ledger: retain indefinitely for audit.
|
|
4. Soft-delete on user request where legally required; hard-delete binaries on retention expiry.
|
|
5. All retention values are configurable in backend settings.
|
|
|
|
## Observability
|
|
|
|
### Metrics
|
|
1. Mention ingest rate and failures.
|
|
2. Article validation fail rate.
|
|
3. Queue depth + job latency percentiles.
|
|
4. TTS success/failure by model.
|
|
5. Credit debit/refund mismatch.
|
|
|
|
### Alerts
|
|
1. Webhook 5xx spikes.
|
|
2. Consecutive TTS failures.
|
|
3. Wallet transaction imbalance.
|
|
4. Rising `429` from X.
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1 (Week 1-2): Core backend + wallet
|
|
1. Convex schema, wallet ledger, Polar webhook.
|
|
2. Auth and minimal dashboard.
|
|
3. Playback ACL with owner-only access.
|
|
|
|
### Phase 2 (Week 3-4): Mention bot MVP
|
|
1. X webhook ingestion + parent resolution + "not an Article" reply path.
|
|
2. Credit charging + async TTS + audio storage + owner grant creation.
|
|
3. Bot reply with playback URL.
|
|
|
|
### Phase 3 (Week 5-6): Hardening
|
|
1. Polling failover/recovery path (augment webhooks).
|
|
2. Retries, backoff, idempotency audits.
|
|
3. Shared-link repurchase flow, observability, and admin tooling.
|
|
|
|
## Estimated Costs
|
|
|
|
### 1) One-time build effort
|
|
1. MVP (mention bot + credits + payments + owner-only playback): `220-320 engineer hours`.
|
|
2. Hardening + shared-link repurchase + observability: `90-160 engineer hours`.
|
|
|
|
Total: `310-480 engineer hours`.
|
|
|
|
### 2) Ongoing monthly operating costs
|
|
Use this formula:
|
|
|
|
```text
|
|
monthly_cost =
|
|
X_api_cost
|
|
+ TTS_cost_per_char * total_chars
|
|
+ storage_egress_cost
|
|
+ hosting_cost
|
|
+ payment_fees
|
|
```
|
|
|
|
Practical baseline for early stage (excluding X API variability):
|
|
1. TTS: scales linearly with characters (dominant variable cost).
|
|
2. Hosting + backend: low double-digits to low hundreds USD/month.
|
|
3. Storage: usually modest unless retention is long and playback volume is high.
|
|
4. Polar fees: percentage + fixed per successful payment.
|
|
|
|
### 3) Example unit economics (replace with real rates)
|
|
Assumptions:
|
|
1. Average article length: `25,000` chars.
|
|
2. TTS rate placeholder: `$0.12 / 10,000 chars`.
|
|
3. Variable infra/storage per generated audio: `$0.005`.
|
|
4. X API per-job variable cost placeholder: `$x_api_job_cost`.
|
|
|
|
Estimated per completed audio:
|
|
|
|
```text
|
|
tts_cost = 25,000 / 10,000 * 0.12 = $0.30
|
|
infra_cost = $0.005
|
|
total_cost_per_audio ~= $0.305 + x_api_job_cost
|
|
```
|
|
|
|
If one credit pack gives `10` base articles and sells at `$9.99`, target:
|
|
1. Gross margin after payment fees > `60%`.
|
|
2. Credit policy tuned so average credits consumed aligns with this margin.
|
|
|
|
## Resolved Defaults
|
|
|
|
1. Credit policy defaults (`25k included`, `+1 / 10k`, `max 120k`) with backend configurability.
|
|
2. Unlock policy: permanent access grants after payment (no expiry).
|
|
3. Retention defaults: `24h` raw text, `90d` audio from last play, ledger/grants retained for audit.
|
|
4. No region/content restrictions configured initially.
|
|
5. UI framework: Tailwind CSS + `daisyUI`, implemented mobile-first.
|
|
|
|
## Repo status
|
|
|
|
This repository now contains an implemented MVP aligned to this architecture.
|
|
|
|
## Implemented MVP (current repo)
|
|
|
|
This repository now includes a runnable MVP server and tests for core flows.
|
|
|
|
### Stack in this repo
|
|
1. Node.js HTTP server (`src/server.js`).
|
|
2. Domain modules for credits, wallet ledger, article extraction, access grants, and webhook signatures.
|
|
3. Mobile-first server-rendered UI using `daisyUI` stylesheet via CDN.
|
|
4. PWA basics (`/manifest.webmanifest`, `/sw.js`).
|
|
|
|
### Auth model in MVP
|
|
1. API auth is represented by `x-user-id` request header.
|
|
2. This is a development placeholder for future `Login with X OAuth`.
|
|
|
|
### Core endpoints
|
|
1. `POST /api/webhooks/x` -> mention webhook ingestion (HMAC verified).
|
|
2. `POST /api/webhooks/polar` -> credit top-up webhook (HMAC verified).
|
|
3. `GET /api/me/wallet` -> caller wallet balance.
|
|
4. `GET /api/jobs/:id` -> caller job status.
|
|
5. `POST /api/audio/:id/unlock` -> pay same credits and unlock permanent access.
|
|
6. `GET /audio/:id` -> playback access page (owner/grant/auth/payment states).
|
|
7. `GET /` -> mobile-first dashboard.
|
|
|
|
### Local commands
|
|
1. `npm test` -> run full test suite.
|
|
2. `npm run start` -> start server (port from `PORT`, default `3000`).
|
|
|
|
### Important environment notes
|
|
1. Credit policy is configurable via env:
|
|
- `BASE_CREDITS`
|
|
- `INCLUDED_CHARS`
|
|
- `STEP_CHARS`
|
|
- `STEP_CREDITS`
|
|
- `MAX_CHARS_PER_ARTICLE`
|
|
2. Webhook secrets:
|
|
- `X_WEBHOOK_SECRET`
|
|
- `POLAR_WEBHOOK_SECRET`
|