chore: initialize repo with product specification
This commit is contained in:
311
README.md
Normal file
311
README.md
Normal file
@@ -0,0 +1,311 @@
|
||||
# X Article to Audio
|
||||
|
||||
Turn X long-form Articles into listenable audio by mentioning a bot account.
|
||||
|
||||
## Product Scope
|
||||
|
||||
### Primary user flow
|
||||
1. User publishes an Article on X.
|
||||
2. User replies to the Article's parent post and mentions `@YourBot`.
|
||||
3. Bot detects the mention and finds the parent post.
|
||||
4. System checks whether the parent post contains an Article payload.
|
||||
5. If not an Article, bot replies that the parent post is not an Article and no credits are charged.
|
||||
6. If valid and user has credits, system generates audio and replies with a listen link.
|
||||
|
||||
### Billing model
|
||||
- Credits are charged per article up to `X` characters.
|
||||
- Above `X`, charge `+1` credit per `Y` characters.
|
||||
- Formula:
|
||||
|
||||
```text
|
||||
credits_needed = base_credits
|
||||
+ max(0, ceil((char_count - included_chars) / step_chars)) * step_credits
|
||||
```
|
||||
|
||||
V1 default configuration (recommended):
|
||||
- `base_credits = 1`
|
||||
- `included_chars = 25_000`
|
||||
- `step_chars = 10_000`
|
||||
- `step_credits = 1`
|
||||
- `max_chars_per_article = 120_000`
|
||||
|
||||
Rationale:
|
||||
1. `25,000` included chars keeps most normal Articles at predictable 1-credit pricing.
|
||||
2. `10,000`-char increments avoid overcharging for small overages.
|
||||
3. `120,000` hard cap controls abuse and runaway TTS cost.
|
||||
|
||||
### Ownership and access model
|
||||
1. The caller (the authenticated account that mentions the bot) pays for generation.
|
||||
2. Generated audio is tied to caller ownership.
|
||||
3. Bot posts a public link, but the asset is access-controlled.
|
||||
4. Non-owner access rule:
|
||||
- If unauthenticated, user is prompted to create/sign in to an account.
|
||||
- If authenticated but no access grant, user pays the same credit amount originally charged for that audio.
|
||||
- After payment, user receives an access grant for that audio.
|
||||
- Access grants are permanent (no expiry) once purchased.
|
||||
|
||||
## Architecture
|
||||
|
||||
### High-level components
|
||||
1. Web App (`Next.js` + PWA)
|
||||
- Auth, wallet UI, history, playback.
|
||||
- UI stack: Tailwind CSS + `daisyUI`.
|
||||
- Design requirement: mobile-first layouts and interactions by default.
|
||||
|
||||
2. Backend (`Convex`)
|
||||
- Core domain logic.
|
||||
- Credit ledger and atomic debit/refund.
|
||||
- Job queue/state machine.
|
||||
- File metadata and playback authorization.
|
||||
|
||||
3. X Integration Service
|
||||
- Receives mention events (polling first in V1, webhook added in V2).
|
||||
- Fetches parent post + article metadata/content.
|
||||
- Posts success/failure replies back to X.
|
||||
|
||||
4. TTS Worker
|
||||
- Pulls queued jobs.
|
||||
- Calls TTS model (e.g. Qwen3-TTS).
|
||||
- Stores audio in object storage.
|
||||
|
||||
5. Payments (`Polar.sh`)
|
||||
- Checkout for credit packs/subscription.
|
||||
- Webhook-driven wallet top-ups.
|
||||
|
||||
### Suggested deployment layout
|
||||
1. `frontend`: Vercel (or similar) for Next.js + API routes.
|
||||
2. `backend`: Convex deployment for DB/functions.
|
||||
3. `worker`: container on Fly.io/Render/Railway for X polling/webhooks + TTS jobs.
|
||||
4. `storage`: S3-compatible bucket (audio assets + signed URLs).
|
||||
|
||||
## Domain Model
|
||||
|
||||
### Core entities
|
||||
1. `users`
|
||||
- `id`, `x_user_id`, `username`, `created_at`
|
||||
|
||||
2. `wallets`
|
||||
- `user_id`, `balance_credits`, `updated_at`
|
||||
|
||||
3. `wallet_transactions` (append-only ledger)
|
||||
- `id`, `user_id`, `type` (`credit|debit|refund`), `amount`, `reason`, `idempotency_key`, `created_at`
|
||||
|
||||
4. `mention_events`
|
||||
- `id`, `mention_post_id`, `mention_author_id`, `parent_post_id`, `status`, `error_code`, `created_at`
|
||||
|
||||
5. `articles`
|
||||
- `id`, `x_article_id`, `parent_post_id`, `author_id`, `title`, `char_count`, `content_hash`, `raw_content`, `created_at`
|
||||
|
||||
6. `audio_jobs`
|
||||
- `id`, `user_id`, `mention_event_id`, `article_id`, `status`, `credits_charged`, `tts_provider`, `tts_model`, `error`, `created_at`, `updated_at`
|
||||
|
||||
7. `audio_assets`
|
||||
- `id`, `job_id`, `storage_key`, `duration_sec`, `size_bytes`, `codec`, `public_url_ttl`
|
||||
|
||||
8. `audio_access_grants`
|
||||
- `id`, `audio_asset_id`, `user_id`, `granted_via` (`owner|repurchase|admin`), `credits_paid`, `created_at`
|
||||
|
||||
9. `payment_events`
|
||||
- `id`, `provider`, `provider_event_id`, `status`, `payload_hash`, `created_at`
|
||||
|
||||
### State machine (`audio_jobs`)
|
||||
1. `received`
|
||||
2. `validated`
|
||||
3. `priced`
|
||||
4. `charged`
|
||||
5. `synthesizing`
|
||||
6. `uploaded`
|
||||
7. `completed`
|
||||
8. `failed_refunded`
|
||||
9. `failed_not_refunded`
|
||||
|
||||
## Event Flows
|
||||
|
||||
### A) Mention -> audio flow
|
||||
1. X Integration receives mention event.
|
||||
2. Deduplicate by `mention_post_id` (idempotency).
|
||||
3. Resolve parent post.
|
||||
4. Check if parent has `article` field/metadata.
|
||||
5. Extract canonical article text and compute `char_count`.
|
||||
6. Compute `credits_needed`.
|
||||
7. Atomic wallet debit in Convex.
|
||||
8. Enqueue TTS job.
|
||||
9. Worker synthesizes audio and uploads file.
|
||||
10. Mark complete and reply on X with playback URL.
|
||||
|
||||
### B) Payment -> credit top-up flow
|
||||
1. User checks out via Polar.
|
||||
2. Polar webhook arrives.
|
||||
3. Verify signature and deduplicate by `provider_event_id`.
|
||||
4. Append `wallet_transactions` credit event.
|
||||
5. Update wallet balance.
|
||||
|
||||
### C) Failure handling
|
||||
1. If TTS fails after debit, create refund transaction.
|
||||
2. If webhook duplicated, no duplicate charges due to idempotency key.
|
||||
3. If parent post is not an article, no charge and bot replies with reason.
|
||||
|
||||
### D) Shared link access flow
|
||||
1. User opens playback URL.
|
||||
2. If owner or already in `audio_access_grants`, allow stream.
|
||||
3. If unauthenticated, redirect to auth.
|
||||
4. If authenticated without grant, charge `credits_charged` from original job.
|
||||
5. On successful debit, create `audio_access_grants` entry and allow stream.
|
||||
|
||||
## X API Strategy
|
||||
|
||||
### Mention ingestion
|
||||
1. V1 recommendation: Account Activity / Activity webhooks as the primary ingestion path.
|
||||
2. Verify webhook signatures and support CRC challenge where required.
|
||||
3. Keep polling `/2/users/{id}/mentions` as failover/recovery if webhook delivery degrades.
|
||||
|
||||
### Parent/article resolution
|
||||
1. Read mention's `referenced_tweets` to locate parent post ID.
|
||||
2. Fetch parent post with `tweet.fields=article,...`.
|
||||
3. If `article` field exists, use article metadata/content path.
|
||||
4. If article content is partially available, run resolver fallback (follow canonical URL and extract body).
|
||||
|
||||
### Required robustness
|
||||
1. Replay-safe processing.
|
||||
2. Backoff + retry on `429` and `5xx`.
|
||||
3. Strict rate-limit budget per minute.
|
||||
4. Idempotent `mention_post_id` + `parent_post_id` dedupe keys.
|
||||
|
||||
## Credit and Pricing Design
|
||||
|
||||
### Configurable credit policy
|
||||
Store in Convex config:
|
||||
- `included_chars`
|
||||
- `base_credits`
|
||||
- `step_chars`
|
||||
- `step_credits`
|
||||
- `max_chars_per_article`
|
||||
|
||||
### Anti-abuse controls
|
||||
1. Max jobs per user/day.
|
||||
2. Max chars/article.
|
||||
3. Cooldown per mention author.
|
||||
4. Deny-list and spam heuristics.
|
||||
|
||||
## API Surface (internal)
|
||||
|
||||
### Web app -> backend
|
||||
1. `GET /api/jobs/:id`
|
||||
2. `GET /api/me/wallet`
|
||||
3. `POST /api/payments/create-checkout`
|
||||
4. `POST /api/audio/:id/unlock` (debit credits equal to original generation)
|
||||
|
||||
### External webhooks
|
||||
1. `POST /api/webhooks/x` (mention events + CRC challenge support)
|
||||
2. `POST /api/webhooks/polar`
|
||||
|
||||
### Worker-only actions
|
||||
1. `POST /internal/jobs/:id/start`
|
||||
2. `POST /internal/jobs/:id/complete`
|
||||
3. `POST /internal/jobs/:id/fail`
|
||||
|
||||
## Security and Compliance
|
||||
|
||||
1. Verify signatures for X and Polar webhooks.
|
||||
2. Store only needed article data; optionally strip raw text after audio generation.
|
||||
3. Encrypt secrets and use least-privilege API keys.
|
||||
4. Keep an audit trail for wallet and job state changes.
|
||||
5. Include takedown/delete endpoint for generated assets.
|
||||
6. Enforce signed, short-lived playback URLs backed by access checks.
|
||||
|
||||
## Retention Policy (recommended)
|
||||
|
||||
1. Raw article text: retain for `24 hours`, then delete and keep only hash/metadata.
|
||||
2. Generated audio files: retain for `90 days` from last play.
|
||||
3. Access grants and financial ledger: retain indefinitely for audit.
|
||||
4. Soft-delete on user request where legally required; hard-delete binaries on retention expiry.
|
||||
5. All retention values are configurable in backend settings.
|
||||
|
||||
## Observability
|
||||
|
||||
### Metrics
|
||||
1. Mention ingest rate and failures.
|
||||
2. Article validation fail rate.
|
||||
3. Queue depth + job latency percentiles.
|
||||
4. TTS success/failure by model.
|
||||
5. Credit debit/refund mismatch.
|
||||
|
||||
### Alerts
|
||||
1. Webhook 5xx spikes.
|
||||
2. Consecutive TTS failures.
|
||||
3. Wallet transaction imbalance.
|
||||
4. Rising `429` from X.
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1 (Week 1-2): Core backend + wallet
|
||||
1. Convex schema, wallet ledger, Polar webhook.
|
||||
2. Auth and minimal dashboard.
|
||||
3. Playback ACL with owner-only access.
|
||||
|
||||
### Phase 2 (Week 3-4): Mention bot MVP
|
||||
1. X webhook ingestion + parent resolution + "not an Article" reply path.
|
||||
2. Credit charging + async TTS + audio storage + owner grant creation.
|
||||
3. Bot reply with playback URL.
|
||||
|
||||
### Phase 3 (Week 5-6): Hardening
|
||||
1. Polling failover/recovery path (augment webhooks).
|
||||
2. Retries, backoff, idempotency audits.
|
||||
3. Shared-link repurchase flow, observability, and admin tooling.
|
||||
|
||||
## Estimated Costs
|
||||
|
||||
### 1) One-time build effort
|
||||
1. MVP (mention bot + credits + payments + owner-only playback): `220-320 engineer hours`.
|
||||
2. Hardening + shared-link repurchase + observability: `90-160 engineer hours`.
|
||||
|
||||
Total: `310-480 engineer hours`.
|
||||
|
||||
### 2) Ongoing monthly operating costs
|
||||
Use this formula:
|
||||
|
||||
```text
|
||||
monthly_cost =
|
||||
X_api_cost
|
||||
+ TTS_cost_per_char * total_chars
|
||||
+ storage_egress_cost
|
||||
+ hosting_cost
|
||||
+ payment_fees
|
||||
```
|
||||
|
||||
Practical baseline for early stage (excluding X API variability):
|
||||
1. TTS: scales linearly with characters (dominant variable cost).
|
||||
2. Hosting + backend: low double-digits to low hundreds USD/month.
|
||||
3. Storage: usually modest unless retention is long and playback volume is high.
|
||||
4. Polar fees: percentage + fixed per successful payment.
|
||||
|
||||
### 3) Example unit economics (replace with real rates)
|
||||
Assumptions:
|
||||
1. Average article length: `25,000` chars.
|
||||
2. TTS rate placeholder: `$0.12 / 10,000 chars`.
|
||||
3. Variable infra/storage per generated audio: `$0.005`.
|
||||
4. X API per-job variable cost placeholder: `$x_api_job_cost`.
|
||||
|
||||
Estimated per completed audio:
|
||||
|
||||
```text
|
||||
tts_cost = 25,000 / 10,000 * 0.12 = $0.30
|
||||
infra_cost = $0.005
|
||||
total_cost_per_audio ~= $0.305 + x_api_job_cost
|
||||
```
|
||||
|
||||
If one credit pack gives `10` base articles and sells at `$9.99`, target:
|
||||
1. Gross margin after payment fees > `60%`.
|
||||
2. Credit policy tuned so average credits consumed aligns with this margin.
|
||||
|
||||
## Resolved Defaults
|
||||
|
||||
1. Credit policy defaults (`25k included`, `+1 / 10k`, `max 120k`) with backend configurability.
|
||||
2. Unlock policy: permanent access grants after payment (no expiry).
|
||||
3. Retention defaults: `24h` raw text, `90d` audio from last play, ledger/grants retained for audit.
|
||||
4. No region/content restrictions configured initially.
|
||||
5. UI framework: Tailwind CSS + `daisyUI`, implemented mobile-first.
|
||||
|
||||
## Repo status
|
||||
|
||||
This repository currently contains only `spec.md`. The README defines a concrete architecture and delivery plan you can now implement.
|
||||
Reference in New Issue
Block a user