chore: initialize repo with product specification

2026-02-18 12:23:00 +00:00
commit 5ebaf3ad5c
2 changed files with 321 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,311 @@
+# X Article to Audio
+
+Turn X long-form Articles into listenable audio by mentioning a bot account.
+
+## Product Scope
+
+### Primary user flow
+1. User publishes an Article on X.
+2. User replies to the Article's parent post and mentions `@YourBot`.
+3. Bot detects the mention and finds the parent post.
+4. System checks whether the parent post contains an Article payload.
+5. If not an Article, bot replies that the parent post is not an Article and no credits are charged.
+6. If valid and user has credits, system generates audio and replies with a listen link.
+
+### Billing model
+- Credits are charged per article up to `X` characters.
+- Above `X`, charge `+1` credit per `Y` characters.
+- Formula:
+
+```text
+credits_needed = base_credits
+  + max(0, ceil((char_count - included_chars) / step_chars)) * step_credits
+```
+
+V1 default configuration (recommended):
+- `base_credits = 1`
+- `included_chars = 25_000`
+- `step_chars = 10_000`
+- `step_credits = 1`
+- `max_chars_per_article = 120_000`
+
+Rationale:
+1. `25,000` included chars keeps most normal Articles at predictable 1-credit pricing.
+2. `10,000`-char increments avoid overcharging for small overages.
+3. `120,000` hard cap controls abuse and runaway TTS cost.
+
+### Ownership and access model
+1. The caller (the authenticated account that mentions the bot) pays for generation.
+2. Generated audio is tied to caller ownership.
+3. Bot posts a public link, but the asset is access-controlled.
+4. Non-owner access rule:
+- If unauthenticated, user is prompted to create/sign in to an account.
+- If authenticated but no access grant, user pays the same credit amount originally charged for that audio.
+- After payment, user receives an access grant for that audio.
+- Access grants are permanent (no expiry) once purchased.
+
+## Architecture
+
+### High-level components
+1. Web App (`Next.js` + PWA)
+- Auth, wallet UI, history, playback.
+- UI stack: Tailwind CSS + `daisyUI`.
+- Design requirement: mobile-first layouts and interactions by default.
+
+2. Backend (`Convex`)
+- Core domain logic.
+- Credit ledger and atomic debit/refund.
+- Job queue/state machine.
+- File metadata and playback authorization.
+
+3. X Integration Service
+- Receives mention events (polling first in V1, webhook added in V2).
+- Fetches parent post + article metadata/content.
+- Posts success/failure replies back to X.
+
+4. TTS Worker
+- Pulls queued jobs.
+- Calls TTS model (e.g. Qwen3-TTS).
+- Stores audio in object storage.
+
+5. Payments (`Polar.sh`)
+- Checkout for credit packs/subscription.
+- Webhook-driven wallet top-ups.
+
+### Suggested deployment layout
+1. `frontend`: Vercel (or similar) for Next.js + API routes.
+2. `backend`: Convex deployment for DB/functions.
+3. `worker`: container on Fly.io/Render/Railway for X polling/webhooks + TTS jobs.
+4. `storage`: S3-compatible bucket (audio assets + signed URLs).
+
+## Domain Model
+
+### Core entities
+1. `users`
+- `id`, `x_user_id`, `username`, `created_at`
+
+2. `wallets`
+- `user_id`, `balance_credits`, `updated_at`
+
+3. `wallet_transactions` (append-only ledger)
+- `id`, `user_id`, `type` (`credit|debit|refund`), `amount`, `reason`, `idempotency_key`, `created_at`
+
+4. `mention_events`
+- `id`, `mention_post_id`, `mention_author_id`, `parent_post_id`, `status`, `error_code`, `created_at`
+
+5. `articles`
+- `id`, `x_article_id`, `parent_post_id`, `author_id`, `title`, `char_count`, `content_hash`, `raw_content`, `created_at`
+
+6. `audio_jobs`
+- `id`, `user_id`, `mention_event_id`, `article_id`, `status`, `credits_charged`, `tts_provider`, `tts_model`, `error`, `created_at`, `updated_at`
+
+7. `audio_assets`
+- `id`, `job_id`, `storage_key`, `duration_sec`, `size_bytes`, `codec`, `public_url_ttl`
+
+8. `audio_access_grants`
+- `id`, `audio_asset_id`, `user_id`, `granted_via` (`owner|repurchase|admin`), `credits_paid`, `created_at`
+
+9. `payment_events`
+- `id`, `provider`, `provider_event_id`, `status`, `payload_hash`, `created_at`
+
+### State machine (`audio_jobs`)
+1. `received`
+2. `validated`
+3. `priced`
+4. `charged`
+5. `synthesizing`
+6. `uploaded`
+7. `completed`
+8. `failed_refunded`
+9. `failed_not_refunded`
+
+## Event Flows
+
+### A) Mention -> audio flow
+1. X Integration receives mention event.
+2. Deduplicate by `mention_post_id` (idempotency).
+3. Resolve parent post.
+4. Check if parent has `article` field/metadata.
+5. Extract canonical article text and compute `char_count`.
+6. Compute `credits_needed`.
+7. Atomic wallet debit in Convex.
+8. Enqueue TTS job.
+9. Worker synthesizes audio and uploads file.
+10. Mark complete and reply on X with playback URL.
+
+### B) Payment -> credit top-up flow
+1. User checks out via Polar.
+2. Polar webhook arrives.
+3. Verify signature and deduplicate by `provider_event_id`.
+4. Append `wallet_transactions` credit event.
+5. Update wallet balance.
+
+### C) Failure handling
+1. If TTS fails after debit, create refund transaction.
+2. If webhook duplicated, no duplicate charges due to idempotency key.
+3. If parent post is not an article, no charge and bot replies with reason.
+
+### D) Shared link access flow
+1. User opens playback URL.
+2. If owner or already in `audio_access_grants`, allow stream.
+3. If unauthenticated, redirect to auth.
+4. If authenticated without grant, charge `credits_charged` from original job.
+5. On successful debit, create `audio_access_grants` entry and allow stream.
+
+## X API Strategy
+
+### Mention ingestion
+1. V1 recommendation: Account Activity / Activity webhooks as the primary ingestion path.
+2. Verify webhook signatures and support CRC challenge where required.
+3. Keep polling `/2/users/{id}/mentions` as failover/recovery if webhook delivery degrades.
+
+### Parent/article resolution
+1. Read mention's `referenced_tweets` to locate parent post ID.
+2. Fetch parent post with `tweet.fields=article,...`.
+3. If `article` field exists, use article metadata/content path.
+4. If article content is partially available, run resolver fallback (follow canonical URL and extract body).
+
+### Required robustness
+1. Replay-safe processing.
+2. Backoff + retry on `429` and `5xx`.
+3. Strict rate-limit budget per minute.
+4. Idempotent `mention_post_id` + `parent_post_id` dedupe keys.
+
+## Credit and Pricing Design
+
+### Configurable credit policy
+Store in Convex config:
+- `included_chars`
+- `base_credits`
+- `step_chars`
+- `step_credits`
+- `max_chars_per_article`
+
+### Anti-abuse controls
+1. Max jobs per user/day.
+2. Max chars/article.
+3. Cooldown per mention author.
+4. Deny-list and spam heuristics.
+
+## API Surface (internal)
+
+### Web app -> backend
+1. `GET /api/jobs/:id`
+2. `GET /api/me/wallet`
+3. `POST /api/payments/create-checkout`
+4. `POST /api/audio/:id/unlock` (debit credits equal to original generation)
+
+### External webhooks
+1. `POST /api/webhooks/x` (mention events + CRC challenge support)
+2. `POST /api/webhooks/polar`
+
+### Worker-only actions
+1. `POST /internal/jobs/:id/start`
+2. `POST /internal/jobs/:id/complete`
+3. `POST /internal/jobs/:id/fail`
+
+## Security and Compliance
+
+1. Verify signatures for X and Polar webhooks.
+2. Store only needed article data; optionally strip raw text after audio generation.
+3. Encrypt secrets and use least-privilege API keys.
+4. Keep an audit trail for wallet and job state changes.
+5. Include takedown/delete endpoint for generated assets.
+6. Enforce signed, short-lived playback URLs backed by access checks.
+
+## Retention Policy (recommended)
+
+1. Raw article text: retain for `24 hours`, then delete and keep only hash/metadata.
+2. Generated audio files: retain for `90 days` from last play.
+3. Access grants and financial ledger: retain indefinitely for audit.
+4. Soft-delete on user request where legally required; hard-delete binaries on retention expiry.
+5. All retention values are configurable in backend settings.
+
+## Observability
+
+### Metrics
+1. Mention ingest rate and failures.
+2. Article validation fail rate.
+3. Queue depth + job latency percentiles.
+4. TTS success/failure by model.
+5. Credit debit/refund mismatch.
+
+### Alerts
+1. Webhook 5xx spikes.
+2. Consecutive TTS failures.
+3. Wallet transaction imbalance.
+4. Rising `429` from X.
+
+## Implementation Plan
+
+### Phase 1 (Week 1-2): Core backend + wallet
+1. Convex schema, wallet ledger, Polar webhook.
+2. Auth and minimal dashboard.
+3. Playback ACL with owner-only access.
+
+### Phase 2 (Week 3-4): Mention bot MVP
+1. X webhook ingestion + parent resolution + "not an Article" reply path.
+2. Credit charging + async TTS + audio storage + owner grant creation.
+3. Bot reply with playback URL.
+
+### Phase 3 (Week 5-6): Hardening
+1. Polling failover/recovery path (augment webhooks).
+2. Retries, backoff, idempotency audits.
+3. Shared-link repurchase flow, observability, and admin tooling.
+
+## Estimated Costs
+
+### 1) One-time build effort
+1. MVP (mention bot + credits + payments + owner-only playback): `220-320 engineer hours`.
+2. Hardening + shared-link repurchase + observability: `90-160 engineer hours`.
+
+Total: `310-480 engineer hours`.
+
+### 2) Ongoing monthly operating costs
+Use this formula:
+
+```text
+monthly_cost =
+  X_api_cost
+  + TTS_cost_per_char * total_chars
+  + storage_egress_cost
+  + hosting_cost
+  + payment_fees
+```
+
+Practical baseline for early stage (excluding X API variability):
+1. TTS: scales linearly with characters (dominant variable cost).
+2. Hosting + backend: low double-digits to low hundreds USD/month.
+3. Storage: usually modest unless retention is long and playback volume is high.
+4. Polar fees: percentage + fixed per successful payment.
+
+### 3) Example unit economics (replace with real rates)
+Assumptions:
+1. Average article length: `25,000` chars.
+2. TTS rate placeholder: `$0.12 / 10,000 chars`.
+3. Variable infra/storage per generated audio: `$0.005`.
+4. X API per-job variable cost placeholder: `$x_api_job_cost`.
+
+Estimated per completed audio:
+
+```text
+tts_cost = 25,000 / 10,000 * 0.12 = $0.30
+infra_cost = $0.005
+total_cost_per_audio ~= $0.305 + x_api_job_cost
+```
+
+If one credit pack gives `10` base articles and sells at `$9.99`, target:
+1. Gross margin after payment fees > `60%`.
+2. Credit policy tuned so average credits consumed aligns with this margin.
+
+## Resolved Defaults
+
+1. Credit policy defaults (`25k included`, `+1 / 10k`, `max 120k`) with backend configurability.
+2. Unlock policy: permanent access grants after payment (no expiry).
+3. Retention defaults: `24h` raw text, `90d` audio from last play, ledger/grants retained for audit.
+4. No region/content restrictions configured initially.
+5. UI framework: Tailwind CSS + `daisyUI`, implemented mobile-first.
+
+## Repo status
+
+This repository currently contains only `spec.md`. The README defines a concrete architecture and delivery plan you can now implement.
--- a/spec.md
+++ b/spec.md
@@ -0,0 +1,10 @@
+# X article to audio spec
+ 
+My idea is to create a website that allows people to take articles on X and convert them into audiobooks that people can listen to. the idea is that people can open the website, log in and top up credits and with these credits they are able to generate a certain number of audiobooks.
+
+They can then go under articles on X and call the bot or open up the website and paste in the article link. In the background the site will use a tts model like qwen 3 tts to create an audio clip that they can listen to.
+
+Im thinking of having it be a PWA that uses convex for the backend. set up auth with X oauth. I dont know how X bots work or how much this would cost.
+
+I'm thinking of using polar.sh for payments
+