TweetSim v1 roadmap: what ships, what's deferred, what we won't build
Public roadmap for TweetSim v1, written the day after launch. Three sections: what ships in v1, what's deferred to v2+, what we won't build at all.
Most product roadmaps lie about timelines. This one is honest about what's done, what's next, and what we'll deliberately skip — even if customers ask for it.
What ships in v1 (next 4 weeks)
Stripe billing + paid tiers
The Free / Builder / Pro tiers shown on the landing page need actual Stripe Checkout + Customer Portal wiring. Quotas need to be enforced at the API gateway layer, not on the engine itself. Estimated 3-5 days of work, blocked on me deciding whether to use Stripe Checkout Sessions vs. Stripe Elements.
Trained view-curve model
Today the view-curve runs on a generic Gompertz prior. The Render cron is collecting metric snapshots every 30 minutes via RapidAPI; in ~24 hours we'll have enough time-spread data to fit a real per-account model. The fitter is built (fit_view_curve.py); just needs the data + a VIEW_CURVE_MODEL_PATH env var pointing at the trained JSON.
X OAuth + per-user persona
Right now there's one persona (mine). For v1, paid users will connect their X account, the system will pull their last ~200 tweets, run them through the persona-ingest pipeline (PRD_PERSON_INGEST_BRAND_CONTEXT.md), and use the derived brand brief / offer / CTA library to steer generation.
Stats dashboard
Predicted-vs-actual scatter plots per published tweet. Calibration scorecard with traffic-light status (Pearson r > 0.5 green, 0.25-0.5 yellow, < 0.25 red). Top-K winners and mispredictions surfaced. This is mostly UI work on top of the existing tweet_stats.py CLI which already has the queries.
Short-tweet-specific composite scorer
The current composite scorer was designed for 60-220 word long-form posts. When run on single-line tweets, it penalizes uniformly for clarity / dwell, compressing scores into a tight 38-45 band. A purpose-built scorer for the short-tweet path will widen the differentiation. Should be a 1-2 day refactor.
What's deferred to v2+
Audience panel UI improvements
The 6-archetype panel works in the engine but isn't surfaced in the UI yet. Each draft would show a bar chart per archetype: "the indie founder loves it, the casual lurker skips it." Useful but not critical — deferred until v1.5.
Topic-cluster amplification model
Phase 2 of the view-curve roadmap: detect when your tweet would land in a trending topic cluster and adjust the predicted ceiling upward. Requires real-time topic-trend ingestion from X (or a scraper). Useful but expensive in API costs. v2 territory.
Reply-tree depth modeling
Long reply chains drive disproportionate impressions. Modeling them requires a secondary scrape to count reply-of-reply depth. Real signal, but additive — current view-curve already handles the tweet-level reach. v2.
Multi-account workspace billing
Agency feature. Manage 5+ accounts under one subscription. Real demand but adds workspace + role management complexity. v2 if Builder tier shows demand for it.
What we won't build (even if asked)
An LLM "tweet quality vibe-check" mode
A single button that says "use Claude to grade this tweet." We have one — it's called the audience panel — but framed as per-archetype reactions, not a single score. The reason: a single "LLM thinks this is a 7/10" is unfalsifiable, doesn't backtest, and trains users to outsource judgment. The whole product premise is that scores should be measurable, calibratable, and verifiable against real engagement.
Faked or composite case studies
We launched today. Real case studies need real, attributable customers using the product over weeks. Until then, the methodology page is the proof — the engine is open source, the math is published, you can verify the demo in 30 seconds. When real customers want to share results, we'll publish them with names attached, not as "a Fortune 500 used TweetSim and saw 47% lift."
"Algorithm hacks" content
No "10 secrets the X algorithm doesn't want you to know" blog posts. The algorithm has documented characteristics. We model what's public. The honest version of those posts is already on the methodology page.
Tweet-thread autoscheduling at high cadence
The cadence guardrail in StrategyThread refuses to post more than 4 threads per UTC day with a 4-hour floor between posts. Reason: flooding the feed tanks per-tweet reach (the algorithm normalizes attention across your recent activity). We will not build a "post 50 tweets a day" mode. There's no version of that which doesn't hurt the user.
Fake follower tools, mass DM, engagement pods
Out of scope on principle. These tools compress your account's long-term reach in exchange for short-term metrics. The whole product premise is honest pre-publish signal.
What we'll publish weekly
While we're building v1, the blog will publish:
- Calibration scorecard updates — actual Pearson r per action against my real engagement, every Monday
- Why-did-this-tweet-flop write-ups — pick a tweet that scored 70+ but flopped, audit what the model missed
- Why-did-this-tweet-pop write-ups — same, in reverse
- Engine release notes — when phoenix weights or composite dimensions change
That's the build-in-public commitment. Ship the engine, publish the calibration, fix what doesn't work, repeat.