A/B experiments let you trial different agent personas against each other on real visitor traffic and pick the winner based on engagement / lead-capture / conversation length. Pitchbar assigns each visitor stickily — the same person always sees the same variant — so the comparison is honest.

Where it lives

Open any agent → A/B experiments tab in the agent nav (Beaker icon). URL: /app/agents/{id}/experiments. Owners and Admins can create / start / stop experiments; Editors can't.

Creating an experiment

Click New experiment.
Pick a kind:
- persona — variants override the agent's name + tone in the system prompt for assigned visitors. Use this to test "Aria" vs "Max", "friendly" vs "punchy", etc.
- cta — variants record assignment but don't yet alter the runtime CTA payload. Recorded for measurement; full runtime application ships in a future release.
- trigger — same as cta — measurement only.
Add at least 2 variants. Default is control + treatment at 50/50. You can change weights (any positive integers — they're normalised) and names.
For persona kind, each variant's config JSON should hold a persona object:
```
{ "persona": { "name": "Aria", "tone": "warm and concise" } }
```
The widget renders the variant's persona name in the chat header, and the LLM speaks under that name + tone for the assigned conversation.
Save. Status starts at draft — no visitors are assigned yet.
Click Start. Status flips to running. Every subsequent first-turn visitor is bucketed.

How assignment works

On the first message of a conversation, the MessageStreamController calls ExperimentResolver::resolveForConversation:

If conversation.variant_id is already set, use it (sticky).
Otherwise, look up the most recently started running experiment for this agent. Only ONE active experiment per agent — if you start a second one while the first is running, the resolver picks the most-recent. To run a different kind, stop the previous one first.
Hash (visitor_id + experiment_id) into a bucket on the weighted variant list (Assigner). The same visitor returning days later lands in the same variant — the assignment row is durable.
Persist conversation.variant_id. Every future turn for this conversation reads the same variant.
For kind = persona, the variant's config[persona] overrides the agent's default persona in PromptBuilder::build for that turn.

Seeing it in action

The fastest way to confirm the wiring:

Create a persona experiment with two clearly different variants — e.g. { "persona": { "name": "Helpfulbot" } } vs { "persona": { "name": "Snarkbot" } }.
Start the experiment.
Open your widget in two different browsers (or one normal + one incognito — different cookies = different visitor_id).
Ask the same question in each. The chat panel header should read "Helpfulbot" in one and "Snarkbot" in the other, and the answers should sound noticeably different.
Open /admin/conversations and confirm each conversation row has a variant_id stamped.

Measuring results

All persisted: experiment_assignments rows + the variant_id on every conversations row. Join those two tables against messages and leads for any analysis you want — e.g.:

SELECT v.name,
       COUNT(DISTINCT c.id) AS conversations,
       COUNT(DISTINCT l.id) AS leads_captured,
       AVG(c.message_count) AS avg_messages
FROM variants v
LEFT JOIN conversations c ON c.variant_id = v.id
LEFT JOIN leads l ON l.conversation_id = c.id
WHERE v.experiment_id = '...'
GROUP BY v.name;

A built-in stats panel inside /app/agents/{id}/experiments is on the roadmap; for now you'll need to run that query manually (workspace API token + /api/v1/db read access if you're on the self-host build, or ask support).

Stop / delete

Stop sets status to stopped and flushes the running-experiment cache so new conversations immediately stop getting assigned. Existing conversations keep their assigned variant for consistency in mid-flight chats.

Delete hard-deletes the experiment row. Variant rows cascade. Existing conversations.variant_id values become foreign-key orphans — that's intentional; we keep the historical record of which conversation got which variant even after the experiment ends.