A/B experiments let you trial different agent personas against each other on real visitor traffic and pick the winner based on engagement / lead-capture / conversation length. Pitchbar assigns each visitor stickily — the same person always sees the same variant — so the comparison is honest.
Open any agent → A/B experiments tab in the agent
nav (Beaker icon). URL: /app/agents/{id}/experiments.
Owners and Admins can create / start / stop experiments; Editors
can't.
control + treatment at 50/50. You can change weights (any positive integers — they're normalised) and names.persona kind, each variant's config JSON should hold a persona object:
{ "persona": { "name": "Aria", "tone": "warm and concise" } }
The widget renders the variant's persona name in the chat header, and the LLM speaks under that name + tone for the assigned conversation.
draft — no visitors are assigned yet.running. Every subsequent first-turn visitor is bucketed.
On the first message of a conversation, the
MessageStreamController calls
ExperimentResolver::resolveForConversation:
conversation.variant_id is already set, use it (sticky).running experiment for this agent. Only ONE active experiment per agent — if you start a second one while the first is running, the resolver picks the most-recent. To run a different kind, stop the previous one first.(visitor_id + experiment_id) into a bucket on the weighted variant list (Assigner). The same visitor returning days later lands in the same variant — the assignment row is durable.conversation.variant_id. Every future turn for this conversation reads the same variant.kind = persona, the variant's config[persona] overrides the agent's default persona in PromptBuilder::build for that turn.The fastest way to confirm the wiring:
persona experiment with two clearly different variants — e.g. { "persona": { "name": "Helpfulbot" } } vs { "persona": { "name": "Snarkbot" } }.visitor_id)./admin/conversations and confirm each conversation row has a variant_id stamped.
All persisted: experiment_assignments rows + the
variant_id on every conversations row.
Join those two tables against messages and
leads for any analysis you want — e.g.:
SELECT v.name,
COUNT(DISTINCT c.id) AS conversations,
COUNT(DISTINCT l.id) AS leads_captured,
AVG(c.message_count) AS avg_messages
FROM variants v
LEFT JOIN conversations c ON c.variant_id = v.id
LEFT JOIN leads l ON l.conversation_id = c.id
WHERE v.experiment_id = '...'
GROUP BY v.name;
A built-in stats panel inside /app/agents/{id}/experiments
is on the roadmap; for now you'll need to run that query manually
(workspace API token + /api/v1/db read access if
you're on the self-host build, or ask support).
Stop sets status to stopped and flushes the
running-experiment cache so new conversations immediately stop
getting assigned. Existing conversations keep their assigned
variant for consistency in mid-flight chats.
Delete hard-deletes the experiment row. Variant rows
cascade. Existing conversations.variant_id values become
foreign-key orphans — that's intentional; we keep the historical
record of which conversation got which variant even after the
experiment ends.