# Proposal — Knowledge-Base Accuracy Feedback Loop

**Status:** draft (audit iter 12, 2026-05-16)
**Owner:** TBD
**Effort:** 5 days (3 backend, 1 frontend admin, 1 docs+QA)
**Plan gate:** all plans (improves core quality, not a tier feature)

---

## Problem

Today the bot can hallucinate, cite irrelevant chunks, or give
outdated answers — and the workspace owner has zero feedback signal
to know which chunks of their crawled knowledge base are driving the
bad replies.

Today's surfaces:
- Visitors see a 👍 / 👎 prompt on each reply (already shipped).
- `FeedbackController` writes thumbs up/down to `message_feedback`.
- `SuggestFromGapsCommand` looks at low-confidence/unanswered
  questions and proposes curated answers (already shipped).

The missing link: **per-chunk attribution of bad replies.**

Buyer feedback (recurring):
> "I get 10 thumbs down a week but no idea WHICH page on my site
>  caused it. I have 400 pages crawled — am I supposed to read all of
>  them?" — ttsoft, batch v3
> "Stale pricing in our docs caused 4 visitors to think we still sold
>  the Lite plan. I want to know WHICH chunk gave them that answer." —
>  Lucian, batch v4

Competitors:
- **Chatbase** — per-chunk quality score on the inspector.
- **Intercom Fin** — admin-tagged "this answer is wrong" → tracks the
  source page → flags for review.
- **MendableAI** — bulk reindex with a "stale" filter.

Pitchbar's "knowledge" page today is read-only crawl status. No
quality signal. No "fix this chunk" workflow.

---

## Goals

1. Persist the chunks that drove every reply (already retrievable
   from `Retriever::retrieve()` result, but not stored long-term).
   Tie thumbs-down to the specific chunks that were in context.
2. Reverse-aggregate: per-chunk dashboard showing
   `(thumbs_down_count / used_in_replies)` ratio, sorted descending.
3. Workspace admin sees a new "Low-quality chunks" tab on the
   Knowledge page, with the offending chunks + the conversations
   they drove + a "Rewrite this chunk" / "Mark obsolete" / "Re-crawl
   this page" action set.
4. Operator-driven re-crawl: clicking "Re-crawl" enqueues
   `CrawlPageJob` for the chunk's parent Document so the latest page
   state replaces the stale chunk.
5. Bulk-mark-obsolete: workspace admin can hide a chunk from RAG
   retrieval without deleting it (so reindex doesn't bring it back
   while they fix the source page).

---

## Non-goals

- **No LLM-side "rate this chunk" auto-scoring.** Out of scope.
  Signal must come from real visitor feedback, not from the model
  evaluating itself.
- **No public visitor-facing "this chunk was wrong" surface.** Only
  workspace admins see the dashboard.
- **No automatic chunk deletion on N negative feedbacks.** Manual
  workflow only; auto-purge could silently destroy correct chunks
  flagged by spammers.
- **No retroactive reattribution** for replies before this ship. New
  data only.

---

## Data model

New table:

```php
Schema::create('message_chunk_attribution', function (Blueprint $table) {
    $table->uuid('id')->primary();
    $table->foreignUuid('message_id')->constrained('messages')->cascadeOnDelete();
    $table->foreignUuid('chunk_id')->constrained('chunks')->cascadeOnDelete();
    $table->float('score');                      // from Retriever
    $table->boolean('boosted_for_current_page'); // CLAUDE.md gotcha #4
    $table->timestampTz('created_at')->useCurrent();

    $table->index(['chunk_id', 'created_at']);
    $table->index(['message_id']);
});
```

New columns on existing `chunks`:

```php
Schema::table('chunks', function (Blueprint $table) {
    $table->boolean('hidden_from_retrieval')->default(false)->index();
    $table->timestampTz('marked_obsolete_at')->nullable();
});
```

`message_feedback` already exists; we add an FK from
`message_chunk_attribution.message_id` so a thumbs-down on a message
becomes a thumbs-down for every chunk that contributed.

---

## Write path

After `PersistTurnJob` writes the assistant turn, dispatch
`AttributeChunksJob`:

```php
class AttributeChunksJob implements ShouldQueue
{
    public function __construct(
        public string $messageId,
        public array $chunks,   // from Retriever result
    ) {}

    public function handle(): void
    {
        $rows = collect($this->chunks)
            ->map(fn ($c) => [
                'id' => (string) Str::uuid7(),
                'message_id' => $this->messageId,
                'chunk_id' => $c['chunk_id'],
                'score' => $c['score'],
                'boosted_for_current_page' => $c['boosted_for_current_page'] ?? false,
                'created_at' => now(),
            ])
            ->all();

        DB::table('message_chunk_attribution')->insert($rows);
    }
}
```

Dispatched alongside `PersistTurnJob` from the SSE
`after-stream` block. Hot-path safe — fires after first token.

---

## Read path

`Retriever::retrieve()` adds a pre-filter:

```php
->where('chunks.hidden_from_retrieval', false)
```

Hidden chunks are skipped at the DB hydration step. Vectorize still
returns the point IDs (we don't sync hidden state to Vectorize — too
chatty); the DB join discards them. Cost: minor (one extra index
lookup per retrieved chunk).

---

## Admin dashboard

New tab on `/app/agents/{agent}/knowledge`:

**Low-quality chunks** — sorted by `thumbs_down_count /
used_in_replies` descending, filtered to chunks with ≥5 uses (avoid
small-sample noise).

Each row shows:
- Source URL + chunk excerpt (first 200 chars).
- Stats: `42 uses · 8 thumbs-down · 19% bad rate`.
- Last 3 conversations the chunk appeared in (linked).
- Actions: `Re-crawl page` / `Mark obsolete` / `Edit chunk text`.

`Edit chunk text` opens a modal — operator-rewritten chunks are
flagged with `chunks.edited_at` so the next re-crawl doesn't blow
their fix away.

Wayfinder typed routes:
- `GET /app/agents/{agent}/knowledge/quality` — returns the dashboard
  data.
- `POST /app/agents/{agent}/knowledge/chunks/{chunk}/hide` — flips
  `hidden_from_retrieval`.
- `POST /app/agents/{agent}/knowledge/chunks/{chunk}/recrawl` —
  dispatches `CrawlPageJob` for the parent Document.

---

## Hot-path safety

- Attribution write happens AFTER first token via
  `AttributeChunksJob`. Zero hot-path cost.
- Retrieval adds one `WHERE hidden_from_retrieval = false` clause to
  the existing chunk hydration query. Indexed; negligible cost.
- Dashboard query is admin-side, not hot path.

---

## Test plan

Pest feature tests:
- `AttributeChunksJob` writes one row per chunk in the Retriever
  result.
- Thumbs-down on a message correctly back-attributes to every
  contributing chunk via the join.
- Retriever skips chunks with `hidden_from_retrieval=true`.
- Re-crawl endpoint enqueues `CrawlPageJob` for the chunk's Document.
- Hide / unhide is idempotent and audit-logged.
- Cross-tenant: workspace A's chunks never appear in workspace B's
  dashboard.
- Dashboard query excludes chunks with fewer than 5 uses (sample-size
  guard).

UI test plan:
1. Sign in as workspace admin. Open the demo agent's Knowledge page.
2. Click new "Low-quality chunks" tab.
3. If demo workspace, see the placeholder "Not enough feedback yet";
   simulate via tinker.
4. Click "Mark obsolete" on a row → confirm chunk hidden from
   retrieval (verify next visitor question doesn't surface it).
5. Click "Re-crawl page" → confirm CrawlPageJob enqueued.

---

## Rollout

1. Phase 1 (2 days): schema migrations + `AttributeChunksJob` +
   Retriever hidden-chunk filter + tests. No UI change yet.
2. Phase 2 (2 days): admin dashboard tab + Wayfinder routes + the 3
   action endpoints.
3. Phase 3 (1 day): docs page `troubleshooting-knowledge-quality.blade.php`
   + nav entry.
4. Canary: enable on Pitchbar's demo agent first. Watch
   `message_chunk_attribution` row count vs message count to ensure
   the job is dispatched correctly.

---

## Risks / open questions

- **Storage growth.** Each turn writes ~6 attribution rows (topK=6).
  At 100K turns/month per medium workspace = 600K rows/month. Index
  on `chunk_id, created_at` keeps queries fast; 12 months of data is
  ~7M rows per workspace. Acceptable. Can add retention policy
  (`DELETE WHERE created_at < now() - 6 months`) later.
- **False positives via spam thumbs-down.** Same visitor IP /
  conversation flooding 👎 to bury a competitor's chunk. Cap at 1
  thumbs-down per conversation per chunk; dashboard filters by
  distinct visitors not raw count.
- **Edited chunks vs re-crawl.** Operator manually rewrites a chunk
  → re-crawl would overwrite it. Solution: `chunks.edited_at` flag
  pauses re-crawl for that chunk; operator gets a banner when the
  source page changed.

---

## Why now

- 4+ months of buyer complaints with no observability surface.
- Engineering scope clean: one new table, one new column,
  one queued job, one new tab. No hot-path risk.
- Unlocks an entire new operator workflow: "rate-of-improvement of
  knowledge base over time" — a quality metric Pitchbar can market
  alongside conversion rate.
- 8th proposal in Q3 roadmap; ~41 days total engineering across all 8
  proposals.
