# Proposal — GDPR Visitor Data Export & Deletion API

**Status:** draft (audit iter 17, 2026-05-16)
**Owner:** TBD
**Effort:** 5 days (3 backend, 1 frontend admin, 1 docs+QA)
**Plan gate:** all plans (legal compliance, not a tier feature)

---

## Problem

Today the Pitchbar widget stores per-visitor data: `visitors` rows
(IP hash, UA, anonymous id), `conversations` (messages, citations,
page URLs), `leads` (email, phone, name, custom fields). Workspace
admins can read this data, but a visitor has no programmatic way to:

- Request a copy of all data Pitchbar holds about them.
- Demand erasure of their data (GDPR Article 17 "right to be
  forgotten" + CCPA equivalent).

For workspace admins headquartered in the EU / UK / California, this
is a hard legal blocker — they can't legally deploy Pitchbar to
their site without a documented data-subject-request (DSR) path.

Competitors:
- **Intercom** — built-in "Export visitor data" + "Delete visitor"
  buttons in admin panel; visitor-facing self-serve via signed link.
- **Drift** — admin endpoint only; visitor must email support.
- **HubSpot** — GDPR module includes visitor erasure + portability.

Pitchbar's missing surface = compliance gap = lost EU/UK buyers.

---

## Goals

1. **Admin export endpoint** — workspace admin can request a
   structured export of all data Pitchbar holds for a single visitor
   (identified by email, conversation id, or anonymous id).
2. **Admin delete endpoint** — same lookup, but erases instead of
   exports. Visitor-attributed rows (conversations, messages, leads,
   feedback, attribution) get the visitor link nulled or the row
   anonymized; the visitor row itself is hard-deleted.
3. **Visitor-facing self-serve** — admin can paste a signed URL into
   their privacy-policy page. Visitor opens it, types their email,
   hits "Export my data" or "Delete my data". Workspace admin gets
   notified; action is queued for review (not auto-applied, to
   prevent abuse).
4. **Audit trail** — every DSR is logged in `audit_logs` with the
   resolved visitor id, requester (admin or visitor self-serve),
   timestamp, and outcome.

---

## Non-goals

- **No auto-deletion on visitor request** — every visitor-initiated
  erasure is admin-confirmed in v1 (protects against social-engineering
  attacks that erase competitor leads).
- **No SSO with Pitchbar's account system** — workspace owners
  authenticate the visitor request via the signed URL, not a Pitchbar
  account.
- **No cross-workspace export** — data export is scoped to one
  workspace per request; cross-workspace visitors must request
  per workspace.
- **No legal-hold escrow** — out of scope; ops team coordinates
  with legal counsel for litigation holds.
- **No analytics-retention impact** — anonymized rows still feed
  aggregate counters; deletion removes only the personal link.

---

## Data model

New table `dsr_requests`:

```php
Schema::create('dsr_requests', function (Blueprint $table) {
    $table->uuid('id')->primary();
    $table->foreignUuid('workspace_id')->constrained()->cascadeOnDelete();
    $table->string('action', 16);           // export | delete
    $table->string('lookup_email')->nullable();
    $table->string('lookup_visitor_id', 36)->nullable();
    $table->string('lookup_anonymous_id')->nullable();
    $table->string('source', 16);           // admin | visitor_self_serve
    $table->foreignId('requested_by_user_id')->nullable()->constrained('users')->nullOnDelete();
    $table->string('status', 24)->default('pending'); // pending|approved|rejected|completed
    $table->json('matched_visitor_ids')->nullable();
    $table->json('result_payload')->nullable(); // exported data for `export`, null for `delete`
    $table->timestampTz('completed_at')->nullable();
    $table->timestampsTz();

    $table->index(['workspace_id', 'status', 'created_at']);
});
```

No changes to existing tables. Erasure operates by:
- `visitors` → hard delete (cascades to conversations via FK).
- `leads` → null `email/phone/name`, set `fields = []`.
- `messages` → keep (no PII); just lose the visitor link.

---

## Lookup pipeline

`App\Services\Gdpr\VisitorResolver::resolve(Workspace, array $criteria): Collection<Visitor>`

- Accept any of: `email` (matches `leads.email` in workspace),
  `visitor_id` (uuid), `anonymous_id` (widget storage key).
- Returns all matching visitors (one visitor may have multiple
  conversations across pages, but the visitor row is canonical).

`App\Services\Gdpr\Exporter::export(Visitor): array` returns a
shape:

```json
{
  "visitor": { /* row sans IP-hash */ },
  "conversations": [
    {"id": "…", "started_at": "…", "page_url": "…", "messages": [...], "lead": {...}}
  ],
  "leads": [/* row, fields */],
  "feedback": [...],
  "events": [...]   // CTA clicks, satisfaction prompts, etc.
}
```

`App\Services\Gdpr\Eraser::erase(Visitor): void` performs:
1. `DB::transaction`:
   - Delete `visitor_typing_until` references in conversations.
   - Null `leads.email / phone / name`, clear `leads.fields`.
   - Hard-delete `feedback` rows for the visitor's messages.
   - Hard-delete `events` rows where `payload.visitor_id` matches.
   - Hard-delete the `visitors` row (cascades conversations →
     messages → chunks-attribution).
2. Append `audit_logs` row + `dsr_requests.status = completed`.

---

## Admin endpoints

Wayfinder routes (scoped to `app/dsr/*`):

- `POST /app/dsr/lookup` — body: `{email?, visitor_id?, anonymous_id?}`.
  Returns matched visitors + a preview of what would be exported /
  erased.
- `POST /app/dsr/export` — body: `{visitor_id, dsr_request_id}`.
  Queues `ExportDsrJob`; admin gets a notification when the JSON
  is ready, downloads from `/app/dsr/{id}/download` (signed URL).
- `POST /app/dsr/erase` — body: `{visitor_id, confirm_typed:
  "ERASE"}`. Runs `Eraser` synchronously inside a DB transaction,
  closes the request.

`DsrPolicy::manage` gated to Admin+ (Viewer / Editor cannot trigger
erasure).

---

## Visitor self-serve endpoint

`GET /widget/dsr/{signed_token}` — public, no auth. Token is signed
with `workspaces.cta_context_secret` (already encrypted-at-rest); it
encodes `{workspace_id, action: 'export'|'delete', requested_at}`.

Visitor lands on a minimal page:

```
─────────────────────────────────────────────
   Request your data from <Workspace Name>

   Email you used:  [_________________]

   ( ) Send me a copy of my data
   ( ) Delete all my data

   [Submit]
─────────────────────────────────────────────
```

Submission creates a `dsr_requests` row with `source=visitor_self_serve`
and `status=pending`. Workspace admin gets notified; admin manually
approves/rejects via /app/dsr.

Rate-limited 3 / hour / IP to prevent flood. Signed token expires
90 days from issue (admin re-paste required for older sites).

---

## Hot-path safety

- Lookup + export are admin-only, off hot path.
- Erasure is synchronous in a DB transaction; called from admin
  endpoint only, not from visitor turn.
- Visitor self-serve endpoint NEVER triggers erasure directly —
  admin approval gate guarantees the hot path is not affected.
- `Eraser` does NOT touch the Vectorize index; chunks remain
  searchable because they're tied to documents, not visitors. (PII
  in messages is generally absent from the indexed text; if a
  buyer has a high-PII corpus, separate flag enables full re-index.)

---

## Test plan

Pest feature tests:
- Lookup by email returns the right visitor + preview.
- Lookup by visitor_id returns the visitor.
- Lookup by anonymous_id returns the visitor.
- Lookup against another workspace's email returns empty.
- Export queues `ExportDsrJob` + writes audit log.
- Erase nulls lead PII; visitor row gone; messages retained.
- Erase across multiple conversations (same visitor browsed many pages).
- Viewer role gets 403 on erase endpoint.
- Visitor self-serve writes pending dsr_requests; admin can approve.
- Signed token verification rejects tampered tokens.
- Rate limit fires at 4th request from same IP.

UI test plan:
1. Sign in as workspace admin → /app/dsr.
2. Paste a captured lead's email → preview shows the visitor + N
   conversations + 1 lead.
3. Click "Export" → confirm JSON downloads after a few seconds.
4. Click "Erase" → confirm prompt, type "ERASE" → confirm visitor +
   lead PII gone but messages retained.
5. Admin pastes the signed URL into their privacy page → visitor
   visits → submits → admin sees pending request → approves.

---

## Rollout

1. Phase 1 (2 days): schema + `VisitorResolver` + `Exporter` +
   `Eraser` + tests. Admin endpoints `lookup/export/erase`. No UI.
2. Phase 2 (2 days): admin Inertia page (Wayfinder routes) +
   visitor self-serve page + signed-token generation.
3. Phase 3 (1 day): docs + Mintlify nav + privacy-policy template
   snippet workspace admin can paste.
4. Canary: Pitchbar's own demo workspace first.

---

## Risks / open questions

- **Performance on big workspaces.** Export of a visitor with 1000+
  conversations could be slow. Queue the export instead of inline.
- **Vectorize residue.** Erasure clears the relational PII but
  cannot un-train models that may have seen the data. Document this
  in the privacy page snippet — Pitchbar uses Retrieval-Augmented
  Generation, no fine-tuning, so this is a non-issue in practice.
- **Audit log retention.** GDPR also covers audit logs; ensure
  `audit_logs.action='dsr.completed'` is itself excluded from a
  subsequent erase request (catch-22). Document the carve-out.
- **What constitutes "the data"?** Pitchbar holds the message text
  but the LLM ALSO sent that text to Cloudflare Workers AI / OpenAI
  during the conversation. Document that upstream-provider data is
  the workspace owner's contractual issue with the LLM provider,
  not Pitchbar's.

---

## Why now

- EU/UK/California GDPR/CCPA enforcement actions accelerating in
  2025-2026. CodeCanyon buyers in these jurisdictions can't deploy
  Pitchbar without a documented DSR path.
- Engineering scope is small (5 days) and additive — no hot-path
  risk, no existing-data migration.
- Closes the last compliance-gap blocker for EU agency / SMB market.
- 11th proposal in Q3 roadmap (~56 days total engineering across
  11 proposals).
