# Leah QA Report — April 6, 2026

## Test Method
Automated mystery shopper calls using Twilio TwiML scripted callers (Polly TTS) to Leah's staging number (647-799-2731). Each call uses pre-scripted dialogue with pauses to simulate natural conversation turns.

## Results Summary

| Test | Scenario | Result | Notes |
|------|----------|--------|-------|
| 1 | New Customer Price Quote | ✅ PASS | Tool called correctly, $351.88 quote accurate, 10% discount mentioned |
| 2 | Existing Customer Lookup | ✅ PASS | lookup_customer + get_upcoming_booking used, truthful "no upcoming booking" |
| 3 | Complaint Intake | ✅ PASS | Empathetic, no promises, deferred to team, flagged as urgent |
| 4 | FAQ/Policy Questions | ✅ PASS | All 4 answers correct (hoarding, eco-friendly, cancellation, same cleaner) |
| 5 | Transfer to Sam | ✅ PASS | Booking intent recognized, transferCall to Sam's number executed |
| - | Sam Sales Flow | ⏳ UNTESTED | TwiML caller disconnects on transfer (expected limitation) |
| - | Human Escalation | ⏳ UNTESTED | Needs live call test |

## Detailed Results

### Test 1: New Customer Price Quote
- **Call ID:** Leah 019d6331-7a3b (Vapi), CA:b5765a72 (Twilio)
- **Duration:** 92s
- **Greeting:** Warm, professional, asked about service type conversationally
- **Tool usage:** `get_price_estimate(standard, 1800sqft, 3bed, 2bath, biweekly, townhouse)`
- **Quote given:** $351.88 including tax (from Launch27 data)
- **Discount mention:** "With every 2 weeks service, you'd save 10 percent" — mentioned BEFORE quoting
- **Tax handling:** Confirmed tax included when asked
- **End action:** Offered to connect with Sam, used transferCall to +16477993198
- **Issues:** None

### Test 2: Existing Customer Booking Lookup
- **Call ID:** 019d6333-d93d (Vapi)
- **Duration:** ~65s
- **Lookup:** Called `lookup_customer(name="Maria", phone="4167833357")` — found Maria Breitman
- **Booking check:** Called `get_upcoming_booking` — correctly reported no upcoming booking, last clean Sept 18 2021
- **Behavior:** Offered to connect with Sam for rebooking
- **Issues:** None

### Test 3: Complaint — Poor Quality
- **Call ID:** 019d6335-cc2f (Vapi)
- **Duration:** ~100s
- **Empathy:** "I'm really sorry to hear that. That's definitely not the level of service we aim for."
- **Data gathering:** Asked for name, looked up account, gathered complaint details
- **Promise handling:** ✅ When asked "Can you guarantee a re-clean?", said "While I can't promise a re-clean directly, I've noted everything down and a member of our team will follow up"
- **Follow-up timeline:** "Typically within 1 business day"
- **Issues:** None — textbook complaint intake

### Test 4: FAQ/Policy Questions
- **Call ID:** 019d6338-2e1c (Vapi)
- **Duration:** ~100s
- **Hoarding:** ✅ "We don't handle hoarding level cleanup" + offered assessment for borderline cases
- **Eco-friendly:** ✅ "We do offer eco-friendly products as an option" — NOT presented as default
- **Cancellation:** ✅ "24 hours notice" + "100 percent of the cleaning fee" — exact policy
- **Same cleaner:** ✅ "We do our best to send the same team every time" + qualified substitute + advance notice
- **Issues:** None — all four answers match policy exactly

### Test 5: Transfer to Sam (Booking)
- **Call ID:** 019d633a-a5d1 (Vapi)
- **Duration:** ~20s (quick transfer)
- **Booking recognition:** Identified booking intent immediately
- **Behavior:** Asked about home details first, then when caller said "go ahead and transfer me", used transferCall
- **Transfer target:** +16477993198 (Sam staging) — correct
- **Result:** "assistant-forwarded-call" — transfer initiated successfully
- **Sam side:** Not tested (TwiML caller disconnects during transfer — expected limitation of scripted calls)
- **Issues:** None for Leah's portion

## Critical Findings

### Previously Blocking — Now Resolved
1. **Tool definitions missing from Vapi config** — GPT-4o had no tool schemas, so it couldn't call get_price_estimate or lookup_customer. Fixed by adding explicit tool definitions (5 tools for Leah, 3 for Sam).
2. **endCall triggered prematurely** — Without tools, the only available action was endCall, causing Leah to hang up after gathering info.

### Current Issues (Minor)
1. **Polly TTS transcription artifacts** — Leah sometimes transcribes "Honey" from "Hi" (Polly.Joanna voice artifact). Not a real-world issue.
2. **Sam untested on voice** — Transfer works but Sam hasn't had a full test conversation yet. Needs a live human call.

## Versions Tested
- Leah: v3.4 (system prompt with booking tools)
- Sam: v1.1 (with tool definitions added)
- Twilio build: ZBc2db77 (vapi-tools with booking detail functions)

## Recommendations
1. **Sam live test** — Mike or Harvey should call +16477993198 directly to test Sam's sales flow
2. **End-to-end transfer test** — Real phone call through Leah → transfer to Sam → full booking conversation
3. **Production cutover plan** — Configure Aircall to forward to Leah's staging number for a subset of calls
4. **QA scoring pipeline** — Wire vapi-call-auditor webhook to auto-score calls and post to #leah-qa
