Measuring Speech Application Success: A Comprehensive Health Check Methodology
In this insightful presentation, Silke Witt-Ehsani, PhD from TuVox, explores the complex landscape of measuring success in speech applications. Key topics include defining success criteria, establishing success metrics, and how design choices can influence user experience. With a focus on both subjective and objective success metrics, the session will delve into case studies demonstrating varying success criteria and metrics for different applications. Gain valuable insights on routing accuracy, caller satisfaction, and effective health check methodologies to enhance your voice user interface performance.
Measuring Speech Application Success: A Comprehensive Health Check Methodology
E N D
Presentation Transcript
Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox
Outline • What is Success? • Success Criteria • Success Metrics • Putting it all together: A health check methodology • Success vs Design • How they effect each other • Case studies
Different questions (Success Criteria) require different answers (Success Metrics) How do we do that? Success Criteria: i.e. What is “success”? • Common criteria: • Are callers transferred to the correct destination? • How many callers are being helped? • How do callers like my speech applications? • What is the system recognition accuracy?
Subjective Usability study Whole call recordings Individual caller feedback Objective = Application Statistics Automation rates Containment rates Non-cooperative caller rate Success Metrics: Subjective vs Objective
Success Metrics: Business vs Technical Higher Routing Accuracy = Less Agent-to-agent transfers More Transfers out of application = higher call center cost • Business Metrics for • Business User: • Routing Accuracy • Agent Transfers • Customer Satisfaction • Technical Users: • need detailed application performance on dialog state level • grammar coverage • NoMatch, NoInput • need ability to drill down Business stakeholders care about the bottom line impact of several application and speech events
Common Business Metrics • Containment rate = “keep caller hostage in the system” • Automation rate = “offer complete functionality…” • Successful routing = “get the caller to the right expert” • Average call duration • And many, many more ….
Application Health Check - Business 3 main elements of a Business Health Check are • Custom defined success rate • Non co-operative Caller rate • Agent Transfer rate • Transfer due to explicit caller request • Transfer due to errors (both speech and system) • Transfer by design (i.e. correctly routed calls)
Example Success Metric: Routing Accuracy Definition: Confirmed routed calls (calls reaching an end destination) over all calls Useful metric when using: • Skills-based routing • Routing application with N routing points % Routing Accuracy 85% 77% 68.3% ~150 routing points ~ 50 routing points 4 routing points
% Non-cooperative Callers 8.6% 6.3% Open-ended Router Directed Dialog Technical Support Example: Non Co-operative Callers Definition: Non-cooperative callers is the percentage of all callers that immediately hang-up or request an agent but never interact with the application Possible reasons: • Degree of caller acceptance of system • Non application related, such as wrong number, child crying etc. Expected range: 5-10% of call volume
% Agent Requests 45% 4.7% Example: Agent Transfers • Applications tend to have many different types of agent transfers. • Main categories: • Customer zero-ing out • Routing to an agent based on caller information is a “Designed Transfer” • Routing due to some logic in the application is a “Necessary Transfer” • Agent Transfers have immediately impact on call center cost Definition: % Agent transfers of all calls Example from a Telecommunications Company
Numbers are relative, they only have meaning in a context When defining success metrics, create a baseline then compare to that. Potential Baselines: previous IVR touch-tone application Go-live Performance Baseline and Trending Customers finding speech easier or much easier than IVR 76% 66% 52% Usability Go-live Tuning 1
Application Health check = Technical • Purpose of hotspot analysis • Identify areas where application is performing sub-optimal • Hotspot analysis should be done for each dialog state • Important: Hotspot analysis gives the “where” of issues, not the “why”!
Rule of Thumb : State Exit Count = # of calls * ( %H + %NI + %NM + %TR) Framework for Technical Health Check TuVox Hotspot analysis = Integrated view of: • Hang-up ( %H ) • % Final NoInput ( %NI) • % Final NoMatch ( %NM) • Transfer Requests ( %TR ) These numbers are a first order of approximation: • Sort by highest state exit count • Review one by one in context, i.e. high hang-up because it is a logical end point
Design influences success Authentication Look up all loans for this callers Does caller Caller selects from have more than yes list of loans 1 loan ? no Loan Menu : · Balance Does caller has · no More loan details a line of credit ? · Make loan payment yes Design Success and Design are tightly linked Success determines the design Success Metric
Case Study 1: Airline application • Customer requirement: 64% Success • Success definition: • “For 64% of the callers entering the application, their ticket reservation record has to be retrieved from the back-end • Design consequences: • Ensure via prompting that callers have their record identifier number before entering the application • Make it hard to get to an agent, i.e. multiple retries • Explain what the record identifier was Design tailored to success criteria but at the expense of ease of use and caller experience
Case Study 2: Travel Application Hotspot analysis identifies a too high number of exists at a main menu • Observation: One menu option is much more common than other 5 choices • Old Design: Menu with 6 options • New Design: Yes/no question followed by a menu Impact on Application Performance • Turn failure rate = Decreased by 39% • Opt-out rate to the call center = Decreased by 44%
Case Study 3: HighTech Routing Application • 3 success criteria: • Average call handling less than 30 secs • High customer satisfaction • 4 queues to route to, but many different call reasons • Influence of these criteria on the design: • Only 1 reprompt instead to standard 2 attempts • No traditional error prompting a la ‘sorry I didn’t get that’ • Natural language open ended prompting with high coverage grammar
Summary • Define Application Success Criteria • Based on that, define success metrics • Use trending and baseline to put data in context • Success Criteria and Design are highly interlinked, i.e. success criteria determine the design • The design influences how targeted success metrics can be met