130 likes | 138 Vues
Warranty/Performance Text Exploration for Modern Reliability. Scott Wise – Global Enablement & Training Manager Discovery Summit Europe 2019. Overview. Why Text Exploration in Reliability:
E N D
Warranty/Performance Text Exploration for Modern Reliability Scott Wise – Global Enablement & Training Manager Discovery Summit Europe 2019
Overview • Why Text Exploration in Reliability: • With increasing access to more reliability information, we often ignore the unstructured text data that is often included with our warranty/performance data. • We will show the basic steps and methods that Reliability Professionals can use to explore unstructured text to better uncover the real reliability trends in their data. • By Including Unstructured Data into Your Reliability Studies: • Clearly See Reliability Warranty/Performance Text Patterns and Trends and Validate Current Approaches • Create More Accurate Reliability Models by Incorporating Warranty/Performance Text Analysis
Basic Text Exploration Steps Basic Steps for Exploring Unstructured Text in Warranty and Performance Data • Summarizing– Find out the words that occur the most often in your text data • Preparing– Fine tune the list of biggest terms and phrases in your text data • Visualizing– Graphically see largest terms in your text data • Analyzing– Dimensionally reduce down to most important terms and topics • Modeling– Incorporate learnings into Better Reliability and Predictive Models Modern Analytic Software Makes it Easy for Reliability Practitioners to Learn and Apply Text Exploration • We will use JMP Pro Statistical Discovery Software (Version JMP 13.2) from SAS Institute for analysis on this presentation
Important Text Exploration Definitions • Important Text Exploration Terms • Document -- Individual body of text (Each Row of Text) • Corpus -- The set of documents (All rows in a text column) • Term -- Unit of analysis (Single or a multiword phrase)
Background & Introduction: • Case Background: • BIOS Time Delta Free Text - Typical Computer Warranty Data Failures in Test • Explore the Case Data: • Model, Issue ID, Phase Found, Severity, Submit Date/Time, Close Date/Time, Time Delta, Text (Unstructured Free Text) • Findings: • Some Categorical and Continuous Categories for Drilldown on Failure • But valuable info in the Comments (Unstructured Free Text) Case Example:Case Data Snapshot & Graph
Summarizing Text Data • Objective: • Create a list of the most common occurring text terms in your data. • Tokenizing: • Parse Text into Terms or Tokens • Uses built-in libraries and techniques to find patterns (domain names, money, words, html links, time, numbers, etc.) • Stemming • Pull off the “stems” (s and endings) and combines into root Term • Example: Fail = Fails, Failed, Failing, etc. • Case Example: Common Text Exploration Set-Up
Preparing the Text Data: • Objective: • Create List of Terms and Phrases • Build on Terms/Phrases: • Add-on to and Combine Phrases and Terms • Stop Words: • Remove terms or phrases up don’t want --------------- • Case Results: • Many phrases can be combined, like those related to ”Power” • Case Example: Terms & Phrases After Combining “Power” Related Phrases
Visualizing Text Data • Investigate Large Terms and Phrases • Can Drill Down to which Cases (Rows) contain certain Terms and Phrases • Word Cloud • Creates a cloud visual of the words, sized by frequency • Add a color gradient also by frequency or another factor. --------------- • Case Results: • Large “Bios” effect • Case Example: Word Cloud
Analyzing Text Results • Create Indicator Columns: • Save Desired Terms or Phrases and as indicator column • Filter by Other Columns • Create Document Term Matrix (DTM): • Transform Prepared Text into DTM making it easier to analyze • Creates an indicator or scored column for each Term in the datatable • Case Examples: Reduced DTM Snapshot
Analyzing Text Data Cont. • Cluster (Group) the Documents Using Multivariate Methods: • Latent Semantic Analysis with SVD • Performs a Singular Value Decomposition (SVD) on the DTM (Similar to PCA) • SVD Scatterplot Matrix – Shows large differences in dimensional distance among documents and terms and enables drilldown • Topic Analysis • Groups words into themes or topics, and is similar to Factor Analysis --------------- • Case Results: • ”Bios” documents and terms stand out as different than the others in the SVD Matrix • Drilldown and Topic Analysis results shows Topic 1 contains associated terms common in Bios failures (IPMI, DRAC3/XT, etc.) • Case Example: SVD Matrix & Topic Analysis
Modeling with Text Data • Reliability Modeling for Bios • Use ”CloseDate/Time” for Time to Failure • Use “bios Binary” for the DTM as the Censor data • Software indicated better reliability fit with a Frechet underlying distribution • Now ready to interact with Quantile Profiler to answer Bios Reliability questions Case Example: Reliability Model Setup & Results
Modeling with Text Data Cont. Case Example: Severity Based on Topics • Predictive Modeling for Top Ten Saved Terms • Model of Severity Based on Top DTM Terms – Interactive Ordinal Logistic Regression • Allows Prediction of Severity based on Active Failure Terms
Recap • Overview • Utilize the unstructured text data that is often included with our warranty/performance data. • Text Exploration on unstructured text better uncovers the real reliability trends • By Including Unstructured Data in Your Studies: • Clearly See Reliability Warranty/Performance Text Patterns and Trends and Validate Current Approaches • Create More Accurate Reliability Models by Incorporating Warranty/Performance Text Analysis