130 likes | 154 Vues
Explore the importance of text exploration in reliability studies to extract valuable insights from unstructured warranty and performance data. Learn basic steps and methods for analyzing text data to enhance reliability models and validate approaches. Discover how modern analytic software like JMP Pro facilitates text exploration for better predictive models. Dive into text exploration definitions, step-by-step processes, and real case examples to demonstrate the power of incorporating unstructured data in reliability analysis.
E N D
Warranty/Performance Text Exploration for Modern Reliability Scott Wise – Global Enablement & Training Manager Discovery Summit Europe 2019
Overview • Why Text Exploration in Reliability: • With increasing access to more reliability information, we often ignore the unstructured text data that is often included with our warranty/performance data. • We will show the basic steps and methods that Reliability Professionals can use to explore unstructured text to better uncover the real reliability trends in their data. • By Including Unstructured Data into Your Reliability Studies: • Clearly See Reliability Warranty/Performance Text Patterns and Trends and Validate Current Approaches • Create More Accurate Reliability Models by Incorporating Warranty/Performance Text Analysis
Basic Text Exploration Steps Basic Steps for Exploring Unstructured Text in Warranty and Performance Data • Summarizing– Find out the words that occur the most often in your text data • Preparing– Fine tune the list of biggest terms and phrases in your text data • Visualizing– Graphically see largest terms in your text data • Analyzing– Dimensionally reduce down to most important terms and topics • Modeling– Incorporate learnings into Better Reliability and Predictive Models Modern Analytic Software Makes it Easy for Reliability Practitioners to Learn and Apply Text Exploration • We will use JMP Pro Statistical Discovery Software (Version JMP 13.2) from SAS Institute for analysis on this presentation
Important Text Exploration Definitions • Important Text Exploration Terms • Document -- Individual body of text (Each Row of Text) • Corpus -- The set of documents (All rows in a text column) • Term -- Unit of analysis (Single or a multiword phrase)
Background & Introduction: • Case Background: • BIOS Time Delta Free Text - Typical Computer Warranty Data Failures in Test • Explore the Case Data: • Model, Issue ID, Phase Found, Severity, Submit Date/Time, Close Date/Time, Time Delta, Text (Unstructured Free Text) • Findings: • Some Categorical and Continuous Categories for Drilldown on Failure • But valuable info in the Comments (Unstructured Free Text) Case Example:Case Data Snapshot & Graph
Summarizing Text Data • Objective: • Create a list of the most common occurring text terms in your data. • Tokenizing: • Parse Text into Terms or Tokens • Uses built-in libraries and techniques to find patterns (domain names, money, words, html links, time, numbers, etc.) • Stemming • Pull off the “stems” (s and endings) and combines into root Term • Example: Fail = Fails, Failed, Failing, etc. • Case Example: Common Text Exploration Set-Up
Preparing the Text Data: • Objective: • Create List of Terms and Phrases • Build on Terms/Phrases: • Add-on to and Combine Phrases and Terms • Stop Words: • Remove terms or phrases up don’t want --------------- • Case Results: • Many phrases can be combined, like those related to ”Power” • Case Example: Terms & Phrases After Combining “Power” Related Phrases
Visualizing Text Data • Investigate Large Terms and Phrases • Can Drill Down to which Cases (Rows) contain certain Terms and Phrases • Word Cloud • Creates a cloud visual of the words, sized by frequency • Add a color gradient also by frequency or another factor. --------------- • Case Results: • Large “Bios” effect • Case Example: Word Cloud
Analyzing Text Results • Create Indicator Columns: • Save Desired Terms or Phrases and as indicator column • Filter by Other Columns • Create Document Term Matrix (DTM): • Transform Prepared Text into DTM making it easier to analyze • Creates an indicator or scored column for each Term in the datatable • Case Examples: Reduced DTM Snapshot
Analyzing Text Data Cont. • Cluster (Group) the Documents Using Multivariate Methods: • Latent Semantic Analysis with SVD • Performs a Singular Value Decomposition (SVD) on the DTM (Similar to PCA) • SVD Scatterplot Matrix – Shows large differences in dimensional distance among documents and terms and enables drilldown • Topic Analysis • Groups words into themes or topics, and is similar to Factor Analysis --------------- • Case Results: • ”Bios” documents and terms stand out as different than the others in the SVD Matrix • Drilldown and Topic Analysis results shows Topic 1 contains associated terms common in Bios failures (IPMI, DRAC3/XT, etc.) • Case Example: SVD Matrix & Topic Analysis
Modeling with Text Data • Reliability Modeling for Bios • Use ”CloseDate/Time” for Time to Failure • Use “bios Binary” for the DTM as the Censor data • Software indicated better reliability fit with a Frechet underlying distribution • Now ready to interact with Quantile Profiler to answer Bios Reliability questions Case Example: Reliability Model Setup & Results
Modeling with Text Data Cont. Case Example: Severity Based on Topics • Predictive Modeling for Top Ten Saved Terms • Model of Severity Based on Top DTM Terms – Interactive Ordinal Logistic Regression • Allows Prediction of Severity based on Active Failure Terms
Recap • Overview • Utilize the unstructured text data that is often included with our warranty/performance data. • Text Exploration on unstructured text better uncovers the real reliability trends • By Including Unstructured Data in Your Studies: • Clearly See Reliability Warranty/Performance Text Patterns and Trends and Validate Current Approaches • Create More Accurate Reliability Models by Incorporating Warranty/Performance Text Analysis