1 / 4

Content Analyzer

Content Analyzer. UNCLASSIFIED//FOR OFFICIAL USE ONLY. Compile industry standard information Global Dialing Data Solution E.164 E.212. Augment with specialized (local) information names (naming convention, honorific titles, etc.) common addressing .

unity
Télécharger la présentation

Content Analyzer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Content Analyzer UNCLASSIFIED//FOR OFFICIAL USE ONLY • Compile industry standard information • Global Dialing Data Solution • E.164 • E.212 • Augment with specialized (local) information • names (naming convention, honorific titles, etc.) • common addressing • Significant use of pattern matching and statistics • Frequency distribution • Cardinality • Sequencing • Mathematical functions UNCLASSIFIED//FOR OFFICIAL USE ONLY

  2. Content Analyzer Workflow UNCLASSIFIED//FOR OFFICIAL USE ONLY text Content Analyzer Database Loader Text Extractor Input Standardizer doc, pdf, txt xls, xml pdf (image) text Image Extractor OCR Text Re-Assembly UNCLASSIFIED//FOR OFFICIAL USE ONLY

  3. Content Analyzer UNCLASSIFIED//FOR OFFICIAL USE ONLY ~1028922367~~110509191856~~799277896~AMIN GUL~~412200307862307~ZHAWANDAI~~ ~1029017905~~110517065408~~700314788~HAYDARI~~412500204345693~HALIL~~ ~1029133489~~110601234059~~700314788~HAYDARI~~924236260210~~~~ Input Patterns &Statistics Field ID Standardization Rules Pattern Matching Data Collection Internal Sequence Number D{10} (96.7%) Field # 2 of 13 Known Patterns 1028922367 All Digits D{10} 98.2% 1029017905 Length{10} [A-Z]{2}D{8} 1.8% 1029133489 OCR errors Sequential Ascending LO29117401 DE errors Pattern Stats …… Dedupe High Cardinality Confidence Level 100% ….. ….. Time DB Loader UNCLASSIFIED//FOR OFFICIAL USE ONLY

  4. Content Analyzer UNCLASSIFIED//FOR OFFICIAL USE ONLY ~1028922367~~110509191856~~799277896~AMIN GUL~~412200307862307~ZHAWANDAI~~ ~1029017905~~110517065408~~700314788~HAYDARI~~412500204345693~HALIL~~ ~1029133489~~110601234059~~700314788~HAYDARI~~924236260210~PCIL~~ YYMMDDHHMMSS Frequency Distribution Afghani Mobile Phone Subscriber Length Match MPC Match Multiple Phone Pairing IMSI (GDDS) Phone Length Match Surname Match Subscriber MCC Match Surname Match (GDDS) Subscriber Match (E.212) DB Loader UNCLASSIFIED//FOR OFFICIAL USE ONLY

More Related