Portuguese BI-RADS Features Extractor for Mammogram Radiologist Structured Reports

Extracting BI-RADS Features from Portuguese Clinical Texts H. Nassif, F. Cunha, I.C. Moreira, R. Cruz-Correia, E. Sousa, D. Page, E. Burnside, and I. Dutra University of Wisconsin – Madison, and University of Porto, Portugal

The American Cancer Society, Cancer Facts & Figures 2009.

Mammogram Radiologist Structured Database Impression (free text) Predictive Model Benign Malignant

BI-RADS Lexicon Concepts

Example • In the right breast, an approximately 1.0 cm mass is identified in the right upper slightly inner breast. This mass is noncalcified and partially obscured and lobulated in appearance. Concepts

Nassif 09

Syntax Analyzer • Tokenize sentences • Discard punctuation • Keep stop words • Stem words

Nassif 09

Information from Lexicon • Translate lexicon into Portuguese • Lexicon specifies synonyms: Eg: Equal density, Isodense • Lexicon allows for ambiguous wording:

Nassif 09

Experts • Provide domain specific information • Synonyms: Oval, Ovoid • Acronyms, abbreviations • Domain idiosyncrasies • Interact with and modify semantic rules

Nassif 09

Concept Finder • Regular expression rules • Extract concepts from text • Rule formation: • Initial rules based on lexicon • Rules refined by experts

Rule Generation Example 1 • Aim: Regional Distribution Concept • Lexicon specifies the word “regional” • Initial rule: presence of the word “regional” • Run on training set, experts see results • Many false positives: • “regional medical center”, “regional hospital” • Rule refined by experts: • “regional .* !(medical|hospital)”

Rule Generation Example 2 • Aim: Skin Thickening Concept • Lexicon specifies “skin thickening” • Try “skin” and “thickening” in same sentence • “skin retraction and thickening” • “thickening of the overlying skin” • “A BB placed on the skin overlying a palpable focal area of thickening in the upper outer right breast” • Experts suggest “skin” and “thickening” in close proximity

Scope • Scope: distance between two words • Start with a large scope: • assess number of true and false positives • Move to smaller scopes: • assess number of false negatives • Check precision and recall estimates • Experts decide on the best distance

Nassif 09

Negation Detector • Negation triggers (Mutalik 01, Gindl 08): • “não” (not) when not preceded by “onde” (where) • “sem” (without) • “nem” (nor). • Precedes or appears within the subsentence • Establish negation scope • “without evidence of suspicious cluster of microcalcifications”

Dataset • Training set: 1,129 reports, unlabeled • Testing set: 153 pairs, labeled by radiologist • Basic screening report • Detailed diagnostic report • Perform three refinement passes • Double blind, based on lexicon • Refine rules • Refine manual labeling and rules

Results

Conclusion • Out of 48 disputed cases, parser correctly classified 25 (52.1%) • First Portuguese BI-RADS extractor • Discovers features missed or misclassified • Similar performance to manual annotation • Method portable to other languages

Portuguese BI-RADS Features Extractor for Mammogram Radiologist Structured Reports

Portuguese BI-RADS Features Extractor for Mammogram Radiologist Structured Reports

Presentation Transcript

Extracting Essential Features of Biological Networks

CLINICAL FEATURES

Extracting Predicates from Semi-structured and Unstructured Texts

Extracting Videos from YouTube

Extracting features from spatio-temporal volumes (STVs) for activity recognition

BI-RADS

Reading Informational Texts: Text Features

Clinical Features

Clinical features

BI – RADS (breast imaging reporting and data system)

Extracting Tables from ERD

LUNG RADS

Extracting semantic role information from unstructured texts

Persuasive Texts Language Features

Clinical Features

Language Features of Informational Texts

Extracting Value from SOA

Extracting biological names and relations from texts

Extracting Parallel Texts from Massive Web Documents

Automatic extraction of BI-RADS breast tissue composition classes from mammography reports

Stylistic Features of Informal texts