1 / 70

Applying Natural Language Generation to Electronic Health Records in an e-Science context

This article discusses the use of Natural Language Generation (NLG) in the CLEF project to query and generate tailored reports from data-encoded patient histories in electronic health records. It provides an overview of the CLEF project, its aim to provide a repository of well-organized clinical data, and its role in facilitating in silico medical research.

maribell
Télécharger la présentation

Applying Natural Language Generation to Electronic Health Records in an e-Science context

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying Natural Language Generation to Electronic Health Records in an e-Science context Donia Scott Centre for Research in Computing The Open University

  2. Outline • Background: the CLEF project • Patient records as data-encoded patient histories • Role of NLG in CLEF • Intuitive querying with natural language • Generating tailored reports from CLEF data

  3. Background: the CLEF project • CLEF (Clinical E-Science Framework) is an MRC-funded project aiming at providing a repository of well organised data-encoded clinical histories • Aim: to provide the framework for a new type of medical research: in silico experiments • Partners: • NLP: OU, Sheffield • Medical informatics: Manchester • Electronic Health Records: Royal Marsden Hospital, UCL • Privacy/confidentiality: Cambridge

  4. Collect clinical information from multiple sites Analyse, structure and integrate it Make it available, using GRID tools To authorised clinicians and e-Health scientists In a secure and ethical collaborative framework GRID

  5. Data from: • Referral letters • Review notes • Lab results • Nurse notes • Hospital admission notes • Hospital discharge notes • Treatment notes • Surgery reports Chronicle Organised data on individual patients The CLEF repository Repository

  6. The CLEF Chronicle Representing the story of a patient over time

  7. Pain:5735 Ulcer:1945 locus locus reason attends locus reason finding attends time time time time time time time time attends Breast:1492 Clinic:4096 Clinic:1024 Clinic:2010 plans plans reason plans plans plans reason reason Biopsy:1066 Radio:1812 Chemo:6502 locus target finding treats reason treats Mass:1666 Cancer:1914 locus The story of an illness Human:1382

  8. ~200 15 ~100 ~5 ~20 ~600 ~10 Problems Interventions Investigations Drugs Loci Relations Consults SimID SimID ID EventStartDate ID SimID EventEndDate Name EventStartDate SimID Status ID Item1Type Existence SimID ClinicalCourse EventEndDate Genotype Item1ID ID Name mmSize Name Grade Relation Name Histology Status TumourMarker Item2Type NodesCounted Regime Laterality NodesInvolved Item2ID SimID ID EventStartDate EventEndDate Name Status Outcome SimID ID EventStartDate EventEndDate Status Type Location 3320133511 2342511939 131 3320133511 PROBLEM 131 2342511936 Xray HAS_LOCUS completed LOCUS 2342511937 3320133512 2342512318 3322572593 2347911036 epirubicin breast daily R 3320133512 2342511987 449 457 primary treatment package completed successful 3320133511 2342511936 1 cancer primary -1 BRCA1 +ve 5.8 0 invasive tubular adeno oestrogen receptor +ve 0 0 3320133511 2342511938 131 131 completed mammography screening clinic 3320133511 2342511943 287 3320133511 LOCUS 287 2342511937 Xray HAS_LOCUS completed PATIENT 3320133511 3320133512 2342512319 3322572593 2347911042 cyclophosphamide chest daily 3320133512 2342511990 449 449 lumpectomy completed complete excision 3320133511 2342511946 443 443 completed mammography screening clinic 3320133511 2342511940 131 131 cancer 0 0 0 3320133511 2342511947 443 3320133511 CONSULT 443 2342511938 Xray ARRANGED completed CONSULT 2342511941 3320133512 2342512320 5-fluorouracil daily 3322572593 2347911044 blood 3320133512 2342511997 449 0 hormone anatagonist therapy started 3320133511 2342511986 449 449 completed initial treatment planning clinic 3320133511 2342511951 446 3320133511 INVESTIGATION 446 2342511939 Xray HAS_TARGET completed LOCUS 2342511937 3320133512 2342512479 doxorubicin daily 3322572593 2347911046 haemoglobin concentration 3320133511 2342511944 287 287 cancer 0 0 0 3320133512 2342512287 197 205 primary treatment package completed successful 3320133511 2342512020 465 465 completed follow up clinic 3320133511 2342511953 3320133512 446 3320133511 2342512818 INVESTIGATION 446 2342511939 epirubicin testing HAS_FINDING completed PROBLEM daily 2342511940 3322572593 2347911048 leucocyte count 3320133512 2342512290 197 197 radical mastectomy completed incomplete excision 3320133511 2342511948 443 443 cancer -1 BRCA1 +ve 0 0 invasive tubular adeno oestrogen receptor +ve 0 0 3320133511 2342512064 489 489 completed follow up clinic 3320133511 2342511971 446 3320133511 PROBLEM 446 2342511940 examination HAS_LOCUS completed LOCUS 2342511937 3322572593 2347911050 platelet count 3320133512 2342512316 211 215 radiotherapy course completed 3320133511 2342512108 545 545 completed follow up clinic 3320133511 2342511973 446 3320133511 CONSULT 446 2342511942 examination ARRANGED completed CONSULT 2342511945 3322572593 2347911052 GGT concentration 3320133511 2342511955 446 446 abnormality 0 0 0 3320133512 2342512317 218 225 chemotherapy course completed 3320133511 2342512152 633 633 completed follow up clinic 3320133511 2342511974 446 3320133511 INVESTIGATION 3322572593 446 2342511943 2347911054 examination HAS_TARGET Bilirubin concentration completed LOCUS 2342511937 3320133511 2342511959 446 446 abnormality 0 0 0 3320133512 2342512348 213 213 radiotherapy cycle completed 3320133511 2342512196 737 737 completed follow up clinic 3320133511 2342511976 446 3320133511 INVESTIGATION 3322572593 446 2342511943 2347911056 examination HAS_FINDING Alkaline Phosphatase concentration completed PROBLEM 2342511944 3320133512 2342512349 214 214 radiotherapy cycle completed 3320133511 2342511978 446 446 cancer recurrent 0 0 0 3320133511 2342512222 841 841 completed mammography screening clinic 3322572593 2347911058 Creatinine concentration 3320133511 2342511988 449 3320133511 PROBLEM 449 2342511944 cancer staging HAS_LOCUS completed LOCUS 2342511937 3320133511 2342512229 0 0 scheduled mammography screening clinic 3320133512 2342512350 216 216 chemotherapy cycle completed 3320133511 2342511979 446 446 lymphadenopathy 0 0 0 3322572593 2347911060 ESR concentration 3320133511 2342511991 449 3320133511 CONSULT 449 2342511946 excision biopsy ARRANGED completed CONSULT 2342511949 3320133512 2342512373 217 217 chemotherapy cycle completed 3322572593 2347911062 axillary lymphnodes R 3320133511 2342511992 449 3320133511 INVESTIGATION 449 2342511947 histopathology HAS_FINDING completed PROBLEM 2342511948 3320133511 2342511980 446 446 enlargement 0 0 0 3322572593 2347911065 abdomen 3320133511 2342511994 449 3320133511 PROBLEM 449 2342511948 excision biopsy HAS_LOCUS completed LOCUS 2342511937 3320133512 2342512375 219 219 chemotherapy cycle deferred 3320133511 2342511982 446 446 enlargement 0 0 0 3322572593 2347911070 liver 3320133511 2342512000 457 3320133511 CONSULT 457 2342511950 Xray ARRANGED completed CONSULT 2342511985 3320133512 2342512377 219 219 packed red cell transfusion completed 3320133511 2342511984 446 446 abnormality 0 0 0 3322572593 2347911072 spleen 3320133511 2342512001 457 3320133511 INVESTIGATION 457 2342511951 testing INDICATED_BY completed PROBLEM 2342511936 3320133512 2342512378 221 221 chemotherapy cycle completed 3320133511 2342511989 449 449 cancer stage1 -1 BRCA1 +ve 0 0 invasive tubular adeno oestrogen receptor +ve 0 0 3322572593 2347911090 axilla R 3320133511 2342512010 457 3320133511 INVESTIGATION 457 2342511951 examination RECOMMENDED_BY completed CONSULT 2342511950 3320133512 2342512379 222 222 chemotherapy cycle deferred 3322572593 2347911268 brain 3320133511 2342512012 457 3320133511 LOCUS 457 2342511952 examination HAS_LOCUS completed PATIENT 3320133511 3320133512 2342512381 222 222 packed red cell transfusion completed 3320133511 2342511993 449 449 cancer -1 1 BRCA1 +ve 0 0 invasive tubular adeno oestrogen receptor +ve 0 0 3322572593 2347911292 lung R 3320133511 2342512013 457 3320133511 INVESTIGATION 457 2342511953 examination INDICATED_BY completed PROBLEM 2342511936 3320133512 2342512382 224 224 chemotherapy cycle completed 3322572593 2347911294 lung L 3320133511 2342512021 465 3320133511 INVESTIGATION 465 2342511953 Xray RECOMMENDED_BY completed CONSULT 2342511950 3320133511 2342511996 449 449 lymphnode count metastatic 0 5 0 3320133512 2342512383 225 225 chemotherapy cycle completed 3322572593 2347911319 brain L 3320133511 2342512022 465 3320133511 INVESTIGATION 465 2342511953 testing HAS_TARGET completed LOCUS 2342511954 3320133511 2342512002 457 457 abnormality 0 0 0 3320133511 2342512031 465 3320133511 INVESTIGATION 465 2342511953 examination HAS_FINDING completed PROBLEM 2342511955 3320133512 2342512475 298 347 relapse treatment package completed unsuccessful 3322572593 2347911414 bone metabolism A typical cancer patient

  9. The role of NLG • an intuitive query interface to provide efficient access to aggregated data-encoded patient histories for: • Assisting in diagnosis and treatment • Identifying patterns in treatment • Selecting subjects for clinical trials • generating reports from the data-encoded histories, for clinicians to use at the point of care.

  10. Intuitive querying of the CLEF repository

  11. What does the CLEF database provide • Evidence from about 20,000 patient records, comprising 3.5 million record components (about 5GB of data). These are all in the area of cancer. • 162 queriable fields • various text-only records (non-queriable) • Two types of data: • Structured • Extracted from narratives by IE • Queriable data is encoded according to various medical terminologies (SNOMED, ICD, UMLS) • There are approximately 19,500 different medical codes currently used in the database (a relatively small subset of SNOMED and ICD)

  12. Queriable data • Structured data: • Demographics: • Age, gender, postal district, ethnical group, occupation • Laboratory findings: • 32 types of haematology findings • 51 types of chemistry findings • Cytology reports • Histopathology reports • Imaging studies: • Radiology procedure, site, diagnosis, morphology, topography, report, indication, department • Treatments: • Prescription drugs • Chemotherapy protocol • IV chemotherapy • Radiotherapy • Surgical procedures • Diagnoses • Clinical diagnosis • Cause(s) of death • Data extracted from narratives

  13. Query interface requirements • Designed for: • casual and moderate users, who are familiar with the semantic domain of the repository but not with its technical implementation • Typically clinicians or medical researchers • Should be able to: • Allow the construction of complex queries with nested structures and temporal expressions • Minimise the risk of ambiguities • Offer good coverage of the data types in the CLEF database • Should be used with: • Minimal training • No prior knowledge of medical terminologies, formal querying languages, databases

  14. Typical queries “How many patients with AML have had a normal count after two cycles of treatment?” “ How many patients with primary breast cancer have relapsed in the last five years? ” “ What is the median time between first drug treatment for metastatic breast cancer and death? ” “ In breast cancer patients, what is the incidence of lymphoedema of the arm that persists more than two years after primary surgical treatment? ” “ What is the average number of x-rays for patients with prostate cancer? ” “ What is the average time between first treatment for cervical cancer and death for patients aged less than 60 at death compared with those aged over 60? ” “How many patients between the ages of 40 and 60 when they were first diagnosed with lung cancer had a platelet count higher than 300 but a white cell count lower than 3 before the 4th cycle of any course of chemotherapy they received during treatment? ”

  15. Querying alternatives • SQL: • Not appropriate for the typical CLEF user • Requires deep knowledge of the database structure and content, medical terminologies used in the database • Graphical interfaces: • Have to cope with large number of parameters • Nested structures and temporal restrictions are difficult to express • Natural Language interfaces: • More natural and more expressive than formal querying languages, but… • Sensitive to errors in composition, spelling, vocabulary • Normally understand only a subset of natural language • Complex queries are difficult to process • It is difficult to trace the source of errors in the result

  16. The CLEF approach • Similar to Natural Language interfaces, however the user edits the conceptual meaning of a query instead of its surface text • Allows users to easily construct non-ambiguous queries • Guides the users towards constructing correct queries only (queries compatible with the content of the database) • It is semi-database independent but very domain specific • Based on the Conceptual Authoring (aka WYSIWYM) technique (Power and Scott, 1998) • The query is presented to the user as an interactive text, and it is edited by making selections on various components of the query • Each selection triggers a text re-generation process which results into a new feedback text containing the selection the user made

  17. Query editing

  18. Modelling queries • There are 4 distinct sections of a query: • A description of the subjects (in terms of demographics information and basic diagnosis) • A description of treatments that the subjects received • A description of laboratory findings • An outcome section (what do we want from the group of patients we have just described) • Each query element can be expressed as a conjunction or disjunction of same-type query elements, e.g.,: • Cancer of the breast and of the lung • Patients who received chemotherapy and radiotherapy • Some query elements can be temporally related to each other, e.g.,: • Patients who received chemotherapy within 5 months of surgery • Patients alive 5 years after the diagnosis

  19. Constraining user choices • At each step, users are only given correct choices • Choices are context dependent • Patients diagnosed with [some cancer] in [some body part] • User selects [some cancer] => “squamous cell carcinoma” • The interface restricts the choices available for [some body part] to those sites where squamous cell carcinoma can develop

  20. Dealing with ambiguities • Once a query is constructed, there is only one way it can be interpreted – there is no disambiguation task to be performed • … but users may be misled into constructing a different query than they intend to

  21. Answer generation • The answer set consists of an age/gender breakdown of the patients that fulfil the query requirements • Each additional clinical feature is combined with the age/gender breakdown to provide more detailed information • 3 types of rendering: • Text • Charts • Table

  22. Evaluation • Research questions: • Can the WYSIWYM query formulation method be easily learned by users of CLEF? • Is it easier to formulate CLEF queries in SQL or with the WYSIWYM query formulation method? • Are the interactive feedback texts ambiguous?

  23. Evaluation results show that… • The CLEF Conceptual Authoring query interface works! • The method is easily acquired. • Investigation shows that it is much easier to use than current alternatives (viz. SQL). • The feedback texts tend to be easily understood • It is a viable solution to the querying the CLEF repository. • However ….

  24. Unresolved issues • Are the queries we currently support really the ones users will want to ask? • Does the query interface provide sufficient data coverage?

  25. Generating reports from the CLEF repository

  26. The context • We aim at generating reports from the data-encoded Electronic Patient Records • Our reports are aimed at clinicians for use at the point of care • Various types of report work on the same input (roughly the same content) but express information from different viewpoints • We address the problem of conceptual restatement in generating summarised reports

  27. Typical input ~200 15 ~100 ~5 ~20 ~600 ~10 Problems Interventions Investigations Drugs Loci Relations Consults SimID SimID ID EventStartDate ID SimID EventEndDate Name EventStartDate SimID Status ID Item1Type Existence SimID ClinicalCourse EventEndDate Genotype Item1ID ID Name mmSize Name Grade Relation Name Histology Status TumourMarker Item2Type NodesCounted Regime Laterality NodesInvolved Item2ID SimID ID EventStartDate EventEndDate Name Status Outcome SimID ID EventStartDate EventEndDate Status Type Location 3320133511 2342511939 131 3320133511 PROBLEM 131 2342511936 Xray HAS_LOCUS completed LOCUS 2342511937 3320133512 2342512318 3322572593 2347911036 epirubicin breast daily R 3320133512 2342511987 449 457 primary treatment package completed successful 3320133511 2342511936 1 cancer primary -1 BRCA1 +ve 5.8 0 invasive tubular adeno oestrogen receptor +ve 0 0 3320133511 2342511938 131 131 completed mammography screening clinic 3320133511 2342511943 287 3320133511 LOCUS 287 2342511937 Xray HAS_LOCUS completed PATIENT 3320133511 3320133512 2342512319 3322572593 2347911042 cyclophosphamide chest daily 3320133512 2342511990 449 449 lumpectomy completed complete excision 3320133511 2342511946 443 443 completed mammography screening clinic 3320133511 2342511940 131 131 cancer 0 0 0 3320133511 2342511947 443 3320133511 CONSULT 443 2342511938 Xray ARRANGED completed CONSULT 2342511941 3320133512 2342512320 5-fluorouracil daily 3322572593 2347911044 blood 3320133512 2342511997 449 0 hormone anatagonist therapy started 3320133511 2342511986 449 449 completed initial treatment planning clinic 3320133511 2342511951 446 3320133511 INVESTIGATION 446 2342511939 Xray HAS_TARGET completed LOCUS 2342511937 3320133512 2342512479 doxorubicin daily 3322572593 2347911046 haemoglobin concentration 3320133511 2342511944 287 287 cancer 0 0 0 3320133512 2342512287 197 205 primary treatment package completed successful 3320133511 2342512020 465 465 completed follow up clinic 3320133511 2342511953 3320133512 446 3320133511 2342512818 INVESTIGATION 446 2342511939 epirubicin testing HAS_FINDING completed PROBLEM daily 2342511940 3322572593 2347911048 leucocyte count 3320133512 2342512290 197 197 radical mastectomy completed incomplete excision 3320133511 2342511948 443 443 cancer -1 BRCA1 +ve 0 0 invasive tubular adeno oestrogen receptor +ve 0 0 3320133511 2342512064 489 489 completed follow up clinic 3320133511 2342511971 446 3320133511 PROBLEM 446 2342511940 examination HAS_LOCUS completed LOCUS 2342511937 3322572593 2347911050 platelet count 3320133512 2342512316 211 215 radiotherapy course completed 3320133511 2342512108 545 545 completed follow up clinic 3320133511 2342511973 446 3320133511 CONSULT 446 2342511942 examination ARRANGED completed CONSULT 2342511945 3322572593 2347911052 GGT concentration 3320133511 2342511955 446 446 abnormality 0 0 0 3320133512 2342512317 218 225 chemotherapy course completed 3320133511 2342512152 633 633 completed follow up clinic 3320133511 2342511974 446 3320133511 INVESTIGATION 3322572593 446 2342511943 2347911054 examination HAS_TARGET Bilirubin concentration completed LOCUS 2342511937 3320133511 2342511959 446 446 abnormality 0 0 0 3320133512 2342512348 213 213 radiotherapy cycle completed 3320133511 2342512196 737 737 completed follow up clinic 3320133511 2342511976 446 3320133511 INVESTIGATION 3322572593 446 2342511943 2347911056 examination HAS_FINDING Alkaline Phosphatase concentration completed PROBLEM 2342511944 3320133512 2342512349 214 214 radiotherapy cycle completed 3320133511 2342511978 446 446 cancer recurrent 0 0 0 3320133511 2342512222 841 841 completed mammography screening clinic 3322572593 2347911058 Creatinine concentration 3320133511 2342511988 449 3320133511 PROBLEM 449 2342511944 cancer staging HAS_LOCUS completed LOCUS 2342511937 3320133511 2342512229 0 0 scheduled mammography screening clinic 3320133512 2342512350 216 216 chemotherapy cycle completed 3320133511 2342511979 446 446 lymphadenopathy 0 0 0 3322572593 2347911060 ESR concentration 3320133511 2342511991 449 3320133511 CONSULT 449 2342511946 excision biopsy ARRANGED completed CONSULT 2342511949 3320133512 2342512373 217 217 chemotherapy cycle completed 3322572593 2347911062 axillary lymphnodes R 3320133511 2342511992 449 3320133511 INVESTIGATION 449 2342511947 histopathology HAS_FINDING completed PROBLEM 2342511948 3320133511 2342511980 446 446 enlargement 0 0 0 3322572593 2347911065 abdomen 3320133511 2342511994 449 3320133511 PROBLEM 449 2342511948 excision biopsy HAS_LOCUS completed LOCUS 2342511937 3320133512 2342512375 219 219 chemotherapy cycle deferred 3320133511 2342511982 446 446 enlargement 0 0 0 3322572593 2347911070 liver 3320133511 2342512000 457 3320133511 CONSULT 457 2342511950 Xray ARRANGED completed CONSULT 2342511985 3320133512 2342512377 219 219 packed red cell transfusion completed 3320133511 2342511984 446 446 abnormality 0 0 0 3322572593 2347911072 spleen 3320133511 2342512001 457 3320133511 INVESTIGATION 457 2342511951 testing INDICATED_BY completed PROBLEM 2342511936 3320133512 2342512378 221 221 chemotherapy cycle completed 3320133511 2342511989 449 449 cancer stage1 -1 BRCA1 +ve 0 0 invasive tubular adeno oestrogen receptor +ve 0 0 3322572593 2347911090 axilla R 3320133511 2342512010 457 3320133511 INVESTIGATION 457 2342511951 examination RECOMMENDED_BY completed CONSULT 2342511950 3320133512 2342512379 222 222 chemotherapy cycle deferred 3322572593 2347911268 brain 3320133511 2342512012 457 3320133511 LOCUS 457 2342511952 examination HAS_LOCUS completed PATIENT 3320133511 3320133512 2342512381 222 222 packed red cell transfusion completed 3320133511 2342511993 449 449 cancer -1 1 BRCA1 +ve 0 0 invasive tubular adeno oestrogen receptor +ve 0 0 3322572593 2347911292 lung R 3320133511 2342512013 457 3320133511 INVESTIGATION 457 2342511953 examination INDICATED_BY completed PROBLEM 2342511936 3320133512 2342512382 224 224 chemotherapy cycle completed 3322572593 2347911294 lung L 3320133511 2342512021 465 3320133511 INVESTIGATION 465 2342511953 Xray RECOMMENDED_BY completed CONSULT 2342511950 3320133511 2342511996 449 449 lymphnode count metastatic 0 5 0 3320133512 2342512383 225 225 chemotherapy cycle completed 3322572593 2347911319 brain L 3320133511 2342512022 465 3320133511 INVESTIGATION 465 2342511953 testing HAS_TARGET completed LOCUS 2342511954 3320133511 2342512002 457 457 abnormality 0 0 0 3320133511 2342512031 465 3320133511 INVESTIGATION 465 2342511953 examination HAS_FINDING completed PROBLEM 2342511955 3320133512 2342512475 298 347 relapse treatment package completed unsuccessful 3322572593 2347911414 bone metabolism

  28. Why are textual reports needed? • Clinicians and other health professionals use patient health summaries at the point of care, where time is a critical resource • Reports provide quick access to an overview of a patient’s medical history • Typically, an electronic patient record contains around 1000 messages • Even structured, this volume of data is very large • Access to relevant information about particular patients is difficult • Textual reports: • are easy to read and understand • can be customised to the type of information needed • provide a quick way of identifying errors in the patient record • alleviate the need to know in detail the structure of the underlying database

  29. Why are paraphrases needed? • Alternative views of the patient record, i.e., Reports from various viewpoints: • Full chronological reports • Summaries of investigations, interventions, treatments • Same content, different textual representation • Potted summaries also important (30-second overview of patient’s history)

  30. Content selection • Two notions: • Spine events: the main concepts in the summary (depending on user-defined type of summary) • Skeleton events: linked to the spine by various relations • Basic procedure: • Step 1: group linked events into clusters and remove small clusters • Typically, a small number of very large clusters and a small number of small clusters • Small clusters are assumed not to be related to the main topic of the summary • Step 2: Identify spine events according to the type of summary • Longitudinal, Investigations, Interventions, Problems • Step 3: Identify the skeleton events • If (“problem is spine event” and “investigation has_indication problem”) then select investigation (unless already selected) • Repeat step 2 a certain number of times (given by a threshold parameter)

  31. Spine of Problem events

  32. The patient identifies pain in the left breast. A lump in the breast is found through a mammogram. A biopsy performed on the breast reveals cancer in the left breast. The patient receives radiotherapy to treat the cancer. Skin ulceration develops in the left breast as a result of radiotherapy, which is treated with hyperbaric oxygenation. mammogram pain biopsy lump breast cancer cancer radiotherapy ulcer Hyperbaric oxygenation radiotherapy cycle Problem

  33. radiotherapy pain breast ulcer mammogram radiotherapy cycle cancer biopsy Hyperbaric oxygenation lump Interventions Radiotherapy on the breast is initiated to treat cancer in the breast. A first radiotherapy cycle is performed. The radiotherapy causes skin ulceration. The patient receives hyperbaric oxygenation to treat the ulcer.

  34. mammogram pain breast cancer lump biopsy radiotherapy ulcer radiotherapy cycle Hyperbaric oxygenation Investigations A mammogram is performed because of pain in the left breast, which identifies a lump in the breast. A biopsy of the lump identifies cancer in the left breast.

  35. radiotherapy pain breast ulcer mammogram radiotherapy cycle lump mammogram biopsy cancer Hyperbaric oxygenation mammogram pain pain breast cancer biopsy lump breast lump biopsy cancer cancer radiotherapy ulcer radiotherapy ulcer Hyperbaric oxygenation radiotherapy cycle Hyperbaric oxygenation radiotherapy cycle Investigations Interventions Problem

  36. Discourse structuring • Mostly given by relations in the EPR • 19 different types of relations, which can be: • Attributive: Problem has_locus Locus • Rhetorical: Problem caused_by Intervention • Attributive relations do not contribute to the discourse structure • In a first step, events linked through attributive relations are combined: Message_Problem+Message_Locus => Message_Problem_Locus • Messages are grouped according to type of summary: • Longitudinal: events occurring in the same week should be grouped together and further grouped into years • Logical: arrange chronologically and then group similar events (e.g., liver panels, screening consults)

  37. Discourse structuring • Within each group: • link messages by discourse relations inferred from EPR relations: Cause, Result, Sequence • assume a List relation if no relation specified • Between groups: • If all events in one group are linked to events in another group by some EPR relation, link groups through the corresponding discourse relation • Otherwise, assume a List relation

  38. Problem_3 HAS_LOCUS {Locus_1, Locus_2} Investigation_3 HAS_INDICATION {Problem_1, Problem_2} Text structuring • Aggregation • Problems: Problem_1:name HAS_LOCUS Locus_1 Problem_2:name HAS_LOCUS Locus_2 Enlargement of the liver + Enlargement of the spleen => Enlargement of the liver and/but not of the spleen • Investigations: Investigation_1:name HAS_INDICATION Problem_1 HAS_LOCUS Locus_1 Investigation_2:name HAS_INDICATION Problem_2 HAS_LOCUS Locus_2 Examination of the abdomen revealed no enlargement of the liver Examination of the lymphnodes revealed no lymphadenopathy => Examination revealed no enlargement of the liver and no lymphadenopathy

  39. Text structuring • Aggregation • Interventions Intervention_1 PART_OF Intervention_0 Intervention_2 PART_OF Intervention_0 [ID01]Chemotherapy cycle PART_OF [ID0]Chemotherapy [ID02]Chemotherapy cycle PART_OF [ID0]Chemotherapy [ID03]Chemotherapy cycle PART_OF [ID0]Chemotherapy • 3 chemotherapy cycles • Ellipsis Examination of the left breast revealed no recurrent cancer in the left breast => Examination of the left breast revealed no recurrent cancer • {count} Intervention_1

  40. Text structuring • Events can be compacted according to domain-specific rules: • Clinical examination is: examination of the liver, examination of the spleen, examination of the abdomen • Clinical examination was normal • Clinical examination was normal apart from an enlargement of the spleen • Clinical examination revealed enlargement of the spleen • Liver panel is: billirubin concentration, ESR concentration, GCT concentration • The liver panel was in the normal range (apart from a very high level of GCT)

  41. Maintaining the thread of discourse • Textual representation should reflect the relative importance of events • At discourse level: spine concepts are preferably realised in nuclear units and skeleton events in satellite units • At sentence level: spine events are assigned salient syntactical roles • The status of an event of being on the spine or on the skeleton determines its realisation as a sentence, a main or subordinate clause, phrase

  42. Typical output of the NL generator Long chronological report Year 1 Week 0 • A mammography screening was scheduled at the clinic. Week 1 • Primary cancer of the right breast; histopathology: invasive tubular adenocarcinoma. YEAR 2Week 131 • Xray revealed no cancer of the right breast. YEAR 5Week 287 • Xray revealed no cancer of the right breast. YEAR 8Week 443 • Xray revealed cancer of the right breast. Week 446 • Examination (indicated by primary cancer of the right breast) revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. • Testing (indicated by primary cancer of the right breast) revealed no abnormality of the haemoglobin concentration and no abnormality of the leucocyte count. • An Xray (indicated by primary cancer of the right breast) was performed. • Very high level of the ESR concentration. • Very high level of the Creatinine concentration. • Very high level of the Alkaline Phosphatase concentration. • Very high level of the Bilirubin concentration. • Very high level of the GGT concentration. • No abnormality of the platelet count. Week 449 • An initial treatment planning was completed at the clinic. • Excision biopsy revealed no metastatic lymphnode count of the right axilla. • Histopathology revealed primary cancer of the right breast. • Cancer staging revealed stage1 cancer. • Hormone anatagonist therapy was started to treat primary cancer of the right breast. • Lumpectomy was performed on the breast to treat primary cancer of the right breast. • Primary treatment package was started to treat primary cancer of the right breast. …………………. YEAR 17Week 893 • Xray revealed no cancer of the right breast.

  43. Typical output of the NL generator Compact reports Focus on Problems In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma. In weeks 131 and 287 Xray revealed no cancer of the right breast. In week 446, there was no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes revealed by examination. There was no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration or of the ESR concentration. In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Lumpectomy was performed on the right breast. Hormone anatagonist therapy was initiated to treat primary cancer of the right breast. In weeks 457 to 737, there was no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. There was no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration and of the ESR concentration. In weeks 457 to 893, Xray revealed no cancer of the right breast. Focus on Interventions In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma. In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. Lumpectomy was performed on the right breast. Hormone anatagonist therapy was started to treat primary cancer of the right breast. Focus on Investigations In week 0, the patient is diagnosed with primary cancer of the right breast, histopathology: invasive tubular adenocarcinoma. In weeks 131 and 287 Xray revealed no cancer of the right breast. In week 446, examinations revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing revealed no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration or of the ESR concentration. In week 449, excision biopsy revealed no metastatic lymphnode count of the right axilla. Histopathology revealed primary cancer of the right breast. In weeks 457 to 737, examinations revealed no enlargement of the liver or of the spleen, no recurrent cancer of the right breast and no lymphadenopathy of the right axillary lymphnodes. Testing revealed no abnormality of the haemoglobin concentration or of the leucocyte count, no abnormality of the platelet count, very high level of the GGT concentration, of the Bilirubin concentration, of the Alkaline Phosphatase concentration, of the Creatinine concentration and of the ESR concentration. In weeks 457 to 893, Xray revealed no cancer of the right breast

  44. Ongoing work on report generation • Add domain-specific knowledge to improve content selection • Some events are become important depending on context • Change the (sub-)domain • Test if the generation method is easily portable • Link NLG to IR to improve IR • Produce reports for patients

  45. Summary and Conclusions • CLEF is now entering the integration phase, moving towards testing and deployment • Major emphases at this point are on privacy and security • Informing patients a major thread for future work. • Integrating IE and NLG

  46. Thank You! Collaborators: Catalina Hallett Richard Power

  47. Evaluation procedure • Subjects: • We tested the performance of 15 subjects. • Subjects had a range of expertise in the CLEF domain -- from expert (oncologist) to novice (computer scientist), but most subjects had some medical training. • Subjects had no previous experience with the CLEF WYSIWYM query interface, but most were aware of its fundamental principles. • Methodology: • Subjects were given a set of four fixed queries to formulate using the CLEF WYSIWYM query interface. • The queries were expressed in language as different as possible from the language in the query interface. • Each subject received the queries in a different order.

  48. Evaluation – data analysis • We recorded • the time taken to compose each query. • the number of operations used for constructing a query and compared it with the optimal number of operations (pre-computed). • We analysed whether performance, as indicated by • Speed • Efficiency improves with training (experience).

  49. Evaluation resultsTime to completion • Subjects’ performance improved dramatically with experience. • After their first experience of composing a query, subjects’ completion time halved, and asymptotes at that level.

More Related