200 likes | 329 Vues
This course at the University of Aberdeen's Department of Computing Science emphasizes developing novel technologies for data analysis and communication. Covering major topics like data interpretation, statistical methods, and accessibility, students will explore various domains, such as medical and engineering data. The course includes lectures, practicals, and assessments that integrate continuous evaluation and a final exam, ensuring students grasp both theoretical and practical aspects of the subject. Learn to convey information effectively across varied user abilities and contexts.
E N D
CS5545 Data Interpretation and Communication Yaji Sripada Ehud Reiter Dept. of Computing Science, University of Aberdeen
Time table • Lectures • 2 lectures on Mondays in Meston 311 • 9:30 -10:30 • 11:00 -12:00 • No lectures in Week 6 and Week 12 • Practicals/Tutorials • 1 two hour practical/tutorial on Mondays in Meston 311 • 14:00-16:00 Dept. of Computing Science, University of Aberdeen
Assessment • Two components • 25% continuous assessment • 75% end of term exam • Continuous assessment • First assignment • Weight – 12.5% • Issued in Week 5 • Due on the Friday of Week 6 • Second assignment • Weight – 12.5% • Issued in Week 11 • Due on the Thursday of Week 12 Dept. of Computing Science, University of Aberdeen
Course Organization • Three parts • Weeks 1-4 – YS • Week 5-7 – ER • Weeks 9-11 – YS+ER Dept. of Computing Science, University of Aberdeen
Reading • Weeks 1-4 • Mostly lecture notes and some research papers • Week 5-7 • Lecture notes, research papers and • Background: Ehud Reiter and Robert Dale, “Building Natural Language Generation Systems”, Cambridge University Press • Weeks 9-11 • Lecture notes and research papers Dept. of Computing Science, University of Aberdeen
Introduction • Humans have access to large volumes of data in many domains • Scientific • Complete sequence data from Human Genome Project • of 3 billion DNA units • Medical • Physiological data • 10s of parameters such as blood pressure and heart rate measured every second • Engineering • 100s of sensors on a gas turbine taking measurements every second • And many more Dept. of Computing Science, University of Aberdeen
Varying purpose/task • Different people use data for different purposes/tasks • For example, physiological data is used by • Medical staff on the ward to monitor the patient • Medical researchers for scientific explorations • Medical admin staff to archive them in patient records Dept. of Computing Science, University of Aberdeen
Varying abilities/disabilities • Not all humans are equal in using the available data • 1 in 4 adults in the UK has poor numerical skills • 1 in 7 people in the UK suffers from some form of physical disability (such as visual impairment) • Many of us just don’t have the time to use all the data at our disposal • Data from our credit card bills and utility bills • Many of us don’t have the required domain knowledge to interpret the data • Data from medical lab tests such as blood tests Dept. of Computing Science, University of Aberdeen
What we need • Novel computer technology to • (1) analyse and interpret large volumes of data • (2) communicate to us ‘the required’ information suitable to our task/purpose in a way suited to our abilities/disabilities • In this course we study • (1) issues involved in developing such novel technology • (2) currently available techniques to be used as part of the novel technology • (2) study some systems in some limited domains developed using existing technology Dept. of Computing Science, University of Aberdeen
Data Analysis and Interpretation • Data analysis • techniques from several fields are used • Statistics • Medical signal processing • Image processing • Data Mining etc • Issues with reusing data analysis methods • Choosing an algorithm from multiple algorithms available for performing a task may not be easy • Even when we find an algorithm, it may not be the best fit for use in a communication context • In other words, we may have to adapt available data mining algorithms to suit our purpose • Data interpretation • Knowledge based techniques are used • Context dependent • Varies from domain to domain Dept. of Computing Science, University of Aberdeen
Communication • Information can be presented to users either • Graphically – using visualization technology or • Textually – using Natural Language Generation (NLG) technology or • Speech – using text to speech technology or • Combinations of the above • Issues with communication • Visualization • Relatively a mature technology - a large collection of visualization techniques for different kinds of data are available • communicating high dimensional data is hard • Communicating large data sets on low resolution screens is a challenge • NLG • Communicates messages more directly • Effective for communicating over low bandwidths - SMS • Currently being developed – a few success stories in some limited domains Dept. of Computing Science, University of Aberdeen
Accessibility • Communication works • for an intended audience with their associated abilities/disabilities • with an intended task/purpose • Therefore communication should be sensitive to different users with different abilities and purpose Dept. of Computing Science, University of Aberdeen
System Building Life cycle • Several Iterations of the following phases • Knowledge Acquisition (requirements collection and analysis) • System design • Implementation • Evaluation • Differs from the normal software development life cycle • Poorly understood requirements • System design ideas still under research • Evaluation ideas too still under research Dept. of Computing Science, University of Aberdeen
Lectures 3 Parts Part 1 Data Analysis & Interpretation Basic Statistics Data analysis - Trend and pattern detection Part 2 Data Communication Visualization NLG Accessibility Part 3 Real World Applications Practicals Part 1 Basic data analysis techniques using Excel Trend and pattern detection in time series and spatial data Visualization of time series and spatial data Part 2 Document planning for summarising time series data Micro-planning for summarising time series data Course Organization – in detail Dept. of Computing Science, University of Aberdeen
In our department • Many projects aim to develop technology for • “data interpretation and communication” • It is one of the three research themes in the department • Projects • SumTime – Summarising Time Series Data • RoadSafe – Automatically generating advisory text for road maintenance vehicle routing – new project • BabyTalk – Generating textual summaries of clinical temporal data – new project • ScubaText – Generating textual reports of Scuba dive computer data • Atlas.txt – Generating textual reports of Census data for visually impaired people Dept. of Computing Science, University of Aberdeen
Example 1: SumTime-Mousam • Software developed in the department - as part of the SumTime project • Task – Automatically generates weather forecast texts in English • Input – Numerical Weather Prediction (NWP) Data – output of weather simulation software • Output – English text delivered • As an ascii file to the client • In the spoken form over a telephone line • As a text message over a mobile line (currently explored) • Operationally deployed at a weather services company in Aberdeen • Produces around 150 draft forecasts/day • Produces text in some ways better than human authors Dept. of Computing Science, University of Aberdeen
SumTime-Mousam (2) • SumTime technology • (1) Analyses NWP data Using segmentation techniques developed in the time series data mining community • (2) automatically produces the English forecast text using Natural Language Generation (NLG) technology • Majority of SumTime output texts used by oil company staff supporting oilrigs in the North Sea • Can we produce weather forecasts for a different purpose/task – say for hill climbers? • In this course, we study how data analysis/interpretation and its communication (presentation) vary with the end-user task/purpose. Dept. of Computing Science, University of Aberdeen
Example 2: GIS • Technology to store, retrieve, analyse and visualize spatial data on geographic maps • Plot delivery routes on street maps to a level of detail pinpointing even the locations of manholes and speed cameras • Plot census data such as residents’ ages, gender, income etc on country or regional maps for businesses to target their customers Dept. of Computing Science, University of Aberdeen
GIS (2) • GIS technology • (1) Analyses/interprets spatial data • (2) presents spatial data in the form of visual maps • Great for sighted users, but useless for visually impaired users • In this course, we study technology not just based on ‘what it does’, but also based on ‘to whom it does’. • Accessibility issues Dept. of Computing Science, University of Aberdeen
Summary • You learn novel technology to • Analyse and interpret large data sets by adapting data analysis techniques developed in other fields • Communicate (present) relevant information to different users with different tasks and abilities. • Relevant to E-technologies • All modern organizations • possess large volumes of data and • Communicate information to different stakeholders Dept. of Computing Science, University of Aberdeen