540 likes | 720 Vues
SANJEEV GUPTA. “Addressing the issues of KM for citizens (disabled, rural, women and others) on the other side of the digital divide of the country”. IBM Research - India. IBM Human Ability & Accessibility Center.
E N D
SANJEEV GUPTA “Addressing the issues of KM for citizens (disabled, rural, women and others) on the other side of the digital divide of the country”
IBM Research - India IBM Human Ability & Accessibility Center
“Accessibility – which started out as a philanthropic effort – has now evolved to a business transformation effort for IBM and our clients.” Sam Palmisano, IBM CEO, April 27, 2004 What is Accessibility? Access to information technology regardless of ability or disability
IBM Vision on Accessibility From compliance to societal transformation... IBM’s vision is to enhance human capabilities through technological innovation so that societal participation and personal fulfillment can be maximized, regardless of age or ability…. Accessibility is not about “them”, it’s about ALL of us…
Accessibility in India – Broader Landscape • Factors affecting accessibility • Physical disability • About 60 million people with disability • 42.5% of the disabled population comprises women • 75% of persons with disabilities live in rural areas • Educational/Economic disability • About 70% of India’s population lives in villages • A large percentage of them are poor • Literacy is mainly limited to write their names, make signatures, read large hoardings • Low internet penetration • Computer penetration is still very low • People in remote areas not very comfortable in using computers
Addressing India Accessibility Issues • Hindi Speech Recognition • Indian English TTS • EWB • WebAdapt2Me • aDesigner Bridging the digital divide Making Web Accessible IBM Research Innovation Novel ways for Infor- mation sharing • Telecom Web/ WAV
The Spoken Web IBM Research - India
Introducing VoiceSites • A VoiceSite is: • A voice driven application hostedin the network and created by subscribers themselves • Consists of a set of interconnected VoicePages (eg vxml files) • Accessed by calling up the associated phone number and interacting with its underlying application flow through a telephony interface • Analogous to websites in the World Wide Web
Call VoiServ to create VoiceSite VoiceSite Calendaring Service Yellow Page Service Location Tracker WWW YellowPages Website Database IMS Yellow Pages Server Presence Server VoiGen You have now created your voice site successfully. Users can now access your voice site through your phone number. Thank you for using this system. Would you like to offer appointment scheduling services? Please say the name and phone number of your references Do you accept jobs while you are away from your home location? Please specify your service charges Would you like to publish your information in yellow pages? Would you like to provide some references for your work? Please say your home location Please record your welcome message Please enter your working hours Please say your name Please specify your profession VoiServ: VoiServ You can talk to Jack about my work. His number is 41292100 Yes Yes Hi my name is Sam, and I am a plumber. Please find information regarding my services on my VoiceSite. 9 am to 7 pm Yes South Delhi Plumber Sam I charge 5 dollars an hour Yes Caller:
Small user study – Plumber VoiceSite People call these voice sites to schedule time with the specialists Carpenters/ Electricians make a call to VoiGen to generate their voice sites • Methodology for survey • Electricians/Plumbers/Carpenters make a call to VoiGen and create their voice sites • We ask the subjects about the usability of the VoiGen system • 12 subjects surveyed for technology validation • 10 were able to create the voice site successfully (within 4 minutes) • There were usability issues with respect to conversation flow, speech recognition accuracy • Everyone realised that this technology can have tremendous impact • Since this technology does not require the end-user to own any costs in terms of devices, it has a low acceptance barrier The voice sites are automatically deployed in the system
What is the Spoken Web? The Spoken Web is a world wide web in the telecom network, where people can host and browse VoiceSites, traverse VoiLinks, even conduct business transactions, all just by talking over the existing telephone network. • The Spoken Web will interoperate with the existing WWW. • The Spoken Web will interoperate with Next Generation Networks too.
Spoken Web enables multiple business opportunities • New source of revenue opportunity for telecom operators • Creation and hosting voicesites • Payments and financial transactions • SMBs and microbusinesses can leverage the T-Web • Examples • Microbusiness Voicesite • VoiceSite Personalisation • Rural Voikiosk “Anyone with a mobile handset can become a T-Web enabled microbusiness voicesite owner and accessor, and also conduct transactions on the T-Web”
SpokenWeb Andhra Pilot Statistics Matrimonial Ads • Pilot Launch: May 23, 2008 • Report Summary (ended on Jan 28, 2009) • Total number of calls received = 114782 • Number of unique callers = 6509 • Total time spent = 2135 hours • Average call time spent = 0 hours, 1 min, and 14 seconds. • Maximum call duration = 0 hours, 49 min, and 40 seconds. • Minimum call duration = 0 hours, 0 min, and 0 seconds. • Number of calls to Ashwini Center = 8399 • Number of calls to Health Center = 14216 • Number of calls to V-Agri = 13881 • Number of calls to Professional Services = 37112 Social Space Election Speech
Motivation • Web is a rich source of useful local information • Weather, travel, entertainment, insurance, finance • However a significant population (specially in emerging countries) is not using this information due to • computer skills, exposure to browsing, language skills, physical limitations, aging • A large number of such people have access to phone (landline/mobile) • Growing at a fast rate • Even computer users can’t browse in several conditions • On the move, no connectivity areas, low speed, etc.
Proposition • Decouple web information from web browsing • Let the people access web information without having to browse/know how to browse • However, still leverage the web interface • No change required on the website/content provider side • Let the system browse instead of the user • System figures out how to extract the information from web for a user’s query • The interaction can be enabled in user’s language for simple queries (structured input/output) • through speech recognition and language translation
Scenarios • A person wants to go from station A to B. He wants to know what all trains are available, their schedule, availability, etc. He has only a phone and is not familiar with the web. • Access a relevant website (e.g., indianrail.gov.in) • Get the required inputs from the user • Source, destination, class, dates, etc. • Fetch the information from the web and give it back to the user • A person wants to know what are the interest rates offered by various banks for home loan • System can goto a popular website (e.g., apnaloan.com) • Get the required inputs from the user • Term, floating, fixed • Fetch the information from the web and give it back to the user • lowest interest rate offered • A person is planning a trip to Chennai and wants to know the current weather there • Goto cnn.com • Fill up the form • Speak the weather over phone in local language • Goto google • Get the weather information and reply back to the user
What is currently available? • Web browsers on mobile phones • Person can browse the web on handheld device • Costly, complicated, tiny interface, inaccessible, not suited for common man, • Browsing of voice sites • Created from scratch using VXML • Speech interface to specific services (TellMe, Nuance) • Nearest restaurant, police station, hospitals, etc. • Based on knowledgebase created offline OR • Proprietary tie-up with content providers to have access to databases • Predefined, Keyword based A third-party data provider gathers the business information that Tellme provides in the Tellme download and on 1-800-555-TELL, so we are unable to directly add or correct specific business information. If you would like to add or correct information that is listedfor your business, please use the easy form on the InfoUSA website. (Taken from TellMe Website : http://www.tellme.com/you/faqs)
Proposed approach World Wide Web Request Generator Dialogue Component ASR/TTS Voice/ DTMF Response Generator
How does it work? Web Site Service1 Service2 Response Process Generator Information Extraction tools Service3 Service4 Request Process Generator Browser scripting tools such as Co-scripter
Request Generation & Execution • Leveraging the browser interface through scripts • Generate a script with inputs taken from the user • Execute the script with a browser Data for Script Scripting Tool Input Collection (VXML) Web Browser Inputs Web-Page Data Script
Information Extraction & Response • Use HTML Syntax and Semantics to extract information • Look in HTML sections using syntax knowledge • Use semantics based on context and keywords Information Extraction Module Web-Page’s HTML Source Response User Relevant Keywords/ Semantics Syntax User Interaction (Iterative) Request Generation & Execution Keywords : Airline / Lowest / Cheapest / Prices HTML Syntax : TABLE , ROW-COLUMN (<TR><TD>) Top three cheapest flights are : Go Air 4435 Rs at 5:05 AM Deccan 4449 Rs at 4:15 AM Spice Jet 4859 Rs at 8:00 AM
Overview • Having difficulty viewing Web pages? Easy Web Browsing is a solution that helps bridge the digital divide for novice computer users, people who are experiencing vision loss, second-language learners, seniors, and persons with reading challenges • Highlights • Installs by automatically downloading from Web site. • Reads text aloud with adjustable speed and volume control. • Allows users to customize size and color of Web content. • Ruler function that helps users find and follow their reading position. • Highlight function focusing on the reading text with four patterns of marking. • Customizable line and word spacing features that enhance readability.
Easy Web Browsing IBM Easy Web Browsing display on a client's personal computer
Summary • Accessibility web sites are required for PWD’s but they offer seniors, novices and non native speakers assistance as well. • Financial, Retailers, Travel and Government Industries are interested in Web & Kiosk accessibility. • IBM’s EWB is a quick and reasonable solution to make web sites and Kiosk more accessible for consumers, citizens and travelers. • It also drag new opportunities by combined offering • Both the customers and the end users are satisfied with this solution
Reading Companion “Reading Companion has opened new cultural horizons for our children. With such a wide choice of books to increase their vocabulary and improve their comprehension skills, they’re developing a true love for reading.” Patricia Diaz Covarrubias, Executive Director, Christel House de Mexico, A.C. readingcompanion.org IBM’s multi-million-dollar investment in literacy, using voice recognition technology over the web to help children and adults learn to read. • Anytime, anywhere web access, providing feedback and as-needed assistance • More than 1,380 schools and nonprofit organizations -- about half of which are schools -- in 25 countries and approximately 56,200 users are participating in this grant program.850 schools & nonprofit organizations in 26 countries, benefitting more than 40,000 children and adults • Evaluation showed: • Child: higher test scores on word recognition and reading comprehension • Adult: Increased English communication skills and literacy; positive job outcomes for some learners
Meet IBMer Dimitri Kanevsky: • Deaf • Master Inventor in IBM Research • 2002 Science Accomplishment for Maximization Algorithms • Generated 80 IBM patents IBM: Employing Diversity & Excellence
Meet IBMer Mike Squillace: • Blind • Joined IBM in 2002 • Sun Certified Java Programmer • PhD in Philosophy and B.S in Computer Science • Developed Patents for multiple GUI architectures and defining GUIs via mark up languages & reflection IBM: Employing Diversity & Excellence
Meet IBMer Chieko Asakawa: • Blind • Joined IBM Research in 1985 • An IBM Fellow • Member of Women in Technology Hall of Fame • Developed Digital Braille System & 3 key applications IBM: Employing Diversity & Excellence
Recognition by the Hon President of India in 2007 & 2009for providing technology for people on the other side of the digital world to make complete knowledge society.
aDesigner Characteristics Visualization of blind usability Simulation of low-vision users’ view Weak eyesight, color vision deficiency, cataracts. Checking compliance items WCAG, Section 508, IBM CI162, JIS, etc. Award Wall Street Journal Technology Innovation Award 2004 (Runner-up) Status Opensourced as a basis of Eclipse.org ACTF (Accessibility Tools Framework)
Blind Usability Visualization Example Original With heading Tags Inaccessible With skip-link • Headers can use as TOC • Easy to navigate through the page Easy to find main contents
Low Vision Simulation Simulating the experience of users who have low vision Low vision simulation. In this example, Color Vision Deficiency (Deutan) and cataract are simulated. The original Web page which people without low vision view. Problem map that indicates the positions of problems. Summary Report Setting panel(Eyesight, color vision deficiencies, crystalline lens transparency)
1945 1st IBM Research Lab in NY (Columbia U) Watson 1961 1952San JoseCalifornia Zürich 1955 Beijing 1995 Almaden 1986 Delhi 1998 Tokyo 1982 Austin 1995 Haifa 1972 1970's 1980's 1990's 2000's • Corporate funded research agenda • Technology transfer • Collaborative team • Shared agenda • Effectiveness • Work on customer problems • Create business advantage for customers eBusiness research Research in the marketplace Joint programs Centrally funded IBM Research Overview Famous for its science and vital to IBM Innovation that Matters Business New Insights Society Technology ODISOn Demand Innovation Services EBOEmerging Business Opportunities FOAKFirst of a Kind Technology Transfer
Focus Areas Business Areas Service Delivery Emerging Solutions Software Infrastructure Services Application Services Contact Center Services Telecom Others (Banking, etc.) Systems Technical Competencies • Computer Science • Distributed Systems – system mgmt., middleware • Information Management – data mining, machine learning • Interaction Technologies – speech • Programming Technologies – parallel and hi-perf. prog. • Software Engineering – model-driven, distributed dev. • Math Science • Operations Research • Algorithms • Optimization • Game Theory • Service Science • Service Engineering • Service Productivity • Service Management • Service Quality • Service Supply Chains
IBM Research Websitehttp://www.research.ibm.com IBM Research - Indiahttp://www.ibm.com/in/research
Easy Web Browsing – UI technologies (1) • Easy operations and Easy-to-use operation panel • No URL input field, could only surf within specified domains. • Operation Panel • Navigation (Home/Back/Stop) • Voice speed/volume • Zoom • Line Spacing • Ruler • Color setting • Print • Detail Setting • Help
Easy Web Browsing – UI technologies (2) • Read aloud with speed control • Character enlarging (/w screen magnifier) Accessibility at IBM means enabling IT hardawa,
Easy Web Browsing – UI technologies (3) • Background color change • Color vision deficiency • Cataract • Weak sighted • Black text on a white background with blue for links for normal display • Yellow text on a blue background with white for links • Black text on a light yellow background with blue for links • Yellow text on a black background with white for links
Easy Web Browsing – UI technologies (4) • Automatic language switch (panel, TTS etc) according to the lang attribute of the Web page . • Support for thirteen languages : Chinese (Simplified), Chinese Traditional (Taiwan, Hong Kong), English (US and UK), French, German, Italian, Japanese, Korean, Spanish, and Portuguese (Brazil, Portugal). Japanese English
Sensei • An automated tool for assessing spoken English skills • Evaluates pronunciation, grammar, comprehension • Uses advanced speech processing techniques • Provides scores for each of the categories in real time • The tool is Web enabled • Can be used for remote hiring/assessment • Can be used for training • Centralized database/content update • Can help children learn English language
Evaluation of Syllable Stress • Lexical stress evaluation • Important for spoken English comprehension • Meaning changes with stress pattern (PROject, proJECT, conTENT, CONtent) • Different stress point for different words (aVAilable, Industry) • Primary features – pitch, duration & energy • Challenges • Every word has a different stress pattern • Stress can also change depending upon context • Relative importance of the features varies for different words and speakers • primary syllable can be inherently low in energy or short in duration • Word Dependent Classifiers • A separate classifiers is trained for each of the words • Performs better than the word independent models • Single class classifiers • Estimating the multi-dimensional shape corresponding to the correct class (spanned only by the correct utterances) • Word Independent Classifiers • Classify each individual syllables into stressed/unstressed • Combine soft decisions to determine correctness at word level Accuracy with Human • Human Assessors Repeatability is 86% • Human Assessors Reproducibility is 64% • Human Assessors Accuracy is 85% • Sensei Accuracy is 81 % • Sensei Reproducibility is 100%
Speech Recognition Evaluation of Spoken Grammar Possible responses • Both the dogs is barking (x) • Both the dogs are barking (b) • Those dogs are barking () • Both the dogs were barking (b) Prompt: Both the dogs is barking Correct answers: • Both the dogs are barking • Both the dogs were barking Grammatically correct or incorrect sentence Candidate records correct sentence 1: assigned for correct sentence 0: assigned for incorrect sentence • Evaluate spoken grammar skills of the candidate • Not possible to evaluate free speech – low recognition accuracy, LM bias • Prompts and answer (make it interactive) • Prompts designed to test various parameters • Tenses, articles, propositions, subject-verb agreement • Challenges • Correct and incorrect answers acoustically close to each other • Multiple correct answers are possible • Incomplete recordings (last word chopped), response outside speech grammar • Content challenges • Effectiveness of questions Accuracy with Human • Human Assessors Repeatability is 94% • Human Assessors Reproducibility is 82% • Human Assessors Accuracy is 95% • Sensei Accuracy is 85 % • Sensei Reproducibility is 100%
Evaluation of Articulation Decision Level Accuracy with Human • Impact Sounds • S,sh, z,sh, v,w, t,d, AO (ball) • Correct pronunciation of words • Different from speech recognition • Recognition should discard pronunciation variation • Customization of acoustic models • Models trained from models speakers • US/UK models adapted to model speakers • Indian English models adapted to model speakers • Features • phone confidence scores. • Word endings • Duration of phones • Challenges • Subjectivity in human ratings • Lack of model speakers data (only few model speakers) • Other considerations in human rating – stress, fluency, etc. Accuracy with Human