See, Hear, Do: Language and Robots

Far Reaching Research (FRR) Project See, Hear, Do:Language and Robots Jonathan Connell Exploratory Computer Vision Group Etienne Marcheret Speech Algorithms & Engines Group Sharath Pankanti (ECVG) Josef Vopicka (Speech) Title slide

Challenge = Multi-modal instructional dialogs Use speech, language, and vision to learn objects & actions Innate perception abilities (objects / properties) Innate action capabilities (navigation / grasping) Easily acquire terms not knowable a priori Example dialog: Round up my mug. I don’t know how to “round up” your mug. Walk around the house and look for it. When you find it bring it back to me. I don’t know what your “mug” looks like. It is like this <shows another mug> but sort of orange-ish. OK … I could not find your mug. Try looking on the table in the living room. OK … Here it is! command following verb learning noun learning advice taking Language Learning & Understanding is a AAAI Grand Challenge http://www.aaai.org/aitopics/pmwiki/pmwiki.php/AITopics/GrandChallenges#language

Eldercare as an application • Example tasks: • Pick up dropped phone • Get blanket from another room • Bring me the book I was reading yesterday • Large potential market Many affluent societies have a demographic imbalance (Japan, EU, US) Institutional care can be very expensive (to person, insurance, state) • A little help can go a long way Can be supplied immediately (no waiting list for admission) Allows person to stay at home longer (generally easier & less expensive) Boosts independence and feeling of control (psychological advantage) • Note: We are not attempting to address the whole problem X Aggressive production cost containment X Robust self-recharging and stairs traversal X Bathing and bathroom care, patient transfer, cooking X OSHA, ADA, FDA, FCC, UL or CE certification

State of the art • Indoor navigation Minerva from CMU, Jose from Univ. British Columbia • Perception & manipulation Herb from CMU / Intel (Kanade), PR2 from Willow Garage • Language learning Ripley from MIT (Deb Roy), HAM from KTH in Sweden • Dialog and speech Honda system from IBM, call center handling from IBM • No object perception • No manipulation capability • Off-line object model generation • No natural language interface • Either fetch or carry • No procedural learning • No physical presence or action • No visual perception of objects

OEM buy hardware $70B / year add software and services Third Party customers Business Model IBM

$24B / yr resell robot + value added software + field service • Eldercare market in US (x3 if EU and AP also) 3 million Total US population 300 million Ages 75-85 10% Suitable (ability level, desire, finances) 10% • Manufacturing business ($2000 / robot yr) $6B / yr • Services business ($3000 / robot yr)$9B / yr Costs & revenue potential • OEM sales price for hardware $6000 • Electromechanical parts $1300 • Onboard computer $500 • Assembly (15hrs x $80 / hr) $1200 • + 30% Sales & distribution + 20% profit $3000 • Value-added wholesale price (w/ software) $15,000 • 10% Continued R&D $1500 • 30% Sales & distribution $4500 • 20% Profit $3000 Price = Less than a new car • Total cost of ownership $8000 / yr • Lifetime = 3 years $5000 / yr • Service (15hrs / quarter x $50 / hr x 4 quarters) $3000 / yr • Effective wage (40hrs / wk x 50wks / yr = 2000 hrs / yr)$4 / hr

Alternative: Half-time aide + robot $20,500 / yr Human still helps with clothes, hygiene, meals Robot potentially available after hours and on weekends No problem with robot Training, Turnover, and Trust (stealing) • Value proposition (to client): 30% more hours @ 10% less cost Split savings with customer ($50,000  $45,000 per client) Human 5 hrs + robot 8 hrs = 13 hrs / day during week 10% less revenue but 22% more profit (= $6.6B / yr extra profit if 100% market share) Bill at $20,000 - $3000 service = $17,000 / yr revenue  10.6 months payback on $15,000 purchase Sample business case • Home eldercare now (employer costs) $25,000 / yr • 1 aide from 8am to 6pm = 10 hrs • 50wks x 5days / wk x 10hrs / day = 2500 hrs / yr • Federal min. wage = $7.25 / hr • +38% overhead (FICA + 401K + medical) = $10 / hr • Aide’s activities: • Help with clothes, hygiene, meals • Odd tasks such as fetching objects • Sitting around watching TV

What’s different and important • Speech-driven interface • No headset required (far field), can learn new nouns and verbs • Multi-modal dialog • Responds to gestures, exploits synergies between modalities • Manipulation as well as mobility • Not just a walking telephone, can do useful physical work also • One-shot learning • No turntable scanning, not 100’s of examples, no trial-and-error experiments • Cost containment • Vision instead of special-purpose sensors and precise mechanicals

See, Hear, Do: Language and Robots

See, Hear, Do: Language and Robots

Presentation Transcript

Robots of today

Symbiosis: Cooperative Algorithms for Mobile Robots and a Sensor Network

Hear Their Voices:

Self-Reconfigurable Robots

PSYCHOLOGY OF LANGUAGE

The C Language

Chapter 8: Language and Society

Chapter 5 C Language Programming

Evolution of Language

Robots : Building Brains!

A Programming model for failure-prone, Collaborative robots

MIND CONTROLLED ROBOTS

Multi-Robot Systems with ROS Lesson 6

They’re Gonna Hear the US Roar…

Language

Albania

SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Doraemon

Insects, Robots

SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Outline

What Meta Robots Tags are for (public)