80 likes | 168 Vues
Far Reaching Research (FRR) Project. See, Hear, Do: Language and Robots. Jonathan Connell Exploratory Computer Vision Group Etienne Marcheret Speech Algorithms & Engines Group Sharath Pankanti (ECVG) Josef Vopicka (Speech). Title slide. Challenge = Multi-modal instructional dialogs.
E N D
Far Reaching Research (FRR) Project See, Hear, Do:Language and Robots Jonathan Connell Exploratory Computer Vision Group Etienne Marcheret Speech Algorithms & Engines Group Sharath Pankanti (ECVG) Josef Vopicka (Speech) Title slide
Challenge = Multi-modal instructional dialogs Use speech, language, and vision to learn objects & actions Innate perception abilities (objects / properties) Innate action capabilities (navigation / grasping) Easily acquire terms not knowable a priori Example dialog: Round up my mug. I don’t know how to “round up” your mug. Walk around the house and look for it. When you find it bring it back to me. I don’t know what your “mug” looks like. It is like this <shows another mug> but sort of orange-ish. OK … I could not find your mug. Try looking on the table in the living room. OK … Here it is! command following verb learning noun learning advice taking Language Learning & Understanding is a AAAI Grand Challenge http://www.aaai.org/aitopics/pmwiki/pmwiki.php/AITopics/GrandChallenges#language
Eldercare as an application • Example tasks: • Pick up dropped phone • Get blanket from another room • Bring me the book I was reading yesterday • Large potential market Many affluent societies have a demographic imbalance (Japan, EU, US) Institutional care can be very expensive (to person, insurance, state) • A little help can go a long way Can be supplied immediately (no waiting list for admission) Allows person to stay at home longer (generally easier & less expensive) Boosts independence and feeling of control (psychological advantage) • Note: We are not attempting to address the whole problem X Aggressive production cost containment X Robust self-recharging and stairs traversal X Bathing and bathroom care, patient transfer, cooking X OSHA, ADA, FDA, FCC, UL or CE certification
State of the art • Indoor navigation Minerva from CMU, Jose from Univ. British Columbia • Perception & manipulation Herb from CMU / Intel (Kanade), PR2 from Willow Garage • Language learning Ripley from MIT (Deb Roy), HAM from KTH in Sweden • Dialog and speech Honda system from IBM, call center handling from IBM • No object perception • No manipulation capability • Off-line object model generation • No natural language interface • Either fetch or carry • No procedural learning • No physical presence or action • No visual perception of objects
OEM buy hardware $70B / year add software and services Third Party customers Business Model IBM
$24B / yr resell robot + value added software + field service • Eldercare market in US (x3 if EU and AP also) 3 million Total US population 300 million Ages 75-85 10% Suitable (ability level, desire, finances) 10% • Manufacturing business ($2000 / robot yr) $6B / yr • Services business ($3000 / robot yr)$9B / yr Costs & revenue potential • OEM sales price for hardware $6000 • Electromechanical parts $1300 • Onboard computer $500 • Assembly (15hrs x $80 / hr) $1200 • + 30% Sales & distribution + 20% profit $3000 • Value-added wholesale price (w/ software) $15,000 • 10% Continued R&D $1500 • 30% Sales & distribution $4500 • 20% Profit $3000 Price = Less than a new car • Total cost of ownership $8000 / yr • Lifetime = 3 years $5000 / yr • Service (15hrs / quarter x $50 / hr x 4 quarters) $3000 / yr • Effective wage (40hrs / wk x 50wks / yr = 2000 hrs / yr)$4 / hr
Alternative: Half-time aide + robot $20,500 / yr Human still helps with clothes, hygiene, meals Robot potentially available after hours and on weekends No problem with robot Training, Turnover, and Trust (stealing) • Value proposition (to client): 30% more hours @ 10% less cost Split savings with customer ($50,000 $45,000 per client) Human 5 hrs + robot 8 hrs = 13 hrs / day during week 10% less revenue but 22% more profit (= $6.6B / yr extra profit if 100% market share) Bill at $20,000 - $3000 service = $17,000 / yr revenue 10.6 months payback on $15,000 purchase Sample business case • Home eldercare now (employer costs) $25,000 / yr • 1 aide from 8am to 6pm = 10 hrs • 50wks x 5days / wk x 10hrs / day = 2500 hrs / yr • Federal min. wage = $7.25 / hr • +38% overhead (FICA + 401K + medical) = $10 / hr • Aide’s activities: • Help with clothes, hygiene, meals • Odd tasks such as fetching objects • Sitting around watching TV
What’s different and important • Speech-driven interface • No headset required (far field), can learn new nouns and verbs • Multi-modal dialog • Responds to gestures, exploits synergies between modalities • Manipulation as well as mobility • Not just a walking telephone, can do useful physical work also • One-shot learning • No turntable scanning, not 100’s of examples, no trial-and-error experiments • Cost containment • Vision instead of special-purpose sensors and precise mechanicals