Download
design exploration of a human machine interface hmi application n.
Skip this Video
Loading SlideShow in 5 Seconds..
Design Exploration of a Human-machine Interface (HMI) Application PowerPoint Presentation
Download Presentation
Design Exploration of a Human-machine Interface (HMI) Application

Design Exploration of a Human-machine Interface (HMI) Application

156 Vues Download Presentation
Télécharger la présentation

Design Exploration of a Human-machine Interface (HMI) Application

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Design Exploration of a Human-machine Interface (HMI) Application Francis Li Sam Madden

  2. The Application • Data glove interface • Wired, bulky • SmartDust scenario • A mote on each fingertip • Investigate implementations • Explore design alternatives

  3. Proof-of-Concept Prototype • By SmartDust group • Atmel AVR Microprocessor • RFM TR1000 Radio • 6 accelerometers • Host PC performs processing • Analysis • Power: 45 mW measured • Continuous operation of processor, accelerometers, communication with host

  4. Application Analysis • Processing (on PC) • Do 20 times per second, for each accelerometer • Read in X and Y samples (10 bits each) • Compute rolling average to smooth input data • Convert averages to polar coordinates • Dominates cost: sqrt, acos, atan • Secondary cost: floating point operations • Periodically, calculate gesture via simple template matching (static hand positions)

  5. Application Analysis (cont) • Communication (from Atmel to PC) • 20 samples / sec • 6 accelerometers • 4 bytes/sample  480 bytes/sec • 115.6 kb/sec RF link • Radio = 12mA @ 3V, when transmitting  1.2 mW for radio alone • Real world power >> 1.2 mW, due to software and analog overhead ( real world analysis later )

  6. Optimization Process • Match Application to HW

  7. Optimization Process • Match Application to HW • Match Hardware to Application

  8. Optimization Process • Match Application to HW • Local computation to reduce communication • Match Hardware to Application

  9. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application

  10. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized

  11. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized • TI vs. Atmel

  12. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized • TI vs. Atmel • DSP

  13. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized • TI vs. Atmel • DSP

  14. Communication vs.Computation • Estimates of local processing cost on Atmel (via simulation of GCC program) • Average: 2223 instr. x 2 • CalcPolar: 19017 instr.  2.83x106 instructions • Report gesture once per second FindGestureError: 5444 instr. 10 gestures, 6 accelerometers  5444 • 60  3.26x105 instr. • Memory operations are 2 cyles/instruction • Total cycles ~ 3.7M  4Mhz  13.5 mW • Communication = 8 bits/sec  negligible cost Loop 6•20 / sec

  15. Communication vs.Computation 2 • Cost of communication to Host PC (measured) • 4317 nJ/bit • From Culler, Hill, Szewczyk, Woo, “System Architecture For Networked Sensors.”  4317nJ/bit • 480 bytes/sec • 8 = 16.57 mW • Processor still sucks power • Current implementation requires 13.5mW • Using sleep, only 1.17 mW 17.74 mW total

  16. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized • TI vs. Atmel • DSP

  17. Distributed vs. Centralized • Move some processing to each sensor • 6 processors • Each computing average, polar transform • Transmitting 4 x 8 = 32bits once/second • Using Atmel processor on each mote • Computation • ~ .5M cycles/sec  2mA @ 2.7V  5.4mW • Communication • Very small: 4317nJ • 32 = .13 mW • 5.53 mW/mote = 33.2 mW total (Bad Idea!)

  18. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized • TI vs. Atmel • DSP

  19. TI Microcontroller Evaluation • A microcontroller with better specs • MSP430P112 330 A/Mhz active mode1.5 A standby (6 ns wakeup) • Used IAR Systems compiler, profiler, development environment • Analysis • Centralized 3.3V, 4 Mhz: 3.8 mW • Distributed 2.5V, 1 Mhz: 0.48 mW per mote • Six processors  2.9 mW

  20. Optimization Process • Match Application to HW • Local computation to reduce communication • Floating point  Fixed Point • Match Hardware to Application • Distributed vs. Centralized • TI vs. Atmel • DSP

  21. TI DSP Evaluation • TMS320C54x • Used TI Code Composer Studio, compiler, simulator • Power • Active Mode, 3.3V 10 Mhz: 33 mW • IDLE1, 0.36 mW • Analysis • Centralized: 7.8 mW • Distributed: 1.6 mW per mote • Six processors = 9.6 mW total

  22. TI DSP Evaluation Part 2 • TMS320C55x (two parallel MACs) • Same tools, with C55x compiler, simulator • Power: No details available... • Advertised: 0.9V, 0.05 mW/Mhz • Analysis • Centralized: 1170240 cycles (vs 2290440 54x) • 2 Mhz: 0.1 mW • Distributed: 195040 cycles (vs 381740 54x) • 1 Mhz: 0.05 mW • Six processors: 0.3 mW total

  23. Other Explorations • Hand optimized code • Possible to massively reduce computation cost • FP/Transcendentals conspicuously painful • Outside scope of our exploration • Radio Hardware • Bluetooth ~ 100 times more efficient • Reconfigurable Computing • Other circuitry (e.g. accelerometers)

  24. Results Summary • Cost, in mW of various implementations 17.74 using sleep mode, 28 without • 31/104 % improvement with same hardware • 170x improvement with new hardware

  25. Conclusions • By finding better mappings from SW  HW  Application, big performance gains are possible. • Effective use of local processor resources can reduce communication overheads, which are significant. • DSPs and other specialized processors can be a big win and don’t require hand-coded assembly or reconfigurable design