1 / 31

Low power Design Strategies

Low power Design Strategies. Daniele Folegnani. Talk outline. Why Low Power is Important Power Consumption in CMOS Circuits New Trends for Future Microprocessors Low Power Strategies Power Consumption Evaluation of a Superscalar Processor

dillon
Télécharger la présentation

Low power Design Strategies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low power Design Strategies Daniele Folegnani

  2. Talk outline • Why Low Power is Important • Power Consumption in CMOS Circuits • New Trends for Future Microprocessors • Low Power Strategies • Power Consumption Evaluation of a Superscalar Processor • An Architectural Technique to Reduce the Power Consumption of the Issue Logic • Conclusions

  3. Why Low Power is Important High performance microprocessors PowerPC704 consumes 85 Watt Alpha 21364 consumes 100 Watt Problems involved: thermal runaway, gate dielectric, junction fatigue, electromigration diffusion, electrical parameters shift, silicon interconnections fatigue, package related failure. THE FUNCTIONALITY AND THE CLOCK SPEED CAN BE LIMITED

  4. Thermal and Power dissipation costs

  5. Low performance processors High demand of portable devices ( mobile phones, laptops, smart cards, videogames, etc ) >>> 95% of production !!! Extensive use of multimedia features Problems involved: >>> Battery life !!! Energy battery will not grow drastically in the near future due to technology and safety reasons ( today´s batteries has the same energy of a grenade !!! ) One of the market point is: hours of use and hours of standby Need of techniques to improve energy efficiency without penalizing performance

  6. Power Consumption in CMOS Circuits • Static • Theoretically 0, in practice leakage and threshold currents exist in transistors • Dynamic • Transients ( the linear zone ) • Capacitance switching THE MOST IMPORTANT FACTOR

  7. New Trends for Future Microprocessors

  8. Moore´s Law doubling transistors every 18 months Power is proportional to DIE AREA and FREQUENCY In the same technology a new architecture has 2-3X in Die Area Changing technology implies 2X frequency SCALING TECHNOLOGY ... Decreasing voltage ( 0.7 scaling factor ) Decreasing of die area ( 0.5 scaling factor ) Increasing C per unit area 43% !!!

  9. This implies that the power density increase of 40% every generation !!! Temperature is a function of power density and determinates the type of cooling system needed. VARIABLES • PEAK POWER ( worst case ) Today´s packages can sustain a power dissipation over 100W for up to 100msec >>> cheaper package if peaks are reduced • ENERGY SPENT ( for a workload ) More correlated to battery life

  10. Low Power Strategies OS level : PARTITIONING, POWER DOWN Software level : REGULARITY, LOCALITY, CONCURRENCY (Compiler technology for low power, instruction scheduling ) Architecture level : PIPELINING, REDUNDANCY, DATA ENCODING ( ISA, architectural design, memory hierarchy, HW extensions, etc ) Circuit/logic level : LOGIC STYLES, TRANSISTOR SIZING, ENERGY RECOVERY ( Logic families, conditional clocking, adiabatic circuits, asynchronous design ) Technology level : Threshold reduction, multi-threshold devices, etc

  11. Power Consumption Estimation

  12. Due to the relative high error rate in the architecturalestimation ( no vision of the total area, circuit types, technology, block activity, etc ) IMPORTANT DESIGN DECISIONS MUST BE DONE AT ARCHITECTURAL LEVEL • Accurate power evaluation is done at late design phases • Needs of good feedback between all the design phases - Correlation between power estimation from low level to high level TRY TO IMPROVE ACCURACY AT HIGH LEVEL - Critical path based power consumption analysis ( CIRCUIT TYPES, TECHNOLOGY, ACTIVITY FACTOR ) - Thermal images based correlation analysis ( HOTTEST SPOTS LOCATION, COOLEST SPOTS LOCATION, TEMPERATURE DIFFERENCES, TEMPERATURE DISTRIBUTION )

  13. Architectural Power Evaluation [ G.Cai, Intel ] • Architectural design partition • Power consumption evaluation at block level -Power density of blocks ( SPICE simulation, statistical input set, technology and circuit types definition ) - Activity of blocks and sub-blocks( running benchmarks ) - Area( feedback from VLSI design, circuits and technology defined ) • TRY DO DEFINE SCALING FACTORS THAT ALLOW TO REMAP THE ARCHITECTURAL POWER SIMULATOR WHEN TECHNOLOGY, AREA AND CIRCUIT TYPES CHANGE • TRY TO REDUCE THE ERROR ESTIMATION AT HIGH LEVEL

  14. POW OUT ORDER • Technology assumed: CMOS 0.18 micron • 5 types of circuit logic ( static, dynamic, SRAM, clock distribution, PLA ) • 32 architectural blocks and area associated • blocks built with custom design • two types of power density ( active and inactive power density )

  15. Power Consumption Evaluation of a Superscalar Processor Architectural parameters: • 4 instr. fetch, issue and commit • 128 entries instruction queue size • I-Cache 128Kbytes, direct mapped, 32 byte line, 1 cycle hit, 3 cycle miss • D-Cache 128Kbytes, 4 way set ass, 32 byte line, 1 cycle hit, 3 cycle miss • UL2-Cache,1024Kbytes, 4 way set ass, 64 byte line, 3 cycle hit • Combined predictor of 1K entries with Gshare with 1K 2-bit counters, 8 bit global history and bimodal pred. of 2K entries with 2-bit counters • 4 intALU, 4fpALU, 1int mul/div, 1 fp mul/div • Out of order issue, oldest ready first selection policy

  16. An Architectural Technique to Reduce the Power Consumption of the Issue Logic • IQ + ROB responsible of about 53% of power consumption • Cache hierarchy is not the most important power consumption factor in superscalar paradigm • Power consumption is almost independent to the instruction mix TRENDS IN SUPERSCALAR • Increasing issue width • Increasing size of instruction window is more than linear respect IW • Area of IQ grows more than linear respect the number of entries IQ power contribution may grow in the future

  17. Every cycle the wakeup logic broadcast the result tags through the result buses to all the entries and each entry compares them with their to find a match THE ISSUE ENGINE SPEND EVERY CYCLE A LARGE AMOUNT OF POWER ONYL FOR CHECKING IF SOME INSTRUCTIONS ARE AVAILABLE FOR EXECUTION Considering • Periods of execution with high parallelism, just a subpart of the IQ may satisfy the IW • Periods of execution with poor parallelism, some parts of the IQ may not provide any useful instruction ready to execute The issue engine is very power inefficient

  18. Dynamically Resizing the Instruction Queue • We propose a run-time mechanism that adapt the size of IQ based on its contribution on IPC • We avoid the wake-up function in the parts that are temporally disabled • Resize decision are commit based IQ implemented as a circular FIFO with head and tail pointers, no collapsing

  19. What we do is ... Partition the queue in 16 parts of 8 entries Define a new pointer for the queue, called the limit pointer • At start time has the same value of the head pointer and is update as the head pointer • When a resize decision is done an offset ( one portion ) is added/subtracted from it • The zone between the head and the limit pointer is the disabled zone ( no wake-up ) • If the tail grows more than the limit, we allow the correct wake-up and we stop the insertion until the limit reach the tail

  20. Heuristic to reduce size • Collect statistics about the instructions committed in the youngest portion of the queue every quantum time ( 1000 cycles ). We propose to insert a bit in each ROB entry that will be set at dispatch time if the physical position of the instruction in IQ is in the current youngest part • The resize decision is threshold-based>>> 0.025 of IPC in the current portion • No limit to cut Heuristic to increase size • Blind >>> grow of one portion every 5 quantum time at lets the cut approach decide if the decision was correct or not ( time of high parallelism or not )

  21. Results

  22. Conclusions • Power consumption is a new constraint in the design of computer systems like cost and performance • The problem must be attacked from different levels of abstraction • Power decision must be done at early steps of the design • There is a need of power estimation models and tools, specially at architectural level

  23. Q&A ?

More Related