Power Management

Power Management

Outline: • Introduction to the topic • Part 1: Vertigo • How can power consumption by a processor be reduced? • What is DVS? • Performance setting algorithms • Policy stack • Evaluation • Conclusion • Part 2: Cooperative I/O • Components of Cooperative I/O • Low power modes • Cooperative File operations • Cooperative Updates & DDT • Prototype Implementation • Experiments & Results • Conclusion

What was some of the common things among these devices you just saw?

Power Management The goal of power management is to combine high performance with low power consumption.

How can we better manage power ? • Develop longer lasting batteries. • Reduce the power consumed by processor/CPU • Reduce the power consumed by the network interface and user interface and during I/O. • Application driven power management which involves developing energy aware applications.

Part 1: Vertigo

Reduce the power consumed by the processor 3 possible solutions to the problem • By taking advantage of the highly variable performance requirements of the application. • By Dynamic Voltage Scaling(DVS) involves reducing the clock frequency and corresponding operating voltage of the CPU. • By reducing the static power consumption, which results from leakage current in CMOS devices. Using techniques such as Adaptive reverse Body Biasing(ABB).

DVS(Dynamic Voltage Scaling)exploits the fact that the peak frequency of a processor implemented in CMOS is proportional to the supply voltage, while the amount of dynamic energy required for a given workload is proportional to the square of the processor’s supply voltage. • In simple “running the processor slower means that the voltage level can be lowered, yielding a quadratic reduction in energy consumption, at the cost of increased run-time” • So the key is to reduce the processor’s performance level so as to be able to exactly meet the software deadlines not to run the processor at its peak level at all times. • Similarly, static power consumption due to leakage can be significantly reduced if processor does not operate at its peak performance level.

Dynamic Voltage Scaling • Simple approach to do voltage scaling is based on the usage model(not on the workload). This approach does not need any OS support. • Another approach is to build the power management policy into the processor’s firmware. This approach also does not require any modifications to the OS. • The approach used in Vertigo is to build power management policy as hooks to the already existing OS. This helps it to get access to a richer set of data for making performance perdictions.

Vertigo[1] Design objectives of the Vertigo system: • To be minimally intrusive into the host operating system. So as to make it autonomous from the other units of the kernel. • To automatically set the processor performance, without any application specific involvement. • To be able to run it in passive mode in order to monitor the system.

Vertigo Architecture • Vertigo is implemented as a set of kernel patches that hook into the Linux kernel to monitor the program execution and control the speed and voltage levels of the processor. • Vertigo co-exists with the existing scheduler, system calls and power manager. • It uses a composition of performance setting algorithms, all specializing in diffirent kinds of run-time situations.

Performance Setting Algorithms They can be broadly divided into two categories 1. Those that use information about task deadlines in real-time kernels to guide performance-setting decisions of the processor. 2. Those that seek to derive deadlines automatically by either monitoring the past utilization of the processor(interval based techniques) or based on semantic task and event classification. Note: Vertigo’s performance setting algorithms fall into the 2nd category.

Vertigo’s Performance Setting Algorithms • The multiple performane setting algorithms are co-ordinated to find the best estimate for the necessary performance level. • Hence there is a decision hierarchy, were the top level algorithms have a right to override the lower levels. • Vertigo has three levels of decision hierarchy 1. Top level - This algorithm quantifies the performance requirements of an interactive application. 2. Middle level- An application specific layer, where DVS aware applications can submit their information. 3. Bottom level- The algorithm tries to estimate future utilization based on past information. Note: The focus of the paper is on the top level and bottom level algorithms.

How to quantify Work? The full speed equivalent(fse) work estimate is computed by Workfse= ni=1 tipi(Eq 1) where i refers to one of the n different performance levels during a given interval with the corresponding amount of non-idle time spent at that performance level(ti) in seconds and frequencies(pi) specified in fraction of peak performance The equation basically states that the workload’s runtime is linearly related to the inverse of the processor’s performance level.

Perspective-based Algorithm • It is the algorithm used at the lowest level of the policy stack. • It is called perspective-based, since it computes the performance predictions from the perspective of each task and uses the combined result to control the performance setting of the processor. • Vertigo computes each task’s observed period for recomputing per-task exponentially decaying averages of the work done during the task’s period and of its estimated deadlines. • Performance level is got by Perf= WorkEst/ Deadline (Eq 4) • To find the task’s period, the algorithm tracks the time from when the task starts executing, through points when it is preemepted and eventually runs out of work, until the next time it is rescheduled.

Task Period At point c the Workfse is computed for the period between a and c.

WorkEst and Deadline WorkEstnew= (k* WorkEstold + Workfse)/ (k+1) (Eq 2) Deadlinenew= (k*Deadlineold+Workfse+Idle)/(k+1) (Eq 3) Note: k is a constant whose value is choosen as k=3.

Advantage: By keeping track of work and deadline predictions seperately the performance predictions are weighted over the length of the interval over which the work estimates are measured. Disadvantage: If a new non-interactive, CPU-bound task that utilizes the CPU without being preempted, there is significant latency incured in responding to the load. This is overcome by placing a limit over the non-preempted duration over which the work estimate is computed. They have choosen a value of 100ms after which is a task is not preempted the work estimate is recomputed.

Top level Algorithm(1) • The middle level- DVS-Aware Applications can submit their requirements to this layer. But it has not been currently implemented in Vertigo. • The top level- The algorithm automatically quantifies the performance requirements of interactive applications and ensures that software deadlines are met, so that user experience does not suffer. • An interactive episode is started by the user through a GUI event. • This event gets handled by the GUI controller and the appropriate task handler. • Therefore by monitoring the appropriate system calls, Vertigo can detect interactive episodes . • The GUI controller, the task handler and all other tasks initiated by these are marked as interactive tasks. • An interactive episode ends when there are no more interactive tasks to perform.

Top level Algorithm(2) • Skip threshold: It is set to a value of 5ms. • Panic threshold: If the interactive episode does not end by this point the processor performance is increased to the maximum. It is set to 50ms. • Perception threshold: If an episode is longer than both the skip and the perception threshold then the performance requirement is set to 100% • Perception threshold describes a cut-off point under which events appear to happen instantaneously for the user. • The perception threshold depends upon the task and user but 50ms is commonly used.

Top level Algorithm(2) • The predictions for the interactive episodes are as exponentially decaying averages of the correct settings of the past interactive episodes. • If the panic threshold was reached during an episode, the moving threshold is rescaled so that the last performance gets incoporated with a higher weight. (k=1 instead of k=3) • The performance requirements of the interactive episodes shorter than the perception threshold is calculated as Perf= Workfse/ Perception Threshold (Eq 5)

Policy Stack and Work Tracking in Vertigo • The policy stack keeps track of the various commands and performance level requests from each policy and uses this information to combine them into a singel global performance level decision when needed. • The policies can be triggered by any event in the system and they may submit new performance requests at any time. • The algorithms use processor utilization history to compute performance level, hence they need to be able to track the work done over an interval. • This is done is by the policies in Vertigo by making use of structure called “Vertigo_work”. • This not only simplifies the policy code but also simplifies the porting of Vertigo.

Evaluation • Sony Vaio PCG-CIVN notebook computer • Transmeta Crusoe 5600 processor running at 300MHz to 600MHz with 100MHz steps. • This particular model was used because it had its own power management done at the firmware level which could be turned on or off. Thus it was useful in evaluating Vertigo. • LongRun is the power management system of Crusoe. • The OS used Mandrake 7.2 with modified Linux 2.4.4-ac18 kernel

Table 3: Application level statics about the plaympeg benchmark

Figure 8:Fraction of the time spent in different interactive workloads

Conclusion • Plus • The idea for using more than one performance setting algorithm in a policy stack • The Linux kernel itself wasn’t modified. • The applications need not be energy-aware • The system showed good results in most situations • Minus • It would have been interesting to see the actual energy savings. • The amount of energy saved by optimising the energy consumption of the processor alone may not be significant • The applications need to be energy-aware to make use of the middle level of the performance stack. • The ideas used in Vertigo can be used along with other power management methods for overall improvement.

Part 2 Cooperative I/O- A Novel I/O Semantics for Energy Aware Applications

Cooperative I/O- A Novel I/O Semantics for Energy Aware Applications[2] • Integrates the application layer into the dynamic power management of devices. • Energy-aware applications make use of the new interface operating system to do cooperative I/O. • The basic idea is for the OS to batch deferable requests in order to create long idle periods during which switching to low-power mode pays off. Ex: Auto-save function of the text editor. • The proposed model consists of three major parts • New co-operative file operations that have time-out and cancel flags as additional parameters. • The update policy of the OS is re-designed to save energy. • The OS controls the hard disk modes by using a simple algorithm called “Device-dependent time-out” or DDT.

Components of Coop-I/O

Low power modes(1) • The OS may reduce the power consumption of a device by switching between the device’s operation modes. • The ATA standard, defines 4 operation modes for the hard disk • Active mode: The hard disk is reading,writing,seeking or spinning • Idle mode: The disk is rotating and the interface is active but the head is in the parking position. • Standby mode: The disk spindle motor is off, but the interface is still alive • Sleep mode: Both the spindle motor and the interface are off. To reactivate we need the reset command. • The device is “shut down” if it is in the Standby or the Sleep mode.

Low power modes(2) • A non-negligible amount of time and power is required to enter and leave these modes. • Break-even time is defined as the minimum interval between 2 disk operations for which the switching will pay-off. • The break-even time is dependent on the characteristics of the disk. • Stand-by time: If the interval of idleness is longer than the break-even time then it is called the stand-by period.

Cooperative File Operations(1) • These operations provide a way for applications to exert influence on the way the hard-disk operates. • Three cooperative variants to the commonly used file operations: • open_coop() • read_coop() • write_coop() • In addition to the standard parameters these operations have 2 additional parameters, the time-out and the cancel flag. • The time out value specifies when the operation is initiated at the latest and the cancel flag specifies if or not the particular operation can be aborted. • In case of legacy applications their open(), read() and write() map on to the corresponding cooperative variants but with time-out and cancel flag set to zero and inactive respectively.

Cooperative Read operation • Copies of the disk blocks are kept in a cache in main memory, called the block buffers, to reduce the number of slow disk opertions. • Modified block buffers are marked dirty and the “dirty buffer life span” determines the time when these buffers must be written back to the disk. • Cooperative read: • Checks to see if the block is in the cache, if yes, retrieves it. • If not, checks if the disk is in the active or Idle mode, if yes, then the block is retrieved immediately. • Else, it waits until the disk spins again or the time-out has elapsed. • If the time-out has elapsed then it checks the cancel flag to determine whether to restart the disk or abort the read operation.

Cooperative Write operation • If the block is cached, the update task defers write operation until the dirty buffer life span has elapsed. • If the block isn’t cached, then the write may induce a read operation, so that the block can be read into the main memory before it can be modified. • The situation gets more complicated when the write operation needs to read an uncached block after several modifications to the cached block. • The problem overcome using the early commit/abort strategy

Early Commit/Abort Strategy(1) • Decides to commit or abort as soon as the first modification to a block buffer is going to take place. • Assume, we want to modify the buffer and the drive is in standby mode, then we can distinguish 3 situations: • The drive is activated by another request before the time-out for our request is reached. • If at the end of our time-out there are no other dirty buffers to the drive, depending on the cancel flag we can either commit or abort the write operation. • If there is another dirty buffer for the drive. In which case we wait until old requests time-out elapses and the drive is started. • A write to a block is delayed only as long as the drive is in standby mode and there are no dirty buffers for that drive.

Early Commit/Abort Strategy(2) Therefore since a write operations first buffer modification involves committing the operation, a write can be committed even if the disk is not running. Also an abortable write will not be aborted after the time-out even if the disk is in standby mode. This is case when a read follows a commited write • The disk is in standby and there exists dirty buffers for the disk • The write operations has to modify several disk blocks. • The write operation is commited after the first modification. • If a subsequent block is not cached, it has to be read from the disk. • This action will be deferred until time-out has elapsed. • We have to spin the disk then to read the block as the complete operation is already committed.

Cooperative Open Operation • An open operation can cause a read or a write operation if the file has to be created or truncated. • For the open_coop(), the read and write induced by it are mapped to their corresponding cooperative counterparts. Usage of the cooperative operations: • Since the coop operations can be delayed the I/O must be handled by a separate thread in the program so that data is cached by it for the main thread. • The main thread thus reads and writes data to the buffer as if there is no cooperative I/O.

Cooperative Updates • Cooperative Update tries to optimize the caching mechanism with respect to power consumption. • The update process usually takes place when one of the following conditions are met • An explicit sync() command forces a write back of the buffers to disk • The dirty buffer life span has elapsed • A certain percentage of block buffers are dirty • System needs main memory and has to do a write back to reclaim memory • The problem with the above policy is 2 updates can be spaced with a time period that is shorter than the break-even time. • Solution: Try to batch write requests so that device can spend more time in low power modes. • The policy is called “Drive specific cooperative updates”

Drive specific cooperative updates It consists of 4 strategies: • Write back all buffers instead of only the oldest ones • Update cooperatively: The OS tries to combine other disk access such as read requests and writing back dirty buffers whose “dirty buffer life span” hasn’t yet elapsed. • Update each drive seperately: It may help to increase the update interval for a singel drive, also balance the I/O load, yet the same time it will not compromise on file system consistency. • But this would require us to maintain the oldest dirty buffer for each individual drive along with a list of all the dirty buffers that belong to that drive. • Update on shutdown: If the drive is planning to shutdown write back all the dirty buffers belonging to that drive.

Device dependent time-out(DDT) policy The policy states: Shut down the hard disk if tla + tbe =< t where t : The current time tla : The time of the last hard disk access tbe : The break even time Thus the disk drive has to keep track of tla and tbe

Prototype implementation • They have implemented the Coop-I/O concept in the Linux kernel version 2.4.10. • They have also made the following modifications • Cooperative file operations: In additions to the already mentioned file operations a function, wait_for_drive() is provided to block them when the drive is not active. • The VFS and Ext2 file system have been modified to support the drive-specific cooperative update policy: By mapping device numbers to drives, also now the need for cooperative updates is checked every time a one reads or writes to a drive. • The IDE driver has been enhanced by power mode control for disk drivers, which includes the DDT algorithm. DDT algorithm is implemented as a timer based function that is called every 1sec.

Experiments and Results • Experiments were run on a • personal computer running Linux kernel • with a IBM Travelstar 15GN hard disk. • Power measurements where taken using 4 analog-to-digital convertors • Testing a cooperative Audio player • the application was modified to make it energy aware • Two audio files were played, with the same length but different compression ratios. • The two audio files were used, both alone and also companied by a mail reader. • All the tests were run for 534 seconds.

The system was tested under 4 strategies: • Cooperative: Use DDT and new buffer cache and update mechanism. Read_coop() has a delay time equivalent to the play time for one semi-buffer. • ECU(Energy aware Caching and Update) : Same as above but instead of read_coop() function a standard read() was used. • DDT only: Use DDT with the original buffer cache and update mechanism • None: Do not use any power saving mechanism at all • Uncooperative Oracle policy : Values for this are derived from running the unmodified Linux. This represents the theoritical lower bound.

Figure 4 & Table 1- compares the 4 strategies

Figure 5- compares the 4 energy saving strategies(playing “Toccata” without mail)

Figure 6- compares interaction between 2 independent tasks, “Toccata” and Mail cooperative.

Reads and Writes with varying average period length • Multiple tasks periodically read or write data • The energy consumption was measured over 1000seconds • The read and write processes have varying period lengths and idle times

Fig 9: Varying the number of cooperative processes The energy consumption steadily declines with the increasing proportion of cooperative processes.

Power Management