FAWN: A Fast Array of Wimpy Nodes Power-efficient Clusters

CSCI 620 – Advance Computer Architecture FAWN: A Fast Array of Wimpy NodesPower-efficient Clusters Tilini Senevirathna Project 2

Outline • Introduction • Data-intensive Computing • A Fast Array of Wimpy Nodes • Why FAWN? • Alternatives: When FAWN?

Introduction • Power is becoming an increasingly large financial and scaling burden for computing. • By 2012, 3-year data center energy costs will be double that of server equipment expenditures. • While power constraints have pushed the processor industry toward multi-core architectures, power-efficient alternatives to traditional disk and DRAM-based cluster architectures have been slow to emerge. • As a power-efficient alternative for data-intensive computing, a cluster architecture called a Fast Array of Wimpy Nodes (FAWN ) proposed.

Introduction • A FAWN consists of a large number of slower but efficient nodes that each draw only a few watts of power, coupled with low-power storage. • Prototype FAWN nodes are built from 500MHz embedded devices with CompactFlash storage that are typically used as wireless routers, Internet gateways, or thin clients. • FAWN can be up to six times more efficient than traditional systems with Flash storage in terms of queries per joule for seek-bound applications and between two to eight times more efficient for I/O throughput-bound applications.

Introduction

Data-intensive Computing • Data-intensive workloads are often I/O-bound, and can be broadly classified into two forms: • seek-bound • scan-bound

Seek-bound workloads • Read intensive workloads with random access patterns for small objects from a large corpus of data. • Poorly suited to conventional disk-based architectures where magnetic hard disks limit performance. • Ex: Access times for a random small block of data on a magnetic disk only 200-300 requests per second per disk. • Existing and emerging Internet applications forced to create and maintain large cluster-based memory caches. • Ex: facebook, LiveJournel, Amazon, LinkedIn

Scan-bound workloads • This type of data-intensive workloads are exemplified by large-scale data-analysis. • Analysis of large, unstructured datasets is becoming increasingly important in log-file analysis, data-mining, and for large-data applications such as machine learning. • These workloads are, at first glance, well suited to platter-based disks, which provide fast sequential I/O. • In many cases, however, the I/O capability provided by a typical drive or small RAID array is insufficient to saturate a modern high-speed, high-power CPU. As a result, performance is limited by the speed at which the storage system can deliver data to the processors.

Reseach Question • Can we build a cost-effective cluster for data-intensive workloads that uses less than a tenth of the power required by a conventional architecture, but that still meets the same capacity, availability, throughput, and latency requirements?

A Fast Array of Wimpy Nodes • Uses a large number of “wimpy” nodes that act as data storage/retrieval nodes. These nodes use energy-efficient low-power processors combined with low power storage and a small amount of DRAM. • FAWN creates a well-matched system architecture around flash: each node can use the full capacity of the flash without memory or bus bottlenecks, but does not waste excess power. • FAWN-KV is designed specifically with the FAWN hardware in mind, and is able to exploit the advantages and avoid the limitations of wimpy nodes with flash memory for storage. • Small writes on flash are very expensive. Updating a single byte of data is as expensive as writing an entire block of pages.

A Fast Array of Wimpy Nodes

A Fast Array of Wimpy Nodes • Client requests enter the system at one of several front-ends. • The front-end nodes forward the request to the back-end FAWNKV node responsible for serving that particular key. • The backend node serves the request from its FAWN-DS data-store and returns the result to the front-end • FAWN-DS is a log-structured key-value store. All writes to the data-store are sequential, and reads require a single random access. • The large number of back-end FAWN-KV storage nodes are organized into a ring using consistent hashing. • To balance load and reduce failover times, each physical node joins the ring as a small number (V) of virtual nodes, each virtual node representing a virtual ID (“VID ”) in the ring space. • FAWN-KV cluster key-value lookup system, including caching, replication, and consistency.

Why FAWN? • The Increasing CPU-I/O Gap • For data-intensive computing workloads, storage, network, and memory bandwidth bottlenecks often cause low CPU utilization. • To efficiently run I/O-bound data-intensive, computationally simple applications, FAWN uses wimpy processors selected to reduce I/O-induced idle cycles while maintaining high performance. The reduced processor speed then benefits from a second trend:

Why FAWN? • CPU power consumption grows faster than speed. • Branch prediction, speculative execution, and increasing the amount of on-chip caching all require additional processor die area. • Modern processors dedicate as much as half their die to L2/3 caches • These techniques do not increase the speed of basic computations, but do increase power consumption, making faster CPUs less energy efficient. • A FAWN cluster’s slower CPUs dedicate more transistors to basic operations. These CPUs execute significantly more instructions per joule (or instructions/sec per Watt) than their faster counterparts.

Why FAWN? • Dynamic power scaling on traditional systems is surprisingly ineffective. • A primary energy-saving benefit of DVFS was its ability to reduce voltage as it reduced frequency, but modern CPUs already operate near the voltage floor. • reducing peak power is more effective than dynamic power scaling for several reasons. • Despite improvements to power scaling technology, systems remain most power-efficient when operating at peak power

When FAWN? • Estimate the three-year total cost of ownership for a cluster serving a seek-bound workload. • 3-year total cost of ownership (TCO), which is defined as the sum of the capital cost and the 3-year power cost at 10 cents per kWh.

When FAWN?

References • FAWN: A Fast Array of Wimpy Nodes - David G. Andersen, Jason Franklin, Michael Kaminsky, AmarPhanishayee, Lawrence Tan, Vijay Vasudevan - Carnegie Mellon University, Intel Labs • FAWNdamentally Power-efficient Clusters- Vijay Vasudevan, Jason Franklin, David Andersen AmarPhanishayee, Lawrence Tan, Michael Kaminsky, IulianMoraru - Carnegie Mellon University and Intel Research, Pittsburgh

Questions?

FAWN: A Fast Array of Wimpy Nodes Power-efficient Clusters