1 / 9

Metrics

Metrics. FLOPS (FLoating point Operations Per Sec) - a measure of the numerical processing of a CPU which can be an indicator of it’s scientific computing capability.

Télécharger la présentation

Metrics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metrics • FLOPS (FLoating point Operations Per Sec) - a measure of the numerical processing of a CPU which can be an indicator of it’s scientific computing capability. • The floating-point format is a variation of scientific notation - the real number is represented using a mantissa, base, and exponent • Storing real number in computers: • use the fixed length of word as the storage space for a real number (e.g. 64bits) • Mantissa is normalised (1.61 is normalised, 16.1 is not) • The mantissa and exponents are converted to base-2 • Some parts of the word are used to store the mantissa, 1bit to store sign, and the rest to store the exponent • Advantages and disadvantages • Using a fixed-length space to store a wide overall range of values • If 64 bits are used to store the real numbers, in which 11 bits are used to store exponent and 52 bits to mantissa (the remaining 1 bit used to store sign). We can derive the range of numbers this storage layout can represent • More bits are used to store mantissa, higher precision, but smaller range • More bits are used to store exponent, wider range, but lower precision • The difference between two successive numbers is not uniform • When the numbers cannot be perfected converted to base-2 numbers, they must be rounded to be stored in the format, leading to some problems where algebraic rules do not appear to apply • The LINPACK benchmark produces a FLOPS results. This solves a dense system of linear equations by Gaussian elimination.

  2. Example of Floating Point Numbers • 172.625 base 10 • 10101100.101 X 2^0 base 2 • 1.0101100101 X 2^7 base 2 normalised • Using 32 bit (4 bytes) to store the number in computers, in which 1 bit for sign, 8 bits for exponent, and the rest for Mantissa • 0 00000111 00000000000010101100101 • S Exp Mantissa

  3. Metrics • MIPS (Millions of Instructions Per Second) - a measure of the speed of a processor. • Peak MIPS rates (usually vendor supplied) can be misrepresentative • Meaningless Information on Performance for Salespeople • People seldom refer to it

  4. Metrics • SPECint - measures a processor’s integer processing capabilities. • Latest version SPECint2006 • Can test cpu, memory, compiler, but cannot test networking, I/O • Consists of a series of benchmarks (12, including compression, compilation) • each benchmark has a reference time • Dividing the measured runtime of the benchmark by the reference time and multiplying by 100 provides a base ratio • For example, if we run the benchmark 401.bzip2 to test the system, whose reference time is 1400. The actual runtime of the benchmark is 140 sec. then the base ratio is calculated as 1400/140*100=1000 • These are averaged to produce a final performance figure for the processor.

  5. SPECint2006 benchmark suite

  6. Metrics • Communication: • Bandwidth (bytes/sec) • How much data can be sent per second over the network • Latency (seconds) • The time between one processor sending a message and the other processor receiving the message • Interconnection type: On-board interconnection or over networks. • Topologies: bus, crossbar, hub, switch • Protocols: stacks • unicast, multicast, broadcast. • Storage capabilities: • Storage facilities: register, cache, memory, hard disk • Bandwidth and Latency. • Bandwidth: how much data can be accessed per second in a certain storage facility • Latency: the time between sending a data accessing request and receiving the requested data • Memory hierarchies (cpu register-> cache -> main memory -> remote memory) • Local, remote file systems

  7. Top500 Supercomputer list • Website: www.top500.org • Top500 project Started in 1993, updated twice a year • Aiming to track the trend in HPC • Using LINPACK to measure the performance (FLOPS) • Essentially, LINPACK is to solve the dense system of linear equations Ax=b (commonly encountered in engineering area) • Users are allowed to change the problem size to get the maximum performance, which is used to rank the supercomputers • Theoretical peak performance is also given for reference

  8. Top500 Supercomputer list • Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. • Does not consider storage or I/O issues • Both custom designed machines and commodity machines win positions in the list • General trend towards commodity machines (COTS - Commodity Off-The-Shelf). BlueGene/L, however, is not a COTS machine • Connecting a large number of machines with relatively lower performance is more rewarding than connecting a small number of machines each with high performance • Read the paper: “A note on the Zipf distribution of Top500 supercomputers” (download from my homepage) • Performance doubles each year, better than Moore’s Law. • Moore’s Law : performance doubles approximately every 18 months • Dominated by the United States (location map of the Top100 machines: http://www.top500.org/lists/2006/11/top100map) • UK supercomputers in the list • Cambridge: No.20 (http://www.top500.org/system/8267), • AWE: No. 15

  9. Top Machine • BlueGene/L • first supercomputer in the Blue Gene project • Specialised systems based on the Power architecture. • Individual power 400 processors at 700Mhz • Two processors reside in a single chip. • Two chips reside on a “compute card” with 512MB memory. • 16 of these compute cards are placed on a node board. • 32 node boards fit into one cabinet, and there are 64 cabinets. • 130,712 CPUs with theoretical peak of 183.5 TFLOPS/s • Multiple network topologies available, which can be selected depending on the application. • High density of processors in a small area: • Low power and (comparatively) slow processors - just lots of them! • Fast interconnects and low-latency.

More Related