Above: The first magnetic core memory, from the IBM 405 Alphabetical Accounting Machine. This experimental system was t

1. 1

2. 2 COMP 206:Computer Architecture and Implementation Montek Singh Thu, April 16, 2009 Topic: Main Memory (DRAM) Organization

3. 3 Outline Introduction SRAM (briefly) DRAM Organization Challenges Bandwidth Granularity Performance

4. 4 Structure of SRAM Cell Control logic One memory cell per bit Cell consists of one or more transistors Not really a latch made of logic Logic equivalent

5. 5 Bit Slice Cells connected to form 1 bit position Word Select gates one latch from address lines Note it selects Reads also B (and B not) set by R/W, Data In and BitSelect

6. 6 Bit Slice can Become Module Basically bit slice is a x1 memory Next

7. 7 16 X 1 RAM Now shows decoder

8. 8 Row/Column If RAM gets large, there is a large decoder Impossibly large! Also run into chip layout issues Larger memories usually �2D� in a matrix layout Next Slide

9. 9 16 X 1 as 4 X 4 Array Two decoders Row Column Address just broken up Not visible from outside

10. 10 Dynamic RAM Capacitor can hold charge Transistor acts as gate No charge is a 0 Can add charge to store a 1 Then open switch (disconnect) Can read by closing switch Explanation next

11. 11 Precharge and Sense Amps You�ll see �precharge time� B is precharged to � V Charge/no-charge on C will increase or decrease voltage Sense amps detect this

12. 12 DRAM Characteristics Destructive Read When cell read, charge removed Must be restored after a read Refresh Also, there�s steady leakage Charge must be restored periodically

13. 13 DRAM Logical Diagram

14. 14 DRAM Refresh Many strategies w/ logic on chip Here a row counter

15. 15 Timing Say need to refresh every 64ms Distributed refresh Spread refresh out evenly over 64ms Say on a 4Mx4 DRAM, refresh every 64ms/4096=15.6 us Total time spent is 0.25ms, but spread Burst refresh Same 0.25ms, but all at once May not be good in a computer system Refresh takes 1 % or less of total time

16. 16 Summary: DRAM vs. SRAM DRAM (Dynamic RAM) Used mostly in main mem. Capacitor + 1 transistor/bit Need refresh every 4-8 ms 5% of total time Read is destructive (need for write-back) Access time < cycle time (because of writing back) Density (25-50):1 to SRAM Address lines multiplexed pins are scarce! SRAM (Static RAM) Used mostly in caches (I, D, TLB, BTB) 1 flip-flop (4-6 transistors) per bit Read is not destructive Access time = cycle time Speed (8-16):1 to DRAM Address lines not multiplexed high speed of decoding imp.

17. 17 Chip Organization Chip capacity (= number of data bits) tends to quadruple 1K, 4K, 16K, 64K, 256K, 1M, 4M, � In early designs, each data bit belonged to a different address (x1 organization) Starting with 1Mbit chips, wider chips (4, 8, 16, 32 bits wide) began to appear Advantage: Higher bandwidth Disadvantage: More pins, hence more expensive packaging

18. 18 Chip Organization Example: 64Mb DRAM

19. 19 Memory Performance Characteristics Latency (access time) The time interval between the instant at which the data is called for (READ) or requested to be stored (WRITE), and the instant at which it is delivered or completely stored Cycle time The time between the instant the memory is accessed, and the instant at which it may be validly accessed again Bandwidth (throughput) The rate at which data can be transferred to or from memory Reciprocal of cycle time �Burst mode� bandwidth is of greatest interest Cycle time > access time for conventional DRAM Cycle time < access time in �burst mode� when a sequence of consecutive locations is read or written

20. 20 Improving Performance Latency can be reduced by Reducing access time of chips Using a cache (�cache trades latency for bandwidth�) Bandwidth can be increased by using Wider memory (more chips) More data pins per DRAM chip Increased bandwidth per data pin

21. 21 Two Recent Problems DRAM chip sizes quadrupling every three years Main memory sizes doubling every three years Thus, the main memory of the same kind of computer is being constructed from fewer and fewer DRAM chips This results in two serious problems Diminishing main memory bandwidth Increasing granularity of memory systems

22. 22 Increasing Granularity of Memory Systems Granularity of memory system is the minimum memory size, and also the minimum increment in the amount of memory permitted by the memory system Too large a granularity is undesirable Increases cost of system Restricts its competitiveness Granularity can be decreased by Widening the DRAM chips Increasing the per-pin bandwidth of the DRAM chips

23. 23 Granularity Example

24. 24 Granularity Example (2)

25. 25 Improving Memory Chip Performance Several techniques to get more bits/sec from a DRAM chip: Allow repeated accesses to the row buffer without another row access time burst mode, fast page mode, EDO mode, � Simplify the DRAM-CPU interface add a clock to reduce overhead of synchronizing with the controller = synchronous DRAM (SDRAM) Transfer data on both rising and falling clock edges double data rate (DDR) Each of the above adds a small amount of logic to exploit the high internal DRAM bandwidth

26. 26 Block Diagram

27. 27 Activate Row

28. 28 Read (Select column)

29. 29 Basic Mode of Operation Slowest mode Uses only single row and column address Row access is slow (60-70ns) compared to column access (5-10ns) Leads to three techniques for DRAM speed improvement Getting more bits out of DRAM on one access given timing constraints Pipelining the various operations to minimize total time Segmenting the data in such a way that some operations are eliminated for a given set of accesses

30. 30 Nibble (or Burst) Mode Several consecutive columns are accessed Only first column address is explicitly specified Rest are internally generated using a counter

31. 31 Fast Page Mode Accesses arbitrary columns within same row Static column mode is similar

32. 32 EDO Mode Arbitrary column addresses Pipelined EDO = Extended Data Out Has other modes like �burst EDO�, which allows reading of a fixed number of bytes starting with each specified column address

33. 33 Evolutionary DRAM Architectures SDRAM (Synchronous DRAM) Interface retains a good part of conventional DRAM interface addresses multiplexed in two halves separate data pins two control signals All address, data, and control signals are synchronized with an external clock (100-150 MHz) Allows decoupling of processor and memory Allows pipelining a series of reads and writes Peak speed per memory module: 800-1200 MB/sec

34. 34 Synchronous DRAM (SDRAM) Common type in PCs since late-90s Clocked Addresses multiplexed in two halves Burst transfers Multiple banks Pipelined Start read in one bank after another Come back and read the resulting values one after another

35. 35 DDR DRAM Double Data Rate SDRAM Transfers data on both edges of the clock Currently popular DDRx, where x refers to voltage and signaling specs. DDR1 was 2.5v, DDR2 1,8v, DDR3 1.5v Graphics cards now using GDDR4 (Graphics Double Data Rate) memory chips Memory clocks of 900MHz or so (xfer 1800MHz equivalent)

36. 36 RAMBUS DRAM (RDRAM, XDR) RDRAM Another attempt to alleviate pinout limits Many (16-32), smaller banks per chip Made to be read/written in packet protocol Each chip has more of a controller Did not do well in market. High latency. XDR A newer technology Differential, low voltage swing signaling Used in PS3, 65 GB/s xfer rate

37. 37 DRAM Controllers Very common to have circuit that controls memory Handles banks Handles refresh Multiplexes column and row addresses RAS and CAS timing Northbridge on PC chip set

38. 38 Memory Interleaving Goal: Try to take advantage of bandwidth of multiple DRAMs in memory system Memory address A is converted into (b,w) pair, where b = bank index w = word index within bank Logically a wide memory Accesses to B banks staged over time to share internal resources such as memory bus Interleaving can be on Low-order bits of address (cyclic) b = A mod B, w = A div B High-order bits of address (block) Combination of the two (block-cyclic)

39. 39 Low-order Bit Interleaving

40. 40 Mixed Interleaving Memory address register is 6 bits wide Most significant 2 bits give bank address Next 3 bits give word address within bank LSB gives (parity of) module within bank 6 = 0001102 = (00, 011, 0) = (0, 3, 0) 41 = 1010012 = (10, 100, 1) = (2, 4, 1)

41. 41 Other types of Memory ROM = Read-only Memory Flash = ROM which can be written once in a while Used in embedded systems, small microcontrollers Offer IP protection, security Other?

Above: The first magnetic core memory, from the IBM 405 Alphabetical Accounting Machine. This experimental system was t

Above: The first magnetic core memory, from the IBM 405 Alphabetical Accounting Machine. This experimental system was t

Presentation Transcript

POLAR Provides First Experimental Documentation of Magnetic Reconnection

Management Accounting Research: Experimental Approaches

TK2123: COMPUTER ORGANISATION & ARCHITECTURE

Computer Organization and Architecture William Stallings 8th Edition

ACCOUNTING

Sea-Floor Spreading

Outline

Memory-centric System Interconnect Design with Hybrid Memory Cubes

Porting Charm++ to a New System Writing a Machine Layer

Cache Organization

Artificial Intelligence: Human vs. Machine

Magnetic drum memory

Building a Computer

RadMon Radiation Monitoring System for the LHC Machine and experimental caverns

Spintronic Memories

Chapter 18: Magnetic Properties

Required Dimensions of HAPL Core System with Magnetic Intervention

CS25410

CHAPTER 6 EXTERNAL MEMORY

Memory Technology March 14, 2000

CSC 235

Above: The first magnetic core memory, from the IBM 405 Alphabetical Accounting Machine. This experimental system was t