Chapters 8 Storage, Networks, and Other Peripherals

Chapters 8Storage, Networks, and Other Peripherals 授課教師: 張傳育博士 (Chuan-Yu Chang Ph.D.) E-mail: chuanyu@yuntech.edu.tw Tel: (05)5342601 ext. 4337

Interfacing Processors and Peripherals • I/O Design affected by many factors (expandability, resilience) • I/O system 的Performance比CPU的performance更複雜: • 有些device注重access latency • 有些device注重Throughput • I/O system的performance和系統的許多方面有關： • connection between devices and the system • the memory hierarchy • the operating system

Example • Suppose we have a benchmark that executes in 100 seconds of elapsed time, where 90 seconds is CPU time and the rest is I/O time. If CPU time improves by 50 % per year for the next five years but I/O time doesn’t improve, how much faster will our program run at the end of five years? • Solution:已知實耗時間 = CPU time + I/O time 100 = 90 + I/O time所以 I/O time = 10 (s) 所以五年後CPU效能提升了90/12 = 7.5倍實耗時間提升了100/22 = 4.5倍，但I/O time佔實耗時間的比率從10%增加到45%

Type and characteristics of I/O Devices • I/O裝置具有相當多的變化，可歸納出三種特性： • behavior (i.e., input vs. output) • partner (who is at the other end?) • data rate

Type and characteristics of I/O Devices • Mouse • 滑鼠與系統間的介面可以是下列中的一種： • 當滑鼠移動時產生一連串的脈衝(pulse) • 當滑鼠移動時會增加或減少計數器。

I/O Example: Disk Drives • 硬碟的組成 • Platter • Track • Sector • 為所能read/write的最小單位。 • Logical Block Access (LBA)使所能read/write的最小單位變成block • 每個track有相同數量的sector • Zone Bit Recording (ZBR)讓外圈有較多的sector，以增加容量。 • Cylinder 每個sector間會有gap 每個sector內會有ECC

Disk access time: • Seek time: • Move the head to the proper track (8 to 20 ms. avg.) • Rotational latency: • wait for desired sector to rotate under the read/write head • Transfer time : • grab the data (one or more sectors) 2 to 15 MB/sec • Controller time • The overhead the controller imposes in performing an I/O access • Disk access time = Seek time+ Rotational latency+ Transfer time+ Controller time

Type and characteristics of I/O Devices • 硬碟和軟碟相比具有下列優點： • The hard disk can be larger because it is rigid. • The hard disk has higher density because it can be controlled more precisely. • The hard disk has a higher data rate because it spins faster. • Hard disks can incorporate more than one platter.

Example • Disk Read time • What is the average time to read or write a 512-byte sector for a typical disk rotating at 5400 RPM? The advertised average seek time is 12 ms, the transfer rate is 5MB/sec, and the controller overhead is 2 ms. Assume that the disk is idle so that there is no waiting time. • Solution: • Disk access time = seek time + rotation time + transfer time + controller overhead • Disk access time =

RAID • Redundant Array of Independent Disks • Redundant Array of Inexpensive Disks • 6 levels in common use • Not a hierarchy • Set of physical disks viewed as single logical drive by O/S • Data distributed across physical drives • Can use redundant capacity to store parity information

RAID 0 • No redundancy • Data striped across all disks • Round Robin striping • Increase speed • Multiple data requests probably not on same disk • Disks seek in parallel • A set of data is likely to be striped across multiple disks

RAID 1 • Mirrored Disks • Data is striped across disks • 2 copies of each stripe on separate disks • Read from either • Write to both • Recovery is simple • Swap faulty disk & re-mirror • No down time • Expensive

RAID 2 • Disks are synchronized • Very small stripes • Often single byte/word • Error correction calculated across corresponding bits on disks • Multiple parity disks store Hamming code error correction in corresponding positions • Lots of redundancy • Expensive • Not used

RAID 3 • Similar to RAID 2 • Only one redundant disk, no matter how large the array • Simple parity bit for each set of corresponding bits • Data on failed drive can be reconstructed from surviving data and parity info • Very high transfer rates

RAID 4 • Each disk operates independently • Good for high I/O request rate • Large stripes • Bit by bit parity calculated across stripes on each disk • Parity stored on parity disk

RAID 5 • Like RAID 4 • Parity striped across all disks • Round robin allocation for parity stripe • Avoids RAID 4 bottleneck at parity disk • Commonly used in network servers • N.B. DOES NOT MEAN 5 DISKS!!!!!

RAID 6 • Two parity calculations • Stored in separate blocks on different disks • User requirement of N disks needs N+2 • High data availability • Three disks need to fail for data loss • Significant write penalty

RAID 0, 1, 2

RAID 3 & 4

RAID 5 & 6

Data Mapping For RAID 0

Optical Storage CD-ROM • Originally for audio • 650Mbytes giving over 70 minutes audio • Polycarbonate coated with highly reflective coat, usually aluminium • Data stored as pits • Read by reflecting laser • Constant packing density • CD-ROM contains a single spiral track • Sectors near the outside of the disk are the same length as those near the inside. • Information is packed evenly across the disk in segment of the same size. • Constant linear velocity • The disk rotate more slowly for accesses near the outer edge than for those near the center.

I/O Example: Buses • Shared communication link (one or more wires) • Bus的優點： • 多樣性(versatility)、低成本(low cost) • Difficult design: • may be bottleneck • length of the bus • number of devices • tradeoffs (buffers for higher bandwidth increases latency) • support for many different devices • cost

Buses: Connecting I/O Device to Processor and Memory • Bus transaction • Read: transfers data from memory • Write : write data to the memory • Input: putting data from the device to memory • Output: data will be read from memory and sent to the device. The three steps of an output operation 1. CPU送出Read控制訊號，及address給memory 2. memory 讀取所需的資料 4. Disk將data line上的資料寫入disk。 3. memory 將資料送出至data lines，並且送出data可用訊號給disk。

Buses: Connecting I/O Device to Processor and Memory • Input Operation (將磁碟的內容載入memory) 1. CPU送出write request控制訊號，及address給memory 2. 通知disk，memory已準備就緒。 4. Memory將data line上的資料寫入Memory 3.Disk將資料送上data line。

Buses: Connecting I/O Device to Processor and Memory • Types of buses: • processor-memory bus • Short, high speed, to maximize memory-processor bandwidth • backplane bus • Allow processor, memory, and I/O devices to coexist on a single bus. • high speed, often standardized, e.g., PCI • I/O bus • lengthy, different devices, standardized, e.g., SCSI

I/O Bus Standards • Today we have two dominant bus standards:

Buses: Connecting I/O Device to Processor and Memory

Buses: Connecting I/O Device to Processor and Memory • Synchronous and Asynchronous Bus • Synchronous • use a clock and a synchronous protocol, such as processor-memory bus • The bus can run very fast and the interface logic will be small • 缺點： • every device must operate at same rate • “clock skew “ requires the bus to be short • Asynchronous • don’t use a clock and instead use handshaking • Handshaking: Assume that there are there control lines: • ReadReq: • Indicate a read request for memory. Put the address on the data lines. • DataRdy: • Indicate the data word is now ready on the data lines. • Ack: • Used to acknowledge the ReadReq or DataRdy signal of the other party.

I/O read a word from memory 1. I/O 送出ReadReq的同時，也送出address於data bus。 3. Memory偵測到ReadReq low，release Ack。 6. Memory收到Ack，釋出DataRdy及data bus。 7. I/O偵測到DataRdy low，release Ack。傳輸結束。 2. Memory回應Ack，並讀取data bus上的address；此時I/O裝置收到Ack後，將release ReadReq及Data bus。 4. Memory準備好data，並且將data放上data bus上，同時送出DataRdy訊號通知I/O。 5. I/O偵測到DataRdy，開始讀取data bus上的data，同時送出Ack訊號通知Memory。

Example • Performance Analysis of Synchronous Vs. Asynchronous Bus • The synchronous bus has a clock cycle time of 50 ns, and each bus transmission takes 1 clock cycle. The asynchronous bus requires 40 ns per handshake. The data portion of both buses is 32 bits wide. Find the bandwidth for each bus when performing one-word reads from a 200-ns memory. • Solution:從題目可知，在同步bus中每一次傳輸需要花費1個時脈週期(50ns) ，所以從記憶體中讀取一個字組，需要花費：1. Send the address to memory: 50ns + 2. Read the memory: 200 ns+ 3. Send the data to the device: 50nstotal time = 300ns所以傳輸4 bytes需花300ns 4/300ns = 13.3 MB/sec • 在非同步bus中，每次handshake需花費40 ns，而非同步bus的七個步驟中有需多步驟可以重疊進行，步驟 2~4可重疊(因為memory access時間較長) ，所以從記憶體中讀取一個字組，需要花費：Step 1: 40ns + Step 2, 3, 4: max (3*40ns, 200ns): 200 ns+ Step 5, 6, 7: 3x40 120nstotal time = 360ns所以傳輸4 bytes需花360ns 4/360ns = 11.1 MB/sec

Buses: Connecting I/O Device to Processor and Memory • Increasing the Bus Bandwidth • Data bus width • 增加data bus的寬度 • Separate versus multiplexed address and data lines • 將data和address分別用不同的bus，如此在一個bus cycle可同時傳送address和data。 • Block transfer • 不需送出位址及釋放bus，允許bus一個接一個的傳送multiple words，如此將降低傳送大量區塊資料的時間。

Example: • Performance Analysis of two bus schemes • Suppose we have a system with the following characteristics: • A memory and bus system supporting block access of 4 to 16 32-bit words. • A 64-bit synchronous bus clocked at 200MHz, with each 64-bit transfer taking 1 clock cycle, and 1 clock cycle required to send an address to memory. • Two clock cycles needed between each bus operation. (Assume the bus is idle before an access.) • A memory access time for the first four words of 200ns; each additional set of four words can be read in 20 ns. Assume that a bus transfer of the most recently read data and a read of the next four words can be overlapped. • Find the sustained bandwidth and the latency for a read of 256 words for transfers that use 4-word blocks and for transfers that use 16-word blocks. Also compute the effective number of bus transactions per second for each case. Recall that a single bus transaction consists of an address transmission followed by data.

Example: • Solution: • Bus clock = 200MHz 一個clock cycle=1/200MHz = 5ns • 針對4-word block transfer，每一個block需要 • 傳送address到memory: 1 clock cycle • 讀取memory中的data: 200ns / 5ns = 40 clock cycle • 從memory傳送data: 2 clock cycle • 每一次傳輸之間的暫停: 2 clock cycle • 共需要1+40+2+2=45 clock cycle，256/4=64次傳輸。 • 因此，共需要45x64=2880 clock cycle = 2880x5ns =14400ns • Transaction per second = 64/14400ns = 4.44M transaction/sec • Bus bandwidth = (256x4)/14400 = 71.11MB/sec • 針對16-word block transfer，每一個block需要 • 傳送address到memory: 1 clock cycle • 讀取memory中的data: 200ns / 5ns = 40 clock cycle • 從memory傳送data: 2 clock cycle x 4 = 8 clock cycle • 每一次傳輸之間的暫停: 2 clock cycle x 4 = 8 clock cycle • 共需要1+40+8+8=57 clock cycle，256/16=16次傳輸。 • 因此，共需要57x16=912 clock cycle = 912x5ns =4560ns • Transaction per second = 16/4560ns = 3.51 M transaction/sec • Bus bandwidth = (256x4)/4560 = 224.56MB/sec

Buses: Connecting I/O Device to Processor and Memory • Obtaining Access to the Bus • In a single-master system, all bus requests must be controlled by the processor. • 缺點： processor必須處理每一個bus transaction.

Buses: Connecting I/O Device to Processor and Memory • Bus Arbitration: • Deciding which bus master gets to use the bus next. • 仲裁時須注意bus priority及fairness。 • 四種匯流排仲裁： • Daisy chain arbitration (not very fair) • Centralized arbitration (requires an arbiter), e.g., PCI • Distributed arbitration by self selection, e.g., NuBus used in Macintosh • Distributed arbitration by collision detection, e.g., Ethernet

Centralized arbitration (requires an arbiter), e.g., PCI

Communicating with the Processor: • Polling • The process of periodically checking status bits to see if it is time for the next I/O operation. • Interrupts • When an I/O device requires attention from the processor. • Direct Memory Access, DMA • Off-loading the processor and having the device controller transfer data directly to or from the memory without involving the processor.

Interrupts • Mechanism by which other modules (e.g. I/O) may interrupt normal sequence of processing • Program • e.g. overflow, division by zero • Timer • Generated by internal processor timer • Used in pre-emptive multi-tasking • I/O • from I/O controller • Hardware failure • e.g. memory parity error

Program Flow Control

Interrupt Cycle • Added to instruction cycle • Processor checks for interrupt • Indicated by an interrupt signal • If no interrupt, fetch next instruction • If interrupt pending: • Suspend execution of current program • Save context • Set PC to start address of interrupt handler routine • Process interrupt • Restore context and continue interrupted program

Instruction Cycle (with Interrupts) - State Diagram

Multiple Interrupts • Disable interrupts • Processor will ignore further interrupts whilst processing one interrupt • Interrupts remain pending and are checked after first interrupt has been processed • Interrupts handled in sequence as they occur • Define priorities • Low priority interrupts can be interrupted by higher priority interrupts • When higher priority interrupt has been processed, processor returns to previous interrupt

Multiple Interrupts - Sequential

Multiple Interrupts - Nested

Concept of DMA • 直接記憶體存取(Direct Memory Access, DMA) • 直接記憶體存取(DMA)是一種介面，DMA控制器(DMA controller)利用週期竊取(cycle stealing)的方式將記憶體單元中的資料直接對周邊作大量資料的傳輸。 • 當CPU送出起始位址及傳送字數以啟動DMAC之後，由於CPU進行指令解碼及執行的時候，並不會使用到系統的匯流排，因此DMAC會利用這個階段，使CPU讓出系統的匯流排，DMAC就可以使用系統的匯流排，直接使周邊與記憶單元間作資料的傳輸，不必經由CPU的管理。這種技術就稱為週期竊取(cycle stealing)。 • DMA與程式I/O不同的地方在於DMA不使用CPU的暫存器，直接竊取記憶週期，進行資料傳送，適用於高速的周邊設備。

2. DMAC向CPU提出統匯流排請求 3. CPU完成目前週期，回應HLDA 1. 週邊裝置向DMAC提出服務請求 6. 傳輸完畢， DMAC disable HRQ，將系統匯流排控制權交還給CPU 4. DMAC向週邊裝置提出服務認可 5. DMAC送出欲傳送資料的起始位址後，開始進行memory與周邊之間資料傳輸

Example: • Overhead of Polling in an I/O system • Assume that the number of clock cycles for a polling operation is 400 and the processor executes with a 500MHz clock. Determine the fraction of CPU time consumed for the following three cases, assuming that you poll often enough so that no data is ever lost and assuming that the devices are potentially always busy: • The mouse must be polled 30 times per second to ensure that we do not miss any movement made by the user. • The floppy disk transfers data to the processor in 16-bit units and has a data rate of 50KB/sec. No data transfer can be missed. • The hard disk transfers data in four-word chunks and can transfer at 4MB/sec. Again, no transfer can be missed.

Example: • Solution: • (a) For mouse 30x400 = 12000 cycles per second每秒消耗processor的時間比例=12000 / 500M = 0.002% • (b) For Floppy每秒可存取50KB，每次16-bit = 2Bytes所以需要50KB / 2 = 25K 次的poll，每次poll花費400個clock cycle，共需要25K x 400 = 10000K每秒消耗processor的時間比例= 10000K / 500M = 2% • (c)For hard disk每秒可存取4MB，每次4 word = 16Bytes所以需要4MB / 16 = 250K 次的poll，每次poll花費400個clock cycle，共需要250K x 400 = 100000K每秒消耗processor的時間比例= 100000K / 500M = 20%

Example: • Overhead of Interrupt-driven I/O • Suppose we have the same hard disk and processor we used in the previous example, but we use interrupt-driven I/O. The overhead for each transfer is 500 clock cycles. Find the fraction of the processor consumed if the hard disk is only transferring data 5% of the time. • Solution • 每秒可存取4MB，每次4 word = 16Bytes所以需要4MB / 16 = 250K 次的interrupt，每次interrupt花費500個clock cycle，共需要250K x 500 = 125000K每秒消耗processor的時間比例= 125000K / 500M = 25%假設the hard disk is only transferring data 5% of the time ，則每秒消耗processor的時間比例= 25% x 5% = 1.25%

Chapters 8 Storage, Networks, and Other Peripherals

Chapters 8 Storage, Networks, and Other Peripherals

Presentation Transcript

Chapters 7 and 8

Chapters 7 and 8

Chapters 8

Chapters 8

Chapters 5 and 8

Chapter 9 Other Peripherals

Other Chapters

Chapters 7 and 8

Chapters 7 and 8

CSCE 212 Chapter 8 Storage, Networks, and Other Peripherals

Chapters 8

Chapters 8 Storage, Networks, and Other Peripherals

Chapters 1 and 8

Chapters 7 and 8

CHAPTER 8 INTERFACING PROCESSORS AND PERIPHERALS

Chapters 8

Ch 8 Interfacing Processors and Peripherals

Chapters 8

Chapters 7 and 8

Chapters 7 and 8

Chapters 8