240 likes | 366 Vues
SaddleHill is an innovative application designed to test SCSI controller firmware and hardware, improving feature flexibility and fault tolerance. By enabling multiple hosts to access and share SATA affiliations, it overcomes the limitations of single initiators in SAS domains. The system prioritizes long-term fairness without violating I/O time constraints. Built using Trolltech’s Qt framework, SaddleHill comprises logical components for managing I/O, displaying real-time statistics, and ensuring effective communication in complex storage environments. This project aims to enhance performance and reliability in enterprise and embedded storage solutions.
E N D
SaddleHill A SCSI I/O Generator By Owen Parry
Project Motivation • To create an application that exercises new controller firmware and hardware. • Provide the ability to rapidly add features. • Provide more tolerance for hardware/firmware failures. • To Develop a mechanism that allows multiple hosts to randomly access and efficiently share the SATA affiliations. • Current methods seek to avoid the Serial ATA limitation; using single initiator in a SAS domain; limiting communication with the disk drive to only 1 initiator at a time. • Need simple and decentralized strategy for an embedded environment. • Achieve long-term max-min fairness. • Avoid violating I/O time limits.
Background • SCSI targeted at the enterprise storage market. Used primarily to attach hard disk drives. • High performance • RPM: 10K, 15K • Seek Time: 3.2 – 7.4 ms • Greater reliability. • MTBF: 1.2 M Hr • Capacity, 18 – 300Gb • Expensive: $160 - $1400 • Multiple host support. • ATA targeted at the desktop market. • Medium performance • RPM: 5400, 7200 • Seek Time: 8.9 – 9.5 ms • Mediocre reliability • MTBF: 500 K hr • Capacity, 40GB – 1Tb • Cheap: $75 - $300 • Single host.
Background • Serial Attached SCSI is the new Transport protocol replacing parallel SCSI. • SAS Advantages. • Faster Data Rates. • SAS-1:300 MB/s • SAS-2:600MB/s • Larger Drive counts. • Typical Domain size 128 • 16K addresses using fan-out expanders • Increased data integrity. • Configuration flexibility. • Supports Serial ATA Drives.
Background • Problem with SATA in SAS topology • Architecture only allows a single host. • SAS uses mutual exclusion called an “Affiliation.” The first initiator to open a connection may own the affilation indefinitely. • Vendors want to simultaneously issue commands to SATA disks from multiple initiators.
Related Work • Unable to locate other works in the storage area. • Closely Related Research • Wireless LANs • Bandwidth sharing schemes: • Maxmin Fair Scheduling in Wireless Networks, Leandros Tassiulas and Saswati Sarkar. • Channel time sharing schemes: • Proportional Fairness in Wireless LANs and Ad Hoc Networks, Li Bin Jian, and Soung Chang Liew • Time-based fairness improves Performance in Multi-rate WLANs, Godfrey Tan and John Guttag.
SaddleHill Design / Implementation • Built using Trolltech’s Qt 4.2.3 • Compiled for x86_64 bit systems. • Comprised of four logical blocks.
SaddleHill Design / Implementation • MainWindow • Lists PCI SAS Initiators devices. • Lists SAS Target devices. • Displays Live Test Statistics. • Displays Application messages. • Accepts user input.
SaddleHill Design / Implementation • Management Unit • Manages SaddleHill’s physical I/O Data buffers, and Initiator operational buffers. • Address conversion: Virtual to Physical; Physical to Virtual. • Maintains a list of SAS Initiators and Targets. • Maintain the application message log. • Maintain the model objects (system device, message, statistics) which are used by the GUI to gather and display information to user. • Distributes device configurations. • Starts/Stops I/O tests. • Calculates I/O and Throughput rates.
SaddleHill Design / Implementation • IO Engine • Initializes SAS targets. • Maintains disk SAS Addresses, and Target ID. • Generates, Issues, and Completes SCSI Commands. e.g. Read10, Write10, Write And Verify10, Inquiry, Read Capacity etc. • Comprises three threads to perform each of the above tasks. • Maintains statistics: • Number of I/Os issued • Number of I/Os completed • Error count • Amount of Data Transferred. • I/O Response times.
SaddleHill Design / Implementation • Hardware Abstraction Layer • SaddleHillDriver • Registers with linux kernel as a character device. • Registers with PCI core. • Allocates blocks of physical memory. Currently 16 MB. • Reserves the physical memory to prevent swapping. • Provides the facilities to map PCI SAS I/O control registers to user space. • Provides the facilities to map the physical memory to user space. • Provides PCI Device configuration information to user space application. • HAL (User Level) • Implements the MPI specification • Initializes the SAS Adapter • Converts Requests from IO Engine to the MPI specific format. • Sends requests to and receive replies to/from the Initiator via the PCI control registers. • Processes MPI Replies and completes request to IO Engine. • Manages STP Affiliations. • Maintains test statistics • Number of I/Os issues. • Number of I/Os completed. • Error Count. • Amount of Data Transferred. • I/O Response Times. • Affiliation ownership times. • Affiliation synchronization count.
Affiliation Synchronization • Uses idea put forward in “Proportional Firness in Wireless LANs and Ad Hoc Networks.” • Fix the maximum transmission time. • Contend fairly among the initiators for the mutex. • Implementation • Affiliation Acquisition • Acquisition started be reception of new I/O • Calculate back-off. • Use uniform distribution random number generator to choose back-off time within contention window size. • Generate SCSI Inquiry command • Sleep for length of back-off • Issue Inquiry • Failed synchronization attempt doubles contention window size. • Start timer on successful acquisition • Affiliation Release • Resource released if no I/Os are waiting to be sent. • Resource released after ownership timer expires. • There are no preemptions. • I/Os are placed into a waiting state during the release and acquisition process. • I/Os outstanding at the time of release are allowed to completed. • The truncated binary exponential back-off strategy is used to calculate the back-off times.
Finding the Back-Off Strategy • Considered strategies for back-off included: • No Back-off • Fixed Window • BEB • TBEB • Logarithmic • Test Strategy: • Read10, Write10 commands • Single Block Transfers • Same LBA • Drive Caching Enabled • NCQ enabled • Drive Q-Depth = 8 • 3Gb SATA disk • Multiple Initiators
Finding the Back-Off Strategy • STP Ownership Times
Finding the Back-Off Strategy • Synchronization Requests
Finding the Back-Off Strategy • Average I/O Response
Finding the Back-Off Strategy • No Back-Off • Too many synchronization attempts • Depending on topology configuration will favor some initiators • Fixed Window • There is no way to chose the appropriate window size. • BEB • Violates the I/O time limits in long test runs. • Logarithmic • Achieves near perfect max-min fairness in resource ownership in both short and long terms. • Large number of synchronization requests. Unacceptable in large topologies. • The Truncated Binary Exponential Strategy was chosen for the implementation of the synchronization algorithm • Closely achieves long-term max-min fairness • Low number of synchronization attempts.
Performance • Transaction processing profile was used. • Small Block Transfer (1-16 Blocks) • Concerned with I/O Rates rather than throughput. • Single Initiator • Same IO size ~2250 IOPS. • Random IO sizes ~902 IOPS. • Dual Initiators • Same IO sizes ~ 2075 IOPS. 8% Performance decrease. • Random IO sizes ~ 786 IOPS. 12% Performance decrease • Quad Initiators • Same IO sizes ~ 1975 IOPS. 14% Performance decrease • Random IO sizes ~745 IOPS. 17% Performance decrease
Future Directions • Due to the challenges of SATA in enterprise storage environments. Vendors are employing varying strategies to deal with the SATA problem. These include: • Completely removing SATA from topologies. • Building special hardware that increase the affiliation resources. • The STP Resource sharing algorithm will be moved to the SAS Initiator port. • Requires a change in the mechanism that acquires an releases affiliations. • Utilize the SAS CLOSE(CLEAR AFFILIATION) primitive when tearing down connections. • Simply convert and issue host IO. • SaddleHill • Short Term • Support SAS-2 Initiator • Support additional SBC and SPC commands • Support SSC and MMC SCSI command sets • FW Upgrade Support • Initiator Configuration Modification • Long Term • Build into a automated firmware unit test system.
Conclusion • All project goals achieved • User-Level SCSI I/O generator • Synchronization algorithm that meets the simplicity, fairness and decentralization objectives.