Performance Simulation of Digital Video Cluster Fabrics: Insights and Findings

Digital Video Cluster Simulation Martin Milkovits CS699 – Professional Seminar April 26, 2005

Goal of Simulation • Build an accurate performance model of the interconnecting fabrics in a Digital Video cluster • Assumptions • RAID Controller would follow a triangular distribution of I/O interarrival times • Gigabit Ethernet IP edge card would not impress any backpressure on the I/Os

Fabrics Simulated

Digital Video Cluster

Digital Video Node

Modules, Connections and Messages • Messages represent data packets AND are used to control the model • For data packets – have a non-zero length parameter • Contain routing and source information • Modules handle message processing and routing • By and large represent hardware in the system • PCI Bus module – not actual hardware, but necessary to simulate a bus architecture • Connections allow messages to flow between modules • represent links/busses • Independent connections for data vs. control messages • May be configured with a data rate value to simulate transmission delay

Managing Buffer/Bus access Before transferring a data message (RWM) Need to gain access to transfer link/bus and destination buffer

PCI Bus Challenges • Maintain Bus fairness • Allow multiple PCI bus masters to interleave transactions (account for retry overhead) • Allow bursting if only one master

PCI Bus Module Components • Queue – pending RWM’s • pciBus[maxDevices] array – utilization key • reqArray[maxDevices] – pending rqst messages • Work area – manages RWM actually being transferred by the PCI bus • 3 Message types to handle • rqst messages from PCI bus masters • RMW messages • qCheck self-messages

Handling rqst and RWM messages • When RWM finally hits the work area • Set RMW.transfer value = length of message (1024) • Schedule qCheck self-message to fire in 240ns (time to transfer 128bits)

Handling qCheck Messages

Determining Max Bandwidth

Simulation Ramp-up

105Second @ 120MBps Results

Contention / Utilization / Capacity

Learning Experiences • PCI Contention • First as a link like any other maintained by the StarGen chip • Buffer contention and access • Originally used retry loops – like actual system - way too much processing time! • Retry messages that are returned are a natural design given the language of messages and connections.

Conclusion / Future Work • Simulation performed within 7% of actual system performance • PCI bus between IB and StarGen potential hotspot • Complete more iterations with minor system modifications (dualDMA, scheduling) • Submitted paper to the Winter Simulation Conference

Performance Simulation of Digital Video Cluster Fabrics: Insights and Findings

Performance Simulation of Digital Video Cluster Fabrics: Insights and Findings

Presentation Transcript

Digital Video Editing

Digital Video

Digital Video

Digital Video

Digital Video Editing

Digital Video

Digital Video Broadcasting

Digital Video

Digital video

Simulation Video Games

Digital Calorimeter Simulation

Digital Video

SHELTER CLUSTER DIGITAL PLATFORM

Digital Video

Digital Video

SHELTER CLUSTER DIGITAL PLATFORM

Digital Video

Digital Video

Digital Video

DIRSIG Video Simulation

Digital Video

Digital Video