A Methodology for Evaluating Runtime Support in Network Processors
Explore evaluation methodology for assessing dynamic task allocation and runtime support in network processors, focusing on workload and system models to optimize performance.
A Methodology for Evaluating Runtime Support in Network Processors
E N D
Presentation Transcript
A Methodology for Evaluating Runtime Support in Network Processors University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu
Runtime Support in Network Processor • Network processor (NP) • Multi-core system-on-chip • Programmability & high packet processing rate • Heterogeneous resources • Control processors • Multiple packet processors • Co-processors • Memory hierarchy • Interconnection • Runtime support • Dynamic task allocation IXP 2800
General Operation of Runtime Support in NP • Input • Hardware resources • Workload • Mapping method • Output • Task allocation • Dynamic adaptation • Different runtime support systems • Difficult to compare AP3 AP2 AP2 AP3 AP3 AP1
Contributions • Evaluation methodology • Traffic representation • Analytical system model based on queuing networks • Results • Specific: 3 example runtime support system • Ideal Allocation • Full Processor Allocation • R. Kokku, T. Riche, A. Kunze, J. Mudigonda, J. Jason, and H. Vin. A case for run-time adaptation in packet processing systems. In Proc. of the 2nd workshop on Hot Topics in Networks (HOTNETS-II), Cambridge, MA, Nov. 2003 • Partitioned Application Allocation • T. Wolf, N. Weng, and C.-H. Tai. Design consideration for network processor operating systems. In Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication System (ANCS), pages 71-80, Princeton, NJ, Oct. 2005
Outline • Introduction • Evaluation Methodology • Dynamic Workload Model • Runtime System Model • Result • Summary
Workload • NP workload is characterized by applications and traffic • How to represent workload?
Dynamic Workload Model • Workload graph: • Application/Task: T • Traffic: • Processing requirement: • Example: • Processing requirement: • R. Ramaswamy and T. Wolf. PacketBench: A tool for workload characterization of network processing. In Proc. of IEEE 6th Annual Workshop on Workload Characterization (WWC-6), page 42-50, Austin, TX, Oct. 2003
Outline • Introduction • Evaluation Methodology • Dynamic Workload Model • Runtime System Model • Result • Summary
Runtime System Model • Unified approach for all runtime systems • Queuing networks • Specific solution for each runtime system • Runtime mapping: • Graph: • Packet arrival rate: • Service time: • Metrics for all runtime systems • Processor utilization: • Average number of packets in the system:
Three Example Runtime Support Systems • System I: Ideal Allocation • System II: Full Processor Allocation • System III: Partitioned Application Allocation
Example Evaluation Model – System I • Ideal Allocation • All processors can process all packets completely • Unrealistic, but can provide baseline M/G/m FCFS single station
M/G/m Single Station Queuing System • Cosmetatos approximation • Evaluation metrics G. Cosmetatos. Some Approximate Equilibrium Results for the Multiserver Queue (M/G/r). Operations Research Quarterly, USA, pages 615 – 620, 1976 G. Bolch, S. Greiner, H. de Meer, and K. S. Trivedi. Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. John Wiley & Sons, Inc., New York, NY, August 1998
Example Evaluation Model – System II • Full Processor Allocation • Allocate entire tasks to subsets of processors • Allocate as few processors as possible to save power • One processor run one type of task • Reallocation is triggered by queue length BCMP M/M/1-FCFS model (Jackson network)
BCMP Network • BCMP: Basket, Chandy, Muntz, and Palacios • Characteristics: Open, closed, and mixed queuing network; Several job classes; Four types of nodes: M/M/m–FCFS (class-independent service time), M/G/1–PS, M/G/∞–IS, and M/G/1–LCFS PR • Product-form steady-state solution: • Open M/M/1-FCFS BCMP Queuing Network: • Evaluation metrics: F. Baskett, K. Chandy, R. Muntz, and F. Palacios. Open, Closed, and Mixed Networks of Queues wit Different Classes of Customers. Journal of the ACM, 22(2): 248 – 260, April 1975
Example Evaluation Model – System III • Partitioned Application Allocation • Tasks be partitioned across multiple processors • Synchronized pipelines • Allocate tasks equally across all processors to maximize throughput • Reallocate at fixed time intervals Equations for evaluation metrics are the same as System II. BCMP M/M/1-FCFS model (Jackson network)
Outline • Introduction • Evaluation Methodology • Dynamic Workload Model • Runtime System Model • Result • Summary
Setup • System • 16 100MIPS processing engines • Queue lengths are infinite • Workload • Other assumptions • Partition applications into 7-15 subtasks
Processor Allocation Over Time • Ideal: • 16 processors • Full Processor: • Change with traffic • Partitioned Application: • 16 processors Full processor allocation system
Processor Utilization Over Time • Ideal: • Lowest processor utilization • Full Processor: • Highest processor utilization because using fewer number of processors • Partitioned Application: • Low processor utilization • Not equal to ideal case due to the unbalanced task allocation and pipeline overhead
Packets in System Over Time • Ideal: • Least number of packets • Full Processor: • Packets queued up due to its high processor utilization • Partitioned Application: • Most number of packets due to unbalanced task allocation and pipeline overhead • More stable performance because of finer processor allocation granularity
Performance for Different Data Rates • Ideal: • Smooth increase • Full Processor: • Periodical peak • Partitioned Application: • Smooth increase • The maximum data rate supported by the systems • Ideal: 100% • Full Processor: 79.6% • Partitioned application: 75.1%
Implication of the Results • Ideal Allocation • Provide a base line • Full Processor Allocation • Allocate as few processors as possible to save power • Use entire processor as the allocation granularity • Good: High processor utilization • Bad: High performance variance • Partitioned Application Allocation • Equally distribute tasks on all the processors • Finer processor allocation granularity • Good: Stable performance • Bad: Difficult to get optimized solution => pipeline synchronization overhead
Summary • Analytical methodology for evaluating different runtime support NP systems • Dynamic workload model and runtime system model • Results: 3 example runtime support systems • Quantitative metrics • Tradeoffs