210 likes | 347 Vues
Using Uncacheable Memory to Improve Unity Linux Performance. Ning Qu Xiaogang Gou Xu Cheng Microprocessor Research and Development Center Peking University. Unity SoC architecture. Issues. No snooping. Cache coherency problem everywhere !!. poor temporal locality!. Issues cont.
E N D
Using Uncacheable Memory to Improve Unity Linux Performance Ning Qu Xiaogang Gou Xu Cheng Microprocessor Research and Development Center Peking University
Unity SoC architecture Issues No snooping Cache coherency problem everywhere !! Peking University
poor temporal locality! Issues cont. User Process User Process process I/O buffer process I/O buffer Linux Kernel Linux Kernel kernel I/O buffer kernel I/O buffer DMA DMA I/O device buffer I/O device buffer I/O Device I/O Device Peking University
Motivation • Heavy cost of Cache coherency operations • Many high-end embedded processors have Cache, But many of them have very limited support to guarantee cache coherency • Poor locality leads to more data Cache pollution • Cache is based on property of locality • Some programs have poor locality, for example TCP/IP processing How to avoid the disadvantages? Uncacheable memory may be a solution! Peking University
Contributions • Analyze the scenarios in which Cache doesn’t perform well, propose uncacheable memory has two advantages • Eliminate most of Cache coherency operations • Avoid Cache pollution • Apply uncacheable memory in Unity Linux to improve the I/O performance. • Some important aspects improves from 5% - 29% Peking University
Outline • Issues • Motivation • Contribution • Uncacheable Memory • Evaluation • Related Work • Conclusions Peking University
using uncacheable memory Recv Packet Flow step 1 step 2 step 3 step 4 User Space Simple data processing flush cache User Buffer Kernel Space Buffer Buffer Buffer Buffer I/O Device CPU copy DMA copy Peking University
using uncacheable memory Send Packet Flow step 1 step 2 step 3 step 4 User Space User Buffer clean cache DMA copy Kernel Space Buffer Buffer Buffer Buffer CPU copy Simple data processing I/O Device Peking University
Cacheable vs. Uncacheable DMA send and receive cost analysis Peking University
load U to Cache load K to Cache load U into Cache store to K load K to Cache load U into Cache and store load K load U into Cache and store Cacheable vs. Uncacheable cont. Cache clean cost DMA Send: Cache flush cost DMA Recv: Peking University
Cacheable vs. Uncacheable cont. Recv and Send Performance CH vs NC Peking University
Using Uncacheable Memory • Implemented in Unity Linux ported from Linux 2.4.17 • Uncacheable page table • eliminate Cache coherency operations when modifying the page tables • Uncacheable socket buffer for sending • eliminate Cache coherency operations • avoid data Cache pollution Peking University
Outline • Motivation • Issues • Contribution • Uncacheable Memory? • Evaluation • Related Work • Conclusions Peking University
Methodology • Benchmarks: Netperf, Lmbench and Modified Andrew benchmark. • Experiments environment • 160 MHz Unity network computer with 256 MB DRAM, a SoC build-in 10M/100M Ethernet card • Dell 4600 server, two Intel Xeon PIII 700 MHz processors with 4 GB DRAM and 1000M/100M Ethernet card • All benchmarks are executed in single-user mode on NFS. Peking University
Netperf Benchmark Results Netperf TCP_STREAM Send Performance Peking University
Netperf Benchmark Results cont. Netperf TCP_RR Performance Peking University
Lmbench Benchmark Results Lmbench Performance Peking University
Modified Andrew Benchmark Results Modified Andrew Benchmark Peking University
Related Work • Related work: accelerate uncacheable memory performance • New memory type • Intel write-combining • MIPS R10000: uncached-accelerated page • New instructions • SPARC V9, ARM, Unity II: block move instructions • Future work: new memory type support • Read like common cache with low pollution • Write like Write-Combining without write-allocate Peking University
Conclusions • This paper focuses on the uncacheable memory usage. • Pros: eliminating coherency operations and avoiding data Cache pollution. • Cons: slow accessing time • Uncacheable memory can perform well with a carefully design when considering system specialties Peking University
Thank You! Questions? Peking University