Critical Factors in Windows NT Server Performance Gary Cline AvaTech, Inc. 2509 Freetown Drive Reston, Virginia 20191
Objective • To provide an overview of some of the critical factors that impact Windows NT Server performance, including: • CPU, Memory, and PCI Bus Parameter Settings • Adapter Card Installation and Configuration • Disk and RAID Settings • Real-time Thread Priority • Processor Affinity • Registry Settings • Sources of Information
CPU, Memory, & PCI Bus Parameters • The Pentium processor was the first Intel processor implemented with an L2 cache. • The Pentium Pro processor incorporated the L2 cache into the processor chip and increased the speed of the L2 to equal the processor's. • The Pentium II processor also incorporated the L2 cache into the processor chip, but at only half the speed of the processor. • To offset the slower speed of the Pentium II's L2 cache, Intel increased the size of the Pentium II's L1 cache.
CPU, Memory, & PCI Bus Parameters • Second generation Pentium II processors (Deshutes) operate with a front side bus of 100MHz. • The Pentium II Xeon processor was designed for high end server applications. It's L2 cache operates at the speed of the processor and can be configured with up to 64Gb of memory. • Since the size of the L2 cache cannot be increased without replacing the processor, purchase your servers with the largest L2 cache possible.
CPU, Memory, & PCI Bus Parameters • The speed of the system clock which regulates the speed of the system bus is determined by jumpers on the motherboard. • The speed of the processor and the PCI bus is a multiple of the system bus, determined by jumpers on the motherboard. • It is possible to over-clock the processor and the PCI bus speeds. However, your processor's execution may become unstable, and you must have high performance PCI adapters that can operate at higher PCI bus speeds. Heat may also be a problem.
CPU, Memory, & PCI Bus Parameters • An excellent source of information including over-clocking may by found at: • www.tomshardware.com/overclocking.html. • Overclocking an SMP server should only be considered after careful experimentation and as a measure of last resort. • Overclocking probably should not be done on an Intel processor faster than 233MHz. However, overclocking is being done on AMD Athlon CPU’s at speeds as high as 1,000 MHz.
CPU, Memory, & PCI Bus Parameters • Intel chipsets (processor, memory controller, PCI bus controller) have over two hundred settings, many of these settings impact system performance. They include: • CPU to PCI Posting • CPU Pre-fetch • CPU Multiple Read Pre-fetch • CPU Line Read Pre-fetch • PCI Bus Master Enable • These settings are processor, BIOS, and manufacturer dependent.
CPU, Memory, & PCI Bus Parameters • While there are a large number of utilities for displaying and editing these settings for single processor desktop units, you are dependent upon the manufacturer for multi-processor SMP servers. • The manufacturer does not necessarily configure the chipset for maximum performance, rather they are set for maximum stability. • You should have a copy of the factory chipset settings (basic & advanced) and you should have a copy of a software utility for displaying and editing the settings on your server.
CPU, Memory, & PCI Bus Parameters • Changing the value of the advanced settings is not for the faint hearted. You may discover that your system may not boot up. • Sources of CPU, Memory, and PCI bus parameters include: • www.sysopt.com System Optimization • www.miro.pair.com TweakBios program & BIOS Companion book • www.pcguide.com Excellent technical site
Adapter Card Install & Configuration • The maximum speed/throughput of a PCI bus operating at 33 MHz is 132 Mb/sec. Actual speeds are around 20 Mb/sec • Manufacturers of high performance SMP servers have introduced multiple PCI busses in a single chassis to improve I/O performance • The secondary bus tends to be 10% faster than the primary bus. This is because serial and parallel ports, keyboard are connected via the primary bus.
Adapter Card Install & Configuration • If you have one network adapter and one SCSI adapter you should install the network adapter in the primary bus and the SCSI adapter in the secondary bus. • If you have multiple SCSI adapters then their numbers should be balanced across all of the busses. • If you have multiple network adapters then purchase a multi-port board. Adaptec and Aurora Technologies have a 4 port Ethernet adapter.
Adapter Card Install & Configuration • Use only high-performance PCI adapters that support PCI bus mastering and burst mode. • Avoid adapters using programmed I/O peripherals, because they use the PCI bus inefficiently. • Never use EISA bus adapters. • Ensure that you are using the most current device drivers.
Adapter Card Install & Configuration • Do not assume that the manufacturer will follow these guidelines. • Do not install a sound card in a server. • Do not enable the screen saver with anything other than a blank screen.
Disk and RAID Settings • Disk operations are measured in milliseconds while CPU and PCI bus operations are measured in nanoseconds or microseconds • Disk subsystems are important because the physical orientation of the data stored on disk has an influence on overall server performance. • A detailed understand of how disk subsystems operate is critical for effectively solving many server bottlenecks.
Disk and RAID Settings • Do not use EIDE disk drives. EIDE interface does not handle multiple I/O requests very efficiently and the EIDE interface consumes more CPU capacity per I/O than SCSI. • Obtain the technical specifications as well as software utilities for changing the settings for the disk drives and adapters in your servers.
Disk and Raid Settings • The National Software Testing Laboratories reports that defraging your disk drives can dramatically improve your server’s performance • Benchmark tests show improvements greater than 20% running Microsoft Exchange and SQL Server 7.0 • While Windows 2000 will be delivered with defragmentation software from Executive Software. It’s the lite version • The benchmark report can be found on Executive Software’s web page at http://www.execsoft.com/nstl-dk/
Disk and RAID Settings • Run disk benchmarks to determine the throughput of your disk subsystem, and compare the numbers you obtained with industry standards. • As a general rule, do not configure more than three or four disk drives to a single SCSI adapter. • RAID was created to address the huge gap between computer I/O requirements, single disk drive latency, and throughput.
Disk and RAID Settings • RAID is a collection of techniques that treat a redundant array of inexpensive disk drives as a single unit with the objective of improving performance and reliability. • There are five RAID strategies employed by RAID manufacturers, each with their own advantages and disadvantages.
RAID Levels Defined • RAID 0 stripes data across all disks, no redundancy or parity. • RAID 1 mirrors data across multiple disks. • RAID 2 bit interleaves data across multiple disks with parity information. This Level is not used in practice. • RAID 3 and 4 stripe data across multiple drives and write parity to a dedicated drive. • RAID Level 5 stripes data and parity information at the block level across all the drives in the array.
Disk and RAID Settings • Many factors effect RAID performance. The most significant factors in order of importance include: • RAID strategy • Number of disk drives • Drive performance • Firmware level • Stripe size • SCSI bus configuration • Write-back. • In general, adding disk drives is one of the most effective changes that can be made to increase overall server performance.
Disk and RAID Settings • Optimum performance of your disk subsystem is highly dependent upon the characteristics of your applications' I/O requirements. • You may consider employing RAM disk software to eliminate I/O's on your disk subsystem for high usage files. • An excellent source of information on RAID technology is the RAID Advisory Board. Their web page can be found at www.raid-advisory.com.
NT Scheduler & Thread Priority • Windows NT implements a priority-driven, preemptive scheduling system. • When a thread is selected to run, it runs for an amount of time called a Quantum. • Typically the default Quantum on a multiprocessor Pentium server is 180 msec. • Normally, each time a thread completes it's time slice, the value of it's Quantum is decremented by one and continues until it reaches zero where it is then reset to it's specified value.
NT Scheduler & Thread Priority • This can result in an uneven allocation of CPU resources for similar transactions in the same application. • An NT process can be designated as “real-time,” where the threads associated with that process do not have the value of their Quantum decremented • Care must be exercised because you can deny CPU resources to the non real-time processes on your server.
Processor Affinity • In a multiprocessor environment a thread will typically execute on any available processor. For a cpu intensive thread this may not be the most efficient means of execution. • Each processor has its' own L1 and L2 cache. Should a thread execute on a processor other than the last, then most likely the cache for this thread will have to be refreshed. • Windows NT provides the facilities for limiting the processors on which a thread is allowed to run. This is called processor affinity.
Processor Affinity • It should only be employed on cpu intensive threads, such as for a DBMS. • It can also be used to segregate the execution of two applications sharing the same server. • Processor affinity can have a negative effect of increasing system overhead. You should experiment with it before using it in a production environment.
Registry Settings • There are several registry settings that can dramatically impact your servers performance • A high cache hit ratio is vital to the performance of your NT server. There is an L2 cache setting in the Win NT registry • To manually edit the registry,modify HKEY_LOCAL_MACHINES\System\CurrentControl Set\Control\SessionManager\MemoryManagement and change the SecondLevelDataCache key to the size of your second level cache.
Registry Settings • IoPageLockLimit determines the number of pages NT will read or write to the hard disk at one time • If you system performs a significant number of physical I/O’s then raising this limit may improve the throughput of your disk subsystem. • To manually edit the registry, modify HKEY_LOCAL_MACHINES\System\CurrentControl Set\Control\Session Manager\Memory Management and change the IoPageLockLimit. • The default is zero.
Performance Validation • Employ benchmarking tools to provide an accurate assessment of your servers’ performance • There are benchmarks for exercising server components, servers, and total systems • Some software suppliers provide benchmarks for their application software • Excellent source: www.benchmarkresources.com
Performance Prediction • Analytical modeling and simulation tools can provide performance predictions for planned upgrades and/or configuration changes. • Several companies that offer modeling or simulation tools include: • BMC Software (Best/1) www.bmc.com • SES, Inc. (SES/Workbench) www.ses.com • AST Engineering Services (QASE) www.aetes.com • netFUSION (ProSim/NT) www.netfusion.com
Sources of Information • The web is a great source of information. Most of the more prominent manufacturers have libraries of white papers on how to efficiently configure their servers • There are too many sources and more are added each month. It's best to employ internet search engines to find information specific to your needs. However, one excellent site I recently discovered is: http://www.pureperformance.com
Sources of Information • You might also try www.deja.com. It's a search engine for the Internet newsgroups. For example you might search newsgroups with the query: seagate AND (performance AND (problem OR tuning)) • Four excellent books include: • Inside Windows NT (2nd Edition), David A. Solomon, Microsoft Press • Optimizing Windows NT, Sean K. Daily, IDG Press • Tuning & Sizing NT Server, Curt Aubley, Prentice Hall • Windows NT Applications: Measuring and Optimizing Performance
Summary Recommendations • Intel based systems do not come out of the box optimized for performance. Obtain the necessary software utilities to display and change your chipset settings. • Relatively small configuration changes can dramatically change the performance of your server, both positive and negative. PCI bus busy can appear from NT's performance monitor to be a CPU busy problem. • You should balance the I/O workload across all of the PCI busses in your server.
Summary Recommendations • What you don't know about your disk subsystem may be dramatically hurting your servers performance. • You should employ both system and component benchmarks to baseline the performance of your server before and after making changes. • There is adequate documentation available to walk you through a performance improvement project.
Questions, Comments, or Experiences • Should you have questions, comments, or want to share experiences you can contact me by the following methods: • Telephone: 703-391-2142, fax 703-391-2140 • E-mail: email@example.com • S-mail: 2509 Freetown Drive, Reston, Virginia 20191 • Visit our web site at www.avatech-usa.com if only to obtain a magnetic copy of this and other presentations and papers.