1 / 48

The Inside Scoop: How Microsoft Built a Scale Lab with 120 Million Items across two 15 TB Content Databases

SPC399. The Inside Scoop: How Microsoft Built a Scale Lab with 120 Million Items across two 15 TB Content Databases. Paul Andrew Sr. Technical Product Manager Microsoft Corporation. Paul J. Learning Sr. Consultant Microsoft Corporation. Barry Waldbaum FAST Architect Microsoft Corporation.

elgin
Télécharger la présentation

The Inside Scoop: How Microsoft Built a Scale Lab with 120 Million Items across two 15 TB Content Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPC399 The Inside Scoop:How Microsoft Built a Scale Lab with 120 Million Items across two 15 TB Content Databases Paul Andrew Sr. Technical Product Manager Microsoft Corporation Paul J. Learning Sr. Consultant Microsoft Corporation Barry Waldbaum FAST Architect Microsoft Corporation

  2. Session Objectives and Takeaways • Describe the very large scale test lab we did for SharePoint • The test lab was shown in the keynote by Jeff Teper and Richard Riley • Present Test Results for 6 series of testing on the populated farm • Review Architecture for SharePoint and FAST • Identify lessons learned from building a large-scale environment • Discuss tools leveraged to create & load content, performance test

  3. Project Overview and Results Paul Andrew, Sr. Technical Product Manager

  4. Scale Lab Test Goals • Demonstrate very large SharePoint Farm • Example of new SharePoint Boundaries and Limits • Enterprise Content Management (ECM) document archive scenario • Use average Office document types • Largest scale limits are document archive focused • Scale out across multiple content databases • Adds scale out and scale up • Test SharePoint without limits on hardware or storage resources • Index content with FAST Search • Load test with 15,000 concurrent users • Test upgrading on a very large farm

  5. FAST Search Index Multiple SharePoint Content Databases • Content database 60 million • Scale out permits multiple • New docs saved to dropbox • Content routing rules • Separate content databases • Index all content with FAST Documents Drop Box Document Library Archive Content Database(s) Content Routing

  6. Software Boundaries and Limits Impacted • New boundaries and limits for SharePoint released in July 2011 • SharePoint can scale to any customer requirement • Partly thanks to this test lab • Up to 200GB supported as before • Up to 4TB supported for ALL scenarios with requirements guidance • Unlimited size supported for Document Archive scenarios with requirements guidance • New limit of 60 Million itemsin a content database • 5TB SQL Server database instance limit is removed • Remote Blob Storage (RBS) does not alter these limits

  7. Value of Remote Blob Storage (RBS) • RBS allows Binary Large Objects to be stored outside SQL Server • Reduces the size of the SQL Server database to metadata only • This may be just 5% the total SharePoint Content Database • RBS does not alter SharePoint content size limits • Blob and Metadata must be synchronized during backup/ restore • Storage must return TTFB under 20 mS • RBS extensions must use supported SharePoint APIs and not do direct SQL database access • RBS Benefits • Allows use of NAS (with iSCSI) • ISV’s adding Tiered storage • ISV’s adding custom Backup and Restore and other management features • Performance improvements have been seen with > 1Mb files • Useful in write once archive scenarios • We didn’t use RBS in this test lab

  8. Very Large Scale Lab Whitepaper announcing The report with all this detail published on Monday http://go.microsoft.com/fwlink/?LinkId=229493 or http://blogs.msdn.com/pandrew

  9. NEC – Provided the Express5800 ServersIntel – Provided Westmere ProcessorsEMC – Provided the VNX5700 SAN Partner Contributions

  10. Testing Baseline on 100 million items, 30 TB

  11. Load Test Series A – Vary User Load • 4,000, 10,000, 15,000 • Web Front Ends consistently used 2.5 GB RAM • CPU use on WFE’s went down from 55% to 30% for 15,000 • 15,000 user load introduced some response delay

  12. Load Test Series B – Vary SQL RAM • 16GB, 32GB, 64GB, 128GB, 256GB, 600GB • No significant change in performance • Response time has a curve, but all under 1 second

  13. Load Test Series C – Vary Search Transaction Mix • 15%, 30%, 40%, 50%, 50%, 75% • Maximum of about 75 Search Queries Per Second for this Farm • Notice at 75% search we have exceeded the search capacity

  14. Load Test Series D – Vary Front End Server RAM • 4GB, 6GB, 8 GB, 16GB • No impact on Requests Per Second • Minimal impact on response time at 4GB

  15. Load Test Series E – Vary Number Web Front Ends • 2, 3, 4, 5, 6 • 2 WFEs was clearly not enough, RPS was down a little also for 2 • Nice chart showing reducing CPU as number WFEs increases

  16. Load Test Series F – Vary SQL Server CPUs • 4 CPUs, 6 CPUs, 8 CPUs, 16 CPUs, 80 CPUs • Impact in page response time when more resource available • Minimal impact to RPS at 4 CPUs

  17. Results from the Lab • Report published this week with all details of test farm and results • Published document generator and load tools • Published increased software boundaries and limits for SharePoint • 120 million 256KBitems loaded into 30TB SharePoint farm • FAST Search index to 100 million items • Farm renders pages and search results under load in 0.2 seconds

  18. SharePoint Architecture Paul Learning, MCS Sr. Consultant

  19. Very Large SharePoint farm demo Paul Learning Sr. Consultant Microsoft Consulting Services

  20. Did you see the keynote?

  21. Logical Architecture

  22. Physical Architecture

  23. Hardware in the Lab – Physical • SPDC01 (Domain Controller, DNS) • 4 CPU Core, 8GB RAM, 33GB Disk • PACNEC01 (SQL Host) • NEC Express5800/A1080a Server • 80 CPU Core, 1TB RAM, 2x 8GB Fiber Optic HBAs • SharePoint Service Application DBs and Content DBs, FAST Admin DBs • PACNEC02 (Hyper-V Host) • NEC Express5800/A1080a Server • 64 CPU Core, 1TB RAM, 2x 8GB Fiber Optic HBAs • VNX5700 (Storage Area Network – SAN) • 250x 600GB 7200 RPM SAS and 75x 2TB 5400 RPM NL-SAS drives in RAID10 for 120 TB • 2x 8GB Fiber Optic HBAs

  24. Hardware in the Lab – Virtual • 35 Virtual Machinesin total • Testrig1 through 20 VS Controller and Test Agents • APP-1 Central Administration, FAST SSA (Crawler, Query) • APP-2 Service Applications, FAST SSA, FAST Search Center • FAST-SSA-1/2 FAST Service and Administration • FAST-IS1/4 FAST Search Indexers (Index, Search, Web Analyzer) • WFE-CRAWL1 Dedicated FAST Search Crawl Target WFE • WFE-1/6 SharePoint WFEs

  25. Hardware in the Lab – Storage Area Network (SAN) • Content segregation to unique LUNs by database type is CRITICAL for reliability, high-scale and high-performance!

  26. Data Sizing Details • Each NEC 1080a had 8x 146 GB drives • Two Document Center Sizes reflected below • Corpus total was ~30TB content

  27. Data IOPS

  28. Document Creation and Loading • BulkLoader Utility • Up to 10 million unique Word, Excel, PowerPoint and HTML documents • Variable size (250KB used in lab effort) • .NET Framework 4.0, OpenXML 2.0 SDK and Wikipedia dump file required • http://code.msdn.microsoft.com/Bulk-Loader-Create-Unique-eeb2d084 • LoadBulk2SP Utility • 4 Processes containing 16 Threads each targeting unique DL • Mimics Folder/File hierarchy from file system • Loads using SPFileCollection.Add() method • Top load achieved was 233 documents/second • Average load achieved was 127 documents/second • http://code.msdn.microsoft.com/Load-Bulk-Content-to-3f379974

  29. Applying Service Pack 1 and June Cumulative Update

  30. SPC11 Keynote Demo • SQL Server “Denali” CTP3 Refresh • Windows Cluster Services • SQL Availability Group • Client Access Point with Clustered IP address • Reconstructed and connected to original SAN and Virtual Network • Full Farm Failover

  31. Lessons Learned for SharePoint • PURELY OUT OF BOX INSTALLATION FOR LARGE-SCALE LAB • No caching enabled • No Thresholds • No Site Quotas • Provided adequate recommendation of 2 IOPS per GB • SPFileCollection.Add() vs. SPFolder.CopyTo() • Add achieved max of 233 documents/seconds with 16 concurrent threads • CopyTo achieved max of 31 documents/second • Loopback Check Registry Key • Create Registry key and set to DISABLE • \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\DisableLoopbackCheck=1

  32. Lessons Learned for SQL Server • SQL Server MAXDOP=1 ; Default Installation Value=0 • Multiple LUNs on SAN and one virtual CPU to each LUN • Database segregation to unique LUNs, spindles and CPUs • Reduced SQL Server RAM to 600 GB • Table Index Fragmentation (Bulk load only) • SP Timer Job (Health Analyzer) did not function correctly • Microsoft.SharePoint.Administration.Health.DatabasesAreFragmented • Table indexes closely monitored during content loading • Determined Indexes most impacted by load and created SQL Stored Procedure to execute ALTER INDEX for dynamic rebuilds • Stored Procedure also executes job to Update Statistics • Procedure can be dynamically run at load start (Application Configuration)

  33. FAST Search for SharePoint Barry Waldbaum, MCS Architect

  34. FAST Search Server 2010 for SharePoint Deep Refinement Thumbnails Sort on any field Similar Results Previews Built on SharePoint Search Center Leverages all of innovations in SharePoint Open Web Parts, Federation, query suggestions, related queries, Did you mean? Visual results connects users with content Thumbnails for Word and PowerPoint Visual Best Bets highlight premium content Preview in browser without leaving the results

  35. Why was this interesting to me? • Big goals • Access to big iron! • Virtualized hardware and storage • SharePoint topology • Crawling SharePoint vs File Share content • Monitoringat this scale

  36. Screenshots

  37. Screenshots

  38. FAST Topology • 2 Physical nodes for document processing • 4 VMs • (16GB + 4 VCPUs) • Index, Search, Web analyzer • Disks: • C: 128GB VHD • (not expanded, < 40GB used) • E: 3TB LUN • IO Observed: • 100MB/s Reads, 100MB/s Writes, 1K IOPS

  39. SharePoint topology for FAST Search • 2 Crawl components + 2 Query components • VM specs: • 4 Virtual CPUs @ 16GB of memory • C: 128GB VHD (not expanded, < 40GB used) • Crawl Store database kept on a dedicated LUN

  40. SharePoint Crawler Configuration • Registry Settings on Crawler Nodes • HKLM\SOFTWARE\Microsoft\Office Server\14.0\Search\Global\Gathering Manager • FilterProcessMemoryQuota • Default 100MB, Changed to 200MB • DedicatedFilterProcessMemoryQuota • Default 100MB, Changed to 200MB • Monitoring the crawler via perfmon • <confirm> OSS FAST plugin: Batches Open, Ready, Submitted, Failed • Incremental Crawl • Can take an hour to kick off, high database load • 120M items crawldb stays under a 600GB • Overall Crawl rate around 70 DPS

  41. FAST Search Lessons Learned • We can run on big iron • FAST can run on VMs, but physical nodes do have advantages • The SAN performed very well • Monitor the crawl at least 3 times a day • SCOM • SharePoint • Perfmon • FAST command line tools • Backup of the index is not recommended at scale

  42. Monitoring inside FAST • FAST has lots of tools to monitoring what’s going on! • rc–r | select-string “# doc” • How busy are the doc procs • Monitoring crawl queue size • Use reporting or SQL studio to see MSCrawlURL • Indexerinfo –a doccount • Make sure all indexers are reporting to see how many are indexed in 1000 seconds • Indexerinfo –a status • Monitor the health of the indexers and partition layout

  43. FAST Search Tips and tricks • The limit of document processors per node is 20 • can be increased if procserver_21 is stopped • 50 ran successfully on the physical nodes • System maintenance during a crawl: pause the crawl • Do not ignore the capacity planning guide • Make sure your hardware is spec’d to the minimums • Admin node makes a great VM!

  44. References • Test Report (http://go.microsoft.com/fwlink/?LinkId=229493) • SharePoint Server 2010 capacity management: Software boundaries and limits (http://technet.microsoft.com/en-us/library/cc262787.aspx) • Estimate performance and capacity requirements for large scale document repositories in SharePoint Server 2010 (http://technet.microsoft.com/en-us/library/hh395916.aspx) • Storage and SQL Server capacity planning and configuration (SharePoint Server 2010) (http://technet.microsoft.com/en-us/library/cc298801.aspx) • SharePoint Performance and Capacity Planning Resource Center on TechNet (http://technet.microsoft.com/en-us/office/sharepointserver/bb736741) • Best practices for virtualization (SharePoint Server 2010) (http://technet.microsoft.com/en-us/library/hh295699.aspx) • Best practices for SQL Server 2008 in a SharePoint Server 2010 farm (http://technet.microsoft.com/en-us/library/hh292622.aspx) • Best practices for capacity management for SharePoint Server 2010 (http://technet.microsoft.com/en-us/library/hh403882.aspx) • Performance and Capacity Recommendations for FAST Search Server 2010 for SharePoint (http://technet.microsoft.com/en-us/library/gg702613.aspx) • Bulk Loader tool (http://code.msdn.microsoft.com/Bulk-Loader-Create-Unique-eeb2d084) • LoadBulk2SP tool (http://code.msdn.microsoft.com/Load-Bulk-Content-to-3f379974) • SharePoint Performance Testing Scripts (http://code.msdn.microsoft.com/SharePoint-Testing-c621ae38)

  45. © 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related