Managing your Blackboard ® System for Growth and Performance

Managing your Blackboard® System for Growth and Performance Presented By Steve Feldman April 13, 2005

Welcome • Session Objectives: • Introduction to Capacity Planning • Introduction to Performance Management • Handling Performance and Capacity Issues • Introduction to Load Testing • Innovation • Methodology for Resolving Issues • Results/Outcomes • Awareness of what you are doing well or not doing at all.

Introduction: About Your Presenter • What do I do at Blackboard? • Director, Software Performance Engineering and Architecture • Part of Product Development, but interface with every department in Blackboard. • Manage the Software Performance Engineering (SPE) Process as part of the development lifecycle. • A few key points… • Been at Blackboard since the Fall of 2003. • Worked on AP2, AP3 and R7.0 • Manage a team of several developer/engineers. • Practicing Member of CMG

Performance Maturity Model: Where do you fit in? Level 5: Process Optimizing Level 1: Reactive Fire Fighting Level 4: Business Optimizing Level 2: Monitoring And Instrumenting Level 3: Performance Optimizing Michael Maddox, MCI; A Performance Process Maturity Model

Invest Your Time Understanding Performance and Capacity • Set Performance Objectives from the Start • Optimize Your Environment from the Start.

Set Performance and Capacity Objectives from the Start • It’s Never too late to define a performance or capacity objective. • Come as the result of a problem or issue • Solving a maintenance window or schedule • Planning for an upgrade • Planning for a rollout to new users • New Blackboard Building Blocks, Features or Integration • Define Clear and Concise Objectives • Measurable/Quantifiable and Achievable • Differentiate between Performance and Capacity • Processing Time versus Workload • Growth versus Adoption • Resource Utilization and Maintenance

Optimize Your Environment from the Start • Blackboard environments moving from supported to mission critical (Application Management Maturity Model) • Dedicate equipment and even network bandwidth. • Understand the working parts • Acquire knowledge about the integrated sub-systems. • Don’t need to be a web, app or db guru, but know enough to: • Manage and Maintain Independently • Research Knowledge Gaps • Solve Common Issues without Help

Optimize Your Environment from the Start • Optimize Environment from the Start based on Knowledge of Sub-Systems • Monitor and Instrument Regularly • Talk to Your Users about their Experience. • Investigate Yourself • Finding the Right Configuration takes time: • Make 1 Change at Time • Make the Change Based on Empirical Information (Not Hunches…) • Maintain a Consistent Configuration for 1 period of time (month, semester or a grading period)

Introduction to Capacity Planning

Capacity Planning: Building an Ideal Blackboard Environment • What is Capacity Planning? • Capacity Planning Factors • Determine an Initial Deployment Architecture. • Handling Adoption and Growth • Archiving Data • Backups and Restoration • Maintenance Windows and Tasks • Integrating with External Systems • Redundancy and Failover • Business Processes • Upgrades • Rolling out New Features • Capacity Planning Tools

Capacity Planning Factors: Determine an Initial Deployment Architecture • It’s Never Too Late to Consider or Reconsider Your Deployment Architecture. • Try to Understand Key Components • Eventual Audience Rollout • User Behavior • Session Patterns • Frequency • Concurrency • Data Management Strategy • Resource Needs • Processing • Storage

Capacity Planning Factors: Handling Adoption and Growth • Work with Functional Leaders to Understand Deployment Strategy • Adoption Patterns of Users and Features • Study Growth • Not just users and courses, but data and content. • Instrument daily, weekly, monthly, yearly, etc. • Study the Activity Patterns of your Users (Behavior Modeling) • Session Times • Where they go and what they do…

Capacity Planning Factors: Archiving Data • A Lot of Data Can be Viewed as Disposable to Many and Priceless by Few • Define a Strategy Early On About Archiving Data. • Enable Tracking and Study Last Modified • Use BB Tools to Archive and Export • Remove from the System • Maintain Activity Accumulator Data • Export • Purge Regularly

Capacity Planning Factors: Backups and Restoration • Database Backups • Differential versus Full • Depends on Size, Confidence in Process and Usage • Plan for the Unexpected • Restore on Development Environments Routinely • Store in a Safe Place • Practice During Maintenance Windows • File System Backups • Perform Regularly • Just as Valuable as database back-ups • Not just data, but configuration

Capacity Planning Factors: Maintenance Windows and Tasks • Keep Your Users Informed • Downtime/Outages • Periods where Performance Can be Affected • Schedule Regularly • Log Rotations • Server Restarts • Database Statistics, Index Rebuilt and Extent Management • Data Fragmentation • Archiving and Purging Data • Service Packs and Upgrades (discussed later)

Capacity Planning Factors: Integrating with External Systems • Understand the integration • What data is affected • Inbound versus Outbound • Frequency of Integration • Real-time versus Batched/Scheduled • Hopefully not manually intervened • Performance of both systems should not be affected based on integration

Capacity Planning Factors: Failover and Redundancy • Have a Plan • Make a Budget • If no budget, communicate plan and downtime • Practice for the Unexpected • Be Realistic • Built-In Capabilities for Redundancy and Failover • Blackboard Load-Balancing • SQL-Server Clustering and Oracle RAC • Quality of Service Models • Tomcat Clusters

Capacity Planning Factors: Business Processes • Define Schedule with Functional and Technical Leaders • Schedule for an extended period of time • Map out window based on need and usage • Model and Prototype • Make Sure the Window is Large Enough • Business processes should make sense and be realistic • Schedule During Periods of Low Usage and Non-Peak Times • Make it Repeatable, Automated and Easy to Debug

Capacity Planning Factors: Planning for Upgrades • Updating Versions of Blackboard • Take Advantage of New Features • Functional Patches • Performance Same or Optimized • Performance Requirement for Every Development Release • Updating Platform Technology • Platform Patches • Operating System Upgrades • Plan for Downtime (Data Restoration) • Updating Hardware Architecture • Plan for Downtime (Data Restoration) • Take Advantage of Faster, Cheaper Equipment

Capacity Planning Factors: Rolling Out New Features • Understand How New Features Change the Following: • Customer/User Behavior • Adoption • Growth • Resource Utilization • Integration Patterns • Business Process Changes

Capacity Planning Tools • Behavior Modeling • What is it? • What tools can you use? • Valid Instrumentation Periods. • What to look for and to learn from the data. • Homegrown Tools (What to Mine) • Last Modified • Growth Changes • Adoption Patterns • Concurrency Patterns • Business Processes (Run Times)

Behavior Modeling

Capacity Planning Resources • Modeling • SPEED • IBM Rational • Simul8 • Opnet • NetIq (WebTrends) • Many Freeware Products on SourceForge • Resources • Performance by Design : Computer Capacity Planning By Example; Menasce, Daniel

Introduction to Performance Management

Measuring Performance • What to Focus On • Response Time • Processing Time • Storage/Growth (volumetric patterns) • Workload (Processing and Memory) • Network Utilization/Bandwidth • Adoption/Behavior • New Features and Deployments • Plot, Measure and Model • Distinct Sessions • Physical Resource Utilization (Workload) • Logical Resource Utilization

Measuring Performance x Slope of Recovery Users Peak of Saturation Point of Max Workload Workload Peak of Concurrency s ∑ / Time = i = 0 Sessions Per Hour Slope of Abandonment Time 0 60

Quality of Service Paradigm • A web application’s quality of service is measured by response time, throughput and availability. • Poor quality of service leads to abandonment, decline in adoption and potentially permanently lost users. • QoS is key to assessing how well Web-based applications meet user expectations on two primary measures: availability and response time.

Quality of Service: All for One and One for All Architecture… • What exactly does this mean? • In today’s architecture no system, sub-system, use case, transaction, data element, etc. has a greater utility value then its neighbor component in the system. • Is this an accurate representation of the product? • In Blackboard, all things are not created equally or weighted equally in value as deemed by our users. • However, our architecture is such that all things are created and weighted equally. • Why is this bad? • The QoS of the application becomes unpredictable. • No guarantees can be made for capacity planning and utilization. • Clients rarely have the comfort level that their application environment is ever stable other then periods of light usage.

Quality of Service: All Things are Not Equal, So Let’s Not Treat them Equal… • From a psychological perspective, it’s easy to predict which systems have greater QoS needs then others. • Taking an assessment has a greater utility then reading an announcement. • Entering gradebook scores has a greater utility then adding a course document or folder. • From a workload perspective, it’s easy to conceptualize which systems demand greater QoS needs then others. • A lab of 20 students taking an assessment has a greater workload on the system then a lab of 20 students reading a course document. • A virtual workshop of 20 users collaborating has a greater workload then 20 students navigating through a course.

Quality of Service: Where Can We Go With This… • Resource management policies and procedures can be implemented to support the workload needs of the system. • Sub-system or potentially task workload monitoring. • Administrator defined thresholds for application management. • Seasonal deployment changes based on patterns/trends of usage or even predefined scheduling by course administrators. • Better utilization of capital expenditures. • Potentially more expensive with greater adoption. • Quantifiably reliable.

Adaptive Content Workload Content Collection Workload Assessment Workload Quality of Service: Example General Workload Distributed Workload

Dealing with Performance and Capacity Issues

Dealing with Performance Issues Solving a performance issue is no different then solving a functional issue. The same level of care and effort in solving the issue should be given. We recommend the following three steps as the appropriate path for problem determination and resolution: • Decompose the Problem • Resolve the Issue • Follow Up and Prevent

Dealing with Performance and Capacity Issues • Most clients fail to report performance issues. The bulk users of the system (students) rarely report issues. • Most Issues are reported when • Administrators experience performance issues first hand for their own tasks. • Instructors are performing course administration activities. • Instructors are working on the product in a classroom environment. • Administrators pick up student chatter in BLOGS and Discussion Boards. • What does that mean? • Identifying the actual performance bottleneck is hard and requires a well formulated approach. • Primarily performance issues are the result of: • Poor System Management in Dealing with Growth • Changes in Adoption Patterns (Concurrency Thresholds) • Functional Issues in the Application • Undersized Hardware and Resources • User Error (Unrealistic Operations)

Characteristics of a Good Problem Resolution Methodology • Measurable • Reliable • Deterministic • Practical • Finite • Predictive • Efficient • Impact Aware

Performance Resolution Methods • Trial and Error Method • Response Time Method • Do Nothing and Ignore Method • Blame the Users Sub-Method • Blame the Hardware Sub-Method • Blame the Vendor Sub-Method

Trial and Error Method • Identify that a particular operation X has an unacceptable response time. • Make changes with the intent of improving X. • Remove any changes that make X worst. • If improvement is not perceived, go back and make additional changes. • If the improvement is minor, then go back and make more changes as it is possible to produce more improvements with additional changes.

Response Time Method • Select the critical operations for which the business needs improved performance. • Collect proper diagnostic data during periods of poor performance with a focus on: • Response Time Consumption • Execute the optimization activity that will have the greatest net payoff to the business. • If the best payoff activity fails to yield desired results, then suspend optimization activities until something changes:

Example #1 Scenario: Butch (Student) logs into Blackboard to access music files he stores in Content Collection. He selects the appropriate tab and waits for the left navigation frame to completely load. He ends up waiting for 2 minutes until the tree fully loads. Angered by repeated incidents of this he sends a furious email to the system administrators complaining about his “lost time” waiting for the tree to load. Question: How do we address this problem appropriately?

Example #2 Scenario: The accounting department has decided to utilize the Blackboard assessment engine for high-stakes testing during semester mid-terms. The department has issued a 1000 question random block assessment, in which students will be responsible for answering 25 questions in an all-at-once deployment fashion. The department wants all 500 students to complete testing during a 2 hour window over the course of a week. The last time the department used Blackboard for high-stakes assessment, students complained about page load times and a few incidents in which students were kicked out of the application resulting in a locked assessment. Question: The department has approached your help. How do you avoid a repeated issue?

Example #3 Scenario: An integration between the campus SCT system and Blackboard must take place to ensure students and faculty exist in the system and with the appropriate course enrollment based on recent course registration. The integration must take place prior to the beginning of the semester. The same integration took place last semester, but was deemed a failure by the faculty as it took over a week for all courses, faculty and students to be entered and associated on the system. You were/are the administrator in charge of the integration. Part of the problem was that your data feeds from SCT were unorganized. Another problem is that you ran into a large number of system-level issues that caused your integrations to fail. Question: How do you reduce the risk and ensure successful integration?

Example #4: Scenario: You have procured budgetary funding to replace the older Blackboard servers and storage device for newer hardware. This new hardware is expected to solve all of your performance problems. The new servers will arrive in late May, which will give you 45 days to configure and convert your Blackboard environment before the bulk of your students get back on the system. You have been told by your boss that the system can only be down for 48 hours, as the summer school still uses Blackboard. Question: How do you ensure a smooth conversion with minimal downtime? What can you do in advance? How would you spend your 48 hours of downtime?

Example #5: Scenario: Suzie (Blackboard Administrator) has been contacted by her boss about a change in the school’s Blackboard licensing. The school had been using a Blackboard Learning System™ - Basic license for the past two years. They have upgraded to the Blackboard Learning System and purchased the Blackboard Community System™ and Blackboard Content System™ in order to support a new distance learning initiative. Her boss tells Suzie that she is responsible for the following: • Purchasing of hardware and storage to support new products. • Software Upgrade from Blackboard Learning System – Basic Edition to Blackboard Learning System • Installation and Configuration of the new implementation. The new software components are expected to change the way Blackboard has traditionally been used at the school. There will be lot’s more data, and will cater to a community 10X the size of the present implementation. Question: What can Suzie do in order to prepare for the change in features, adoption and growth?

Performance Resources • Measurement • Windows Tool Kit, Top, Sar, VMStat, Prstat • JProbe, OptimizeIt, HPJmeter, JMPI/Thread Dumps • Hotsos, Statspack, TKProf, Enterprise Manager, Query Analyzer • Performasure, Spotlight, Patrol, Unicenter • Apache Server-Status, JVMStat, VerboseGC • Resources • http://support.microsoft.com/kb/224587 • http://www.sql-server-performance.com/jc_sql_server_quantative_analysis1.asp • http://www.javaperformancetuning.com • http://www.oraperf.com • http://www.ixora.com.au • http://www.hotsos.com • http://perl.apache.org/docs/1.0/guide/performance.html

Introduction to Load Testing

Introduction to Load Testing Load Testing is the process of… • Simulating synthetic workload on a software application. • Identifying where bottlenecks exist: • Software Layer • Hardware and/or Interface Layer • Determining software and system capacity capabilities under a given workload. • Attempting to meet or exceed a predefined performance objectives. • Representing conditional patterns of application usage.

Introduction to Load Testing • Software load testing requires a significant investment from an organization both financially and operationally. • Most commercially available load testing tools cost tens of thousands of dollars to purchase and maintain. • Organizing and managing a staff focused on using these specialized tools bears similar expense. • Organizations must be prepared to deal with the results of the load tests. • Optimizing Software (Refactoring) • Identifying Accurate Sizing and Capacity Configurations

Components of Load Testing • Reusable autonomous actions in the application (Create, Read, Delete, Update and Execute) • Isolated verification points • Incorporation of abandonment (patience rating) Library of Test Assets • Capture statistical overview of current implementations (data models) • Study usage patterns and trends for simulation • Develop performance data models based on findings. Volumetrics and Usage Analysis Scenario Definition • Simulation of realistic scenarios based on actual usage (artifacts) • Focus on sessions per hour rather then solely on concurrency • Session Outcomes: Abandon, Abort, Continue or Idle. • Define user patience rating (Will users abandon if the transaction or site are slow) • Incorporate as a means of preserving realistic/expected usage patterns. Abandonment

Load Testing as a Part of the Blackboard SDLC

Load Testing as a Part of the Blackboard SDLC • Five step process deep rooted in designing for performance before a feature is developed. • Part of the requirements process by assessing risk, defining performance requirements and isolating high-impact use cases. • Study artifacts of performance within current implementations: • Usage Analysis • Data Collection (Volumetrics within the Data Model) • Isolate software contention by identifying software anti-patterns. • Refactor and optimize the software application layer (business logic and database structure). • Performance test the software under conditional and common load on standard/recommended configurations. • Simulate Abandonment for Calibration Purposes • Generate enough samples of a given function • Stay within 2 Sigma (95% response time)

Managing your Blackboard ® System for Growth and Performance

Managing your Blackboard ® System for Growth and Performance

Presentation Transcript

MAXIMIZING PERFORMANCE “Making a Difference”

Blackboard ® IP Devices

Peripheral Nervous System 1: The Somatic System

Chapter 16 Appraising and Rewarding Performance

SAP-HR: Performance Management System (PMS) Module

MANAGING MARKETING PERFORMANCE

Endocrine System

Growth and Development

Managing Misbehavior

Performance Management

Engaging Learners through Communication and Collaboration Blackboard Learn ™ R9.1

Leadership

Managing Statistics for Optimal Query Performance

Blackboard

Incident Command System Basic Course

Auto 165 – Automotive Engine Performance

Dynamic Balanced Scorecard System

TAU Performance System

Health System Performance in Selected Nations: A Chartpack

The Reproductive System

Growth and Development