1 / 43

Large Databases in Industry

Large Databases in Industry. Wendy Moncur Department of Computing Science, University of Aberdeen. Large Databases in Industry. Database design & management in a major bank Case study 6000-table Personnel database. My background.

Télécharger la présentation

Large Databases in Industry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large Databases in Industry Wendy Moncur Department of Computing Science, University of Aberdeen

  2. Large Databases in Industry • Database design & management in a major bank • Case study • 6000-table Personnel database

  3. My background • DataBase Administrator (DBA) at one of UK’s largest banks. • Designed databases for high performance & availability. • Platform: DB2 & SQL • Largest database: 6000 tables

  4. DBA Salaries • DBA Average Minimum Salary £41,896 • DBA Average Maximum Salary £47,147 • Source: http://www.itjobswatch.co.uk (2008)

  5. What does a DBA do? • Database design & creation • Quality assurance of SQL • Database optimisation • Performance management • Database administration • Security

  6. Database design & creation • Process of fitting a database design to clients’ requirements. • Database design achieved in 3 phases: • Conceptual – model data independent of all physical considerations • Logical – refine and map conceptual model onto relational model (or some other database model such as object-oriented ) • Physical – map logical model onto a specific DBMS

  7. Quality assurance of SQL • Review application code written by developers • Understand application • Use EXPLAIN to check individual SQL statements • May need to change application or indexes • Are indexes used? • Is the run time acceptable? • Batch • Online

  8. Database optimisation • Improve indexing • Delete redundant indexes • Check order of columns in multi-column indexes matches application needs: • e.g. – Personnel table with index on Surname, FirstName versus FirstName, Surname • Confirm whether indexes should be Ascending or Descending • Verify clustering key is appropriate

  9. Performance management • Clients will specify: • Database implementation date • Online transaction times in milliseconds • Batch process run times • Recoverability of data

  10. Database administration • Once the database is ‘live’: • Backup and recovery strategy • How far back? • How many transactions lost? – depends on business data held • Reorganisation strategy • Trade off between availability & performance • Implementation of changes on database • Application implementation – stability tests

  11. Security • Control access to data in test & production • Even test data may be sensitive • Sample data from live database • LEB: “Baroness Gardner of Parkes” • Coutts is the bank for the Queen …. • Only DBAs should have access to delete or modify the database…. • Use views to control users’ & developers’ information access

  12. Other DBA Deliverables • Documentation for: • Requirements specification • As defined by clients, developers, managers, contractors • Design decisions – in case of problems/ upgrades • Application design reviews and tests • Handover to Production

  13. Career structure • Graduate • Trainee DBA • DBA • May be split into production or development • Production - £££ for being on call • Development – less stress! • Consultant/ Team leader

  14. Case study: the monster database • 6000+ tables • 18000+ indexes

  15. Part1: Challenges • “One size fits all” • External supplier • 6000+ tables • 18000+ indexes • 1 tablespace • Short timescale

  16. Challenges: “one size fits all”? • One size does not fit all. • Performance of SQL statements dependent on: • Database design • Index design • The DATA

  17. Challenges: “one size fits all”? • Every company has different requirements. • Customers demand high performance... and control the budget. • Service Level Agreements (SLAs) dictate … • Minimum transaction speed • Number of concurrent users • Number of remote locations • Daily system availability • Database must be tailored to achieve site-specific SLAs.

  18. Challenges: external supplier • Software package & database from external supplier. • Cannot change this.

  19. Challenges: 6,000+ tables • Cannot change tables: no denormalisation allowed. • Supplied program code demands these tables exist. • Cannot change supplied program code unless essential.

  20. Challenges: 18,000+ indexes • Can change indexes: • Unique indexes • Clustering indexes • Secondary indexes

  21. Unique index • Defines what makes a row unique. • Components of the index cannotbe changed. • Order of componentscanbe changed.

  22. Unique index E.g. – for Table “EMPLOYEE” Unique index =DateOfBirth, Firstname, Surname. Most queries ask for data where only Surname, Firstnameare known. SELECT Surname, Firstname, DateOfBirth From Employee Where Surname= “Jenkins” AndFirstname= “Malcolm” ; Recommendation: Change order of unique index to Surname, Firstname, DateOfBirth.

  23. Clustering indexes • Defines the physical order in which rows of data should be stored. • Components of the index can be changed. • Order of components can be changed.

  24. Clustering indexes E.g. – Table “EMPLOYEE” Clustering index = DateOfBirth Yet most queries order by EmploymentStartDate SELECT EmploymentStartDate, Surname, Firstname From Employee Where Surname = “Jenkins” And Firstname = “Malcolm” ; Order by EmploymentStartDate; Recommendation: Change clustering index to use EmploymentStartDate.

  25. Secondary indexes • Not unique. • Do not dictate how the data is to be held. • Created to improve performance of queries and updates. • Increases cost of insert and update, as must be created and maintained along with the table. Recommendation: Drop superfluous secondary indexes.

  26. Challenge: Short timescale • At least 4 test environments: • 96,000 objects! ((6,000 tables + 18,000 indexes) * 4 environments) • 3 months Vanilla Unit test System test • Pre-live

  27. Tools • Use tools to… • Check performance of each SQL statement • Manage change process

  28. Check performance • “EXPLAIN” • Evaluates route to data for every SQL statement. • Identifies what indexes are used • Doesn’t identify redundant indexes • Doesn’t identify indexes that need to be changed.

  29. Manage change process • Rigorous control needed • Achieved through… • Consistent naming standards • Detailed record of every change • Consistent route through environments, no short cuts • DBA tools

  30. Can’t change: “One size fits all” External supplier 6000+ tables Can change: 18000+ indexes 1 tablespace Short timescale Part1: Recap of challenges

  31. Part2: The Production Database • Does it perform? • Can the right people use it? • If disaster strikes, can the data be recovered?

  32. Does the database perform? • Database performance monitored against Service Level Agreements (SLAs). • Regular health checks carried out: • Data stored in sequence? • Enough space? • If sub-standard performance, further database design work done.

  33. Can the right people access the data? PERSONNEL database

  34. Can the right people access the data? PERSONNEL database Personnel team Query & update data at individual or regional level

  35. Can the right people access the data? DBA Backup/ restore data Reorganise data Change database definitions Update statistics on data PERSONNEL database Personnel team Query & update data at individual or regional level

  36. Can the right people access the data? DBA Backup/ restore data Reorganise data Change database definitions Update statistics on data Chief executive Employee statistics PERSONNEL database Personnel team Query & update data at individual or regional level

  37. Can the right people access the data? DBA Backup/ restore data Reorganise data Change database definitions Update statistics on data Chief executive Employee statistics PERSONNEL database Staff member Their own data Personnel team Query & update data at individual or regional level

  38. Can the right people use the database? • Different people, different information needs. • Sensitive data – salary, health, discipline… • Solution • VIEWS • Transaction Management

  39. If disaster strikes, can the data be recovered? • Robust backup & recovery strategies for: • Hardware failure • Software failure

  40. Part2: Recap of Production Database issues • Database must perform to acceptable level. • Only the right people should have access to any data item. • No matter what, the data must be recoverable.

  41. Summary • MSc learning relevant to real world • Everything is bigger out there! • Grounding in basic understanding lets you handle complex challenges

  42. Questions?

More Related