1 / 20

NCAR storage accounting and analysis possibilities

NCAR storage accounting and analysis possibilities. David L. Hart, Pam Gillman, Erich Thanhardt NCAR CISL July 22, 2013 dhart@ucar.edu. Why storage accounting?. Big Data Increasing cost of storage with respect to compute NSF data management plan mandate Tools for users

louie
Télécharger la présentation

NCAR storage accounting and analysis possibilities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCAR storage accounting and analysis possibilities David L. Hart, Pam Gillman, Erich ThanhardtNCAR CISLJuly 22, 2013 dhart@ucar.edu

  2. Why storage accounting? • Big Data • Increasing cost of storage with respect to compute • NSF data management plan mandate • Tools for users • Some info is better than no info • Some process is better than ad hoc fire drills • Supports allocation processes

  3. Accounting for archive storage • NCAR has “charged” users for archive use for many years. • Archive accounting has institutional inertia • NCAR HPSS details, June-July 2013

  4. Archive storage record • Activity date – date record was collected • Activity type – Read, Write, Storage • Unix uid • Project code – project to charge • Number of files • Bytes – read, written, or stored • Class of service – e.g., single-copy, dual-copy • DNS – of client host • Frequency – interval, in days, between accounting runs

  5. Collecting data from HPSS • Read/write activity • Analyze logs from HSI and HTAR (since May 2013). Logs archived daily, processed weekly. • Storage activity • Weekly DB2 table scan and separate post-processing steps. • Accounting system impact • Approx. 6,000 records per week • Major accounting requirements • Use of HPSS accounting hooks to associate NCAR project code with HPSS file “account” • Accounting system and HPSS enforce requirement for every user to have a “default project” to which files will be charged if no other project provided

  6. Accounting for disk storage • Focus on long-term project spaces, which are allocated • But mechanism captures scratch snapshots, too! • GLADE total storage, June-July 2013

  7. Disk storage record • Event time – date record was collected • Project directory • Group — Unix group • Username • Number of files • kB used • Period — reporting interval, in days • QOS — a quality of service field (for future use)

  8. Collecting data from GPFS • File systems don’t have concept of “project”, but GPFS has notion of “file sets” • Leverage file sets to map to project spaces • For scratch, work, home: report per-user data • Process runs weekly, provides a storage snapshot • With GPFS tools, process requires only a few minutes to complete—full file system scan not required • Accounting system impact • Approx. 4,000 records per week • Major accounting requirements • Agreements and processes between GLADE administrators and User Services about how spaces are created • Deviation would break the system

  9. Analysis and reporting

  10. Storage growth over time (1) HPSS growth in 2013 GLADE growth in 2013

  11. Storage growth over time (3) User reports show project by week and per-user breakdown

  12. Top consumers

  13. Aggregate behavior (1) Net growth, 3/3-4/7 — ~261 TB

  14. Aggregate behavior (2) Data written, 3/3-4/7 — 594 TB

  15. Compute v. storage (1)

  16. Compute v. storage use (2)

  17. Big compute != Big data

  18. What is “Big Data”? Average file size vs. Total data holdings

  19. Managing “orphaned” files • Verifying accounting records lets site operators identify files owned by inactive users or inactive projects • On July 7, HPSS accounting showed 177 users with 885 TB of “orphaned” files • Early outreach to users and project leads does translate to deletions and fewer files for whom an owner cannot be found • Users required to be “actively engaged” in the disposition of their archive holdings. www2.cisl.ucar.edu/docs/hpss/policies

  20. Questions?

More Related