1 / 22

Accounting in LCG

Accounting in LCG . Dave Kant & John Gordon CCLRC, e-Science Centre. APEL in LCG/EGEE. Overview Batch System Support Data Aggregation. Types of Accounting. Job Accounting AFTER the event (APEL Domain) Concept of a “Job” as a unit of resource consumption

xanto
Télécharger la présentation

Accounting in LCG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accounting in LCG Dave Kant & John Gordon CCLRC, e-Science Centre

  2. APEL in LCG/EGEE • Overview • Batch System Support • Data Aggregation HEPiX, Fall 2005, SLAC - 2

  3. Types of Accounting • Job Accounting AFTER the event (APEL Domain) • Concept of a “Job” as a unit of resource consumption • Determination of value after job execution • Job usage record as a complete description of resource consumption • Suitable for post paid services • Metering • Real Time Accounting (DGAS, SGAS Domain) • Incremental determination of resource value while job being executed • Incremental decrement of account balance • Can enforce user quotas • Suitable for pre-paid services HEPiX, Fall 2005, SLAC - 3

  4. APEL, Job Accounting Flow Diagram • [1] Build Job Accounting Records at site. • [2] Send Job Records to a central repository • [3] Data Aggregation HEPiX, Fall 2005, SLAC - 4

  5. Accounting for Grid Jobs • Build Job Records at Site • APEL mapping grid users to the resource usage on local farms HEPiX, Fall 2005, SLAC - 5

  6. User queries Graphs GOC Job Records In via RGMA Consolidation of Data 1 Record per Grid Job (Millions of records expected) RGMA MON SQL QUERY TO Accounting Server 1 Query / Hour Summary data refreshed every hour (Max records about 100K per year) Home Page On-Demand Accounting Pages based on SQL queries to summary data

  7. Accounting Home Page http://goc.grid-support.ac.uk// 127 Sites publishing data (Oct 10 2005) 3.9 Million Job records ~ 100K records per week (period June – Oct 2005)

  8. Batch Support in APEL Currently Available in LCG 2.6 • OpenPBS, Torque, PBSPro and Vanilla PBS • ~90% Sites in LCG/EGEE • Load Share Facility (Versions 5 and 6) • CERN, Italy Available in LCG 2.7 • Condor • Canada • Sun Grid Engine in development • Imperial College HEPiX, Fall 2005, SLAC - 8

  9. Demos of Accounting Aggregation Global views of resource consumption. • LCG View • http://goc.grid-support.ac.uk/gridsite/accounting/tree/treeview.php • Shows Aggregation for each LHC VO • Requirements driven by RRB / Kors Bos • Tier-1 and Country entry points • LHC VO only • All data normalised in units of 1000 . SI2000 . Hour • Tabular Summaries per Tier1/ Country • GridPP View • http://goc.grid-support.ac.uk/gridsite/accounting/tree/gridppview.php • Shows Aggregation for EGEE partner • Prototype for EGEE View HEPiX, Fall 2005, SLAC - 9

  10. LHC View: Data Aggregation For VOs per Tier1, per Country

  11. Aggregation of Data for GridPP

  12. Aggregation of Data for Tier2

  13. Data Aggregation at Site Level Breakdown of data per Vo per month showing Njobs, CPUt, WCT, record history Total CPU Usage per VO Gantt Chart NB:Gaps across all VOs consistent with scheduled downdowns in GocDB

  14. APEL Summary • APEL is not a banking system. • Job accounting AFTER the event; Not in real-time. • APEL designed to build accounting records at a site • Supports PBS and LSF; SGE (done) Condor in development • Middleware Independent. • Although APEL uses R-GMA in LCG/EGEE, it could quite happily use any other mechanism for transportation (e.g. MySQL, WebServices, GridFTP). • Can be deployed on other grids e.g. OSG • Implementation is simple. • One database per site • One central repository • APEL provides high level views of usage Data • Can also show usage at the dn level with restricted access via ACLs (GridSite) • APEL has been running on the production EGEE grid for 1 year HEPiX, Fall 2005, SLAC - 14

  15. What Lies Ahead? • Challenges Ahead • World Wide Accounting Service for LCG HEPiX, Fall 2005, SLAC - 15

  16. Challenges Ahead • Recognise that accounting isn’t just about “job usage” its about Resource usage which encompasses many things:- • CPU Usage  • Also Storage & Network Usage (Who should we talk to?)  • How do we describe this data? • Luckily there is a GGF Usage record which provides a generic description of resource usage  • Are these descriptors stable? • Are they sufficient to describe the data? • Can we get Network and Storage people to use the same schema? • CPU is consumed; Storage is Occupied and can be recycled • How important is accounting? • Compute resource viewed as a grid currency • Need a guarantee that the data has not been tampered with in an un fair way • How does normalisation fit into this? The concept of a raw usage records has no meaning if internal scaling is applied to Heterogeneous farms. • GGF UR allows a “cost” descriptor • Do we need an agreement of cost? HEPiX, Fall 2005, SLAC - 16

  17. Challenges Ahead • Data Collection • Many implementations for collecting accounting data in LCG World; • APEL/DGAS in EGEE • SGAS in SweGrid • Sites that implement their own systems (FermiILab: multiple grid job managers from different grids feed a single condor pool) • Also OSG who are interested in deploying APEL with their own transport mechanism. • Switching one for another doesn’t resolve the problem of data sharing across the project. • No mechanism in place to share this data in a consistent way in place. • GGF Working on a Resource Usage Service  • What would the model for data sharing look like? Low level or high level? • Low Level: sensors publishing data via a web service? • High level: Data collected within the infrastructure, aggregated in a meaningful way, reviewed and approve data before it can be passed on (FermiLab) • Some Tier-1 centres have concerns about data association “LCG not EGEE” “Will the service be separate?” HEPiX, Fall 2005, SLAC - 17

  18. Challenges Ahead(This Slide Highly Recommended by Jeff Templon) • Usage Reporting at what Level? • Anonymous level: How much resource has been provided to each VO • Aggregation across: VOs, Countries, Regions, Grids, Organisations • Granularity: summed over units of Hours, Days, Weeks, Months? • User Level Reporting? • If 10,000 CPU hours were consumed by Atlas VO, who are the users that submitted the work? • Data privacy laws • A Grid “DN” is personal information which could be used to target an individual. • Who has access to this data and how do you get it? • Can CA policies change to support anonymous DNs and reverse DN mappings? • What are the consequences? Are there any lawyers in the audience? HEPiX, Fall 2005, SLAC - 18

  19. World Wide Accounting Service for LCG • Project involves combining results from all three peer infrastructures and presenting an aggregated view of resource usage for LHC VOs to the RRB • Peer Infrastructures in LCG • Open Science Grid + Others (Ruth Pordes, Philippe Canal, Matteo Melani) • Nordugrid (Per Oster, Thomas Sandholm) • LCG/EGEE (Kors Bos, Dave Kant) GRID-ACCOUNTING@LISTSERV.RL.AC.UK HEPiX, Fall 2005, SLAC - 19

  20. Web Service Container Service Interface RUS WS Application ACL DB Resource Usage Service • Based on emerging GGF standards and Web Services • GGF UR, OGSI • An implementation exists in “Market for Computational Science” – UK e-Science project • Use case might be: • A user invokes the query service through a web browser, using SSL for client authentication, to ensure that usage information at user level belongs to the user. Servlet sends query to RUS web service and gets user data. HEPiX, Fall 2005, SLAC - 20

  21. Possible Roadmap • Stage 1: Lets try to get some data from each of Tier-1s summary records describing VO usage over a finite period of time • Before end 2005 • SweGrid and Fermilab and DGAS ARE providing Data! • Stage 2: Centralised database with a web service interface (RUS) to publish/query accounting data (summary records) • Sometime in 2006 • Stage 3: Distributed databases with a complete RUS implementation including permission model. • Sometime early 2007 --- END --- HEPiX, Fall 2005, SLAC - 21

  22. Summary of Issues • Batch systems • PBS and LSF • Working on Condor, SGE • BQS - see below • Privacy • different laws - personal data must not be transmitted over the net unencrypted • Personal information - must not be able to identify individuals or derive patterns of use. • has blocked some sites from transmitting ANY data. (legal paralysis) • Existing accounting • No need to do anything twice. APEL can accept aggregated data which can be plugged straight into the summary tables (per site, per day, per VO, njobs • Multiple Grids - common portal • similar to existing accounting solution, other grids can publish in GGF schema to a common repository • Grids should use a standard repository/interface • working with SWEGrid, GridIT, OSG HEPiX, Fall 2005, SLAC - 22

More Related