1 / 22

Accounting

Accounting . Dave Kant CCLRC, e-Science Centre. Outline. CPU User-Level Accounting Group and Role Accounting Accounting Portal Storage Accounting RUS Summary. Motivation. Accounting for Usage on the Grid

metta
Télécharger la présentation

Accounting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accounting Dave Kant CCLRC, e-Science Centre

  2. Outline • CPU • User-Level Accounting • Group and Role Accounting • Accounting Portal • Storage Accounting • RUS • Summary

  3. Motivation • Accounting for Usage on the Grid • If 10,000 CPU production years were consumed by a VO, who are the users that submitted the work? • Are the resources being utilised efficiently? • Kinds of accounting data • Anonymous, statistical accounting information of type “aggregated consumed CPU time per VO / site / month”. • This type of data is mostly uncritical and may be provided in a “world readable” way to scientific boards and communities. • VO-related • Can provide information about usage within the activities of the VO. Access restricted to members of the VO community. • User-related • This type of data can potentially be used to record and control the work of individual persons, and allows conclusions about his/her working methods, results, performance etc.

  4. User-Level Accounting • Focus of activities in EGEE/LCG has been in two areas: • JSPG User-Level Accounting Data Policy (D.Kelsey) • Discusses the treatment of accounting data • User Consent. • What is stored and where? • Who has Access rights? • For what purposes? • Implementation in middleware • Data Collection • APEL, DGAS • Confidentiality • Encryption • Reporting via Accounting Portal • Access control

  5. User-Level Accounting • Request the Tier-1s publish encrypted User DNs. • Policy document isn’t official yet, so there is no actual reporting of the information available. • 18 sites have already done this:- CESGA-EGEE , CIEMAT-LCG2, CNB-LCG2, CSCS-LCG2, FZK-LCG2, IFAE, IFCA-LCG2, INFN-TORINO, ITEP,JINR-LCG2, LIP-Lisbon, RAL-LCG2, RU-SPbSU, TAU-LCG2, UB-LCG2, UKI-NORTHGRID-LANCS-HEP, UPV-GRyCAP , USC-LCG2 • Most of these sites use APEL • Confidentiality based on encrypting the UserDN using 1024 bit public key cryptography.

  6. Group and Role Accounting • Accounting at the User-Level isn’t enough. • Users wear many hats • A requirement to implement a fine-grained management of permissions and privileges in the Grid • “Support for VOMS roles in production services” • Chris Brew et al. WLCG Workshop, CERN, 23 Jan 2007 • Need for VOMS aware grid services • Access to computing resources e.g. Job priorities • Access to data and storage resources • Example • Define how CPU resources should be allocated to different VO-related activities • Allocate 60% of CPU resources to production at a site for the next three months • Measure what has been delivered • Accounting • Implementation for CPU Accounting • On CE via the Grid Accounting Mapping File (Patch #898) • Maps grid credentials to local batch resources • userDN, userFQAN (VOMS), CE_ID, grid_job_ID + LRMS_ID • Implemented for BLAH (gLite CE and CREAM) and LCG-CE • UserFQAN is derived from LCMAPS on the CE.

  7. VOMS Group Tree • Many possible group hierarchies. • Regional groups • /vo/regionX/siteY • Activity-related groups • /vo/analysis/StandardModel • /vo/reconstruction • It is expected to have in future O(100) distinct group/role combinations • What we have seen so far… /cms/Role=production/Capability=NULL /lhcb/lcgprod/Role=NULL/Capability=NULL /atlas/Role=lcgadmin/Capability=NULL • Breakdown of usage according to activity and VO • FQAN Chain with primary and secondary components. • No activity information in the primary • Can only account usage to the VO /dteam/Role=NULL/Capability=NULL;/dteam/cern/Role=NULL/Capability=NULL

  8. Implemention in APEL • Data Collection via Sensors • Transportation (On-the-fly Encryption of UserDN) via RGMA • High level Aggregation and Reporting via Graphical Front-end High Level Reporting: Tables, Pies, Gantts, Metrics, Trees Aggregation

  9. APEL Test Record • What you see in R-GMA GlobalJobID: https://lxb1762.cern.ch:9000/-cfWmtwoGI5CUwGrI5gVmA UserDN: *APEL V.0.2* P7IMQIkPndWbCpeecX/iqqBhqYdsOp+DZl+9+o9GQ3BPXx8uzcHUqZOJQSq8F6KIczAvxSw+I8x8lwHJ6l9kxIyYbLTEkELI3Ul77I6zhzT90zUDLEpgsQ+0XY3tPRGwu5uG/ibtcWxefOtvZoM6FWwf8yZmv8yLpjmNkMxZOtI= UserFQAN: /dteam/Role=NULL/Capability=NULL;/dteam/cern/Role=NULL/Capability=NULL; ExecutingCE: lxb2034.cern.ch:2119/jobmanager-lcgpbs-dteam

  10. Is User FQAN Confidential? • User FQAN Chain readable in R-GMA by anyone with an IGTF approved certificate. • Should we encrypt this information?

  11. MySQL HotCopy RGMA HotCopyInsert (Every Hour) Offline DB Data Life Cycle • From Site to Portal • Desirable to refresh the data shown on the portal at frequent intervals • Every 8 hours • Site Publish • Migration • Extract from R-GMA • Send data to offline Database • Off-line Data Processing • Decrypt UserDN • Extract Group and Role from UserFQAN chain • Off-line Aggregation • Build summaries of data: • Site-Level, VO-level, User-Level, intra-VO 30 Million Job Records (2005-today)

  12. Portal • Accounting Portal • Access Right according to the “Actor Model” • Five Actors: • Users, VO-Resources Manager, VO-Member, Site-Admin GOC Admin • Three are implemented via DN proxy and ACLs • VO-Member access requires the VOMS proxy information. • Display Features • Build Resource statistics as a function of User and FQAN • How much CPU consumed by User X at Site Y • Top Ten Users in a VO • How much production work done by VO X at Site Y

  13. VO Resource Manager • Table shows CPU, WCT and Job Eff. of the Top 10 Anonymised Users • Breakdown of Usage: DN / VO / Group / Role • Work done by Pablo Mayo, Javier Lopez at CESGA

  14. Issues / Status • T1s consider whether they should report their non-Grid Usage • Should we attempt to distinguish grid from non-grid usage? • Not made any clear recommendations to sites for this if they choose to publish • OSG • Deployment of Gratia accounting system across the CMS and ATLAS tier 1 and 2 centres. • Centralised collection of CPU usage records. • Aggregation to deliver Monthly summaries. • Started publishing March 2007 using MySQL client. • NorduGrid • Deployment of SGAS • Have published job records in the early part of 2006 • Comparison between CERN and APEL • Found a large discrepancies in usage numbers which was traced to a bug in multiple CE support in APEL. • Implemented a bug fix. • Waiting for Feedback • Storing job level information centrally may not scale forever. • Push back to sites and only store summaries centrally • DGAS already does this. • Does Pilot jobs and GLEXEC break user-level accounting? • Do we have an answer to this question? • Account for the VO…

  15. INFN Status • Status at end of Nov 2006. • Data from DGAS HLR has been inserted into the GOC accounting database using DGAS2APEL • Transformation from their HLR schema to APEL schema • Use APEL publisher • Tests performed by Rosario Piro (INFN) • ~ 2,000 Test records published for INFN-TORINO-DEV • To be deployed at Torino site first, then other Tier-2s • However, we are not sure why they have not deployed DGAS2APEL across their sites….

  16. Storage Accounting • Display of Storage Information published by GIPs into the information system. • Sensor queries top-level BDII -> MySQL • Capture information every two hours • See talk by Greig Cowan • Process Data • Used and Allocated storage Disk and Tape at the VO level • Visualisation • Limitations: • Only a high level view • No breakdown of these numbers at a deeper level • UserDN, VOMS Group and Role • File Transactions (Uploads, downloads) • Protocols

  17. http://goc02.grid-support.ac.uk/accountingDisplay/view.php • Select Resources via a Tree • Select time interval (last year, last month, last week, last day) • Select VOs, SEArchitecture

  18. Issues • Enhance the LHC view of storage resources • Tier-1 and Tier-2 tree • Extend breakdown per site to the RRD graphs • To show AverageUsage, AverageAllocated per Week or Month per VO per LHC Tier1. • Example: If VO used 50TB tape in the first week of Jan, and then deleted the data… • How do we get data from OSG? • Bring the Storage and CPU reporting together into a single Portal. • Still only a prototype so sometimes doesn’t run. • Need to check correctness of data • See Greig Cowan talk • More work to be done on display • Comments welcome • Stability • Sensor dropouts address at WLCG • Migrate service to its own machine • What about gLite? • See Greig Talk • When will Glue Schema change? • What more can we learn at the BDII level?

  19. RUS • RUS is a specification for a service used to publish and query resource information. • Can be useful for LCG because • Many partners have their own implementations for collecting resource information. • Cannot control how the partners implement their tools, but you can specify – through project specification – how data should be exchanged between them. • Collect usage from each partner, put the data together and give a global view of resource usage

  20. LCG-RUS • User/VO level Accounting • HTTPS based authentication; • Every authenticated user automatically takes the role of Grid user; • The user who claimed to be the role of “VO manager” has to be authorised by query access control file; • Get aggregated usage information via “extraction” service interface; • On-demand job usage tracing through “tracing” service interface on deployed usage storages at sites; Deployed usage storages

  21. XML Job UR XML Job UR Site Site OGSA-DAI RUS architecture These properties are refreshed each time new data inserted in order to ensure efficient query RUS Response Request RUS Implementation XACML Policy authorization OGSA-DAI Request Wrapper OGSA-DAI Response OGSA-DAI Request OGSA-DAI service Server-side Aggregation Extensions Extensions for RUS Implementations as usage Data Sources Properties for User/vo/resource … accounts RUS plolicy Authorizer Other RUS Implementations

  22. Summary • User level accounting is deployed and many sites are publishing. T1s are encouraged to deploy. • This is not sufficient for the role/group based accounting that VOs have requested. • New release of APEL in certification; together with the accounting mapping log file, FQAN accounting is possible. • Storage Accounting Visualisation tools displaying Used and Allocated disk per VO; LHC view is needed. • RUS Prototype is based on the concept of an Aggregated Usage Record. RUS Working group aiming for implementations based on the OGSA-DAI framework.

More Related