140 likes | 252 Vues
The Aggregate Topology Provider (ATP) was initiated in 2009 as part of the EGEE project, with contributions from key developers like Steve Traylen and James Casey. ATP consolidates grid topology information from multiple sources, serving as a single authoritative resource for managing groups of resources from a Virtual Organization (VO) perspective. Comprising three main components—ATP.Sync, MYWLCG.ATP.Web, and MYWLCG.ATP.API—it facilitates data synchronization and provides interfaces for web display and programmatic access. Despite its advancements, ATP faces challenges in supporting diverse topology entities and relational database management.
E N D
SAM Aggregated Topology Provider pedro.andrade@cern.ch 5 June 2013 IT/SDC/MI section meeting
History • Development started in 2009 during the EGEE project by Steve Traylen, James Casey, David Collados, and others • Many features/improvements added by the BARC team during 2011-12 • Maintained by me in the last months
Overview • ATP scope is: • Aggregate grid topology info from different sources • Single authoritative source of grid topology info • Manage groups of resources from VO perspective
Architecture GOCDB BDII OSG Central Operations Portal ATP Sync VO Feeds MyWLCG ATP API MyWLCG ATP WEB ORACLE MYSQL
Architecture • ATP is composed of 3 main packages: • ATP Sync: A python based package to periodically synchronize data from various topology providers. It also includes PL/SQL for Oracle/MySQL and the Django model. • MYWLCG ATP Web: A front-end for ATP developed in Django. It provides a web interface to display/find the topologies of grid resources • MYWLCG ATP API: A front-end for ATP developed in Django. It provides programmatic feeds to expose ATP data through JSON/XML interfaces.
Input • CIC Portal • VOs • VOMS • VO contacts • GOCDB (EGI) • Sites, Services • Flavours, Downtimes • Site and region contacts • RSV (OSG) • Sites, Services • Flavours, Downtimes • Capacity • GStat: • Capacity • REBUS: • WLCG federations • WLCG tiers • BDII: • Service endpoints • Services/VOs mapping • MPI info • VO Feeds • VO groups of services
Clients • ATP WEB: POEM, NCG • ATP DB: MRS, ACE, MyWLCG
Source Code Repo: http://svnweb.cern.ch/world/wsvn/sam/trunk/atp/ Doc: http://sam-doc.web.cern.ch/sam-doc/atp/doc/build/html/
Configuration • Default configuration structure distributed in ATP package • atp_synchro.conf : main configuration file • atp_db.conf : database connection configuration • atp_logging_files.conf : location of log configuration file • atp_logging_parameters_config.conf : log configuration • roc.conf : list of enabled regions • vo_feeds.conf : list of enabled vo feeds
Execution • Cronjob running ATP daemon: [root@samnag031 ~]# cat /etc/cron.d/atp-sync 50 * * * * edguser [ -f /var/lock/subsys/atp_synchro ] && ( /usr/bin/atp_synchro -d /etc/atp/atp_db.conf -c /etc/atp/atp_synchro.conf -l /etc/atp/atp_logging_files.conf ) > /dev/null 2>&1 • ATP sync execution is structured in synchronizers: [root@samnag031 ~]# cat /etc/atp/atp_synchro.conf cic_portal = Yes gocdb_topology = Yes gocdb_downtime = Yes osg = Yes osg_downtime = Yes gstat = Yes bdii = Yes vo_feeds = Yes
Logs • Log of last execution: /var/log/atp/atp.log • Log of all executions: /var/log/atp/atp_full.log (logrotate) • Errors are also sent to system logging • Six levels of debugging: • CRITICAL, ERROR, WARNING, INFO, DEBUG, NOTSET • Default configuration is on INFO (20) • Standard log file line: • “2012-03-22 15:24:02,308 - ATP - INFO - CIC - Execution – Starting” • CIC: synchronizer name (e.g. CIC, GOCDB Topology, VOFeeds, etc) • Execution: task type (e.g. configuration, validation, execution) • Starting: action description
Debug Tips • The atp.log is quite useful to understand problems: • It will at least help to locate the affected synchronizer • However ATP is based on many PL/SQL procedures/functions: • SQL developer will help ;) • ATP synchronizes from distinct external data sources. ATP execution fails due to “invalid” or “not available” input data: • Check the “aalidation” tag in atp.log to understand which data source was not reachable or was providing invalid data
Problems • No support for other topology entities • Designed to monitor only services • Services check • Strict dependency on services declared in GOCDB, OIM • Duplication of PL/SQL code • Difficult to manage two versions for Oracle and MySQL • Complex relational database model • e.g. isdeleted flags
Suggestions • ATP was started in Sep 2009… 4 years ago • Perhaps it is ready for retirement • The grid topology is always evolving • Perhaps less focus on state, and more on history • Support for two RDBMS is hard • Perhaps no RDBMS can be even better