300 likes | 559 Vues
Jeff Naughton March 14, 2005. Leveraging Database Technologies in Condor. Overview. Introducing ourselves How we got involved What we are doing and what we hope to do Request for input. Who we are. Faculty: David DeWitt, Jeff Naughton
E N D
Jeff Naughton March 14, 2005 Leveraging Database Technologies in Condor
Overview • Introducing ourselves • How we got involved • What we are doing and what we hope to do • Request for input
Who we are • Faculty: David DeWitt, Jeff Naughton • Students: Jiansheng Huang, Ameet Kini, Christine Reilly, Eric Robinson, Srinath Shankar, Lakshmikant Shrinivas
Wisconsin DB Group • A world-leading DB research group for over 20 years. • Strong presence in: • Research publications. • Grads on faculty at top schools (Berkeley X 2, Cornell X 2, CMU) • Grads at top industrial DB research centers (IBM Almaden, MS Research) • Grads in development organizations of main DB companies (IBM DB2, Oracle, MS SQL Server) • History of influential software artifacts (WiSS, Gamma, Exodus, SHORE, Paradise)
So how did we get to Condor/Paradyn week? • 4th floor of CS building: 4361 Naughton, 4367 DeWitt, 4369 Livny (adjacent offices!) • Miron was very persuasive. His algorithm: • Enter our offices. • Describe some challenging and interesting data management problem Condor faces or will face. • Leave office, get on airplane. • Return to Madison, go to 1.
Why Condor and DBMS? • Premise: A running Condor system is awash in data: • Operational data • Historical data • User data • DBMS technology can help capture, organize, manage, archive, and query this data.
Three potential levels of involvement • Passively collect and organize data, expose it through DB query interfaces. • Move/extend some data-related portions of Condor to DBMS (Condor writes to and reads from DBMS) • Provide services to help users manage their data.
Why do this? • For Condor developers: • Easier to trouble shoot and debug the system; • Easier to implement new functionality; • Less time hassling with data management issues; • Power of declarative data management language. • Easier to make data management aspects of the system scalable; • Leverage 25 years of DBMS research on scalable data management.
Why do this? • For Condor administrators • Easier to analyze and trouble shoot; • Easier to audit; • Easier to explore current and past system status and behavior.
Why do this? • For Condor users: • An ever-improving system due to more productive developers and administrators. • Easier to monitor and understand performance of their jobs. • Easier to analyze history of their use of the system. • Complete record of every job they have submitted, and everything that happened to every job while it was running. • Support for detailed data lineage queries. • Data management facilities to assist them in handling large, complex, inter-related data sets.
Our projects and plans • Quill: Transparently provide a DBMS query interface to job_queue and history data. [ready to deploy!] • CondorDB: Transparently captures and provides interface to critical data from all Condor daemons. [status: partial prototype working in our own “sandbox”]
Longer-term plans • Tight integration of DBMS technology and Condor [status: thinking hard!]. • DBMS-inspired data management services to help Condor users manage their own data. [status: thinking really hard!]
Why doesn’t Condor currently use DBMS technology? • Simple answer: Condor and DBMSs “grew up” together. • Condor project started 1986. • Postgres project started 1986. • Now both are ready for each other.
Project 1: Quill • Non-invasive approach to capturing job related information • Works by sniffing updates to the job queue log • Serves condor_q and condor_history queries • Independent, reliable, and efficient querying of job related information So how does it work?
Quill Architecture Master Startd … Schedd Quill Store events Write events Get new events RDBMS Queue + History Tables Job Queue log
Querying Job Related Information Master … Startd Schedd Quill Querying an already busy schedd!! condor_q queries Independent and a more powerful query functionality RDBMS condor_q++ queries
Quill benefits • Robustness: Monitored by master just like other condor daemons – resilient to failure • Independence: Not in critical path of any other condor daemons • Performance: Derive benefits of SQL to serve job related queries an order of magnitude faster • Functionality: A broader range of queries • Extensibility: Easy to add more complex queries • Downside: only handles job queue and history data.
Project 2: CondorDB • CondorDB is a passive approach to capturing operational data in a condor pool • Modified daemons log events to the database at run time – no log sniffing • Central database serves entire pool • Web-based query GUI
Schedd Schedd Shadow Startd Database Starter Negotiator A Machine Data Capture in CondorDB • Condor daemons augmented to record important events in a database • Database is in addition to standard daemon logs • Pool will run unaffected even in the absence of a database
Database CondorDB User Interface • Users can access Condor through a web-interface • Job queue, job history, machine info, match and reject info, aggregates and summaries, etc… • The web server queries the database with PHP
Matchmaking data at your fingertips Matches Rejects
The data-centric approach makes many tasks easier • Privacy enhanced by presenting user with queue/history information about her jobs only • Intuitive “drill-down” navigation to get increasingly detailed information • All information about a job from submit-time until present available from a single screen • Useful summary information presented in tabular and graphical format • Optionally query database directly for ad hoc information on job queue, job history, matchmaking and file usage
Acknowledgement • The Condor team has been wonderfully responsive and supportive throughout this effort.
Demos! • Come see demos of Quill and CondorDB in room 4360 CS on Wed. afternoon.
Virtuous Cycle • As we learn where Condor can use DBMS technology, we also learn where DBMS technology can be (must be?) improved. • Support for dynamic-schema sparse data sets. • Extreme requirements of self-installation and self-maintenance. • Pushing match-making style operations into DBMS. • Improving DBMS technology will lead to more places that it can be installed.
Request • We want your input! • We have a lot of ideas but want to filter, modify, and augment them through the benefit of your experience. • Send mail to naughton@cs.wisc.edu anytime.