550 likes | 705 Vues
This guide provides detailed instructions for building and modifying Condor, a distributed job scheduler. It covers necessary space requirements, UNIX and Windows specifics, and essential tools such as GNU, Perl, and others. Users will find the procedure for downloading, unpacking, and building Condor, along with common issues and solutions related to GCC and glibc versions. The guide also highlights how to configure builds to manage dependencies effectively. Whether you're new to Condor or looking to modify its functionality, this comprehensive resource is here to assist you.
E N D
Nick LeRoy Computer Sciences Department University of Wisconsin - Madison nleroy@cs.wisc.edu http://www.cs.wisc.edu/condor Building and Modifying Condor
Before I start … • If you have any questions, stop me along the way • There should hopefully be time for discussions after the talk • Feel free to talk to me, or any of the Condor developers, any time during the conference • Todd will give the last part of the talk • Windows specifics
Space Requirements • 5G is probably enough • Actual amount depends on the actual features built • Bare minimum 2G • Temporary space is required for building externals, automatically cleaned up
UNIX Requirements • Most tools are standard on Linux development systems • In other cases, they can be downloaded as binaries • Or, downloaded as source and built by hand
UNIX Requirements List • GNU tools: • GNU make • GNU autoconf and autoheader (2.59 or greater) • GNU tar (1.13 or higher) • GNU Compiler Collection (gcc >= 2.95.3) • gzip • Other tools: • perl (5.005_03 or greater) • patch (must support unified diffs, GNU patch is preferred) • strip (can be either GNU or the vendor's version) • lex • yacc (or GNU bison) • some other typically-found utilities (for example, cut, awk, etc.)
Getting it • Download from the same place that you download the rest of Condor • In the form of a gzip-ed “tarball” • Unpack the tarball • If you don’t know how to do this, try: rm condor_src-7.1.0-all-all.tar.gz
First Glance • BUILD-ID • NMI build ID, you can ignore this • config and imake • Yes, we still use imake • The rest of the world wisely abandoned it years ago … • You can probably ignore these • Adds requirement: GNU cpp <= 4.1.3 • LICENSE-2.0.txt • Copy of the Apache License, Version 2.0 • The license under which we’ve released Condor
Interesting Pieces • README.building • Document describing building Condor • NTconfig • Files required for building under Windows • externals • Externally maintained packages • Some are “hard” requirements, others “soft” • src • The Condor source code
Simple Build • The basic Condor build is simple: $ cd src $ ./build_init $ ./configure $ make
Didn’t work? • Most common problem is that you’re trying to build on a system that we haven’t ported the Standard Universe to • Solution: Disable the standard universe and try again $ ./configure --disable-full-port \ --disable-gcc-version-check $ make
Externals • Always have your bags packed • Bags are getting pretty big these days • Globus, ClassAds, PCRE, zlib, Kerberos • Externals and versions by configure • To use system packages: $ ./configure --enable-proper • “All or nothing” • Some features (in particular Condor-G) will be disabled • We’re working on making this selective • Externals tree selected by: $ ./configure --with-externals=/path/tree
First look at src • CODING_GUIDELINES • condor_* • Directories with most of the source code • In the future, we’ll rename them and get rid of the condor_ prefix • Also: h • We’ll look at more of these later
Configuring the build • Uses GNU configure • Some options, like, --prefix don’t work • Make sure that the cpp you use isn’t >= 4.2 $ export CXXCPP=/usr/bin/cpp-4.1 $ ./configure • Default: $ ./configure
Minimal configuration • To save disk & time, make use of –without-xxx or –disable-xxx options you don’t care about • Use ./configure –help to get a list of them • Packages listed as “hard requirement” can’t be turned off • There are some interdependencies $ ./configure --without-globus --without-nordugridgahp --without-unicoregahp --without-gt4gahp --without-srb --without-oci --without-gcb --without-gsoap --without-drmaa --without-gahp --without-blahp --disable-full-port
Some Problems & SolutionsUnknown GCC version configure: error: Condor will not compile with gcc version 4.2.1 • Try: $ ./configure --disable-gcc-version-check • The build itself may fail due to compiler incompatibilities
Some Problems & Solutions Unknown glibc version checking glibc... ERROR configure: error: Condor does NOT know what glibc external to use with glibc-2.6.1 • Edit (yeah, with vi or emacs) configure.ac • Around line 2500, add a block for your glibc version (cut & paste from nearby): "2.6.1" ) # OpenSUSE 10.3 uses glibc 2.6.1 including_glibc_ext=NO ;; • Rerun ./build_init for this to take affect
Build it • From the src directory: $ make • Will build the externals as required • Go get a beverage – this could take quite a while
Build Problems & Solutions Error in ClassAds external classads-1.0rc5: FAILED! (see /home/condor-7.1.0/externals/build/log.classads-1.0rc5) • Disable ClassAds in configure: $ ./configure –without-classads • condor_q –better-analyze will be broken
Build Problems & Solutions Error building other externals xxxx-1.2.3: FAILED! (see /home/condor-7.1.0/externals/build/log.xxxx-1.2.3) • Disable xxxx in configure: $ ./configure –without-xxxx • If this is a “hard requirement” or you rely on this feature: • Look in the above log and correct the problem
Build Problems & SolutionsStandard Universe /tmp/IIf.0twp5X:114:6: error: #error Checkpoint library not compatible with compiler! ../../imake/imake: Exit code 1. Stop. • Standard Universe features haven’t been ported to this compiler / platform yet. $ ./configure --disable-full-port
It built! make[1]: Nothing to be done for `all'. make[1]: Leaving directory `/home/build/condor-7.1.0/src/condor_examples‘ $make release …
Build targets • Testing release • $ make release • Suitable for testing • Creates release_dir • Public release • What we actually release to the public • $ make public • Packaged tarballs wind up in ../public
Test It • We’ll create a test installation of our Condor build • We built condor in /home/condor-7.1.0 • We’ll make our test directory a subdirectory of that • /home/condor-7.1.0/install • Do a basic Condor install of the Condor from release_dir, just like you would any other Condor install • Or …
Test Installation(Step by step) $ CONDOR=/home/condor-7.1.0/install $ mkdir $CONDOR $ cd $CONDOR $ mkdir checkpoints cred_dir execute spool log test $ ln –s ../release_dir/* . $ cp etc/examples/condor_config.generic etc/condor_config $ export CONDOR_CONFIG=$CONDOR/etc/condor_config $ vi $CONDOR_CONFIG $ export PATH=$CONDOR/bin:$CONDOR/sbin:$PATH $ rehash $ condor_master
Simple checks • Run ‘ps’, verify that the Condor processes are running • Run condor_status –any • Run condor_status to verify that the Startd’s machine is correct • Make sure that you wait a bit for the Startd to publish it’s ad(s) • Look through the logs • Submit a simple “hello world” test job, verify that it runs as expected
More tests • We have a whole suite of tests $ cd condor_tests $ make $ ./batch_test.pl –b IsThisNightly passed <…/src/condor_tests> Workspace testing … submitting . tests lib_chirpio_van.run succeeded lib_procapi_pidtracking-snapshot.run succeeded … • Wait patiently (very patiently)
Use the source, Luke • Libraries • Daemon Core • Client (command line) Tools • Daemons • Standard Universe • Other
Source Directories • Most of the directory names are pretty clear • We’re in the process of cleaning up, moving things around, and renaming, so be prepared for changes over time • GIT is finally giving us this freedom • Quite a few have version numbers in the name that make little or no sense to the outside world (condor_startd.V6, …) • This will get cleaned up, too
Master, Quill, Startd, Shadow, Starter, Collector Submit, Q, tools, etc. ClassAds, I/O, Daemon Client, Daemon Core, ProcAPI, SysAPI C++ Utilities, C Utilities “h”, includes Layering
Condor Libraries • The layering is not perfect, there are interdependencies • General purpose: • condor_util_lib • condor_c++_util • I/O & Networking: • condor_io • condor_daemon_client • Process Tracking: • condor_procapi • System Information: • condor_sysapi • ClassAds: • condor_classad • Daemon Core • condor_daemon_core.V6
C / C++ Utilities • In general, there’s a utility for everything • POSIX and stdio library wrappers • C++ Standard library replacements • Condor templates (CTL) • We don’t use STL for hysterical reasons • Designed to be portable • Look here before reinventing the wheel
C: dprintf() • Works like printf() • Conditionally writes to the log dprintf(D_ALWAYS, “Two + two is %d\n”, 2+2); • OR together for multiple levels, so dprintf(D_COMMAND|D_SECURITY, <…>); • Useful debug levels • D_ALWAYS • D_FULLDEBUG • Everything else is probably too esoteric (see condor_debug.h)
C++: MyString.h • Similar to STL’s string • Prefer MyString buffer to char buffer[1024] • automatically allocates and resizes memory • Notable methods / operators: • sprintf() and sprintf_cat() • Value() and GetCStr() – read-only access • += is overloaded to append a lot of types to the string • perl-like chomp() and trim() to get rid of whitespace • readLine() that can slurp in data from a FILE* and ostreams • replacement for strtok() • Other tricks • search for substrings • escape characters
C++: Configuration • Lookup values from the configuration • NOT a ClassAd! • Basic: param(const char *name) • Returns a char * that you must decode manually • You MUSTfree() this buffer! • Others: param_<type>(<name>) • Decodes to the specified type, and free()’s the buffer • Does NOT handle expressions! • Integer: param_integer(<name>) • Double: param_double(<name>) • Boolean: param_boolean(<name>)
C++: Boolean Configuration Expressions • Boolean Expression: param_boolean_expr(<name>) • This one Does handle expressions • Configuration: WIZBANG = ( FUBAR > 10 || SUPERCALIFRAGILISIC ) • Source Code: bool wizbang = param_boolean( “WIZBANG” );
More C & C++ • Wrappers and similar: • safe_open_wrapper(), my_popen() • “CTL” • ExtArray, string_list, Queue, tree, stringSpace, counted_ptr • A lot of other classes & functions • File / Directory access classes: Directory, StatInfo • exponential_backoff • my_hostname(), my_username()
Condor I/O & Networking • All Condor daemons have a “Command Socket” • Data is encoded with CEDAR • Condor External DAta Representation • CEDAR is all-singing, all-dancing • Data representation • socket abstraction • Security • bandwidth limiting • port ranges
Stream, Sock, et. al. • The layering of the Condor socket objects is not obvious • Stream (base class, in stream.{h,C} ) • CEDAR streaming • Integers, chars, strings, etc. • Sock (derived from Stream, in sock.{h,C} ) • Adds connection / session management • ReliSock (derived from Sock, in reli_sock.{h,C} ) • TCP-specific “Sock” • SafeSock (derived from Sock, in safe_sock.{h,C} ) • UDP-specific “Sock”
Daemon Client • Series of classes with knowledge of how to communicate with specific daemons • Master, Collector, Startd, etc. • All derived from a common base
ClassAds • C++ API to access the ClassAds that Condor uses internally • “Old” ClassAds • Subclassed from AttrList, so look there • Lookup() versus Eval() • Lookup() will return “7 + 2” • Eval() will return 9 • ClassAds are parsed to ExprTree(s) • Can generally avoid this and use Eval<Type> • Insert() and Assign() to update the ad • sPrint(), fPrint(), and dPrint() to serialize
Condor Daemons • The code for most Condor daemons are in directories named after the daemon: • Startd is in condor_startd.V6 … • Note: 2 sets of starters / shadows • condor_starter.V5and condor_shadow.V6 • Standard Universe • condor_{starter,shadow}.V6.1 • All others
Daemon Core • Heart and body of a Condor daemon • Usually a singleton object • Event-driven loop around select() • Single threaded! • Your code registers events for select() and callbacks • Timers, Pipes, Signals, Reaper, Socket, CEDAR “Commands”
Registering a Callback • Use Daemon Core’s Register_Command() method: daemonCore->Register_Command(128, "SAY_HELLO", (CommandHandler)&say_hello, "say_hello", NULL, READ, D_FULLDEBUG ); • Parameters: • The command number (usually defined in condor_commands.h and condor_commands.C) • Text description of the command • "CommandHandler", which is really a function pointer • Text description of the handler • The service class to use -- since this is a C handler, we don't need one. • What Permission level we need to be to call this function (i.e. HOSTALLOW_READ, HOSTALLOW_ADMINISTRATOR, etc) • What dprintf() level to use
Some guidelines • You must not • Throw an exception • Call printf() or exit() or assert() • You can: • call ASSERT() • call dprintf()
Dependency Hell • Dependancies work on Windows • Our build system has no knowledge of dependencies • If you modify an include file, make sure that everything that depends on it gets rebuilt • $ make clean && make
More on Dependencies • Objects from some directories need to get “repackaged” with the C++ library • condor_classads • condor_daemon_client • Thus, to rebuild these: • $ make && make –C ../condor_c++_util
(Even) More on Dependencies • If you’re working on a daemon and make a library change • Example daemon: Startd in the condor_startd.V6 directory • Example library: condor_daemon_client $ make –C ../condor_daemon_client && make -C ../condor_c++_util && make release • If you modified dc_startd.h and want to be paranoid: $ (cd ../condor_daemon_client && make clean && make) $ (cd ../condor_c++_util && make clean && make) $ make clean && make release
Adding a Source File • Add the file to the appropriate section of the Imakefile • No, I’m not going to explain our Imakefile syntax here $ ../condor_imake $ make
Testing & Debugging • OK, You’ve built a modified Startd, how do I test / debug it? • Remove STARTD from DAEMON_LIST • Start the master • Run the startd by hand $ ./condor_startd -t –f • -t to log to stdout • -f to run it in the foreground • CTRL-C to kill it
More debugging • Segfaults can sometimes be caused by object version mismatches • You added a field to a class in C++ Util, but didn’t rebuild the Startd that uses the class • With the the use of the –t and -f flags, you can debug like any other program • Adding dprintf()’s • With gdb • Using strace