1 / 53

Planning on the Grid

Planning on the Grid. With slides contributed by Ewa Deelman and Yolanda Gil. Thinking about applications of planning. You’ve seen Planning as X, X  { SAT, CSP, ILP, …} Now: Y as Planning Y  { Grid/Web services composition, …}.

silvio
Télécharger la présentation

Planning on the Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Planning on the Grid With slides contributed by Ewa Deelman and Yolanda Gil

  2. Thinking about applications of planning You’ve seen Planning as X, X  {SAT, CSP, ILP, …} Now: Y as Planning Y  {Grid/Web services composition, …}

  3. Problem-solving on Grids • Users pool access to distributed resources (computers, instruments, data, ..) • Applications are often composed of separate components run at several locations • Grid middleware tools allow for scheduling jobs, resource discovery. e.g. Globus toolkit

  4. The Computational Grid • Emerging computational and networking infrastructure • bring together compute resources, data storage system, instruments, human resources • Enable entirely new approaches to applications and problem solving • remote resources the rule, not the exception • can solve ever bigger problems • Wide-area distributed computing • national and international • Facilitate collaborative environments • Sharing of data which can be expensive to produce (experimentation/simulation)

  5. Example: LIGO Experiment(Laser Interferometer Gravitational-Wave Observatory) • Aims to detect gravitational waves predicted by theory of relativity. • Can be used to detect • binary pulsars • mergers of black holes • “starquakes” in neutron stars • Two installations: in Louisiana (Livingston) and Washington State • Other projects: Virgo (Italy), GEO (Germany), Tama (Japan) • Instruments are designed to measure the effect of gravitational waves on test masses suspended in vacuum. • Data collected during experiments is a collection of time series (multi-channel) • Analysis is performed in time and Fourier domains

  6. archive Interferometer Hz Time raw channels LIGO’s Pulsar Search(Laser Interferometer Gravitational-wave Observatory) Extract channel Short Fourier Transform transpose Long time frames 30 minutes Short time frames Single Frame Time-frequency Image Extract frequency range event DB Construct image Find Candidate Store

  7. Motivation: Using Today’s Grid • Users have high level requirements naturally stated in terms of the application domain • Ex: Obtain frequency spectrum for signal S in instrument I and timeframe T • Users have to turn these requirements into executable job workflows in detailed scripts • Users must figure out which code generates desired products, which files contain it, physical location of the files, hosts that support execution given code requirements, availability of hosts, access policies, etc. • Users must query Grid middleware: metadata catalog, replica locator, resource descriptor and monitoring, etc. • Users must oversee execution

  8. Problems with today’s Grid • Usability: users must be proficient in grid computing • Complexity: many interrelated choices and dead ends • Solution cost: any-cost solutions are already hard • Global cost: optimization necessary when contention • Reliability of execution: job resubmission upon failure

  9. Planning for workflow generation and maintenance Outline: • Formalization as a planning problem • Integration with the grid middleware • Case study: planning for workflows in LIGO • The grid as a test bed for planning and scheduling research

  10. Abstract Workflow Generation Concrete Workflow Generation

  11. Desiderata for workflow generator • Allow users to refer to data requirements by descriptions, not file names • Intuitive, requires far less input • Seek high quality workflows according to variable metric • Model variety of constraints declaratively • Data dependencies, resource constraints, user access rights, ….

  12. Planning for workflow generation and maintenance Outline: • Formalization as a planning problem • Integration with the grid middleware • Case study: planning for workflows in LIGO • The grid as a test bed for planning and scheduling research

  13. Planning for workflow generation • Application components as operators • Desired data as goals • World state includes available hosts, existing data products, network bandwidths, …

  14. Existing tools for building workflows:abstract workflow generation • Chimera • Input-ouput transforms for files, in ‘Virtual Data Language’: DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"}, b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.ilwd"}, t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q", fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234", fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");

  15. (operator pulsar-search (preconds ( (<start-time> 7143800) (<channel> LSC-AS-Q) (<fcenter> 0.5) (<right-ascension> 50) (<sample-rate> 20) …) (and (created “H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd”)) Planning operator (effects () ( (add (created “H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.ilwd”)) ) ))

  16. (operator pulsar-search (preconds ( (<start-time> Number) (<channel> Channel) (<fcenter> Number) (<right-ascension> Number) (<sample-rate> Number) (<file> File-Handle) ;; These two are parameters for the frequency-extract. (<f0> (and Number (get-low-freq-from-center-and-band <fcenter> <fband>))) (<fN> (and Number (get-high-freq-from-center-and-band <fcenter> <fband>))) …) (and (forall ((<sub-sft-file-group> (and File-Group-Handle (gen-sub-sft-range-for-pulsar-search <f0> <fN> <start-time> <end-time> <sub-sft-file-group>)))) (and (sub-sft-group <start-time> <end-time> <channel> <instrument> <format> <f0> <fN> <sample-rate> <sub-sft-file-group>) (at <sub-sft-file-group> <host>))))) Operator with metadata parameters (effects () ( (add (created <file>)) (add (pulsar <start-time> <end-time> <channel> <instrument> <format> <fcenter> <fband> <fderv1> <fderv2> <fderv3> <fderv4> <fderv5> <right-ascension> <declination> <sample-rate> <file>)) ) ))

  17. (operator pulsar-search (preconds ((<host> (or Condor-pool Mpi)) (<start-time> Number) (<channel> Channel) (<fcenter> Number) (<right-ascension> Number) (<sample-rate> Number) (<file> File-Handle) ;; These two are parameters for the frequency-extract. (<f0> (and Number (get-low-freq-from-center-and-band <fcenter> <fband>))) (<fN> (and Number (get-high-freq-from-center-and-band <fcenter> <fband>))) (<run-time> (and Number (estimate-pulsar-search-run-time <start-time> <end-time> <sample-rate> <f0> <fN> <host> <run-time>))) …) (and (available pulsar-search <host>) (forall ((<sub-sft-file-group> (and File-Group-Handle (gen-sub-sft-range-for-pulsar-search <f0> <fN> <start-time> <end-time> <sub-sft-file-group>)))) (and (sub-sft-group <start-time> <end-time> <channel> <instrument> <format> <f0> <fN> <sample-rate> <sub-sft-file-group>) (at <sub-sft-file-group> <host>))))) Operator with host identified (effects () ( (add (created <file>)) (add (at <file> <host>)) (add (pulsar <start-time> <end-time> <channel> <instrument> <format> <fcenter> <fband> <fderv1> <fderv2> <fderv3> <fderv4> <fderv5> <right-ascension> <declination> <sample-rate> <file>)) ) ))

  18. Planning for workflow generation • Application components as operators • Parameters include host: plan is a concrete workflow • Desired data (in descriptive form) as goals • World state includes available hosts, existing data products, network bandwidths, …

  19. Operator descriptions • Represent applying a given component at a particular location with fixed parameters, inputs and outputs. • Preconditions combine • data dependencies – derive input requirements from outputs • Task constraints – e.g. component must be run on an MPI machine

  20. Objective function may include Performance – expected runtime, variance Reliability – probability of failure, expected number of retries Computational cost – use of ‘expensive’ resources, conformance to policies Plan quality

  21. Using local heuristics and global metrics • Need local heuristics since search space is intractable • e.g. prefer host for program with high-bandwidth connection to where the output is required • Need to test a global metric (e.g. overall runtime) since local heuristics can lead to globally poor solution • Create as many plans as possible, return best • Search control to eliminate redundant solutions

  22. Example search heuristics (control-rule only-transfer-from-loc-with-greatest-bandwidth (if (and (current-ops (transfer-file)) (current-goal (at <file> <dest>)) (true-in-state (at <file> <loc1>)) (true-in-state (at <file> <loc2>)) (higher-bandwidth <loc1> <loc2> <dest>))) (thenreject bindings ((<from-loc> . <loc2>)))) (control-rule prefer-mpi-to-condor-for-pulsar-search (if (and (current-ops (pulsar-search)) (type-of <mpi> Mpi) (type-of <condor> Condor-pool))) (thenprefer bindings ((<host> . <mpi>)) ((<host> . <condor>))))

  23. Planning for workflow generation and maintenance Outline: • Formalization as a planning problem • Integration with the grid middleware • The grid as a test bed for planning and scheduling research

  24. Generating the planning problem • Currently, static file representation for available hosts, bandwidths • Query grid services prior to planning to find which relevant files exist • Future versions will make dynamic queries • Goal is translated from user request, plan is translated into DAG format suitable for grid scheduler.

  25. Used LIGO’s data collected during the first scientific run of the instrument Targeted a set of 1000 locations: known pulsar or random locations Results of the analysis published to the LIGO Scientific Collaboration Performed using LDAS and compute and storage resources at Caltech, University of Southern California, University of Wisconsin Milwaukee. LIGO’s Pulsar Search at SC’02

  26. Summary: benefits of planning • Automating workflow composition • Just being addressed in Grid middleware • Reasoning with explicit descriptions of data • More intuitive for users • Far fewer inputs required than at file level • Better workflows by searching many plans

  27. Planning for workflow generation and maintenance Outline: • Existing Grid tools for workflow generation • Formalization as a planning problem • Integration with the grid middleware • The grid as a test bed for planning and scheduling research

  28. Many areas of planning research relevant for grid • Planning for a dynamic environment: plan monitoring and repair, planning under uncertainty • Scheduling:resource reasoning, temporal reasoning • Plan quality:learning, acquiring preferences, local search planning • Planning for information gathering:integrating access to grid services with workflow creation • Domain modeling:handling multiple ontologies, acquiring metadata descriptions, acquiring operators

  29. Fault-tolerant planning for a dynamic environment • Grid resources become unavailable, queue length & network bandwidth change • Exploring plan repair strategies, balance of work done off-line and on-line • Modeling failures, keeping statistics for creating plans more likely to succeed, conditional plans, ..

  30. Fault-tolerant straw men • Current version: build fully detailed plan offline, resource allocation is fixed • Ignores world dynamics • Build abstract plan (without specifying hosts) offline, use a matchmaker online • Matchmaker makes local decisions only

  31. Global reasoning is needed for resource allocation

  32. Approaches for fault-tolerant planning in dynamic domains • RAX (Jonsson et al.) general framework. As implemented: offline: builds complete plan online: adjusts temporal intervals • Combining planning and scheduling offline: build several abstract plans online: reason about critical path to instantiate each plan • MDP/POMDP approaches • Open area..

  33. Challenge: understanding when different approaches are more important • Hypotheses: • Uneven task distribution, in terms of computational and data expense and resource constraints will indicate global planning • Time-dependency, e.g. need to re-plan during execution, will indicate local planning • Interesting project: use experiments in synthetic and real domains to test hypotheses and uncover new insights

  34. Empirical tests with synthetic LIGO problems • Example: Problem requires 100 files on one machine. Vary the number that exist.

  35. resource policies Domain modeling Current system: Knowledge from several sources must be used Info from Grid services (RLS, MCS etc) task requirements existing data in files State info (files, resources) Comp. selector User policies Monolithic planner available resources KBs combined in one location Resource selector Resource queues Concrete tasks Exec. monitor Network bandwidth Grid task schedulers

  36. Where does knowledge used by our planners come from? (Operator … (preconditions .. )) (effects .. )) task resource requirements user policies & preferences resource policies data dependencies (VDL*) Each knowledge component is used for other purposes beyond planning

  37. Automatically generated operators for several application domains (Operator … (preconditions .. )) (effects .. )) task resource requirements { Digital sky survey LIGO GEO Galaxy morphology Tomography policies data dependencies (VDL*) Investigating patterns of data descriptions for more efficient planning

  38. Question: if operators are gathered from distributed services, can we still guarantee soundness and completeness? • Under what kinds of conditions?

  39. Representing appropriate information units with metadata • E.g. Have 60,000 files, want to allocate 60 tasks each dealing with 1,000 files. • Previously, application components specified in terms of specific files: DV run59000->extractSFTData( input=[@{input:“nSFT.59000"},…,@{input:”nSFT.59999”}], output=[@{output:” eSFT.59000”},…,@{output:”eSFT.59999”}], t1="714384000", t2="714384063", freq=“1008”,band=“4”,instrument="H2"); … 59 similar clauses… DV final->computeFStatistic( input=[@{input:”eSFT.00000”},…,@{input:”eSFT.59999”}],…); 1000 files 60000 files

  40. Metadata representation • Replace with two clauses, two input predicates • A predicate now represents a range of files • Simpler to model, greater generality, more efficient for reasoner (operator run-extractSFTData-range (preconds ((<begin-file> Number) (<number-of-files> (and Number (> <number-of-files> 0))) (<local-begin-file> (and Number (gen-smaller-number <number-of-files> 1000 <begin-file>)))) (and (range "eSFT" <begin-file> 2 1 <local-begin-file>) (range "nSFT" <local-begin-file> 2 1 999))) (effects () ((add (range "eSFT" <begin-file> 2 <number-of-files>)))))

  41. Requires library operators for ranges • E.g. if a range of files exists, then so does any subrange • Questions: what are the required operators? Similar to spatial calculus RCC-8? (operator subranges-exist (preconds ((<begin-file> Number) (<type> Object) (<number-of-files> (and Number (> <number-of-files> 0))) (<enclosing-begin> (and Number (gen-known-enclosing-begins <type> <begin-file> 2 1 <number-of-files>))) (<enclosing-number-of-files> (and Number (gen-known-enclosing-number-of-files <type> <enclosing-begin> 2 1 <number-of-files> <begin-file>)))) (created-range <type> <enclosing-begin> 2 1 <enclosing-number-of-files>)) (effects () ((add (created-range <type> <begin-file> 2 1 <number-of-files>)))))

  42. Conclusions • Implemented system takes data description requests from LIGO users, composes workflow and executes on the Grid • Planning and scheduling technologies can make a large contribution to Grid infrastructure • Many interesting challenges for planning and scheduling research from Grid applications http://www.isi.edu/ikcap/cognitive-grids http://www.isi.edu/~deelman/pegasus.htm

  43. Koehler and Srivastava • Different approaches to specifying workflows by hand

  44. WSDL service specification(no workflow specified) <definitions targetNamespace="http://..." xmlns="http://schemas.xmlsoap.org/wsdl/"> <message name = "OrderEvent"></message> <message name = "TripRquest"></message> <message name = "FlightRequest"></message> <message name = "HotelRequest"></message> <message name = "BookingFailure"></message> <portType name ="pt1"> <operation name ="CToCI"> <input message ="TripRequest"/> </operation> </portType> <portType name ="pt2"> <operation name ="CIToHS"> <output message ="HotelRequest"/> </operation> </portType> <portType name ="pt3"> <operation name ="CIToFS"> <output message ="FlightRequest"/> </operation> </portType> ... <portType name ="pt9"> <operation name ="RIToFS"> <output message ="BookingFailure/> </operation> </portType> </definitions>

  45. BPEL4WS <sequence> <receive partner="Customer" portType ="pt1" operation ="CToCI" container ="OrderEvent"> </receive> <flow> <invoke partner ="HotelService" portType ="pt2" operation ="CIToHS" inputContainer ="HotelRequest"> </invoke> <invoke partner ="FlightService" portType ="pt3" operation ="CIToFS" inputContainer ="FlightRequest"> </invoke> </flow>

  46. Golog

  47. Back-up slides

  48. What is Needed • We need alternative foundations that offer • expressive representations • flexible reasoners • Many Artificial Intelligence (AI) techniques are relevant: • Planning to achieve given requirements • Searching through problem spaces of related choices • Using and combining heuristics • Expressive knowledge representation languages • Reasoners that can incorporate rules, definitions, axioms, etc. • Schedulers and resource allocation techniques

  49. Existing tools for building workflows:abstract workflow generation • Chimera • Input-ouput transforms at level of actual files, in ‘Virtual Data Language’: DV first1->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384000_64.gwf"}, t1="714384000", t2="714384063", format="frame", channel="H2:LSC-AS-Q", instrument="H2"); DV first2->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384064_64.gwf"}, t1="714384064", t2="714384127", format="frame", channel="H2:LSC-AS-Q", instrument="H2"); DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"}, b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.123643_+2.56234.ilwd"}, t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q", fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234", fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");

More Related