1 / 30

The Foundation API

The Foundation API. How does it work?. How It Runs. At the DE : The job is started, it runs the Foundation API App, with the information provided in json#1 (created in TITO) and the inputs provided by the user.

kaelem
Télécharger la présentation

The Foundation API

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Foundation API • How does it work?

  2. How It Runs... • At the DE: • The job is started, it runs the Foundation API App, with the information provided in json#1 (created in TITO) and the inputs provided by the user. • As it runs, the App sends a message to TACC. The message tells the Foundation API Application at TACC what application to run, what TACC system to run it on, where the application and its wrapper script is located in iRODS, and the specific settings or arguments to pass to the wrapper script. • At TACC: • The Foundation API Application at TACC runs and it helps create a bash run script to run the job on the SGE queue, with the help of the application’s wrapper script and json#2, which resides within the FAPI system at TACC. The run script includes specifics about where the input data is in iRODS, where the outputs should be put in iRODS after the job, and what settings to pass to the specific bioinformatics application being run.

  3. An Example: Newbler2.6 runAssembly -o outputname -m -force -large –cpu 1 inputReads.sff runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" \ -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}"

  4. An Example: Newbler2.6 The first test of the wrapper script #!/bin/bash #$ -V #Inherit the submission environment #$ -cwd # Start job in submission directory #$ -N newblertest # Job Name #$ -j y # Combine stderr and stdout #$ -o $JOB_NAME.o$JOB_ID # Name of the output #$ -pe 1way 12 # Requests 1task/node, 12 cores total #$ -q development # Queue name ”development” #$ -l h_rt=01:00:00 # Run time (hh:mm:ss) - 1.0 hours #$ -M rogerab@email.arizona.edu # Use email notification address #$ -m be # Email at Begin and End of job set -x # Echo commands, use "set echo" with csh #MODE=${mode} #INPUT="${inputSeqs}" #OUTNAME="${outputName}" #OUTFORM=${outputFormat} INPUT="/iplant/home/rogerab/data/sequencing1/FFGLB5S04.sff” OUTNAME="NewblerOut” CPU=12 MIN_CONTIG_SIZE=100 LARGE_CONTIG_SIZE=500 module purge module load TACC module swap intel gcc module load irods Iinit:password wait #Copy from iRODS iget -fT "${INPUT}" wait INPUT_F=$(basename ${INPUT}) /work/01685/rogerab/bin3/bin/runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}” \ -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}”

  5. An Example: Newbler2.6: The final form of the wrapper script CONTENTS OF newbler_wrapper.sh INPUT="${inputSeqs}" OUTNAME="${outputName}" CPU="${cpu}" MIN_CONTIG_SIZE="${min_contig_size}" LARGE_CONTIG_SIZE="${large_contig_size}" OTHER="${other}" #Copy Input File from iRODS iget -fT "${INPUT}" wait INPUT_F=$(basename ${INPUT}) chmod a+x runAssembly chmod a+x createProject chmod a+x addRun chmod a+x newbler chmod a+x newMapping chmod a+x runMapping chmod a+x runProject chmod a+x newAssembly chmod a+x stopRun runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}"

  6. An Example: Newbler2.6: The json for the Lonestar Foundation API Application CONTENTS OF appN.json { "name": "newbler", "parallelism": "SERIAL", "version": "2.6", "helpURI": "https://pods.iplantcollaborative.org/wiki/display/DEapps/Newbler", "label": "Newbler 2.6", "shortDescription": "Newbler, genome assembler", "longDescription": "Genome assembler for 454 sequencing reads", "author": "Roger Barthelson", "datePublished": "03/20/12", "tags": [ "assembler", "NGS", "454", "Roche" ], "ontology": [ "http://sswapmeet.sswap.info/sequenceServices/SequenceServices" ], "executionHost": "lonestar4.tacc.teragrid.org", "executionType": "HPC", "deploymentPath": "/iplant/home/rogerab/applications/newbler2.6/bin", "templatePath": "newbler_wrapper.sh", "testPath": "test/newblerwrapper.sh", "checkpointable": "true", "modules": [ "purge", "load TACC", "load irods" ], "inputs": [ { "id": "inputSeqs", "value": { "default": "", "validator": "", "visible": true, "required": true },

  7. An Example: Newbler2.6: The json for the Lonestar Foundation API Application CONTENTS OF appN.json (continued) "inputs": [ { "id": "inputSeqs", "value": { "default": "", "validator": "", "visible": true, "required": true }, "details": { "label": "Sequences:", "description": "Sequence file in SFF or fasta format" }, "semantics": { "ontology": [ "http://sswapmeet.sswap.info/sequence/FASTA" ], "minCardinality": 1, "maxCardinality": 1, "fileTypes": [ "fasta-0" ] } } ],

  8. An Example: Newbler2.6: The json for the Lonestar Foundation API Application CONTENTS OF appN.json (continued) "parameters": [ { "id": "cpu", "value": { "default": ”1", "type": "string", "validator": "", "required": true, "visible": true }, "details": { "label": "number of threads", "description": "Specify the number of cores to be used", "visible": true }, "semantics": { "ontology": [ "xs:string" ] } }, { "id": "min_contig_size", "value": { "default": "200", "validator": "", "required": false, "visible": true, "type": "string" }, "details": { "label": "minimum contig size", "description": "Specify the minimum contig size to be output.", "visible": true }, "semantics": { "ontology": [ "xs:string" ] } },

  9. An Example: Newbler2.6: The json for the Lonestar Foundation API Application CONTENTS OF appN.json (continued) { "name": "newbler", "parallelism": "SERIAL", "version": "2.6", "helpURI": "https://pods.iplantcollaborative.org/wiki/display/DEapps/Newbler", "label": "Newbler 2.6", "shortDescription": "Newbler, genome assembler", "longDescription": "Genome assembler for 454 sequencing reads", "author": "Roger Barthelson", "datePublished": "03/20/12", "tags": [ "assembler", "NGS", "454", "Roche" ], "ontology": [ "http://sswapmeet.sswap.info/sequenceServices/SequenceServices" ], "executionHost": "lonestar4.tacc.teragrid.org", "executionType": "HPC", "deploymentPath": "/iplant/home/rogerab/applications/newbler2.6/bin", "templatePath": "newbler_wrapper.sh", "testPath": "test/newblerwrapper.sh", "checkpointable": "true", "modules": [ "purge", "load TACC", "load irods" ], "inputs": [ ], "parameters": [ ] }

  10. Where is the application?

  11. An Example: Newbler2.6: The json for the Lonestar Foundation API Application CONTENTS OF appN.json (continued) { "name": "newbler", "parallelism": "SERIAL", "version": "2.6", "helpURI": "https://pods.iplantcollaborative.org/wiki/display/DEapps/Newbler", "label": "Newbler 2.6", "shortDescription": "Newbler, genome assembler", "longDescription": "Genome assembler for 454 sequencing reads", "author": "Roger Barthelson", "datePublished": "03/20/12", "tags": [ "assembler", "NGS", "454", "Roche" ], "ontology": [ "http://sswapmeet.sswap.info/sequenceServices/SequenceServices" ], "executionHost": "lonestar4.tacc.teragrid.org", "executionType": "HPC", "deploymentPath": "/iplant/home/rogerab/applications/newbler2.6/bin", "templatePath": "newbler_wrapper.sh", "testPath": "test/newblerwrapper.sh", "checkpointable": "true", "modules": [ "purge", "load TACC", "load irods" ], "inputs": [ ], "parameters": [ ] }

  12. Where is the Application?

  13. Where is the json file? • I don’t know. • It seems to be entered into a database of information held by the Foundation API application on the TACC side, e.g. on Lonestar. The actual file name doesn’t matter. • How does it get there? • curl -X POST -sku ”user:password" -F "fileToUpload=@newbler.json" https://foundation.iplantc.org/apps-v1/apps • The response should be the contents of the json file. That means it liked your json.

  14. Test the Application on TACC with the Test Application • https://foundation.iplantcollaborative.org/iplant-test/ • Your App name is the “id” you entered into the json, plus the version number. • Example: id:newbler, version:2.6, becomes newbler-2.6 • What to do: • Log in with your iplant user id and password • Find your App under Apps Service>Shared Apps • Get a job submission form, fill it out, submit it! • Monitor the results under Job Service

  15. The Test Application with the Apps service section shown.

  16. The Test Application: job submission form for newbler.

  17. Summary of the TACC Portion of the Foundation API • The json loaded into the TACC Foundation API application is central. • The json tells the application where everything is: the application, the wrapper script, and what inputs and settings to look for. • The wrapper script feeds the main arguments for the application. • The input files are in the Data Store.

  18. Actual Run Script for Newbler Through the Test Application on Lonestar #!/bin/bash #$ -N newbler4-2_15 #$ -cwd #$ -V #$ -o newbler4-2_15$JOB_ID.out #$ -e newbler4-2_15$JOB_ID.err #$ -l h_rt=01:00:00 #$ -A TG-MCB110022 #$ -pe 12way 24 #$ -q largemem curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/RUNNING cd /scratch/0004/iplant/rogerab/job-1625-newbler4-2_15/newbler2.6 # Environmental settings for newbler-2.6: module purge module load TACC module irods INPUT="/iplant/home//rogerab/data/sequencing1/FFGLB5S04.sff" OUTNAME="NewblerOutDir" CPU="1" MIN_CONTIG_SIZE="200" LARGE_CONTIG_SIZE="500" OTHER="" #Copy from iRODS iget -fT "${INPUT}" wait INPUT_F=$(basename ${INPUT}) runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}” curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/FINISHED curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING imkdir -p /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15 for i in `find . -maxdepth 1`; do exists=`grep -x "$i" .iplant.archive` if [ ! -n "$exists" ]; then iput -v -f -r $i /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15 fi Done curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING_FINISHED

  19. The Discovery Environment Side of the Foundation API • Key arguments in the TACC json and in the wrapper script need to be entered by way of the App in the DE • The Foundation API Application is the tool that is run by the DE (foundational_api_adapter.pl) • The interface for the App is designed by you in TITO.

  20. The Discovery Environment Side of the Foundation API • The Foundation API Application has some of its own arguments that it requires for setting up the run at TACC: • Application ID (appid) • Maximum Memory (maxMemory) • Estimated Run Time (requestedTime) • Job Size (processorCount) • These are in your json at TACC also, and should be preserved in the precise syntax used here (and in the following examples)

  21. In TITO: foundation_api_adapter.pl is the application you are integrating.

  22. In TITO: Your arguments are ordered in a way similar to their appearance in the json at TACC.

  23. Note Format! In TITO: The inputs are whatever your application may need. These are the files selected from the Data Store when you setup a run with the application.

  24. Note Format! In TITO: The options are whatever your application may use for their settings.

  25. Note Format! In TITO: The run options are whatever TACC needs to setup the run! For memory use a setting of <1000 (gbytes) for the normal queue. Use a setting of 1000 to tell TACC to run it on the largemem queue.

  26. Actual Run Script for Newbler Through the Test Application on Lonestar #!/bin/bash #$ -N newbler4-2_15 #$ -cwd #$ -V #$ -o newbler4-2_15$JOB_ID.out #$ -e newbler4-2_15$JOB_ID.err #$ -l h_rt=01:00:00 #$ -A TG-MCB110022 #$ -pe 1way 24 #$ -q largemem curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/RUNNING cd /scratch/0004/iplant/rogerab/job-1625-newbler4-2_15/newbler2.6 # Environmental settings for newbler-2.6: module purge module load TACC module irods INPUT="/iplant/home//rogerab/data/sequencing1/FFGLB5S04.sff" OUTNAME="NewblerOutDir" CPU="1" MIN_CONTIG_SIZE="200" LARGE_CONTIG_SIZE="500" OTHER="" #Copy from iRODS iget -fT "${INPUT}" wait INPUT_F=$(basename ${INPUT}) runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}” curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/FINISHED curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING imkdir -p /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15 for i in `find . -maxdepth 1`; do exists=`grep -x "$i" .iplant.archive` if [ ! -n "$exists" ]; then iput -v -f -r $i /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15 fi Done curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING_FINISHED

  27. Note Format! The time limit is specified on TACC runs to help conserve resources and manage to queue. Maximum time is 24 h, but providing options encourages the user to ask for less time if they don’t think it is needed.

  28. Note Format! The number of processors needed is also specified on TACC runs to help conserve resources. What can be effectively used is an important consideration. A serial application, e.g. one that does not use mpi for multiprocessing like Newbler will not benefit from large numbers of processors. Serial applications (as specified in the json at TACC) must be set to 1 processor. Apps that are set –maxMemory=1000 will run on the largemem queue with 24 cores per node, 48 cores maximum.

  29. Actual Run Script for Newbler Through the Test Application on Lonestar #!/bin/bash #$ -N newbler4-2_15 #$ -cwd #$ -V #$ -o newbler4-2_15$JOB_ID.out #$ -e newbler4-2_15$JOB_ID.err #$ -l h_rt=01:00:00 #$ -A TG-MCB110022 #$ -pe 1way 24 #$ -q largemem curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/RUNNING cd /scratch/0004/iplant/rogerab/job-1625-newbler4-2_15/newbler2.6 # Environmental settings for newbler-2.6: module purge module load TACC module irods INPUT="/iplant/home//rogerab/data/sequencing1/FFGLB5S04.sff" OUTNAME="NewblerOutDir" CPU="1" MIN_CONTIG_SIZE="200" LARGE_CONTIG_SIZE="500" OTHER="" #Copy from iRODS iget -fT "${INPUT}" wait INPUT_F=$(basename ${INPUT}) runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}” curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/FINISHED curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING imkdir -p /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15 for i in `find . -maxdepth 1`; do exists=`grep -x "$i" .iplant.archive` if [ ! -n "$exists" ]; then iput -v -f -r $i /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15 fi Done curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING_FINISHED

  30. APP APP iPlant DE APP Initiate Job Here Sends Job Request, Inputs, Settings Run Information, Progress Returned iPlant Data Store (iRODS) Requests Executables, Wrapper TACC Foundation API Application Executables Returns Executables, Wrapper Application Wrapper Script json#2 Submit Job User’s Input Data Requests Input Files Returns Input Files TACC SGE Queue iPlant DE Results Store Sends Output Files Job Runs Here! Results Stored Here

More Related