1 / 59

IRODS - integrated Rule Oriented Data System

2. Development Team. DICE teamArcot Rajasekar - iRODS Development Lead Mike Wan - iRODS Chief ArchitectWayne Schroeder - iRODS Product Mgr., DeveloperBing Zhu - Fedora, WindowsMike Conway - Java (Jargon)Paul Tooby - Documentation, FoundationSheau-Yen Chen - Data Grid AdministrationReagan Moore - PIPreservation Richard Marciano - Preservation Development LeadChien-Yi Hou - Preservation Micro-servicesAntoine de Torcy - Preservation Micro-services.

Antony
Télécharger la présentation

IRODS - integrated Rule Oriented Data System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. 1 iRODS - integrated Rule Oriented Data System Reagan Moore rwmoore@renci.org

    2. 2 Development Team DICE team Arcot Rajasekar - iRODS Development Lead Mike Wan - iRODS Chief Architect Wayne Schroeder - iRODS Product Mgr., Developer Bing Zhu - Fedora, Windows Mike Conway - Java (Jargon) Paul Tooby - Documentation, Foundation Sheau-Yen Chen - Data Grid Administration Reagan Moore - PI Preservation Richard Marciano - Preservation Development Lead Chien-Yi Hou - Preservation Micro-services Antoine de Torcy - Preservation Micro-services

    3. 3

    4. 4 Scale of iRODS Data Grid Number of files Tens to millions to hundreds of millions of files Size of data Gigabytes to hundreds of terabytes to petabytes of data Number of policy enforcement points 64 actions define when policies are checked System state information 112 metadata attributes for system information per file Number of functions 185 composable micro-services Number of storage systems that are linked One to tens to a hundred storage resources Number of data grids One to federation of tens of data grids

    5. 5 Data are Inherently Distributed Distributed sources Projects span multiple institutions Distributed analysis platforms Grid computing Distributed data storage Minimize risk of data loss, optimize access Distributed users Caching of data near user Multiple stages of data life cycle Data repurposing for use in broader context

    6. 6 Organize Distributed Data into a Sharable Collection Project repository MotifNet - manage collection of analysis products Institutional repository Carolina Digital Repository for UNC collections Regional collaboration RENCI Data Grid linking resources across North Carolina National collaboration NSF Temporal Dynamics of Learning Center Australian Research Collaboration Service National Library French National Library National Archive NARA Transcontinental Persistent Archive Prototype, Taiwan International collaboration BaBar High Energy Physics (SLAC-IN2P3) National Optical Astronomy Observatory (Chile-US)

    7. 7 Logical Name Spaces

    8. Social Challenges Every community prefers their user interface Unix shell commands - icommands Java I/O library - JARGON / JUX C I/O library Portals - EnginFrame Digital Libraries - Fedora / Dspace Workflows - Kepler / Taverna Transport - GridFTP / Parrot Web browsers / Windows browser Load libraries - Python (Pyrods) User level file systems - FUSE / WebDAV / PetaFS Grid APIs - JSAGA Web services - URSpace / VOSpace Future ports - Islandora / iDROP

    9. Heterogenity Challenges Many types of operating systems Unix variants, 32-bit/64-bit Mac OSX/IntelPC, Mac OSX/PowerPc Linux Windows XP, Vista Many types of storage systems File systems Tape archives Cloud storage Different administrative domains Challenge-response authentication Kerberos GSI - Grid Security Infrastructure (PKI certificates) Shibboleth

    10. 10 Data Virtualization

    11. 11 iRODS - Policy-based Management Turn policies into computer actionable rules Compose rules by chaining micro-services Manage state information as attributes on namespaces: Files / collections /users / resources / rules Validate assessment criteria Queries on state information, parsing of audit trails Automate administrative functions

    12. 12 iput With Replication

    13. 13 Under the hood - a glimpse

    14. 14 iRODS Distributed Data Management

    15. iRODS Wiki Presentations, papers, tutorials http://irods.diceresearch.org Open source software - BSD license Contributed clients, software Performance assessments Download source code Windows - binary release Unix / Mac / Linux build from source iRODS Primer Morgan & Claypool Synthesis Lectures on Information Concepts, Retrieval, and Services

    16. 16

    17. 17 Infrastructure Independence Manage properties of the collection independently of the choice of technology Access, authentication, authorization, description, location, distribution, replication, integrity, retention Enforce policies across all storage locations Rule Engine resident at each storage site Apply procedures at each remote storage site Chain encapsulated operations into workflows Use infrastructure independence to enable use of new technology without interruption Integrate new access methods, new storage systems, new network protocols, new authentication systems

    18. 18 Data Grid Security Manage name spaces for: {users, files, storage} Assign access controls as constraints imposed between two logical name spaces Access controls remain invariant as files are moved within the data grid Controls on: Files / Storage systems / Metadata Authenticate each user access PKI, Kerberos, challenge-response, Shibboleth Use internal or external identity management system Authorize all operations ACLs (Access Control Lists) on users and groups Separate condition for execution of each rule Internal approval flags (IRB) within a rule

    19. 19 iRODS Rules and Micro-services Reagan W. Moore

    20. Rule Base Rules stored in core.irb file Separate copy of core.irb installed at each storage location Can have storage or site specific rules Each rule is associated (through its name) with specific event in the iRODS framework (64 hooks) acPreProcForPut acPostProcForPut acDeleteUser Can also execute user-defined rules through the irule command

    21. Variables Session variables Define parameters associated with the client session, such as: $userNameClient $rodsZoneClient Workflow variables Define parameters used within the workflow *A, *CollName stdout Persistent state information Maintained across sessions, stored in iCAT DATA_NAME, DATA_SIZE, COLL_NAME, DATA_CHECKSUM META_DATA_ATTR_NAME, META_DATA_ATTR_UNITS

    22. 22 iRods Rules Each rule defines An action for an event Condition Action chains (micro-services and rules) Recovery chains Invoked by servers to enforce policies Invoked by clients to run workflows on servers Rule types Atomic -- applied immediately Deferred -- run at a later time in the background Periodic – run at a fix time interval

    23. 23 Format of a Rule Action | Condition | MS1, …, MSn | RMS1, …, RMSn Action Name of action to be performed Name known to the server and invoked by server Condition – condition under which the rule apply Micro-services - If applicable micro services will be executed Recovery micro-service - If any micro service fails, recovery micro service(s) executed to maintain transactional consistency Example of MS/RMS createFile(*F) removeFile(*F) ingestMetadata(*F,*M) rollback

    24. 24 Condition Condition under which this Rule applies Examples $rescName == demoResc8 $objPath like /x/y/z/* Many operators ==, !=, >, <, >=, <= %%, !! (and, or) expr like reg-expr , expr not like reg-expr , expr ::= string

    25. 25 Micro-services (MSs) Well-defined Server-side Procedures and Functions C functions on servers MSs can be chained to form workflow using ‘##’ msiDataObjOpen(*A,*S_FD)## msiDataObjRead(*S_FD,10000,*R_BUF)## msiDataObjClose(*D_FD,*stat) Flow control whileExec - while loop forExec – for loop forEachExec – for each in the table or list break ifExec – if-else

    26. 26 Micro-services – flow control examples whileExec assign(*A,0)##whileExec( *A < 20, writeLine(stdout,*A)##assign(*A, *A + 4), nop##nop) forExec forExec(assign(*A,0), *A < 20 , assign(*A,*A + 4), writeLine(stdout,*A),nop) ifExec ifExec(*A > *D, assign(*A,*D),nop,assign(*D,*A),nop)

    27. 27 Other Micro-services delayExec - execute MSs at a later time Exec by the iRods batch server (irodsReServer) in the background Example delayExec(<PLUSET>1m</PLUSET>,msiReplColl(*desc_coll,*desc_resc, backupMode,*outbuf),nop) Time keywords PLUSET – exec after the specified time has passed ET – exec at the specified time (<ET>23:00</ET>) FT – repeat exec at the specified frequency Can be combined <PLUSET>1m</PLUSET><EF>5m</EF> remoteExec – execute MSs on remote servers remoteExec(andal.sdsc.edu,null,msiSleep(10,0)##writeLine(stdout,open remote write in andal), nop) assign - assign a value to a parameter writeString - write a string to stdout buffer writeLine - write a line (with end of line) to stdout buffer

    28. 28 Micro-Services parameters Micro-services communicate through: Arguments/Parameters Input from the initiator (client/server) Lieterals Variables start with * Output of a MS can be used as input of another MS in a MS chain System Session Parameters Start with “$” Valid across rule invocations Persistent data – iCat Query the iCat Valid across sessions XMessages – out-of-band communications Sender obtains send/receive tickets Pass receive ticket to receivers Receiver use ticket to read msg Msg exchange Between Parallel Session Between the batch manager and the task manager on the task status

    29. 29 Example of passing parameters between Micro-services trimColl.ir file: myTestRule||acGetIcatResults(*Action,*Condition,*B)## forEachExec(*B,msiDataObjTrim(*B,tgReplResc,null,1,null,*C),nop)|nop##nop *Action=trim%*Condition= COLL_NAME = '/tempZone/home/rods/loopTest' *Action%*Condition irule –F trimColl.ir

    30. 30 Using the rulegen parser See: https://www.irods.org/index.php/HELP.rulegen Uses a nicer rule language and converts it into the core.irb version rulegen –s rX.r This converts from the rulegen syntax to the core.irb syntax and displays the result on your screen rulegen –s rX.r > rX.ir This converts from the rulegen syntax to the core.irb syntax and stores the result in the file rX.ir irule –F rX.ir Executes the policy

    31. 31 Adding metadata values mytestrule{ msiString2KeyValPair("FILETYPE_STATUS2=FTPASS",*kvp); msiAssociateKeyValuePairsToObj(*kvp,*path,"-d"); } INPUT *Att=$FILETYPE,*Val=$text,*path=/renci/home/rods/listMS.ir OUTPUT ruleExecOut Note that there cannot be any spaces around the “=“ sign within the msiString2KeyValPair micro-service. Spaces are interpreted as part of the attribute name and attribute value.

    32. 32 Adding Metadata mytestrule{ msiString2KeyValPair("*attrname=*attrvalue",*kvp); assign(*A,*path/*obj); writeLine(stdout,*A); msiAssociateKeyValuePairsToObj(*kvp,*path/*obj,"-d"); } INPUT *path=/renci/home/rods,*obj=$listMS.ir,*attrname="FILETYPE", *attrvalue="25" OUTPUT ruleExecOut

    33. 33 Reading user-defined metadata acGetDataObjAVU{ msiMakeQuery("META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE, COLL_NAME, DATA_NAME", "COLL_NAME = '*CollName'", *Query); msiExecStrCondQuery(*Query, *GenQOut); forEachExec(*GenQOut){ msiGetValByKey(*GenQOut, META_DATA_ATTR_VALUE, *AttrValue); msiGetValByKey(*GenQOut, META_DATA_ATTR_NAME, *AttrName); msiGetValByKey(*GenQOut, DATA_NAME, *name); writeLine(stdout,"*name has attribute *AttrName and value *AttrValue"); } } INPUT *CollName="$/renci/home/rods" OUTPUT ruleExecOut This lists all of the user-defined metadata values for all of the files in the named collection

    34. 34 Example of multiple conditions acGetDataObjAVU{ msiMakeQuery("META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE, COLL_NAME, DATA_NAME", "COLL_NAME = '*CollName' and META_DATA_ATTR_NAME = '*AttrName'", *Query); msiExecStrCondQuery(*Query, *GenQOut); forEachExec(*GenQOut){ msiGetValByKey(*GenQOut, META_DATA_ATTR_VALUE, *AttrValue); msiGetValByKey(*GenQOut, META_DATA_ATTR_NAME, *AttrName); msiGetValByKey(*GenQOut, DATA_NAME, *name); writeLine(stdout,"*name has attribute *AttrName and value *AttrValue"); } } INPUT *CollName="$/renci/home/rods", *AttrName="FILETYPE" OUTPUT ruleExecOut This only lists files that have the specified attribute name

    35. 35 Simple rule to list files testlist.ir mytestRule||acGetIcatResults(*Action,*Condition,*B)## forEachExec(*B,msiGetValByKey(*B,DATA_NAME,*D)## msiGetValByKey(*B,COLL_NAME,*E)## writeLine(stdout,*E/*D),nop)|nop##nop *C=/renci/home/rods%*Action=list%*Condition=COLL_NAME = ’*C' ruleExecOut Try irule -F testlist.ir prompt irule -F testlist.ir ‘yourpathname’ irule -F testlist.ir *C=‘yourpathname’

    36. 36 Converting String to AVU triplet testrule|| msiDataObjChksum(*objPath,null,*ChksumStr)## msiGetSystemTime(*Date,human)## msiString2KeyValPair(Checksum.*Date=*ChksumStr,*KVPair)## msiAssociateKeyValuePairsToObj(*KVPair,*objPath,-d)|nop *objPath=/tempZone/home/antoine/tmp.txt ruleExecOut

    37. 37 Installation of iRODS Chien-Yi Hou

    38. 38 iRODS Wiki http://irods.diceresearch.org Descriptions of the technology Publications / presentations Download Performance tests Tinderbox system (tracks upgrades) irods-chat page

    39. 39 iRODS installation Download appropriate installation manual from iRODS Wiki http://irods.dicerearch.org Installation procedure will take Up to 30 minutes for server/catalog/clients Up to 10 minutes for server/clients About 3 minutes for clients We will do a client install

    40. 40 Windows Installation From the URL https://www.irods.org/index.php/windows go to the section labeled Windows i-Commands and click on the file 10-29-09: Windows i-commands 2.2 This will download the file win_icmds_2_2.zip Uncompress the file

    41. 41 Detailed Windows Install Extract the exe files. This will be a long list of separate executable commands, one for each type of operation that you may need to perform. The list will include: iadmin - used by the data grid administrator to set up resources and accounts icd - change to a different directory in the data grid ils - list files in a data grid directory To use these icommands, you will need to set up an environment variable file which has default settings for the data grid that the class will use. Note the directory name where you have put the executables

    42. 42 Detailed Windows Install On the URL https://www.irods.org/index.php/windows there are instructions in the section labeled Setting up the iRODS User Environment file in Windows (for i-commands only) To create the .irodsEnv file: * Launch a "Command Prompt" by navigating to the menu "Start" -> "Accessories" -> "Command Prompt". * Change directory to the user home directory. > cd %HOMEDRIVE%%HOMEPATH% * Type the following Windows command to create a folder, ".irods", and move into this directory. > md .irods > cd .irods > Notepad .irodsEnv This will launch a Notepad and create a text file named ".irodsEnv".

    43. 43 Detailed Windows Install Enter the following information into Notepad and click save. irodsHost iren.renci.org’ irodsPort 1247 irodsDefResource 'renci-vault1' irodsHome '/RENCI/home/usertutor1' irodsCwd '/RENCI/home/usertutor1' irodsUserName ’usertutor1' irodsZone ’renci’ These are the Environment variables for a user account on the data grid ‘RENCI’ You will need to replace the three occurrences of ‘usertutor1’ with your iRODS account name on lines 4, 5, 6

    44. 44 Detailed Windows Install To run i-commands in any directory in a Windows machine, the path to where i-commands reside should be set in the Windows PATH environment variable. To do this, launch the System dialogue via: * Start -> settings -> control panel. * Click the "System" icon. * In the "Advanced" tab, click the "Environment variables" button. Add the path name for the i-commands directory to the "PATH" either in user category or the system category. The path name can be found from the window that shows the icommand executables. Add a semi-colon and this path name to the end of the PATH text. Then close the window and start a new command prompt window. You will be able to execute the icommands from any directory on your system.

    45. 45 Detailed Windows Install To connect to the data grid, type iinit To change your password, type ipasswd You will be prompted for your current password You will then be asked for the new password

    46. 46 iRODS - Unix/Linux/Mac Installation https://www.irods.org/download.html Fill out form for: BSD license Registration / agreement Tar file Installation script (Linux, Solaris, Mac OSX) Automated download of PostgreSQL, ODBC Installation of PostgreSQL, ODBC, iRODS Initiation of iRODS collection

    47. 47 iRODS Installation- Unix Unpack the release tar file gzip -d irods.tgz tar xf irods.tar cd into the top directory and execute ./irodssetup It will prompt for a few parameters

    48. 48 irodssetup Set up iRODS ------------------------------------------------------------------------ iRODS is a flexible data archive management system that supports many different site configurations. This script will ask you a few questions, then automatically build and configure iRODS. There are four main components to iRODS: 1. An iRODS server that manages stored data. 2. An iCAT catalog that manages metadata about the data. 3. A database used by the catalog. 4. A set of 'i-commands' for command-line access to your data. You can build some, or all of these, in a few standard configurations. For new users, we recommend that you build everything.

    49. 49 iRODS Client Installation iRODS configuration setup ---------------------------------------------------------------- This script prompts you for key iRODS configuration options. Default values (if any) are shown in square brackets [ ] at each prompt. Press return to use the default, or enter a new value. For flexibility, iRODS has a lot of configuration options. Often the standard settings are sufficient, but if you need more control enter yes and additional questions will be asked. Include additional prompts for advanced settings [no]?

    50. 50 iRODS Client Installation iRODS configuration (advanced) ------------------------------ iRODS consists of clients (e.g. i-commands) with at least one iRODS server. One server must include the iRODS metadata catalog (iCAT). For the initial installation, you would normally build the server with the iCAT (an iCAT-Enabled Server, IES), along with the i-commands. After that, you might want to build another Server to support another storage resource on another computer (where you are running this now). You would then build the iRODS server non-ICAT, and configure it with the IES host name (the servers connect to the IES for ICAT operations). If you already have iRODS installed (an IES), you may skip building the iRODS server and iCAT, and just build the command-line tools. Build an iRODS server [yes]? no

    51. 51 iRODS Client Installation iRODS can make use of the Grid Security Infrastructure (GSI) authentication system in addition to the iRODS secure password system (challenge/response, no plain-text). In most cases, the iRODS password system is sufficient but if you are using GSI for other applications, you might want to include GSI in iRODS. Both the clients and servers need to be built with GSI and then users can select it by setting irodsAuthScheme=GSI in their .irodsEnv files (or still use the iRODS password system if they want). Include GSI [no]? no

    52. 52 iRODS Client Installation Confirmation ------------ Please confirm your choices. -------------------------------------------------------- GSI not selected Build iRODS command-line tools -------------------------------------------------------- Save configuration (irods.config) [yes]? Saved. Start iRODS build [yes]?

    53. 53 iRODS Client Installation Build and configure ------------------- Preparing... Configuring iRODS... Step 1 of 4: Enabling modules... properties Step 2 of 4: Verifying configuration... No database configured. Step 3 of 4: Checking host system... Host OS is Mac OS X. Perl: /usr/bin/perl C compiler: /usr/bin/gcc (gcc) Flags: none Loader: /usr/bin/gcc Flags: none Archiver: /usr/bin/ar Ranlib: /usr/bin/ranlib 64-bit addressing not supported and automatically disabled.

    54. 54 iRODS Client Installation Step 4 of 4: Updating configuration files... Updating config.mk... Created /iRODS/config/config.mk Updating platform.mk... Created /iRODS/config/platform.mk Updating irods.config... Updating irodsctl... Compiling iRODS... Step 1 of 2: Compiling library and i-commands... Step 2 of 2: Compiling tests... Done!

    55. 55 iRODS Client Installation ----- To use the iRODS command-line tools, update your PATH: For csh users: set path=(/iRODS/clients/icommands/bin $path) For sh or bash users: PATH=/iRODS/clients/icommands/bin:$PATH Please see the iRODS documentation for additional notes on how to manage the servers and adjust the configuration. Change the path name to your installation path

    56. 56 Environment Variables In home directory cd ~/.irods vi .irodsEnv Default values to describe settings for interacting with your data grid

    57. 57 Environment File # iRODS personal configuration file. # # This file was automatically created during iRODS installation. # Created Fri Jan 18 10:01:48 2008 # # iRODS server host name: irodsHost ‘iren.renci.org’ # iRODS server port number: irodsPort 1247 # Home directory in iRODS: irodsHome ’/RENCI/home/usertutor1' # Current directory in iRODS: irodsCwd ’/RENCI/home/usertutor1' # Account name: irodsUserName ’usertutor1' # Zone: irodsZone ’renci'

    58. 58 User Configuration To use the iRODS 'i-commands', update your PATH: For csh users: set path=(/storage-site/iRODS/clients/icommands/bin $path) For sh or bash users: PATH=/storage-site/iRODS/clients/icommands/bin:$PATH

    59. 59 irodsctl - script to control iRODS Usage is: ./irods/irodsctl [options] [commands] Help options: --help Show this help information Verbosity options: --quiet Suppress all messages --verbose Output all messages (default) iRODS server Commands: istart Start the iRODS servers istop Stop the iRODS servers irestart Restart the iRODS servers

    60. 60 irodsctl options Database commands: dbstart Start the database servers dbstop Stop the database servers dbrestart Restart the database servers dbdrop Delete the iRODS tables in the database dboptimize Optimize the iRODS tables in the database dbvacuum Same as 'optimize' General Commands: start Start the iRODS and database servers stop Stop the iRODS and database servers restart Restart the iRODS and database servers status Show the status of iRODS and database servers test Test the iRODS installation

More Related