320 likes | 494 Vues
Globus Online. Prepared by Bekir Güler. What is Globus Online.
E N D
Globus Online Prepared by Bekir Güler
What is Globus Online • Globus online transfer is a hosted service (deployed on Amazon’scloudinfrastructure) whichaims to make high performance data transfers easyand painless not just for the end-users of the data, but also for the organizations hosting the machines thatthoseusersinteractwith • Shortly, it is a transfer service. Ituses Software as a service model.
What data movement is important • Getting data to where it needs to be can be a crucial first (or middle or end) step for primary work. Forexample, in order to analyze satellite data, a researcher might first need to move it from a remote sensor to alocalcomputeresource.
Whymoving data is difficult • Reliability • Security • Performance • Usability • Maintainability
1. Reliability • Potential problems at each endpoint include: operatingsystem misconfigurations, filesystem failures, hardware failures, bad firewall settings, authenticationfailures, authorization failures, etc. Then of course there are the many individual network links connectingthe two endpoints, any one of which is subject to ephemeral failures.
2. Security • Moving data within a site you administer is relatively simple from a security standpoint: you are able to controlexactly who accesses what data. However, enabling wide-area transfers necessarily involves exposingyour resource to outside connections. • People use SCP or FTP but while SCP has someperformancedrawback, FTP is not also a secureprotocol.
3. Performance • Unfortunately, most data transfer protocols are not particularly designed for highperformance. • Popular tools like SCP are secure and easy to use, but performance can be a problem for large datasets. Basically,the cost of SCP’s approach to security is performance. SCP fully encrypts all data even when a morelimited approach would be sufficient (such as applying strong authentication and authorization techniqueswhenestablishingconnections.)
4.Usability • Transferring large datasets across a wide-area is not typically a user-friendly experience. Such transactionsoften involve huge amounts of data and can take hours or days (or even weeks) to complete. Long-runningtransfers often end prematurely due to reliability issues, and figuring out which transfers succeeded andwhich failed after a transfer ends can be a tedious task.
5.Maintainability • Providing a high performance data transfer capability typically requires that you installand maintain special software on your users’ machines. If you’re a member of a large organization such as anational laboratory this is usually not a big deal (or, rather, it probably is a big deal, and thus people are actuallyemployed to do that.) If you’re a small or medium resource provider, enabling high performance datamovement is often a one person job. Not only do you have to install new software and keep it current, youalso have to make sure it is configured to run at peak efficiency. And, of course, user support responsibilitiesmight also fall solely on you.
What is the GO-Transfer approach? • To mitigate reliabilty issues, GO-Transfer includes built-in automated fault detection and recoverycodes and is backed by a responsive Help Desk comprised of data movement experts. • • To address security concerns, communications with endpoints are strongly authenticated and authorizedusing native mechanisms; in addition Globus Online never stores private data like passwordsandkeys.
• Performance-wise Globus Online is able to obtain state-of-the-art speeds through its use of theGridFTP transport protocol; user files are transferred directly from endpoint to endpoint, makinguse of high-performance networks where available. • • Usability issues are addressed with an asynchronous “fire-and-forget” model that frees users frombabysitting long-running transfers; email notifications are sent for interesting events like transfer completion;a rich query interface returns detailed information on transfer status and events; the CLI andGUI are implemented with technologies familiar to most users.
• Maintenance costs are mitigated by Globus Online’s software-as-a-service model; users do not needspecial software to control their transfers; streamlined packaging of endpoint software eases the burdenof creating new endpoints, and the Globus Online development team stands at the ready to fixbugs and provide technical assistance where needed.
How touseGlobus Online Transfer • First of all, you will need to sign up for a Globus Online account. In the sign up page, you will need toprovide just some basic information, as in nextslide.
Once you’ve signed up, you will be taken directly to your dashboard. Your account is not quite ready yet,though. You’ll see a notice on the left sidebar telling you that your account is being provisioned
You’re now ready to do your first transfer with Globus Online. As mentioned earlier, we won’t need to setup a GridFTP server or even create our own endpoint for this. We will simply use a pair of public tutorialendpoints that all GO users have access to.To perform a transfer, click on “Start Transfer” from your dashboard. You will be taken to the transferinterface:
How toturnourlocalcomputerto an endpoint • Globus Connect (or GC, for short). • It will set up a private endpoint on your personal computer, which you will then be able to access from theGO-Transfer web interface. Internally, Globus Connect does install and set up a GridFTP server on yourcomputer, but it will hide all those details from you. • To install Globus Connect, just click on “Get Globus Connect” from the transfer interface. You should see a pop-outlikethis:
Globus Connect establishesonlyoutbound connections and thus canwork behind a firewall or other networkinterface device that does notallow for inbound connections. • TheGlobus Connect server is statelessand thus can be started and stoppedat will; all state associated with transfersis maintained by GO. • Autoupdatemeans the user need not maintain thesoftware over time.
DifferentInterfacesfor GO • Globus onli provides different interfaces for users • Friendly webGui:for adhoc and less technical users • CLI( Command line interface):For advenced users • REST (RepresentationalState Transfer) aplicationprogramminginterfaceforsystembuilders
User Profile andIdentity Management • An important GO feature is the abilityto handle transfers across multiplesecuritydomainswithmultipleuser identities. Unlike many systems,including most previous Grid filetransferservices, GO does not requirea single, common security credentialacrossall transfer endpoints.
GO doesn’tstorepasswords. Instead, it storesIdentities. one can configure his proofilewithvariousidentities. Such as MyproxyCertificationandOauthprotocolidentities.
ScalableCloud-basedImplementation • SaaS requires reliability and scalability,continuing to operate despite thefailure of individual components andbehaves appropriately as usage grows.To this end, the GO team appliesmethods commonly used by SaaSproviders,running GO on a commercialcloud provider, Amazon Web Services(AWS). The GO implementation uses acombination of Amazon Elastic ComputeCloud(EC2), Amazon ElasticLoad Balancing, and Amazon Simple Storage Service (S3)
The vast majority of GO isprogrammed in Python, running onUbuntu Linux servers, with CassandraandPostgresdatabases.