1 / 15

Shelter from the Storm

Shelter from the Storm. Building a Safe Archive in a Hostile World. SCOOP Goal. SURA-funded Coastal Modeling Project Want to develop the community’s cutting-edge techniques to make them ready for use in tomorrow’s production systems.

Télécharger la présentation

Shelter from the Storm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shelter from the Storm Building a Safe Archive in a Hostile World

  2. SCOOP Goal • SURA-funded Coastal Modeling Project • Want to develop the community’s cutting-edge techniques to make them ready for use in tomorrow’s production systems. • For example, automatic verification of storm/surge models against observed data, to help improve the models

  3. CCT Goals • One of CCTs key research outputs is software • Want this to be software of a good quality, to be robust • Want re-use of software across projects • Also want software to be picked up by external users, as well as collaborators

  4. The SCOOP Archive • Need to archive lots of files • Atmospheric models (MM5, GFDL) • Hydrodynamic models (ADCIRC, SWAN, etc) • Observational data (sensor data, buoys) • Requirements poorly defined: • How much data? Don’t know • How long should we keep it for? Don’t know • Have to interface with bespoke data transport mechanisms (LDM) • How to achieve our goals under these conditions?!

  5. Basic Archive Operation Upload: • Client signals they want to do an upload of some files (names are given) • Archive tells the client where to upload them to (transaction handles) • Client uploads files (indep. of archive) • Client tells archive it’s done • Archive creates the logical filenames • Use “upload” tool for this

  6. Basic Archive Operation Download: • Clients use the catalog service to discover/search for logical filenames • Clients talk to the RLS server to get physical URLs • Interact with physical URLs directly • Can use “getdata” CLI tool to encapsulate this • Also, there are portal pages...

  7. Operations on Service • fileUploadBegin - for starting an upload • fileUploadEnd - for saying that an upload is completed • logicalNameRetry • removeDeadTransactions • closeArchive

  8. Distributed Software • Some services hosted externally • Can’t assume our machine or s/w never fails • Need to retain state of our service on restart

  9. Robust Code • Don’t assume our service will remain “up” => Keep all internal state in a database => Reload internal state on a restart • Don’t assume external services always “up” => Design loosely coupled services => Store pending interactions in the database => Retry these periodically • Do “stress testing” on the service during the testing/debug cycle

  10. Keep the internalAPIs Simple int logname_initialize(void); void logname_remove(void); bool logname_create_logfile (std::string logical_name, bool name_is_final, const std::vector<std::string>& urls); bool logname_delete_logfile(std::string logical_name); ulong logname_upload_pending_lognames (ulong max_rows, ulong& total_found, ulong& max_rows_used);

  11. Encouraging Reuse • SCOOP Archive has lots of strange rules about filenames and metadata • During design and implementation, keep thinking: • Is this for the SCOOP project, or • Is this a generic feature • Use good O-O design to keep SCOOP code separate from archive code

  12. Keeping SCOOPto one side... class ArchiveFilingLogic { public: // Called by the default moveFiles implementation virtual bool createPhysicalPath(std::string physicalPath); virtual bool moveFiles(std::vector<std::string>& fileNames,std::vector<std::string>& missingFiles,std::string stagePath,std::string physicalPath); virtual void physicalLocationForFiles (const std::vector<std::string>& filenames, std::map<std::string,std::string>& directories, std::map<std::string,std::string>& errors)=0; virtual std::vector<std::string> logicalNamesForFiles(const std::vector<std::string>& filenames,std::string physicalPath)=0; } ;

  13. New Requirements • Handling common compression formats • Producing subsets of data (predictively) • Tracking data before it is ingested • Notifying people when data arrives • Transforming data to other formats • Generating analytical data “on the fly” • Federating data across multiple locations • Good initial design will simplify all this...

  14. Highest Priority... • Archive machine running out of space • People have started to rely on the service • So, currently we are uploading copies of all data to SDSC DataCenter, using SRB • Now need to keep track of URLs on physically distributed resources • But SRB can help with some of the other requirements...

  15. Any Questions?

More Related