1 / 17

Data Management Breakouts

Data Management Breakouts. Jeff Kantor. Data Management Sessions. DM Overview for Newcomers / Intro to Summer 2014 Session – Jeff Kantor, Mario Juric We introduced the new members of DM Jeff Kantor presented an overview of Data Management

aaron-keith
Télécharger la présentation

Data Management Breakouts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management Breakouts Jeff Kantor

  2. Data Management Sessions • DM Overview for Newcomers / Intro to Summer 2014 Session – Jeff Kantor, Mario Juric • We introduced the new members of DM • Jeff Kantor presented an overview of Data Management • We had a presentation from Arfon Smith about github development, and a question and answer session • The DM Leadership Team will decide how to leverage what we heard

  3. Data Management Sessions • DM's Calibrations Plans; Refine Calibration Data, Products, Processing – Robert Lupton • Presented latest calibration plan • Discussion failed to find any fatal flaws • Next step • Add details to document and circulate • Identify inputs from the camera system, and check against LSE-130

  4. Data Management Sessions • Summer 2014 Development Retrospective – Jeff Kantor, Mario Juric • We reviewed the Summer 2014 plan in PMCS (as imported from JIRA Agile) noting what was complete and what remained to be done • We explained the import process and discussed how we can standardize the JIRA Agile information to support EV • We reviewed the major deliverables from each team and what was accomplished to date • Performance analysis • The good: new tools (JIRA) in use and appear to be working, majority of the goals set by the individual teams were accomplished, initial rebuild of the Continuous Integration (CI) system done • The bad: Not everything completed yet, but can be by September 30th • The ugly: JK/MJ and many others were kept busy by hiring activities, NSF, and long-term planning; no time left to oversee the development work. Some of it diverged from the long-term plan. We knew this was a risk, but made the conscious calculation that (quality) hiring & starting construction was more important. Going forward, it’s critical to ramp up ASAP (w. good people!).

  5. Data Management Sessions • LSST Transient Management: Building on Current Experiences 1 & 2 – Andy Becker • Presentations were given by: • Robert Gruendl – DES Data Management • Luizde Costa - Real-time streaming QA for DES • Rick Kessler - Image subtraction for DES • Simon Krughoff - obs_decam • Alex Kim - Random forest classification of transients • Francisco Forster and Guillermo Cabrera-Looking for core collapse supernova shock breakouts with DECam • Przemek Wozniak - LANL version of Real/Bogus software v5.0 for iPTF • Jonathan Myers – Linktracklets • A number of areas of future collaboration interest were identified

  6. Data Management Sessions • Tool Chains, Developer Visualization & Debugging Tools – Kian-Tat Lim • DM-internal training and discussion • Described and gave links to documents on DM development tools and their usage • Decisions: • C++11 after testing SWIG 3 (Russell Owen) • Python 3 when dependencies prefer it • Allow force-push of ticket branches, rebase/squash • Improve shared stack performance at NCSA

  7. Data Management Sessions • LSST Software Stack Users Tutorial – Dick Shaw • We described the LSST Stack: • how to install it, use it, and how to get help. • The ~35 participants were ~50% scientists vs. engineers & software developers. ~1/3 were comfortable with programming in python • The examples demonstrated how to: • Download (SDSS) or create (with PhoSim) data and configure it • Bulk process the data with command-line tasks, all the way through a data-release production • We plan to address user requests: • More worked examples of using the Stack • Simpler examples of how to customize the Stack for their use • The ability to perform photometry on a single FITS image from any camera

  8. Data Management Sessions • Winter 2015 Planning 1 & 2 – Mario Juric, Jeff Kantor • We reviewed the major features and results of the Summer 2014 release in LDM-240 Data Management Development Roadmap • We reviewed the major features planned for the Winter 2014 release in LDM-240, adjusting those that needed to move to/from another release • Winter 2015 Priorities: • Establish the Continuous Integration system • Track performance metrics • Release often (intermediate releases) • Adopt a DevOps mindset • Bring new people on and up to speed • Implement MultiFit in the DM framework, and start on 2015 roadmap goals • We tasked the team to develop the Winter 2015 JIRA Epics, prioritize them, and estimate resources required for each and available within the team • With this input, we will implement the Epics in JIRA, and import this information into the PMCS for EV

  9. Data Management Sessions • DM Stack Boot Camp – Paul Price, Simon Krughoff • Introduction to afw, Task, CameraGeom • Feedback so far has been all positive • Greater understanding of and appreciation for the DM stack • Hope this translates to more extensive and more confident use of the stack • Now we need to: • Continue the push on documentation • Support our growing user base • Establish policy for supporting existing APIs

  10. Data Management Sessions • How to Use, Re-use Tasks and Integrating Camera Geom in other work – Paul Price , Simon Krughoff • Paul did a comprehensive survey of available tasks and covered many of the basic Task concepts: configuration, inheritance, sub-tasks. • Simon gave an overview of how Camera Geom is used and how to construct a camera using obs_decam as an example. • No decisions, but good interaction with the community. Interest in using Tasks and CameraGeom. Some priorities are: • Documentation of tasks and task flow. • Tools to build CameraGeom from multi-extension FITS files.

  11. Data Management Sessions • Unit and Regression Tests – Kian-Tat Lim • DM-internal training and discussion • Worked example of coverage tools and improving unit tests • Discussed end-to-end integration needs • Improve test dataset • (Re-)write end-to-end test scripts with multiple configurations • Write Tasks to compute performance metrics • Build monitoring/trending for metrics

  12. Data Management Sessions • Using the DM Stack to Characterize Detectors – Robert Lupton • Discussed work at BNL/SNAL • Identified ways that DM can help: • Introduce use of DM’scameraGeom • Write code to generate from LCA-10140 headers • Use DM’s ISR/assembleCCD tasks • Not clear how useful it would be to use DM’s measurement framework

  13. Data Management Sessions • How to Fit a Galaxy Model – Jim Bosch • Tutorial and discussion of algorithmic ideas, with mostly non-DM participants. • Some topics discussed: • what MultiFit means to us and what we plan to do • kinds of models to fit, and how to evaluate them • Bayesian priors and sampling • how to define galaxy colors • star/galaxy classification • whether per-pixel variances should be used when doing photometry (very lively discussion)

  14. Data Management Sessions • Summit, Summit - Base Network Infrastructure – Ron Lambert • We showed the current and proposed plans for the networking paths from Chile to NCSA and Summit to Base. There are improved path diversification on some links than were previously considered. • Plans of the summit and base computer networks were presented. • Continued work required to refine the data paths for the summit and base computer networks • Continuing to work with International link provider AmLight, Chilean telecoms and Chile NREN to legally ratify the various network paths • Expect to have the network paths for LSST with required bandwidths from Summit to NCSA in operation well before end of CY2016. • We took actions to further review and update the diagrams and to develop an inventory of the network equipment in both sites • After August 25 meetings with telcos we will turn in an LCR to LSE-78 to update it.

  15. Data Management Sessions • DM Developer Hackathon 1 & 2– Mario Juric • Qserv now builds within the continuous integration system • We now have the science pipelines, the database, MAF, and CatSim, all building/being tested automatically! • CI Team: All your base are belonging to us!  • Fixes for CFHT processing • Speedups of builds (work in progress) • General note: we should do this more often.

  16. Data Management Sessions • Base Facility Infrastructure, Data Center Design – Kian-Tat Lim Don Petravick • Went through LSE-77 and support document • Exposed issues of redundant cooling, need for inventories of equipment with notional layout • Decided: • Will support two reliability tiers • No separate access controls needed within computing area • Need to engage an engineer knowledgeable about cooling • Need to look at typical tape library footprints in addition to best case • Concrete floor with overhead wiring is now the consensus • Power Utilization Efficiency monitoring required • Loading dock requirements defined

  17. Data Management/SE session • Visualization Tools – Gregory Dubois-Felsmann (Thursday 11:00) • Assembled people from OCS, TCS, Camera, DM, EPO, Simulation to discuss visualization requirements and devise a plan • We put together a list of highest-level use cases • The plan is to flesh these out into finer-grained use cases and requirements, and use that information to… • Identify areas where common tools can be adopted/built and areas where subsystems may have to go it alone • We will consider the overlap between tools required internally by the project (during construction and operation) and tools required for the external science user interface • We plan to have two major teleconferences: October 3rd and early December, and then an in-person meeting in January at which scope decisions can be made • We have a Confluence page and a mailing list for communication

More Related