1 / 31

Point Data

Point Data. Overview. NetCDF -CF proposal CDM Point Feature API TDS DAPPER CdmRemote. CF proposal. Encode common “discrete sampling” data collections into NetCDF classic model / netCDF-3 format Proposal to CF Conventions “ Discrete Sampling Geometries”

dooley
Télécharger la présentation

Point Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Point Data

  2. Overview • NetCDF-CF proposal • CDM Point Feature API • TDS • DAPPER • CdmRemote

  3. CF proposal • Encode common “discrete sampling” data collections into NetCDF classic model / netCDF-3 format • Proposal to CF Conventions • “Discrete Sampling Geometries” • John Caron, Steve Hankin, Jonathan Gregory • 25 pages, 2 years and counting • Version 1 Real Soon Now (really) • https://cf-pcmdi.llnl.gov/trac/

  4. Discrete Sampling Data encoding • Encoding variants • Multidimensional arrays • Contiguous Ragged Arrays • Indexed Ragged Arrays • Single Feature in a file • Make it easy / efficient to • Read a Feature from a file • Subset the collection by space and time

  5. Discrete Sample Feature Types • point: a collection of data points with no connection in time and space • timeSeries: a series of data points at the same location, with varying time • trajectory: a series of data points along a curve in time and space • profile: a set of data points along a vertical line • timeSeriesProfile: a series of profiles at the same location, with varying time • trajectoryProfile: a set of profiles which originate from points along a trajectory

  6. CDM Implementation • Have a working implementation of the entire proposal in CDM 4.2 • Some minor changes not yet made • Beta: needs user testing • Not entirely happy with the API (complexity) • Based on “Nested Table” abstraction • common representations • ArrayStructure, Construct, Contiguous, LinkedList, MultidimInner, MultidimInner3D, MultidimInnerPsuedo, MultidimInnerPsuedo3D, MultidimStructure, NestedStructure, ParentId, ParentIndex, Singleton, Structure, Top • XML configuration possible – easy to add new datasets

  7. Table ConfigurerPlugins • BUFR • CF Conventions • Cosmic • GEMPAK Point • IRIDL station (IRI/LDEO Climate Data Library) • Jason (NASA Ocean Surface Topography Mission) • FSL Wind Profiler • MADIS ACARS • MADIS surface observations • NBDC (National Buoy Data Center) • NCAR-RAF/nimbus • NLDN (National Lightning Data Network) • Suomi-Station • Unidata Observation Dataset Conventions

  8. Application Point Feature API CDM architecture Point Feature Types Datatype Adapter BUFR NetcdfDataset GEMPAK Table Configurer Plugins CoordSystem Builder CF NetcdfFile COSMIC I/O service provider … NetCDF-3 NIDS NetCDF-4 GRIB …

  9. CDM PointFeature UML

  10. CDM Point Feature API FeatureDatasetfd = FeatureDatasetFactoryManager.open( FeatureType.STATION, location, null, log); FeatureCollectionfc = fd.getPointFeatureCollectionList().get(0); StationCollectiontimeSeriesCollection = (StationCollection) fc; PointFeatureCollection points = timeSeriesCollection.flatten( new LatLonRect( new LatLonPointImpl(40.0, -105.0), new LatLonPointImpl(42.0, -100.0)), new DateRange(start, end)); // iterate while(points.hasNext()) { ucar.nc2.ft.PointFeature pointFeature = points.next() Location loc = pointFeature.getLocation(); ... }

  11. Some observations • All requests are in “coordinate space” • Indicate what subset you want, then request “all at once” • Subset is virtual • Library can (try to) optimize the request • “Iterators over result set” vs List or Array • Result set does not have to fit into memory • Allow streaming data (don’t wait until you have it all)

  12. Index vs Coordinate Request float temp(station, time); float data = temp.read(234,23); http://server/Metar.dods?temp[234][23] vs http://server/ncss/Metar.nc?var=temp&time=2008-10-28T12:00:00Z& station=KMDR SELECT temp FROM metar WHERE metar.time= “2008-10-28T12:00:00Z” AND metar.station= “KMDR” Array data = temp.read(“2008-10-28T12:00:00Z”, “KMDR”);

  13. Multidimvs Ragged float temp(sample); intstation_index(sample); for (inti=0; i<sample.len; i++) { if (station_index(i) == KMDR_index) data = read(“http:/server/Metar.dods?temp[i]”); // BAD for (inti=0; i<sample.len; i++) { if (station_index(i) == KMDR_index) indexList.add(i); http:/server/Metar.dods?temp[2,3,5,78,90,123,456,789] // BETTER vs http://server/ncss/Metar.nc?var=temp&time=all&station=KMDR // MO BETTA

  14. Indexed access considered harmfulfor rolling data archives • Cannot deal with constantly changing dataset • This is a contract with the application • When can you break it? • Difficult to reconcile with HTTP/OPeNDAP as a stateless protocol • TDS is broken (Shhhh…) float temp(sample=238743874); float time(sample=238743874); :units= “secs since 2008-10-28T12:00:00Z”;

  15. Where are we going? • Indexed access ok for local, static, “small” datasets • Need new data access paradigm for large, changing, remote dataset collections • Requests in Coordinate Space • Specify entire subset at once – aka “set at a time” • Allow parallelism, optimization

  16. THREDDS Data Server • Forecast Model Run Collection (2D time) • Create a set of 1D Grid datasets • Place in the TDS Configuration catalog: <featureCollectionfeatureType="FMRC” path="fmrc/NCEP/GFS/CONUS_80km"> <collection spec="/data/NCEP/GFS_CONUS_80km_#yyyyMMdd_HHmm#.grib1“/> <update startup="true" rescan="0 5 3 * * ? *" trigger="allow"/> <protoDataset choice="Penultimate" change="0 2 3 * * ? *" /> <fmrcConfigdatasetTypes="Best Files Runs ConstantForecasts" /> </featureCollection>

  17. TDS Point Feature Collection • Scheduled for TDS 4.3 • Configuration: • What services to expose? • Not indexed data access • Hook into Point Feature API on client <featureCollectionfeatureType=“STATION” path="nws/metar/ncdecoded "> <collection spec="/data/metar/Surface_METAR_#yyyyMMdd_HHmm#.nc$“/> <update startup="true" rescan="0 5 3 * * ? *" trigger="allow"/> <protoDataset choice="Penultimate" change="0 2 3 * * ? *" /> </featureCollection>

  18. DAPPER TimeSeries Dataset { Sequence {Float32 lat;Float32 lon;Float32 elev;Int32 _id;Sequence { Float32 visibility; Float32 max_wind_gust; Float32 dewp; Float64 time; Float32 slp; Float32 temp; Float32 wind_speed; Float32 max_temp; Float32 max_sustained_wind_speed; Float32 min_temp; Float32 precip; } time_series; } location; ... } gsod_time_series;

  19. DAPPER Conventions • Two-level (nested) DAP 2 Sequences • Ragged Arrays with coordinate subsetting • Handles timeSeries and profile FeatureTypes • Requires a unique id for each feature • Requires lat / lon / z / time coordinates • Handle longitude wrapping? (yes) • Data variables only in inner sequence, must be floats • Handling of CE on data variables not required • OPeNDAP spec requires • How does client know what if CE allowed ?

  20. DAPPER Conventions - Analysis • Likely easy to hook up to CDM Station/Profile Feature Collection API • Needs to be generalized / clarified • to handle arbitrary datasets • to support the other Point Feature Types • Not sure who would be in change of standard ? • Result set has fixed layout – makes streaming hard • Not accessible through NetCDFAPI – who are the clients? • DAP 2 cant transport NetCDF-4 / CDM data model • Shared Dimensions, Groups, enums, longs, chars, etc • DAP 4 where are you? • Doesn’t have a general coordinate system mechanism.

  21. NetCDF Subset Service (4.0) • Experiment with REST style web service • Allow to subset the dataset by: • Lat/lon bounding box • time and vertical coordinate range • list of Variables • Gridded Data • Output is NetCDF –CF file • Variation of WCS (simplified request protocol) • Grid as Point Datasets • Extract vertical profile, time series from one point in model data • Output: NetCDF-CF, XML, CSV • Tried to use for point datasets • NetCDF cant be streamed • Quite slow for large data collections

  22. ncstream (4.1) • NetCDF files (almost always) have to be written, then copied to network • Assumes random access, not stream • “read optimized” : data layout is known • ncstream explores what “streaming netcdf” might look like • “write-optimized”: append only • Efficient conversion to netCDF files on the client • Ncstreamdata model == CDM data model • Binary encoding using Google'sProtobuf • Binary object serialization, cross language, transport nuetral, extensible • Very fast: some tests show >10x OPeNDAP • Have experimental versions in CDM and TDS since 4.1

  23. CdmRemote web services (4.2) • Follow on to Netcdf Subset Service • Point Feature datasets • Use ncstream for the OTW protocol • In CDM, TDS since version 4.2 • Need to add FeatureCollection configuration

  24. Accessing Point Feature Collections Java Client C Client TDS cdmRemote CDM Point Feature API CDM Point Feature API CDM Point Feature API Application Application Coordinate Systems Data Access Data

  25. Possibility: CdmRemote Server • Lightweight server for CDM datasets • Zero configuration • Local filesystem • Cache expensive objects • Java and C clients • Allow non-Java applications access to CDM stack • Coordinate space queries • Virtual datasets • Feature Types

  26. C library – enable other languages Python / ? cdmRemote Server C Client cdmRemote CDM Point Feature API CDM Point Feature API Application Coordinate Systems Data Access Data

  27. Summary • Discrete Sampling CF Conventions almost ready • CDM Point Feature API ready for testing • TDS Point Feature Collections almost ready • Using cdmRemote/ ncstream • Needs catalog configuration mechanism • Need new APIs : what should they be? • What are the clients? • Unidata is evaluating new APIs in C using ncstream as IPC to Java services in another process • May add Python to our list, as resources permit • Open to other solutions, as resources permit

  28. PS: Can you say SQL? • Jim Gray: “Scientific Data Management in the Coming Decade” • Michael Stonebraker: SciDB (http://scidb.org/) • Array oriented data model – extends relational tables • Useable release in Jan 2011 • Evaluating participating in this effort • New Data Access APIs: • Requests in Coordinate Space • Specify entire subset at once – aka “set at a time” • Allow parallelism, optimization

  29. THREDDS/CDM Developers Conference • Invitational • Show related work • Launch Open Source project • Steering Committee • Broader than Unidata • Fall 2011 (?) • FOSS4G conference in Denver (Sep 12) • OGC Technical Committee / Boulder (Sep 19)

More Related