1 / 30

GRAD 521, Research Data Management Winter 2014 – Lecture 3 Amanda L. Whitmire, Asst. Professor

Types, Formats & Stages of Data. GRAD 521, Research Data Management Winter 2014 – Lecture 3 Amanda L. Whitmire, Asst. Professor. Lesson Three Outline. Data types & formats. Research lifecycle & stages of data. Data management in the lifecycle . Data Types: Observational.

susan
Télécharger la présentation

GRAD 521, Research Data Management Winter 2014 – Lecture 3 Amanda L. Whitmire, Asst. Professor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Types, Formats & Stages of Data GRAD 521, Research Data Management Winter 2014 – Lecture 3 Amanda L. Whitmire, Asst. Professor

  2. Lesson Three Outline Data types & formats Research lifecycle &stages of data

  3. Data management in the lifecycle

  4. Data Types: Observational • Captured in situ • Can’t be recaptured, recreated or replaced Examples: Sensor readings, sensory (human) observations, survey results

  5. Data Types: Experimental • Data collected under controlled conditions, in situ or laboratory-based • Should be reproducible, but can be expensive • Examples: gene sequences, chromatograms, spectroscopy, microscopy, cell counts

  6. Data Types: Derived or Compiled • Compiled: integration of data from several sources (could be many types) • Derived: new variables created from existing data • Reproducible, but can be very expensive • Examples: text and data mining, compiled database, 3D models

  7. Data Types: Simulation • Results from using a model to study the behavior and performance of an actual or theoretical system • Models and metadata, where the input can be more important than output data • Examples: climate models, economic models, biogeochemical models

  8. Data Types: Reference/Canonical Static or organic collection [peer-reviewed] datasets, most probably published and/or curated. Examples: gene sequence databanks, chemical structures, census data, spatial data portals

  9. Amanda’s dissertationThe spectral backscattering properties of marine particles Reference global oceanic bio-optical observations [NASA] Observationsship-based sampling & moored instruments Compiled observations global oceanic bio-optical observations [self + from peers] Simulation results scattering & absorption of light Derived variables endless things Experimental optical properties of phytoplankton cultures

  10. Data types: another classification Qualitative data “is a categorical measurement expressed not in terms of numbers, but rather by means of a natural language description. In statistics, it is often used interchangeably with "categorical" data.” See also: nominal, ordinal Quantitative data “is a numerical measurement expressed not by means of a natural language description, but rather in terms of numbers. However, not all numbers are continuous and measurable.” “My favorite color is blue-green.” vs. “My favorite color is 510 nm.” Source: Wikibooks: Statistics

  11. More common data types Geospatial data has a geographical or geospatial aspect. Spatial location is critically tied to other variables. Digital image, audio & video data Documentation & scripts Sometimes, software code IS data; likewise with documentation (laboratory notebooks, written observations, etc.)

  12. File Formats “A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium.” - Wikipedia Data type Possible formats Qualitative, tabular experimental data Excel spreadsheet (.xlsx) Comma-delimited text (.csv) Access database (.mdb/,accdb) Google Spreadsheet SPSS portable file (.por) XML file

  13. Preferable Format Types for Long-Term Access to Data Data formats that offer the best chance for long-term access are both: Non-proprietary (also known as open), and Unencrypted & uncompressed

  14. Preferred Formats Moving Images: MOV, MP4 Audio: Free Lossless Audio Codec (.flac), WAVE, MP3 Quantitative: ASCII, SAS, SPSS Geospatial: ESRI Shapefile(essential: .shp, .shx, .dbf; optional: *.prj, *.sbx, *.sbn), geo-referenced TIFF (.tif, .tfw), CAD data (.dwg), tabular GIS attribute data Digital Images: TIFF v.6, JPEG 2000 Text: PDF/A, ASCII

  15. What types & formats of data will you be generating and/or using? Reflective Writing: 60 seconds Observational | Experimental | Derived | Compiled | Simulation | Reference/Canonical Qualitative | Quantitative | Geospatial Image/audio/video | Scripts/codes

  16. Back to the research lifecycle

  17. Stage 1: Project conception & planning Project ideas are developed and fleshed out. Actions • Review existing datasets. • Determine where you will archive the data (your funder may have a required location), and consult with the archive about metadata and data formatting requirements. • Plan consent for data sharing, if needed. • Determine costs related to data storage & archiving. • Data management plan (DMP) is created.

  18. Stage 2: Project startup Actions • Finalize plans for data documentation & storage procedures. • Revise DMP if necessary. • Communicate data responsibilities to project members. • Establish, document and communicate protocols and methods.

  19. Stage 3: Data lifecycle* Each lifecycle stage has it’s own set of data management actions *Research is an iterative process. As a result, all of these stages may occur simultaneously.

  20. Stage 3: Data lifecycle • Raw • Data Collection Lifecycle stage Data stage Actions • Organize files; use consistent, thoughtful file-naming conventions; • Carry out regular backups (rule: LOCKSS); • Consider access control & security.

  21. Stage 3: Data lifecycle Lifecycle stage Data stage Actions • Assign clear roles for QA/QC; • Check instrument precision, bias, and scale; replicates where possible; • Use controlled vocabulary on log/data sheets; • Check for out-of-range values, empty cells, etc. Intermediate QA/QC • Raw

  22. Stage 3: Data lifecycle Intermediate • Describe & Document Lifecycle stage Data stage Actions • Document the context of data collection; • Data-level documentation; • Document protocols, conditions, etc. • Document how figures are produced

  23. Stage 3: Data lifecycle Intermediate • Processing & Analysis Lifecycle stage Data stage Actions • Document and manage file versions; • Document file manipulations; • Document analysis procedures; • Writesoftware codes with future sharing in mind; • You should still be doing regular backups.

  24. Stage 3: Data lifecycle • Final • Data Sharing Lifecycle stage Data stage Actions • Determine appropriate file formats; • Finalize data documentation; • Decide which data need to/should be/can be shared; • Share data within project team for analysis and interpretation.

  25. Back to the research lifecycle

  26. Stage 4: End of Project Project is wrapping up; manuscripts in prep. Actions • Write paper(s); • Contact archive for up-to-date requirements; deposit data in archive or repository; • Establishlinks between dataset(s) and paper(s) via citation and the use of identifers (DOI, ORCID, etc.); • Perform data management audit against DMP.

  27. Stages 5 & 6: Data Archived & Shared Data are available for discovery and reuse, by you & others

  28. Questions?

  29. 1. Draw research lifecycle2. Match data management actions with research lifecycle stages3. Report back Buzz Groups: 15 minutes EXAMPLE ACTIONS plan | perform backups | determine appropriate file formats | contact data archive | assign responsibilities | use thoughtful file-naming conventions | document analysis procedures | determine costs

  30. For Thursday… Think about, or investigate, common data collection methods in your area of research, and how paper or electronic notebooks are used. Read mini-case study and answer the included questions: Mini-Case: Identifying Data Types and Stages of Data

More Related