1 / 16

Visualizing MiMarks Compliant Datasets: Insights from Bacterial Diversity Studies

This internship project at Tetherless World Constellation focuses on visualizing datasets compliant with Minimum Information about a Marker Gene Sequence (MiMarks). It analyzes two datasets stemming from a bacterial diversity study in the Western English Channel, revealing seasonal microbial community structures. Using tools such as R and D3.js, advanced visualizations like scatter plots and bubble charts provide insights into correlations between environmental variables like nitrate, phosphate, temperature, and organism counts. The project emphasizes the importance of data cleaning and the conversion of datasets to RDF for enhanced analysis.

drew
Télécharger la présentation

Visualizing MiMarks Compliant Datasets: Insights from Bacterial Diversity Studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Science and Visualization 2014 Summer Internship - Tetherless World Constellation Sumithra Gnanasekar Lakshmi Chenicheri

  2. Objective • Visualize Minimum Information about a Marker Gene Sequence (MiMarks) compliant datasets • A dark data exercise *

  3. MiMarks • A standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences • Describes the environment from which the sample has been taken from • Ensures contextual data is collected and submitted *

  4. MiMarks Checklist

  5. Datasets • Two datasets from a bacterial diversity study from the Western English Channel • Focused on the seasonal structure of microbial communities • Dataset 1 was converted from Excel to CSV • Dataset 2 was converted from SRA to CSV • Data cleaning was undertaken to retrieve relevant fields *

  6. Tools for Visualization • R • Google charts integrated with R • Shiny R Studio • D3.js D3.js was finally used due to its flexibility of use and range of visualizations available *

  7. Scatter Plot Dataset 1 • Allows the user to filter fields • Drill and expand • Group based on fields • Handy in determining correlations between variables *

  8. Analysis of Scatter Plot Dataset 1 • Depth, density, total_Depth of water column, longitude and latitude were found to be independent of the other environmental variables • Near linear correlation between nitrate and silicate, and nitrate and phosphate *

  9. Scatter Plot Dataset 2 • Allows the user to filter fields • Drill and expand *

  10. Analysis of Scatter Plot Dataset 2 Linear trend seen in the scatter plots of: • Spots vs Bases • Nitrate vs Phosphate • Org_nitro vs Ord_carb • Temperature vs Density *

  11. Temporal Visualization Allows one to filter values based on time and analyze its effect on other variables *

  12. DOI Visualization • Visually represents DOIs associated with data points • On clicking a bubble, the metadata for that DOI is fetched and displayed *

  13. Bubble Chart • Visually represents the environment data associated with each sample • Bubble size corresponds to organism count *

  14. RDF Conversion The RDF conversion for MiMarkscompliant datasets involves two steps: • Construct an Ontology or use an existing one • Convert the dataset into a triple instance using CSV to RDF conversion tools csv2rdf4lod is an open source tool that can be used to easily convert the data in a CSV file into RDF encoded data *

  15. Spatio-temporal feature of MiMarks, VAMPS and CoDL datasets Some tools or visualizations that can be used to visualize the MiMarks, VAMPS and CoDL datasets are as follows: • Planetary.js, an open source tool will be effective in representing the spatial features in an interactive way • Motion charts that show the change over a period of time can be effective, by showing a change in the quantity represented as the size of the bubble in the motion chart • Calendar based representation of values if there is continuous data, is another option *

  16. Links to Visualizations • Timeline crossfiltering visualization:http://dco.tw.rpi.edu/viz/timeline/index.html • DOI visualization: http://dco.tw.rpi.edu/viz/doiVis/index.html • Scatterplot visualization for Dataset 1:http://dco.tw.rpi.edu/viz/scatterPlot/demo/demo.html • Bubble chart Visualization:http://dco.tw.rpi.edu/viz/Bubblechart/bubble_dataset2/index.html • Scatterplot visualization for Dataset 2:http://dco.tw.rpi.edu/viz/scatterplot_dataset2/demo/demo.html *

More Related