1 / 87

LPHIG Bioinformatics of SFS Genomics Center Program Projects

LPHIG Bioinformatics of SFS Genomics Center Program Projects. Project leader: Chun-Yuan Huang 1 Members : Charles Joseph Murphy 1 , Aurash Mohaimani 2 PIs: Peter J. Tonellato 1,2,3 , Rebecca Klaper 4 1. Zilber School of Public Health, University of Wisconsin at Milwaukee, Milwaukee, WI

sulwyn
Télécharger la présentation

LPHIG Bioinformatics of SFS Genomics Center Program Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LPHIG Bioinformatics of SFS Genomics Center Program Projects Project leader: Chun-Yuan Huang1 Members: Charles Joseph Murphy1, AurashMohaimani2 PIs: Peter J. Tonellato1,2,3, Rebecca Klaper4 1. Zilber School of Public Health, University of Wisconsin at Milwaukee, Milwaukee, WI 2. Medical Informatics Program, University of Wisconsin at Milwaukee, Milwaukee, WI 3. Center for Biomedical Informatics, Harvard Medical School, Boston, MA 4. Great Lakes Genomics Center, School of Freshwater Sciences, University of Wisconsin, Milwaukee, WI

  2. SFS Genomics Center Program Projects • Project 1: Biomarkers of Reproduction Staging of Sturgeon (Acipenser fulvescens) • Project 2: Daphnia Magna Gene Expression under Nanomaterials Exposure

  3. Background for conservation of sturgeon population • Sturgeon appeared in the fossil record 200 million years ago, and have undergone remarkably little morphological change, indicating their evolution has been exceptionally slow and earning them informal status as living fossils. • Sturgeon become prized lately for its meat, eggs (caviar) and oil. • Sturgeon was exceptionally vulnerable to overfishing, as restoration of its populations is complicated by its slow reproductive cycle. Sturgeon exhibits delayed sexual maturity (between 10 and 30 years of age), infrequent spawning (every few years), and sexual monomorphism.

  4. Background for sturgeon sex determination • Searches for sex-specific markers using DNA-based techniques such as RAPD and AFLP had been failed [1]. • Two sturgeon sex determining genes, dmrt1 (human homolog: doublesexand Mab-3 related transcription factor 1) and tra-1, were identified using next-generation 454 sequencing and de novo assembly of gonad transcriptomes [2]. • Sturgeons undergoing male differentiation express high levels (by qPCR analysis) of Sertoli cell factors (dmrt1, sox9) and of genes involved in the production and receptivity of androgens (cyp17a1, star and ar) together with lh [3]. [1] SaeedKeyvanshokooh and Ahmad Gharaei. A review of sex determination and searches for sex-specific markers in sturgeon. Aquaculture Research, 2010 Aug;41(9):e1–e7. [2] Hale MC, Jackson JR, Dewoody JA. Discovery and evaluation of candidate sex-determining genes and xenobiotics in the gonads of lake sturgeon (Acipenser fulvescens). Genetica. 2010 Jul;138(7):745-56. [3] Berbejillo J, et al. Expression and phylogeny of candidate genes for sex differentiation in a primitive fish species, the Siberian sturgeon, Acipenser baerii. MolReprod Dev. 2012 Aug;79(8):504-16.

  5. Background for sturgeon reproduction stage • Determination of reproduction stages would help to detect gonadal maturity for sturgeon reproduction and population conservation. • Reproduction stages are defined by DNR [1] as follows: • Stage 1 • Stage 2 • Stage 3 • Stage 4 [1] N.A.

  6. Biomarkers of Reproduction Staging of Sturgeon Primary goals: • Use RNA-seq to compare multiple sexual stages and determine uniquely expressed genes among each stage which could potentially be used as a biomarker for reproduction stage determination. • Use the above RNA-seq annotated gene information to identify proteins in our proteomics data obtained from sturgeon blood of various stages. • Use the above RNA-seq annotated gene information to examine the evolutionary questions regarding how the genome of sturgeon is relates to other more recently evolved fish species.

  7. Sturgeon • Gene expression biomarkers for sexual stages RNA-Seq • Annotate proteomics data from blood samples • Phylogenic analysis

  8. Preliminary Considerations • No reference genome is available for lake sturgeon. • Ongoing and finished fish genomes: • Pufferfish (Tetraodonnigroviridis) • Fugu (Japanese Pufferfish) • Stickleback (Gasterosteusaculeatus) • Coelocanth(Indonesia), Coelocanth (South African) • Tilapia (family Cichlidae) Genome Project that includes Nile Tilapia (Oreochromisniloticus), Astatotilapiaburtoni, Pundamilianyererei, Malawi zebra, Neolamprologusbrichardi • Zebrafish • Salmon • Catfish • Medakaricefish • Lamprey (Lampetrafluviatilis) • Dogfish (Scyliorhinuscanicula) • Southern platyfish (Xiphophorusmaculatus) • Poeciliidfish (Xiphophorusmaculatus) • Spotted gar (Lepisosteusoculatus)

  9. Preliminary Considerations (cont.) • Sturgeon is one of the oldest families of bony fish (ray-finned fish, class of Actinopterigii) in existence, and is quite distant from other fishes that have been sequenced. • Based on Near [1], Sturgeon seems closer to Gar than to the other fishes. • A Spotted Gar’s transcriptome (by RNA-Seq) is constructed by Amores [2] and available from DDBJ [3], while its draft genome assembly is available from Broad Institute [4]. • ,zebrafish [1] Near et al. Resolution of ray-finned fish phylogeny and timing of diversification. ProcNatlAcadSci U S A. 2012 Aug 21;109(34):13698-703. [2] Amores et al, Genome evolution and meiotic maps by massively parallel DNA sequencing: spotted gar, an outgroup for the teleost genome duplication. Genetics. 2011 Aug;188(4):799-808. [3] https://trace.ddbj.nig.ac.jp/DRASearch/submission?acc=SRA026509 [4] ftp://ftp.broadinstitute.org/pub/assemblies/fish/spottedGar/

  10. Slide adapted from “Leveraging Trinity for de novo transcriptome assembly and analysis, 2012 CSHL workshop, Brian Haas, Broad Institute

  11. Overview of the de novo transcriptome assembly strategy Martin JA, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet. 2011 Sep 7;12(10):671-82.

  12. Survey on de novo transcriptome assembly methods Martin JA, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet. 2011 Sep 7;12(10):671-82.

  13. Sample Description • Sturgeon liver samples from three biological replicates of female reproduction stage 1, 2, and male reproduction stages stage 1, 2. • 100-bp paired-end reads from Illumina HiSeq2500. • 915,602,572 total reads (457,801,286 paired reads) • The Purdue Genomics Center has done the de novo Transcriptome assembly using the Trinity method. • The reads have been further mapped to contigs (coordinate) via Bowtie, sorted by coordinate and resulted in BAM-format files for each sample. • Also contigs in each sample are counted for all mapped reads, along with the contig’slength (bp), the homolog search result of each contig blasted against NCBI, the homolog’s GO terms and GenbankIDs.http://www.genomics.purdue.edu/%7Ecore/projects/Klaper/

  14. Sample Description Note: F1L: female stage 1, F2: female stage 2, M1: male stage 1, M2: male stage 2. Samples are all prepared from sturgeon liver (L).

  15. Trinity De novo reconstruction of transcriptomes Overview (Purdue Genome Center)

  16. Example of Data FastQC • Data is in high quality. Unfiltered reads:

  17. Slide adapted from “Leveraging Trinity for de novo transcriptome assembly and analysis, 2012 CSHL workshop, Brian Haas, Broad Institute

  18. Analysis Plan A (the fast-track version) • Use contigs, counts and homolog (blast result of the contigs) available from the Purdue Genomics Center • Differential expression and biomarker discovery • Estimate contigs abundance using bowtie (done in Purdue U.) • Statistical analysis for significantly differential expressed (DE) contigs using EdgeR: • Identify DE contigs/biomarkers among reproduction stages (F1 vs. F2; M1 vs. M2) • Identify DE contigs/biomarkers among gender (F1 vs. M1; F2 vs. M2) • Functional annotation by Trinotate (a module in the Trinity package). • Pathway analysis by Ingenuity Pathway Analysis (IPA). • GO term enrichment analysis by the Database for Annotation, Visualization and Integrated Discovery (DAVID) • Gene Set Enrichment Analysis (GSEA) • Identify proteins in the proteomics data obtained from sturgeon blood of various stages. • Comparative genomic study – evolutionary aspect of Sturgeon genome as relates to other fish species

  19. Sturgeon F1 vs. F2 DE contigs (isoforms) analysis by bowtie-edgeR Red: FDR<0.05, total 309 contigs

  20. Sturgeon M1 vs. M2 DE contigs (isoforms) analysis by bowtie-edgeR Red: FDR<0.05, total 229 contigs

  21. Sturgeon F1 vs. M1 DE contigs (isoforms) analysis by bowtie-edgeR Red: FDR<0.05, total 616 contigs

  22. Sturgeon F2 vs. M2 DE contigs (isoforms) analysis by bowtie-edgeR Red: FDR<0.05, total 1543 contigs

  23. Analysis Plan A (the fast-track version) • Use contigs, counts and homolog (blast result of the contigs) available from the Purdue Genomics Center • Differential expression and biomarker discovery • Estimate contigs abundance using RSEM • Statistical analysis for significantly differential expressed (DE) contigs using EdgeR: • Identify DE contigs/biomarkers among reproduction stages (F1 vs. F2; M1 vs. M2) • Identify DE contigs/biomarkers among gender (F1 vs. M1; F2 vs. M2) • Functional annotation by Trinotate (a module in the Trinity package). • Pathway analysis by Ingenuity Pathway Analysis (IPA). • GO term enrichment analysis by the Database for Annotation, Visualization and Integrated Discovery (DAVID) • Gene Set Enrichment Analysis (GSEA) • Identify proteins in the proteomics data obtained from sturgeon blood of various stages. • Comparative genomic study – evolutionary aspect of Sturgeon genome as relates to other fish species

  24. RNA-Seq by Expectation Maximization (RSEM) RSEM (bowtie is used internally in the mapping step; during the EM step, one read will be assigned to one isoform and counted only once according to the EM model) bowtie; one read can be mapped to several isoforms and be counted for several times

  25. A case of bowtie-edgeR on Sturgeon DE isoforms: comp221487_c2

  26. A case of RSEM-edgeR on Sturgeon DE isoforms: comp221487_c2

  27. Summary of RSEM-edgeR vs. bowtie-edgeR In bowtie, one read can be mapped to several isoforms and be counted for several times. In RSEM, reads are mapped internally with bowtie first, then each read is assigned to one isoform and counted only once according to the EM model. In the case of Sturgeon DE isoforms comp221487_c2, six isoforms are called deferentially expressed using bowtie-edgeR. However, only one isoform is called deferentially expressed using RSEM-edgeR.

  28. Trinity transcript naming convention [1] The Trinity transcript are names like: compX_cY_seqZ X defines the de Bruijn graphical component generated by Chrysalis (from clustering Inchworm contigs). Butterfly tease subgraphs apart from each other within a single component, based on the read support data, and gives rise to subgraphs (cY) for each connected component. Each subgraph then gives rise to path sequences (seqZ). Trinity does not reason in terms of genes, loci and alternative splicing events. It solves a de Bruijngraph problem, though of course the heuristics are tuned to the needs of biology. So while it's highly likely that all the isoforms of a given gene belong to the same subcomponent, you shouldn't assume that a subcomponent is a gene. [1] Trinity: Frequently Asked Questions and Topics http://trinityrnaseq.sourceforge.net/trinity_faq.html

  29. Sturgeon F1 vs. F2 DE contigs (isoforms) analysis by RSEM-edgeR Red: FDR<0.05, total 27 contigs

  30. Sturgeon F1 vs. F2: 27 DE contigs (isoforms) sorted by logFoldChange (total 27 contigs with FDR < 0.05)

  31. Sturgeon M1 vs. M2 DE contigs (isoforms) analysis by RSEM-edgeR Red: FDR<0.05, total 4 contigs

  32. Sturgeon M1 vs. M2: 4 DE contigs (isoforms) (total 4 contigs with FDR < 0.05)

  33. Sturgeon F1 vs. M1 DE contigs (isoforms) analysis by RSEM-edgeR Red: FDR<0.05, total 0 contigs

  34. Sturgeon F2 vs. M2 DE contigs (isoforms) analysis by RSEM-edgeR Red: FDR<0.05, total 70 contigs

  35. Sturgeon F2 vs. M2: 70 DE contigs (isoforms) sorted by logFoldChange (total 70 contigs with FDR < 0.05)

  36. Sturgeon F2 vs. M2: 70 DE contigs (isoforms) sorted by logFoldChange (total 70 contigs with FDR < 0.05) (continue ..)

  37. Sturgeon F1 vs. F2 DE components (genes) analysis by RSEM-edgeR Red: FDR<0.05, total 5 components

  38. Sturgeon F1 vs. F2: 5 DE components (genes) (total 5 components with FDR < 0.05)

  39. Sturgeon M1 vs. M2 DE components (genes) analysis by RSEM-edgeR Red: FDR<0.05, total 7 components

  40. Sturgeon M1 vs. M2: 7 DE components (genes) (total 7 components with FDR < 0.05)

  41. Sturgeon F1 vs. M1 DE components (genes) analysis by RSEM-edgeR Red: FDR<0.05, total 10 components

  42. Sturgeon F1 vs. M1: 10 DE components (genes) (total 10 components with FDR < 0.05)

  43. Sturgeon F2 vs. M2 DE components (genes) analysis by RSEM-edgeR Red: FDR<0.05, total 49 components

  44. Sturgeon F2 vs. M2: 49 DE components (genes) sorted by logFoldChange (total 49 components with FDR < 0.05)

  45. Sturgeon F2 vs. M2: 49 DE components (genes) sorted by logFoldChange (total 49 components with FDR < 0.05) (continue ..)

  46. Analysis Plan A (the fast-track version) • Use contigs, counts and homolog (blast result of the contigs) available from the Purdue Genomics Center • Differential expression and biomarker discovery • Estimate contigs abundance using RSEM • Statistical analysis for significantly differential expressed (DE) contigs using EdgeR: • Identify DE contigs/biomarkers among reproduction stages (F1 vs. F2; M1 vs. M2) • Identify DE contigs/biomarkers among gender (F1 vs. M1; F2 vs. M2) • Functional annotation by Trinotate (a module in the Trinity package). • Pathway analysis by Ingenuity Pathway Analysis (IPA). • GO term enrichment analysis by the Database for Annotation, Visualization and Integrated Discovery (DAVID) • Gene Set Enrichment Analysis (GSEA) • Identify proteins in the proteomics data obtained from sturgeon blood of various stages. • Comparative genomic study – evolutionary aspect of Sturgeon genome as relates to other fish species

  47. Functional annotation by Trinotate • Trinotateis a comprehensive annotation suite designed for automatic functional annotation of de novo Transcriptome assemblies created using the Trinity assembly program. • Trinotate makes use of a number of different well referenced methods for functional annotation including • Search/generate the most likely longest-ORF peptide candidates from the contigs of the Trinity Assembly (Transdecoder) • Homology search to known sequence data (NCBI-BLASTP), • Protein domain identification (HMMER/PFAM), • Protein signal prediction (singalP/tmHMM), and • Comparison to currently currated annotation databases (EMBL UniproteggNOG/GO Pathways databases).

  48. Functional annotation by Trinotate • Trinity de novo assembled isoforms (446,408) are subject to Trinotate analysis. • More than one peptide could be resulted from blastp of one contig. • Trinotate annotation of 446,408 contigs become 478,700 peptide records.

  49. Example of two records for one contig (comp221487_c2_seq3)

  50. Validation of contigcomp221487_c2_seq3 blastp result Consistent with Pfam result Consistent with Pfam result

More Related