Visualization of Genomic Data: A Survey on Genome Browsers Like UCSC and Ensembl
This survey explores the functionality and features of popular genome browsers such as UCSC Genome Browser and Ensembl. It covers essential functionalities like gene searching by name or sequence, understanding gene structure, identifying orthologues across species, and analyzing SNPs (Single Nucleotide Polymorphisms). The advantages of graphic displays over flat files for data visualization are also discussed, emphasizing the compactness and ease of data access. Moreover, it highlights custom track uploads and tools like BLAT for efficient sequence alignment.
Visualization of Genomic Data: A Survey on Genome Browsers Like UCSC and Ensembl
E N D
Presentation Transcript
Genome browsers Visualization of genomic data
Survey • UCSC browser • Ensembl browser • Others ?
UCSC genome browserBasic functionalities used in exercise • Finding a gene • by name • by sequence • Gene structure • Orthologues – i.e. functional homolog in other organisms • SNP’s - Single Nucleotide Polymorphisms • Several other functionalities • Gene Sorter - sort according to expression, homology, in situ images of genes in different tissues • Custom tracks – upload your own data
Genome browsers Visualization of genomic data
Genome browsersVisualization of a gene Flat files / tab files >chr5:123.004.678-125.345.112 ATGAAGTTATGGGATGTCGTGGCTGTCTGCCTGGTGCTGCTCCACACCGC GTCCGCCTTCCCGCTGCCCGCCGGTAAGAGGCCTCCCGAGGCGCCCGCCG AAGACCGCTCCCTCGGCCGCCGCCGCGCGCCCTTCGCGCTGAGCAGTGAC TGTAAGAACCGTTCCCTCCCCGCGGGGGGGCCGCCGGCGGACCCCCTCGC ACCCCCACCCGCAGCCAGCCCCGCACGTACCCCAAGCCAGCCTGATGGCT GTGTGGCCTACCGACCCGTGGGCAAGGGGTGCGGGTGCTGAAGCCCCCAG GGGTGCCTGGCTGCCCACTGCTGCCCGCACGCCTGGCCTGAAAGTGACAC GCGCTGGTTTGCCCAGCACAGAGGGGATGGAATTTTTATGCTGCTCCTTT AGCATTCTGATGAACAAATATCCTCCCCACCAGCACCACCACCTCAGTAA Exon Intron Exon Chr5 123.004.678 123.404.678 124.987.012 125.345.112 Open Reading Frame (ORF) – from start to stop codon
Genome browsersWhy graphic Display ? • Why is a graphic display better than Flat files / tab files • A graphic display is compact • Meta data available i.e. Support information about a gene • Experimental evidence like EST • Predicted gene structures • SNP information • Links to many databases • In short much data about a gene is gathered is one place • and can be viewed easily.
Genome browsersVisualization of a gene (UCSC) Exon Intron UTR
Genome browsers • UCSC genome browser • http://genome.ucsc.edu/ • Easy to use • Often updates, but not as often as Ensembl • upload of personal tracks • Ensembl browser • http://www.ensembl.org/index.html • Less easy to use • Maintained/updated by several people • Gbrowser • http://www.gmod.org/GBrowse
Splice sites Exon Intron Exon BLATBlast Like Alignment Tool • BLAT (2002) • Very fast searches (MySQL database) • Handle introns in RNA/DNA alignments • Data for more that 30 genomes (human, mouse, rat…)
BLAT genome Browser http://genome.ucsc.edu//
BLAT genome Browser Using a search term or position eg Chr1:10,234-11,567
BLAT genome Browser http://genome.ucsc.edu/
BLAT genome Browser Using a protein or DNA sequence
BLAT genome Browser”Details” Correct splice site ?
Logo PlotInformation Content • IC = -H(p) + log2(4) • = a palog2pa + 2 • The Information content is calculated from a multiple sequence alignment. • Result is a graphical visualization of sequence conservation where: • Total height at a position is the Information Content • Height of single letter is proportional to the frequency of that letter • Mutiple alignment of 3 protein sequences: • Seq1: A L R K P Q R T • Seq2: A V R H I L L I • Seq3: A I K V H N N T • Pos1: I = -[1*log2(1)]+ 4.32 = log2(20) = 4.32 • Pos2: I = -[1/3*log2(1/3)+ 1/3*log2(1/3)+ 1/3*log2(1/3)] + 4.32 = 2.73 • Pos3: I = -[2/3*log2(2/3)+ 1/3*log2(1/3) + 4.32 = 3.38
Logo Plot Exon
BLAT genome Browser”Details” Correct splice site ?
BLAT genome Browser”Details” Donor site | Acceptor site exon... . G | GT ...intron ...AG | exon...
BLAT genome Browser”Browser” Base, Center & Zoom Known genes Predictions RNA EST Expression Conservation
BLAT genome BrowserCenter & zoom Selected number of tracks Forward/reverse direction
Single Nucleotide PolymorphismSNP • SNPs can be located anywere in the genome • non synomous (nsSNP) i.e. amino acid is changed (shown below ) • Synomous SNP does not affect the the protein T V I P An amino acid is coded by 3 nucleotides Valine (V): GTC Humans are diploid: cells have 2 homologous copies of each chromosome i.e. 2*23 chromosomes. Haploid cells only 23 chromosomes (sex-cells)
Diploid organism - most mammals An example of two homologous copies of ex chromosome 9 within a cell A chromosome from mother A chromosome from father If the red strand is the plus-strand: C;T (or T;C but we write it alphabetical) If the green strand is the minus strand: G;A but we write it as G;A
Exercise Basic understanding of the graphics Effect of Single Nucleotide Polymorphisms (SNPs) Finding Orthologue genes Identify chromosomal locus for a gene