BIOL6900 Bioinformatics Chapter 8 Bioinformatics approaches to ribonucleic acid

BIOL6900 Bioinformatics Chapter 8 Bioinformatics approaches to ribonucleic acid

Outline of upcoming lectures The first part of the course covered sequence analysis, including BLAST (Chapters 1-7). We begin the next part of the course: functional genomics (Chapters 8-12). We will study how DNA is transcribed to RNA (i.e. gene expression), and we will discuss microarrays. Then we will study proteins. We will conclude with a survey of genomes (Ch. 13-20).

2e Fig. 8.1

Types of RNAs • tRNA, rRNA - together 95% of total RNA • mRNA, • Other non-coding RNA: small nuclear RNA (snRNA); small nucleolar RNA (snoRNA); microRNA (~22 nt); short interfering RNA (siRNA)

http://rfam.sanger.ac.uk/ The Rfam family includes alignments and descriptions of RNA families Rfam 2e Fig. 8.3

Summary of non-coding RNA families in Rfam database that are assigned to the long arm of human chromosome 21. 2e Fig. 8.4

Figure 8.5 Identification of tRNAs using tRNAscan-SE server 2e Fig. 8.5

2e Fig. 8.5

Vienna RNA package 2e Fig. 8.6

Figure 8.7 Structure of a eukaryotic ribosomal DNA repeat unit 2e Fig. 8.7

2e Fig. 8.8

2e Fig. 8.9

Gene expression is regulated in several basic ways • • by region (e.g. brain versus kidney) • • in development (e.g. fetal versus adult tissue) • • in dynamicresponse to environmental signals • (e.g. immediate-early response genes) • in disease states • by gene activity Page 157

Organism Gene expression changes measured... virus bacteria fungi invertebrates rodents human In response to stimuli Cell types In virus, bacteria, and/or host In mutant or wildtype cells Disease Development Fig. 6.1 Page 158

DNA RNA protein phenotype cDNA Page 159

protein protein DNA RNA DNA RNA cDNA cDNA UniGene SAGE (Serial Analysis of Gene Expression) microarray Fig. 6.2 Page 159

DNA RNA protein phenotype cDNA [1] Transcription [2] RNA processing (splicing) [3] RNA export [4] RNA surveillance Page 160

5’ 3’ exon 1 intron exon 2 intron exon 3 3’ 5’ transcription 5’ 3’ RNA splicing (remove introns) 3’ 5’ polyadenylation 5’ AAAAA 3’ Export to cytoplasm Fig. 6.3 Page 161

2e Fig. 8.10

Relationship of mRNA to genomic DNA for RBP4 ~2e Fig. 8.11

2e Fig. 8.11

2e Fig. 8.12

exon 3 exon 2 exon 1 2e Fig. 8.12

intron intron exon 3 exon 3 exon 2 exon 1 exon 1 query 1: genomic contig NT_037887, nucleotides 162875-163708 query 2: cDNA NM_000517 2e Fig. 8.12

Analysis of gene expression in cDNA libraries • A fundamental approach to studying gene expression • is through cDNA libraries. • Isolate RNA (always from a specific • organism, region, and time point) • Convert RNA to complementary DNA • Subclone into a vector • Sequence the cDNA inserts. • These are expressed sequence tags • (ESTs) insert vector 2e ~Fig. 8.13

UniGene: unique genes via ESTs • • Find UniGene at NCBI: from the home page click All databases (on the top bar) then UniGene, or go to: • www.ncbi.nlm.nih.gov/UniGene • UniGene clusters contain many ESTs • • UniGene data come from many cDNA libraries. • Thus, when you look up a gene in UniGene you get information on its abundance and its regional distribution. Page 164

Cluster sizes in UniGene This is a gene with 1 EST associated; the cluster size is 1 Page 164 & Fig. 2.3, Page 23

Cluster sizes in UniGene This is a gene with 10 ESTs associated; the cluster size is 10 Page 164

Cluster sizes in UniGene (human) Cluster size (ESTs)Number of clusters 1 42,800 2 6,500 3-4 6,500 5-8 5,400 9-16 4,100 17-32 3,300 500-1000 2,128 2000-4000 233 8000-16,000 21 16,000-30,000 8 UniGene build 194, 8/06

Ten largest human UniGene clusters Cluster size Gene 22,925 eukary. translation EF (Hs. 522463) 22,320 eukary. translation EF (Hs. 4395522) 16,562 actin, gamma 1 (Hs.514581) 16,309 GAPDH (Hs.169476) 16,231 actin, beta (Hs.520640) 11,076 ribosomal prot. L3 (Hs.119598) 10,517 dehydrin (Hs.524390) 10,087 enolase 1 (alpha)(Hs.517145) 9,973 ferritin (Hs.433670) 8,966 metastasis associated (Hs.187199) Table 6.2 Page 165 UniGene build 186, 9/05

Why ribosomal transcripts are abundant in UniGene The major types of RNA are: ribosomal RNA rRNA (~85%) transfer RNA tRNA (~15%) messenger RNA mRNA (~3%) noncoding RNA ncRNA (<1%) small nuclear RNA snRNA small nucleolar RNA snoRNA small interfering RNA siRNA

UniGene clusters are often “similar to” a known gene There are three distinctions of similarity in UniGene: 1. "Highly similar to" means >90% in the aligned region. 2. "Moderately Similar to" means 70-90% similar in the aligned region. 3. "Weakly similar to" means <70% similar in the aligned region. Page 164

UniGene includes 74 species (as of Aug. 2006), all with many ESTs available. Recent entries include: Currently: ~130 species Species Canis familiaris (dog) Helianthus annuus (sunflower) Salmo salar (Atlantic salmon) Bombyx mori (domestic silkworm) Apis mellifera (honey bee) Lotus corniculatus (Birdsfoot trefoil) Physcomitrella patens (physco. moss) Lactuca sativa (garden lettuce) Malus x domestica (Apple) Hydra magnipapillata Populus tremula x Populus tremuloides (aspen) Ovis aries (sheep)

The significance of UniGene’s continued growth Identifying protein-coding genes in genomic DNA remains a tremendous challenge. Genes can be predicted “ab initio” (by analyzing genomic DNA for the features of start and stop sites, exons/intron structures, regulatory regions etc.). When EST data are coupled with gene prediction, the accuracy soars. Thus all ongoing genome sequencing projects include a major component of large-scale EST sequencing. Typically, this is done at different developmental stages (e.g. embryo versus adult), regions (e.g. brain versus gut), and physiological states (e.g. mosquitoes having fed on blood versus sucrose). EST data are deposited in UniGene. (dbEST)

Digital Differential Display (DDD) in UniGene • UniGene clusters contain many ESTs • UniGene data come from many cDNA libraries • Libraries can be compared electronically Page 165

Fig. 6.6 Page 166

UniGene brain libraries

UniGene lung libraries

Fig. 6.7 Page 167

CamKII up-regulated in brain n-sec1 up-regulated in brain surfactant up- regulated in lung Page 167

fraction of sequences within the pool that mapped to the cluster shown

DDD at UniGene Question: are there individual RNA transcripts that are differentially present in a comparison of EST libraries? Approach to estimating statistical significance: Fisher’s exact test. Pages 165

2e Fig. 8.14

DDD at UniGene • Fisher’s exact test is a nonparametric method. • It does not assume a normal distribution of the observations • It is easy to calculate • It often has less statistical power than parametric tests (such as a t-test) • For nonparametric methods, observations are typically arranged in an array with ranks assigned from 1 to n.

BIOL6900 Bioinformatics Chapter 8 Bioinformatics approaches to ribonucleic acid

BIOL6900 Bioinformatics Chapter 8 Bioinformatics approaches to ribonucleic acid

Presentation Transcript

Bioinformatics

Chapter 7 Bioinformatics

RIBONUCLEIC ACID RNA

Bioinformatics approaches for…

Bioinformatics

Bioinformatics

Bioinformatics

Ribonucleic Acid

Bioinformatics

Ribonucleic Acid (RNA)

RNA Ribonucleic Acid

Bioinformatics

Bioinformatics

Bioinformatics

RNA = RiboNucleic Acid

Bioinformatics

Ribonucleic Acid

RiboNucleic Acid -RNA