Comprehensive Junction Array Design for Alternative Splicing Analysis in Human and Mouse Genomes
This research presents a novel junction array design aimed at investigating alternative splicing events in human and mouse genomes. By utilizing exon-exon junction probes, this design allows for comprehensive measurement of exon skipping events and reciprocal analyses. The methodology includes an unbiased approach, enhanced coverage of the genome, and the capability to monitor small exons. We developed the arrays based on extensive data from sources like Ensembl and RefSeq, leading to over 30,000 genes incorporated into our design. The study emphasizes the importance of accurate splicing detection in understanding transcript diversity.
Comprehensive Junction Array Design for Alternative Splicing Analysis in Human and Mouse Genomes
E N D
Presentation Transcript
HJAYHuman (Research) Junction Array EURASNET, Cambridge Sept. 14th, 2007 Tyson A. Clark
Junction Arrays vs. Exon Arrays Exon Junction • Direct Measure of skipping events • Reciprocal Analysis • Multiple independent measurements of a single event • Information on how exons are joined • Ability to monitor small exons • Increased genome coverage • “Discovery” of Alt. Splicing • Probe selection flexibility • Comprehensive / Unbiased Design Pros • Observed events only (no discovery) • Requires lots of probes • Limited flexibility • Half-Hyb • Difficult to predict joining events without empirical evidence • Cannot distinguish some isoforms • Fewer probes per splicing event • No joining information Cons
Why use Exon-Exon Junction probes? • Alt. spliced Exons (A & B) present in 50% of transcripts
Why use Exon-Exon Junction probes? • Alt. spliced Exons (A & B) present in 50% of transcripts 1
Why use Exon-Exon Junction probes? • Alt. spliced Exons (A & B) present in 50% of transcripts 1 - or - 2
Why use Exon-Exon Junction probes? • Alt. spliced Exons (A & B) present in 50% of transcripts 1 Cannot distinguish between situation 1 & 2 (using only exon representation) ? ? - or - 2
Advantages of Junction Probes • Information on how exons are joined together • Exon skipping events are measured directly • rather than just a decrease in exon signal • allows for reciprocal change analyses • Exon-Exon junctions are non-genomic sequence • not present on a genome tiling array • Ability to monitor small exons and distinguish alternative splice sites that are very close
New Research Junction Array DesignGenome-wide “Observed” Junctions • Using content from ExonWalk (C. Sugnet), Ensembl, and RefSeq, we have designed an array that will include: • Exon Probes 8 – 12 PM probes per exon • Exon – Exon Junction Probes 8 probes per junction (-4 to +4) • >30,000 Human & Mouse Genes • Human and Mouse designs will be manufactured onto separate chips
Design Input • Human Input files (10,063,211 input exons) • (NCBI 36, March 2006 Genome Assembly) • RefSeq (hNCBI36) • Ensembl (38) • ExonWalk (hNCBI36_exonwalkall) • Mouse Input files (3,963,343 input exons) • (MM7, August 2005 Genome Assembly) • RefSeq (mm7) • Ensembl (38) • ExonWalk (mm7_exonwalkall)
Exon Walk (developed by Chuck Sugnet) • The ExonWalk program merges cDNA evidence together to predict full length isoforms, including alternative transcripts. • ESTs Filtered • Present in cDNA libraries of another organism (i.e. also present in mouse) • Or have three separate cDNA GenBank entries supporting it.
Junction Design Strategy(Note reverse strand and non-overlapping transcripts areseparated into unique Transcript Clusters)
Human Design Run #2 (1 week 1 day 41 minutes and 55 seconds elapsed) • Transcript Clusters 35,123 • with junctions 24,753 • Transcripts 335,663 • Junctions (Obs) 260,488 • Exons 360,569 • Exon Clusters 249,240 • PSRs 315,137
Mouse Design (3 days 17 hrs 34 minutes and 1 seconds elapsed) • Transcript Clusters 30,833 • with junctions 25,431 • Transcripts 145,993 • Junctions (Obs) 237,871 • Exons 319,769 • Exon Clusters 239,114 • PSRs 282,186
One 49 Format 5 Micron Mask SetSplit Between Designs(6,553,600 features)
Analysis Approach • Used the Splicing Index Algorithm treating each probeset (Exon or Junction) as independent • P-Value cutoff <0.001 • Magnitude of Change > |0.5| (log2 ratio ~1.4) • 1570 Probesets in total passed those cutoffs • Looked for Splicing Events that had more than 1 significant probeset • 252 genes with multiple probesets on the list • ~75% looked like real AS event
CLASP1 Event #1
CLASP1 Event #2
221 Exons Total MED or higher confidence • By Confidence Level • 26 HIGHEST • All probesets from that splicing event made the list • 89 HIGH • Reciprocal Junctions from the splicing event made the list • 53 MED HIGH • Multiple Junctions from the splicing event made the list • 53 MEDIUM
By Splicing Event • 179 Cassette Exons • 19 Mutually Exclusive Cassettes • 27 Multiple (consecutive) Cassettes • 32 Alternative 3’ Terminal Exons • 5 Alt. 5’ss • 4 Alt. 3’ss
Histogram of Cassette Exon Size 16 Exons < 25 bp