290 likes | 304 Vues
This research focuses on finding the shortest superstring for the human genome, using the greedy algorithm. Results show that the greedy algorithm yields a superstring that is never more than twice the length of the shortest superstring.
E N D
SUMMER RESEARCH: THE SUPERSTRING PROBLEM Charles Mullins DIMACS Biomaths Conference April 30, 2005
THE SUPERSTRING PROBLEM • Human genome consists of billions of bases: A,C,G,T • Current technology can only sequence “short” strings from 500-1000 bases • Genome is cut into smaller strings that are sequenced • How to recover the original superstring
A SUPERSTRING CONTAINS ALL THE ORIGINAL STRINGS • Occam’s razor • Nature is efficient • LOOK FOR SHORTEST SUPERSTRING SS! • Greedy Algorithm: proceed pairwise to get maximal overlap at each “stage” • Greedy doesn’t always give SS
HOW GOOD IS “GREEDY?” • Early results proved resulting SS was never worse than 3 times as long • This factor was slowly reduced by others • Our mentor Elizabeth “Z” Sweedyk obtained a factor of 2.5
EXAMPLE OF GREEDY • XABAB ABABY BABA • FIRST, SECOND: ABAB • FIRST, THIRD: BAB • SECOND,THIRD: ABA • REPLACE FIRST PAIR WITH XABABY • XABABY,BABA YIELD XABABYBABA • SS IS XABABABY
Our research considered strings consisting of m zeros followed by n ones followed by p zeros: 01100 000111100 etc Key result: Greedy gives SS
CONJECTURE In general, “Greedy” will never produce a result more than twice the length of a shortest superstring
TEACHING RESEARCH METHODS AT ASMSA Charles Mullins Arkansas School for Mathematics, Science and the Arts Hot Springs AR 71910 Mullinsc@asmsa.org
Topics • Research Through Technology • Junior FIRM • Senior FIRM
RTT • Required course for all entering juniors • Fall semester • Objectives in: • Technology • Science • Math • Writing
Technology objectives • Learn to use: • TI calculator • GraphLink & TI-Interactive • Office • E-mail, Web, HTML • Turnitin.com
Math Objectives: • Get introduced to : • Regressions and data modeling • Probability • Descriptive statistics • Inferential statistics
Structure • Introductory lessons & activities • Four mini projects • The Ideal Weight • The Dubl Stuf Dilemma • Pop Off • M & Ms
Science Objectives • Learn: • How to design & do experiments • How to present & model data
Writing objectives • Learn: • Our lab report format & style • How to paraphrase & cite • How to integrate data, graphs, equations, etc.
Text • http://165.29.91.7/math/Rizzle/Final.pdf • PDF-formatted copy of the text we wrote for RTT
Scheduling • All our classes meet 3 times per week • Monday all 7 classes for 55 mins • Tuesday periods 1 - 4 for 75 mins • Wed. periods 5 - 7 for 75 mins. • Thur & Fri are repeats but for 90 mins.
Scheduling • Gives us Tues. & Wed. afternoon w/o classes • Tuesday for Junior FIRM • Wednesday for senior FIRM • 2 hour blocks to work with our students on their projects
Junior FIRM • Prelude during November • Faculty post database of problem statements and interest areas • Students review database • Choose faculty ideas they like • Formulate their own that overlaps w/ faculty interest
Project matching • Students interview w/ chosen faculty to: • Compete for a faculty-chosen problem • Sell their idea to a mentor • Goals: • Match each junior w/ mentor by end of Jan. • Distribute juniors, 5 per teacher
Assignments • Be ready to start experiment on 1 June • Formulate problem statement & hypothesis (design goal) • Collect sources & start bibliography • Study background science • Start thinking about required materials • Plan experimental techniques
Assignments • Critique seniors project displays and oral presentations • Present their planned experiment to a panel of faculty & seniors
Summer work • Ideally they should start their experiment if possible • Minimum requirement is to be ready to start in August
Senior FIRM • More of the same • Continue to study background • Refine method • Collect data, obtain results, & draw a conclusion • Early Dec. deadline for preliminary results
Cooperation • All writing assignments submitted to mentor and in composition class • Graded by differing criteria: • Mentor looks for quality science • Comp. teacher looks at writing • Math teachers help w/ statistics
End products • Science paper • Project display for science fair • Oral presentation Junior Academy of Science
Benefits • Students leave school: • with lab skills • knowing how to write lab reports • Knowing how to present results • Students do well in state and international science fairs
Science fair • We have enough students to have our own ISEF-affiliated regional fair • Must have 50 students • $500 affiliation fee • Must send at least one finalist and adult to International fair.
ACKNOWLEDGEMENTS • The presentation on implementing research at ASMSA was first given at the NCSSSMST Expedition 2005 conference in St. Louis, March 9-12, 2005, by my colleagues, Dr. Brian Monson, Dept of Science Chair, and Bruce Turkal, Dept of Mathematics