Genome Revolution & BLAST: Understanding DNA Beyond the Basics

The cartoon above states the early description of DNA and the double helix in 1953. It is most alarming and interesting to see how in 54 years we have come so far as to understand such complex algorithms as BLAST that help us to know much more beyond the basic structure of DNA. BLAST today and its implications for the future. Jeff Shen, Morgan Kearse, Jeff Shi, Yang Ding, & Owen Astrachan Genome Revolution Focus 2007, Duke University, Durham, North Carolina 27708 • APT • Problem Statement • You are writing code to find which of several DNA strands in a given DNA library have similarities to a given query strand…. • (Full problem statement of APT available at http://www.cs.duke.edu/csed/algoprobs/4g07blast1.html) • …Return an array of library strands each of which contains the query strand. • Class • public class Blast • { • public String[] findAll(String query, String[] library){ • // fill in code here • } • } • Constraints • query contains at most 30 characters. • Each string in library contains at most 50 characters. • There are at most 50 Strings in library. • Example • query: "ATCG" • library= { "ATC", "TATC, "ATCATC", "GATCATC", "ATCGATG", "GATATCG" } • Returns: {"ATCGATG", "GATATCG"} The other strands in library do not contain "ATCG", only the last two strands of library contain "ATCG". Introduction BLAST (Basic Local Alignment Search Tool) is a widely used bioinformatics search tool used to compare different DNA samples for their similarities. Researchers can use this search tool to compare their own DNA samples to all the DNA and protein sequences in various genebanks and libraries.BLAST takes a heuristic approach to compare the different sequences, which dramatically increases the speed of searches. The program scans at approximately 2 x 106 bases/s. The increase in speed has made a lasting impact in the fields of bioinformatics and computer science. In the past, searches that would have taken days to finish now can be done in mereseconds. Figure 4. The table below shows an example of a scoring table/matrix that the BLAST algorithm might use to store the comparisons between certain segments. Figure 1. This image shows how target sequences are matched up during the BLAST process for comparison. The lines between the sequences represent matches in those segments. Method User inputs a target query sequence (a so-called w-mer of length w) into BLAST that is to be compared User also can specify a value W for which matches under W will be ignored, based on the user’s preference for accuracy and speed 3 Phases of BLAST search 1st Phase: The w-mer is then searched against a sequence database of billions of base pairs (length W or higher) that have been previously organized and find exact matches. 2nd Phase: These matching sequences are then extended from both sides and any further matches with the target sequence are tallied up – in this phase, insertions/deletions are ignored in terms of score. 3rd Phase: High-scoring alignments from these processes are compared and an optimized measurement of similarity and other statistics are returned to the user in this final algorithmic phase. In this process segments of all possible lengths are compared. Application Because of its speed, BLAST has become a very popular bioinformatics search tool. BLAST has been cited by over twenty thousand scientific journals whose authors have used BLAST to compare different DNA sequences or whole genomes for similarities.For example, researchers in the Cold Spring Harbor Lab used an enhanced version of BLAST, BLATZ, to find the similarity between the human genome and the mouse genome.Using BLATZ, they concluded that 39.154% of the human sequence aligned to mouse sequence. Also, other organisms such as drosphilia (fruit fly) have been compared with the human genome with BLAST.Another area that BLAST is prevalent is in the field of protein studies. Not only can researchers use BLAST for comparing DNA sequences, but they can also use the program to find similarities between protein sequences.BLAST has become an indispensable bioinformatics tool in the fields of biology, engineering, and biochemistry. Conclusions BLAST (Basic local alignment search tool) is a bioinformatics tool that became a marginal aid to many researchers to help in quickly comparing their own DNA samples. It is popular because of its speed even though it does allow certain errors. It allows navigation by letting the user determine the number/length of sequence to compare. After this they can also set the threshold to decrease the number of matches. From the use of this search tool it is important to see what could arise from this tool. Already there are other search tools such as BEAUTY. BEAUTY database search tool is very similar to BLAST but more advanced. This is due to the fact that BEAUTY which stands for BLAST enhancement alignment utility. The Beauty tool works by incorporating conserved regions and functional domains proteins sequences into the BLAST program to make it more specific. So as time goes on we will most likely see an increase in programs such as BEAUTY. Figure 2. In this image we are given an example of what an individual search might look like using BLAST. The target sequence above is PQG (w, length=3 letters) and a threshold is set at 13 (the W mentioned in the method). The neighborhoods words stands for the words in the database that the query word or target is compared too. To the right of the neighborhood word is a score that represents the match equivalence. This model stops taking score after 13 to optimize the match. The segment that is thought to be the best match is returned along with the source/subject of where it came from. BLAST in the future… As an example, there are companies such as Korilog that have made software (KoriBLAST) that use the BLAST system along with other programs to create software solutions to make it easier for labs and researchers in areas of data integration, visualization and management. Their goal is to provide the means for state-of-the-art graphical environments for quick and easy research. The software program is dedicated to making the BLAST program very useful by doing sequence data mining. Literature cited http://www.acm.org/crossroads/wikifiles/13-1-CE/13-1-11-CE.html http://www.mrc-lmb.cam.ac.uk/genomes/madanm/articles/antim.htm http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Alignment_Scores2.html http://www.goroadachi.com/etemenanki/blog-05feb.htm http://www.cs.duke.edu/courses/cps004g/fall06/papers/duke/altschuletal1990.pdf http://www.korilog.com/products/improving-NCBI-BLAST.php Figure 3. In the imageabove we are shown how the score is created. It is the sum of all the matches and mismatched amino acids minus the sum of the number of gaps. It also shows as previously stated that the score returned is the max/optimized score.

Genome Revolution & BLAST: Understanding DNA Beyond the Basics

Genome Revolution & BLAST: Understanding DNA Beyond the Basics

Presentation Transcript

Writing Biomedical Research Papers

DB2 9 for z/OS Overview

Juvenile Depression Jeff Q. Bostic, M.D., Ed.D. Massachusetts General Hospital Harvard Medical School

Introduction to the Unix operating system CIS 118 Jeff Moro

Biomathematics 170A Medical Statistics Jeff Gornbein Office:Life Science 5202 gornbein@ucla.edu 310-825-4193 Office hr

Jeff Hippen

Marketo Training

Federal Update Jeff Baker, Federal Student Aid November 12, 2013

Class 6~9: Formulas That Tonify Qi, Blood, Yin and Yang

Tutorial C: Service Oriented Architecture in the Context of Ground Systems

Permutation, Combinations, Probability, Oh My…

Psychologists at Health Psych Maine

NRSA and Stuff

Steve Morgan, MD & Scott Adams, MD Original Authors: Steve Morgan, MD; March 2004;

Air Breathing Propulsion JPC Report

THE SUN

M&A and Reinsurance Panel Insurance Tax Conference November 5, 2010 Jeff Vogel, KPMG LLP

WHO DARES LOOSES

Federal Update Jeff Baker Policy Liaison and Implementation Federal Student Aid

Redding East Rotary

Genome Revolution & BLAST: Understanding DNA Beyond the Basics