10 likes | 114 Vues
GBK FILES. CMR DATA FILE. rRNA PARSER. FILE 1. CMR-GBK range. FILE 2. TEST_PCR. PCRprimer. FILE 3. PRIMER LOCATIONS. VisualPCR. rRNA DATA.
E N D
GBK FILES CMR DATA FILE rRNA PARSER FILE 1 CMR-GBK range FILE 2 TEST_PCR PCRprimer FILE 3 PRIMER LOCATIONS VisualPCR rRNA DATA A Closer Look Into the rRNA Gene Complex Of Pathogenic Members Of Actinobacteria Laura E Heuermann* and Dhundy Bastola*° *College of Information Science & Technology, University of Nebraska at Omaha, Omaha, NE 68182 °Department of Pathology & Microbiology, University of Nebraska Medical Center, Omaha, NE 68198 Abstract Internal Transcribed Spacers (ITS) are highly variable regions within the rRNA gene complex in the genomes of any organism. These are of special interest for the development of PCR-based molecular diagnostic tools. As a first step in the development of genus-specific PCR primer sets for amplification of the ITS region from Mycobacterium, we developed a computational tool to automate the analysis of multiple genomic sequences. We report the analysis of the rRNA complex from 19 organisms belonging to the family Actinobacteria. This Family consists of gram-positive bacteria species that are pathogenic to humans and commonly found in the flora of human lungs. Mycobacterium, Corynebacterium, Nocardia, Rhodococcus and a few species of Streptomyces are representative members of Actinobacteria whose genomes have been sequenced. Our results show that there are multiple copies of this genomic target within each genome and evidence for possibly incorrect annotation of some of the genomic sequences from different resources. Motivation The polymerase chain reaction (PCR) is a technique widely used in molecular biology to amplify a certain region within the target DNA of interest (Fig. 2). Due to high degree of sensitivity of this method, use of highly specific primers is one of the important parameters for a successful PCR reaction. To determine a set of genus specific primers, the most common approaches include multiple sequence alignment of target DNA and determining highly conserved regions across the members of the given genus. While the use of a multi-copy target (such as the rRNA) allows increased sensitivity due to increased template availability in the reaction, we hypothesize that possible change in orientation of these multi-copy template produce unexpected PCR products of different length. Background Mycobacterium belongs to the family of Actinobacteria. This family consists of gram-positive bacteria species, some of which are pathogenic to humans and commonly found in the flora of human lungs. Different Mycobacterium sp. have been identified using PCR amplification of the ITS region (Fig 1) of the rRNA using primer sets described earlier [1]. The DNA sample obtained from various Mycobacterium spp. gives a representative 400-600 bp amplified product (Fig 3). Occasionally, multiple unexpected PCR products have been observed from Mycobacterium sample (Fig 3, Table 1 ), especially from clinical material. Literature search shows the lung flora to consist of Mycobacterium, Corynebacterium, Nocardia, Rhodococcus and a few species of Streptomyces. The genome sequences of these representative members of the Actinobacteria are available in public online databases. Together with the availability of such genomic sequences and the computational tools, it is now possible for an automated in silico PCR (isPCR) process. This process can comprehensively search the genome to quickly determine possible PCR products without the interference of errors that is possible and commonly introduced by humans in the molecular biology laboratory. To trouble shoot an unexpected amplified product will be a daunting task in a wet-lab setting. To overcome this limitation, we present an in silico PCR; an alternative that will not only identify PCR primers and the possible product(s) but also display all possible orientations of these primers in a genomic DNA sequence. The results of isPCR using these sequences are expected to allow assessment of wet-lab results in a genomic context and also determine the identity of the bands if they were in fact from the amplification of Actinobacteria. Additionally, the isPCR could be used to test the specificity of any primer using genome sequences from Genbank. Figure 1, Genomic Structure of rRNA gene complex in prokaryotes. Internal Transcribe Spacers (ITS) are highly variable regions within the rRNA gene complex and are typically used for identification of closely related bacterial species. Figure 2. An overview of Polymerase Chain Reaction (PCR) showing three major steps (1,2,&3) in the PCR based amplification of a shorter region from a genome. The terminal red color represents forward and reverse primers Figure 3. PCR amplification of the ITS region using a DNA sample taken from fluid in the lung flora. Primers that were believed to be specific to Mycobacterium were used. Methods The in silico PCR (isPCR) is a tool written in Perl using the Bioperl module. The program consists of multiple steps (Fig 4) and uses sequence data from two publicly available databases, namely GenBank at NCBI and Comprehensive Microbial Resources (CMR) [2]. The program begins by reading in features from a Genbank file. An over view of the entire pipeline is shown (Fig 5). To assure comparable annotation from the two public sequence repositories, GenBank and CMR, the coordinates for known rRNA gene complexes were obtained from CMR and compared with the annotation provided in the GenBank entry. The results were written to a file (Table 1). Additionally, the program searched for the pair of given primer sequences (Forward= "CCAGCAGCCGCGGTAATACG" ; Reverse= "GAAGTCGTAACAAGGTARCCGII") in both forward and reverse orientations. For every search hit, the location and orientation was recorded for further processing. However, before processing the real genome sequence, we implemented this algorithm to a test sequence where the Forward and Reverse primers were embedded in the following four orientations: 1. FP 2. RP 3. rc(FP) 4. rc(RP) After successfully identifying known-embedded primers, the program analyzed all the genomic sequences and output the results to a data-folder. In the first step of this process, isPCR collected the rRNA features including: (1) Accession number, (2) organism name, (3) rRNA subunit type, (4) rRNA location, and (5) sequence. The subunit locations obtained from CMR and the locations found in the Genbank were then compared, and the differences were written to a file. Similarly, the ‘PCRprimer function’ in the program outputs the primer location and orientation to a separate file which is formatted in a manner that is acceptable as an input for Gbrowse [3] . Additionally, ‘VisualPCR’ program (Fig 6) was used in the visualization of Primer Pairs and its orientation on a small fragment of a genome. Read GBKfile Save $CMRdata = ( Read CMRfile and collect data ) Call function TESTPCR { Read TESTseq Call PCRprimer ( $primer, $TESTseq ) { Check $function.results with $known.results }} Foreach ( GBKfile ) { collect ( rRNA features ) { Save $rRNAdata = $Accession_#, $Organism_name, $rRNA_type, $rRNA_location, $strand, $sequence; } } Call function RANGEfinder { Compare ( $rRNAdata.location with $CMRdata.location ) { If ( similar ) { Save$status = “OK”; } Else { save $status = location.difference; } } Print FILE1 rRNAdata, $status; } Call function PCRprimer ( $primer, $sequence ) { Foreach ( $primer found in$sequence ) { Save$primerDATA = ( $STARTlocation and$ENDlocation and$orientation ); Print FILE2 $primerDATA; } Callfunction VisualPCR( $sequence, $primerDATA ) { PrintFILE3 visual picture of $primerDATA in $sequence } } Figure 4. Pseudocode describing important functions used by the isPCR program. Figure 5. An overview of in silico PCR (isPCR) showing work flow and associations between various input/output files, parser and visualization tools. Figure 6. Display of primer locations and orientations within the genome of Corynebacterium glutamicum ATCC 13032 Kitasato. Visual primer locations are helpful in determining different possible bands produced from multiple primers. • Results • Parsing of the GenBank files and comparing the information to CMR database revealed inconsistencies in the annotation information available in public databases. Inconsistent information for Mycobacterium avium paratuberculosis, Cornebacterium jeikeium k411 and Cornebacterium glutamicum ATCC 13032 Kitasato were identified by our tools. Summary of result is shown below (Table 1). • Table 1. Ribosomal RNA subunit data obtained from Genbank files (O) and data expected from CMR database (E). Status is referred to as ‘U’ when discrepancy occurred between the annotation information between two databases. • Conclusion • If genome sequences for organism are available, computational tool such as isPCR can be used to comprehensively search any genome to quickly determine possible PCR products and assist in the experimental design. • Based on annotation information, every members of Actinobacteria contain more than one copy of rRNA gene complex. The copy number ranges from one to seven between the different members of this family. • Public Databases are good sources of sequence data. However, use of these data in the development of down stream application requires curation and validation. • The visual results obtained from isPCR displaying the location and orientation of primers within the genomes is a convenient way to suggest the possibility of multiple PCR products with a given set (or multiple sets) of primers. • Future Work • Visualize primer position, location and product size with single or multiple pairs of primers. This is expected to allow effective comparison between in silico result and actual experimental result obtained from a molecular biology laboratory. • References • [1] Mohamed AM, Kuyper D, Iwen PC, Bastola DR, Ali HH and Hinrichs SH (2005). Computational Approach Involving Use of the Internal Transcribed Spacer 1 Region for Identification of Mycobacterium species. J Clinical Micro. 43:3811-3817 • [2] Center for Microbial Resource (CMR)-http://cmr.jcvi.org/tigrscripts/CMR/CmrHomePage.cgi • [3] Gbrowse - http://gmod.org/wiki/index.php/Gbrowse • Acknowledgement • This project was supported by the NIH grant number P20 RR016469 from the INBRE Program of the National Center for Research Resources