Comparative Genomics The Finale Angela Pena, AmbilySivadas, AmitRupani Shimantika Sharma, Juliette Zerick KeertiSurapaneni, ArtikaNath, HemaNagrajan
Outline Results • Goal 1 – PCR Assay • Goal 2 – Comparative genome analysis • Goal 3 – Haemolysisstudy • Goal 4 – Virulent factors • Discussion
Goal 1 Identification and characterization of target genes for PCR Assay
Identification of target genes Fw primer Rv primer Hhae C A B NTHi A C PCR products of different size One copy of B or multiple copies? Is A-C organized the same way in both organisms? Identify candidate clusters/genes for assay development and conserved regions for primer/probe design
Cluster statistics • Total clusters: 8402 • Total common to all genomes : 361 • Total unique to Hhae: 82 • Total unique to Hinf: 38 • Total unique to Pathogenic strains: 0 • Protein sequences were clustered using Blastclust
Target identification - Protocol 1 1. Take all proteins common to all 25 genomes • Common = Most conserved proteins
Target identification - Protocol 1 1. Cluster Analysis: Take all proteins common to all 25 genomes • Common = Most conserved proteins 2. Compute and compare inter-cluster distances for HhaevsHinf • Look for species specific patterns • Look for including unique genes
Protocol II:BLAST Everything If a pistol just isn't working for you . . . • Our method is this: for every unique Hhae gene, we will locate its corresponding contig • We checked the flanking regions (on the contig) for conserved genes. • We will then locate the conserved genes in the Hinf genome and see if they are adjacent. • Since a wide net can be cast with BLASTn searches, this includes homologs.
Start with a set of Hhae genes We found a (more) unique gene! Select a unique Hhae gene from the set YES NO Are the conserved genes adjacent or “close enough” in the Hinf genome? Reject gene and start over Search the set of common Hhae/Hinf (conserved) genes for genes in the flanking regions Get the locations of the conserved flanking genes in the Hinf genome NO Is there at least one conserved gene in each flanking region? YES
PCR Assay: Results • Target 1 • PCR product • No duplication was found for these genes fatty acid/ phospholipid synthesis protein 50S ribosomal protein 1020 bp A B C D Hh NTHi A B D 170 bp Nucleic acid binding protein (hypothetical) 3-oxoacyl-(acyl carrier protein) synthase III 1250 bp Hh NTHi 380 bp
PCR Assay: Results • Target 2 • PCR product 1451 bp A C E Hh B D NTHi 1 6 2 3 4 5 D A E purine nucleoside phosphorylase predicted membrane protein 1934 bp fructose-biphosphatealdolase 1451 bp Hh NTHi 1934 bp
Target validation by Insilico PCR Step 1: Multiple Sequence Alignment by ClustalW2 - Overview 1 870 1775 2749 Non Typable H. influenzae 19 strains + 1 Typeable H. haemolyticus 5 strains Target 1 905 nts
Neighbor Joining Tree Percentage of Identity using Jalview Step 2: Phylogenetic analysis
Step 3: Finding primers 1 870 1775 2749 5’-CTCACTTACGCCACCACGTA-3’ Forward Non Typable H. influenzae 20 strains H. haemolyticus 5 strains 3’-TGCAACAATAATCAGTTCAATATCT-5’ Reverse
In silico PCR Analysis Non Typable H. influenzae AAZD00000000 Product length: 487 H. haemolyticus M21621 Product length: 1354
In silico PCR Analysis Sequence (5'->3') FORWARD
MSA – Target 2 5372 1 Non Typable H. influenzae 20 strains H. haemolyticus 5 strains
Goal 2 Comparative genomic analysis
Horizontal Gene Transfer • Horizontal gene transfer (HGT), also lateral gene transfer (LGT) refers to the transfer of genetic material between organisms Alien Hunter • Predicts putative horizontally transferred regions. • Standalone software • Available at http://www.sanger.ac.uk/Software/analysis/alien_hunter • Usage: • ./alien-hunter <input_file> <output_file> • INPUT: raw genomic sequence • PREDICTION: HGT regions based on Interpolated Variable Order Motifs (IVOMs)
Last time, we got many hits with varied scores that covered almost 90% of the genes in each genome. Hence, we decided to place a threshold on the scores. • We studied the distribution of scores for each genome by plotting histograms for each genome based on the scores. • We decided to place a threshold of >70 after studying all the histograms. Screenshot of M21621
Insertion elements • An Insertion element is a short DNA sequence that acts as a simple transposable element. • A transposable element (TE) is a DNA sequence that can change its relative position (self-transpose) within the genome of a single cell. The mechanism of transposition can be either "copy and paste" or "cut and paste".
FASTA sequences We retrieved FASTA sequences by submitting the accession IDs in NCBI
BLAST We blasted these insertion sequences against each of the strains and got the location of the insertion sequences in the strain. A PERL script was written to extract the insertion sequences from their respective contigs in each strain.
Goal 3 Identification and Characterization of Haemolysin in Hhae
AIM #1 Look for the hemolysinBAoperon present in the H.haemolyticus strains and characterize it as present/absent in the hemolytic and non hemolytic strains
HEMOLYSIN • Hemophilusducreyi, requires two adajecent genes, hhdB and hhdA for hemolysis . • hhdB is an outer membrane protein, which is required for secretion and activation of the hemolysin structural protein, hhdA. • Once secreted, hhdA interacts with target cell membranes, oligomerizes, and forms pores 2.5 to 3.0 nm in diameter, which lyse the target cell TWO PROTEIN SECRETION SYESTEM
OUR STRATEGY • Downloaded the Fasta files of all hemolysin protein sequence of the Pasteurellaceaefamily from NCBI protein database. • Blasted the predicted protein sequences of the six strains against these. Cut off threshold: Identity 70% Coverage 80%
RESULTS All hits had 70% and more identity and 95-100 coverage
AIM# 2 • Characterize the domains/motifs/residue in hemolysin. • Depict the secondary structures in hemolysin. • Predict the 3D structure of hemolysin.
SIGNAL PEPTIDE & HAEMAGGLUTINATION ACTIVITY DOMAIN Haemagglutination activity domain N’ terminal • A signal peptide (25 aa) to transport the hemolysin to outer membrane or periplasm. LipoP cleavage site Spase I at 25-26. NOT LIPOPROTEIN • Haemagglutination activity domain -suggested that the haemagglutination activity domain is a carbohydrate-dependent haemagglutination activity site which is found in a range of haemagglutinins and haemolysins Signal Peptide
HAEMAGLUTININ REPEAT • Haemaglutinin repeat is a highly divergent repeat that occurs in number of proteins implicated in cell aggregation
TPS DOMAIN All TPS-secreted proteins contain a distinctive N-proximal module essential for secretion, the TPS domain. TpsA proteins display two conserved regions, C1 and C2, and two less-conserved regions, LC region.ANPNL and NPNGIS is found in this region hemolysins/cytolysinsShlA of Serratiamarcescens, HpmA of Proteus mirabilis, EthA of Edwardsiellatarda, HhdA of Haemophilus ducreyi, the large supernatant proteins LspA1 and LspA2 of H. ducreyi, and the HecA adhesin of E. chrysanthemi . Clantin et al., 2004. The crystal structure of filamentous hemagglutinin secretion domain and its implications for the two-partner secretion pathway.PNAS.
Does the TPS domain exist in H.haemolyticus strains? Fha30 H.H H.H H.H HhdA EthA ShlA HpmA LSpA1 LSPA2 21127 21621 19107 Fha30 EthA HpmA LspA1 LspA2 ShlA HhdA .
ANPNL CONSERVED RESIDUES IN TPS DOMAIN NPNLGI NPNL & NPNGI These motifs form type I beta -turns, which might play important stabilizing roles. The conserved residues of the TPS domain serve to drive the folding of the TPS domain into a beta -helix and to stabilize the helix TPS HAD 39-159-Pfam Or TPS 39-270
AIM #3 • Identify the domains in the hemolysin activator gene • Determine the secondary and 3D structure of hemolysin activator gene
HEMOLYSIN ACTIVATOR PROTEINTRANSMEMBRANE PROTEIN MEMBRANE PROTEINS α-helical β-barrel β-barrel membrane protein class are located in the outer membrane of Gram-negative bacteria. These proteins have membrane spanning segments formed by antiparallel β-strands, creating a channel in the form of a barrel that spans the outer membrane.
DOMAIN IS HEMOLYSIN ACTIVATOR POTRA_2 Activator Domain SP SP (LipoP) – SPI cleavage site between pos. 19 and 20. NOT LIPOPROTEIN POTRA_2- polypeptide-transport-associated domain. In ShlB this domain has a chaperone-like function over ShlA. Activator domain in ShlB is shown to interacts with ShlA during secretion and imposes a conformational change in ShlA to form the active hemolysin. ShlA/B: Serratiamarcescens