1 / 19

Exploring Williams-Beuren Syndrome using my Grid

Exploring Williams-Beuren Syndrome using my Grid. R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass, a M. Tassabehji b. b University of Manchester, Academic Unit of Medical Genetics St Mary’s Hospital.

denzel
Télécharger la présentation

Exploring Williams-Beuren Syndrome using my Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring Williams-Beuren Syndrome using myGrid R.D. Stevens,a H.J. Tipney,b C.J. Wroe,a T.M. Oinn,c M. Senger,c P.W. Lord,a C.A. Goble,a A. Brass,a M. Tassabehji b bUniversity of Manchester, Academic Unit of Medical Genetics St Mary’s Hospital cEuropean Bioinformatics Institute Wellcome Trust Genome Campus Hinxton a Department of Computer Science University of Manchester

  2. Williams-Beuren Syndrome (WBS) • Congenital disorder caused by sporadic gene deletion • 1/20,000 live births • Effects multiple systems – muscular, nervous, circulatory • Characteristic facial features • Unique cognitive profile • Mental retardation (IQ 40-100, mean~60, ‘normal’ mean ~ 100 ) • Outgoing personality, friendly nature, ‘charming’ • Haploinsuffieciency of the region results in the phenotype

  3. ~1.5 Mb 7q11.23 Patient deletions * * WBS SVAS GTF2IRD2P Physical Map CTA-315H11 ‘Gap’ CTB-51J22 FKBP6T POM121 GTF2IP NOLR1 NCF1P PMS2L STAG3 Chr 7 ~155 Mb Block B Block A Block C Williams-Beuren Syndrome Microdeletion Eicher E, Clark R & She, X An Assessment of the Sequence Gaps: Unfinished Business in a Finished Human Genome. Nature Genetics Reviews (2004) 5:345-354 Hillier L et al. The DNA Sequence of Human Chromosome 7. Nature (2003) 424:157-164 A-cen B-cen C-cen C-mid B-mid A-mid B-tel A-tel C-tel WBSCR1/E1f4H WBSCR5/LAB GTF2IRD1 WBSCR21 WBSCR22 WBSCR18 WBSCR14 GTF2IRD2 POM121 NOLR1 BAZ1B BCL7B FKBP6 GTF2I CLDN3 CLDN4 CYLN2 STX1A LIMK1 NCF1 TBL2 RFC2 FZD9 ELN

  4. 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa Filling a genomic gap in Silico • Identify new, overlapping sequence of interest • Characterise the new sequence at nucleotide and amino acid level Cutting and pasting between numerous web-based services i.e. BLAST, InterProScan etc

  5. Filling a genomic gap in silico • Frequently repeated – info rapidly added to public databases • Time consuming and mundane • Don’t always get results • Huge amount of interrelated data is produced – handled in notebooks and files saved to local hard drive • Much knowledge remains undocumented: Bioinformatician does the analysis Advantages: Specialist human intervention at every step, quick and easy access to distributed services Disadvantages: Labour intensive, time consuming, highly repetitive and error prone process, tacit procedure so difficult to share both protocol and results

  6. Why Workflows and Services? Workflow = general technique for describing and enacting a process Workflow = describes what you want to do, not how you want to do it Web Service = how you want to do it Web Service = automated programmatic internet access to applications • Automation • Capturing processes in an explicit manner • Tedium! Computers don’t get bored/distracted/hungry/impatient! • Saves repeated time and effort • Modification, maintenance, substitution and personalisation • Easy to share, explain, relocate, reuse and build • Available to wider audience: don’t need to be a coder, just need to know how to do Bioinformatics • Releases Scientists/Bioinformaticians to do other work • Record • Provenance: what the data is like, where it came from, its quality • Management of data (LSID - Life Science Identifiers)

  7. myGrid • E-Science pilot research project funded by EPSRC www.mygrig.org.uk • Manchester, Newcastle, Sheffield, Southampton, Nottingham, EBI and RFCGR, also industrial partners. • ‘targeted to develop open source software to support personalised in silico experiments in biology on a grid.’ www.mygrid.org.uk Which means…. Distributed computing – machines, tools, databanks, people Personalisation Provenance and Data management Enactment and notification A virtual lab ‘workbench’, a toolkit which serves life science communities.

  8. SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST Workflow Components Freefluo Freefluo Workflow engine to run workflows Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available

  9. Query nucleotide sequence Williams Workflow Plan RepeatMasker BLASTwrapper Pink: Outputs/inputs of a service Purple: Tailor-made services Green: Emboss soaplab services Yellow: Manchester soaplab services GenBank Accession No Promotor Prediction URL inc GB identifier TF binding Prediction Translation/sequence file. Good for records and publications prettyseq Regulation Element Prediction GenBank Entry Amino Acid translation Sort for appropriate Sequences only Identifies PEST seq epestfind Identify regulatory elements in genomic sequence Seqret Identifies FingerPRINTS pscan MW, length, charge, pI, etc Nucleotide seq (Fasta) pepstats 6 ORFs Predicts Coiled-coil regions RepeatMasker pepcoil tblastn Vs nr, est, est_mouse, est_human databases. Blastp Vs nr Coding sequence GenScan BlastWrapper Restriction enzyme map restrict SignalP TargetP PSORTII sixpack Predicts cellular location transeq CpG Island locations and % cpgreport Identifies functional and structural domains/motifs InterPro RepeatMasker Repetitive elements ORFs Hydrophobic regions Pepwindow? Octanol? Blastn Vs nr, est databases. ncbiBlastWrapper

  10. The Williams Workflows A B C A: Identification of overlapping sequence B: Characterisation of nucleotide sequence C: Characterisation of protein sequence

  11. The Workflow Experience Have workflows delivered on their promise? YES! • Correct and Biologically meaningful results • Automation • Saved time, increased productivity • Process split into three, you still require humans! • Sharing • Other people have used and want to develop the workflows • Change of work practises • Post hoc analysis. Don’t analyse data piece by piece receive all data all at once • Data stored and collected in a more standardised manner • Results amplification • Results management and visualisation

  12. The Workflow Experience • Activation energy versus Reusability trade-off • Lack of ‘available’ services, levels of redundancy can be limited • But once available can be reused for the greater good of the community • Licensing of Bioinformatics Applications • Means can’t be used outside of licensing body • No license = access third-party websites • Instability of external services • Research level • Reliant on other peoples servers • Taverna can retry or substitute before graceful failure • Shims

  13. Shims shim (sh m) n. A thin, often tapered piece of material used to fill gaps, make something level, or adjust something to fit properly. shimmed,shim·ming,shims To fill in, level, or adjust by using shims or a shim. • Explicitly capturing the process • Unrecorded ‘steps’ which aren’t realised until attempting to build something • Enable services to fit together

  14. ‘I want to identify new sequences which overlap with my query sequence and determine if they are useful’ Sequence database entry Fasta format sequence Genbank format sequence Sequence i.e. last known 3000bp Identify new sequences and determine their degree of identity Mask BLAST Simplify and Compare Retrieve Lister Old BLAST result Alignment of full query sequence V full ‘new’ sequence BLAST2 Shims

  15. WBSCR21 WBSCR27 WBSCR24 WBSCR18 WBSCR22 WBSCR28 STX1A CLDN3 CLDN4 RP11-148M21 RP11-731K22 RP11-622P13 314,004bp extension All nine known genes identified The Biological Results Four workflow cycles totalling ~ 10 hours The gap was correctly closed and all known features identified WBSCR14 ELN CTA-315H11 CTB-51J22

  16. Conclusions • It works – a new tool has been developed which is being utilised by biologists • More regularly undertaken, less mundane, less error prone • Once notification is installed won’t even need to initiate it • More systematic collection and analysis of results • Increased productivity • Services: only as good as the individual services, lots of them, we don’t own them, many are unique and at a single site, research level software, reliant on other peoples services, licenses • Activation energy

  17. Future Directions • Scheduling and Notification • Portals • Results visualisation • Re-use: other genomic disorders, Graves Disease

  18. www.mygrid.org.uk Acknowledgments • Dr May Tassabehji • Prof Andy Brass • Medical Genetics team at St Marys Hospital, Manchester • Wellcome Trust

  19. myGrid People www.mygrid.org.uk Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker

More Related