430 likes | 510 Vues
“the world’s 1 st structured network pattern database technology”. Introductions. Robert Hercus - CSO & Founder Australian, over 30 years IT experience Pioneered many large-scale IT projects “Language of Biology” basis of Synamatix
E N D
“the world’s 1st structured network pattern database technology”
Introductions Robert Hercus - CSO & Founder • Australian, over 30 years IT experience • Pioneered many large-scale IT projects • “Language of Biology” basis of Synamatix • Interests: Linguistics, Genomics, Artificial Intelligence, • Neural Networks
4 Common perceptions…. X 1. Bioinformatics companies are applications dependent 2. Using your applications will mean that we are locked in…. 3. Buying proprietary software means we cannot modify or understand how it works 4. We will have to replace current software and investments • No, Synamatix is a database technology company that ALSO develops applications to demonstrate its technology X • No, YOU can develop your own applications, or ask someone else to do it X • No, Synamatix gives away the source code for applications built upon SynaBASE X No, SynaBASE is designed to be open to enable integration with EXISTING IT infrastructure and software investments and installations
Not just target discovery…broad applications Clinical drug candidate Lead Optimisation Lead ID Toxicology & Assays Target ID
Personal Genomics Proteomics Non-sequence data Phylogenetics Comparative Genomics Chip-design Sequence Mining Mapping Motifs Annotate Clustering & Assembly
Open, shareable applications Data repository Internal software Bought in software 1 1 2 2 3 3 4 4 Integration Interface 5 5
Open application development Internal software Bought in software Data repository 1 1 2 2 3 3 External development 4 4 Internal development 5 5
5 Unique features
Patterns and structures • Finds, Stores, Relates & Structures • PATTERNS, • not FLAT FILES
Patterns & Network 1
1 Significance 2 5 Novel Applications Patterns & Network Scale Speed 4 3
T TG TGG TGGT TGGTG TGGTGT TGGTGTA Patterns – forward and reverse ATGTGGT redraw A AT ATG ATGT ATGTG ATGTGG
Patterns – all fwd intermediates ATGTGGT AT ATG ATGT ATGTG ATGTGG TG TGT TGTG TGTGG TGTGGT GT GTG GTGG GTGGT TG TGG TGGT GG GGT GT
Patterns – all rev intermediates ATGTGGT TG TGG TGGT TGGTG TGGTGT GG GGT GGTG GGTGT GGTGTA GT GTG GTGT GTGTA TG TGT TGTA GT GTA TA
GT TGGTGTA TGGTGT TGGTG GGTG AT TG TGG GGT ATGT TGGT TG GTGT A TGT TGT GGTGT GGTGTA T TGTG TGG ATG ATGTGG ATGTG ATGTGGT ATG TGGTG GT GTG GTGGT GTGTA TGGT GTG AT GT TGTA TG ATGT GG TG GTGG GT GTA TGTGG TGG TGGTGT TG TA TGGT GGT ATGTG GG SynaBASE is 100% exhaustive
TEMPORAL/SPATIAL “events related by time or proximity are associated” A B C ASSOCIATIVE • Precise • Can recall • Computationally simple • Multi-level network structure • Updating is simple dt
SynaBASE can address diverse data types • Patterns can be associated based upon TEMPORAL or SPATIAL characteristics • Sequence data – SPATIAL/TEMPORAL • Protein data – INTERACTION • Gene expression – TEMPORAL • Phylogeny – DISTANCE MEASURES • Text mining – SPATIAL INTERACTIONS • Transcription factors – REGULATORY INTERACTIONS • etc…
1. Patterns and structures 2. Significance and Frequency SynaBASE automatically learns and maintains the significance of patterns and data
Relying on frequency alone is inadequate… The elephant and the giraffe walked up the mountain A graph showing Frequency of “string (word)” patterns in a sentence does not reflect meaning The elephant and the giraffe walked up the mountain A graph showing Probabilities of predicting Precessor and Successor Characters/events (string Significance) reflecting meaning
Significance – forward and reverse elephant
a 64000 at 17000 (17/64=26%) atg 4930 (4930/17000=29%) atgg 1725 (1725/4930=34%) atggt 760 (760/1725=44%) atggtg 500 (500/760=66%) atggtgat atggtga 355 (355/500=71%) atggtgat 266 (266/355=75%)
Frequency v Significance FREQUENCY SIGNIFICANCE Human placental ribonuclease inhibitor
Gene models and “SIGNIFICANCE” correlation Ensembl Gene F2 F3 PIM1 Oncogene
SIGNIFICANCE and conservation Multi Species Comparison as presented by: Eric Green ISMB 2004
Pattern Significance 1st 500 KBP of hu.ch7 6 Genome db 3.80s Human Genome 7.41s Mouse Genome 3.88s Dog Genome 6.01s
Patterns and structures • SIGNIFICANCE • 3. Scale and 4. Speed • Unique method for structuring data leads to • Ultra-high-throughput applications becoming routinely accessible
10 Genome 10 9 Genome 9 8 Genome 8 7 Genome 7 Size of database 6 Genome 6 5 Genome 5 4 Genome 4 3 Genome 3 2 Genome 2 1 Genome 1 2 4 6 8 10 Number of Human genome copies
10 9 8 7 6 Size of database 5 4 Genome 10 Genome 9 Genome 8 Genome 7 3 Genome 6 Genome 5 Genome 4 2 Genome 3 Genome 2 1 Genome 1 2 4 6 8 10 Number of genomes
Analysis speed scales at logn base 2 Speed milliseconds 900 800 Conventional 700 SynaBASE 600 500 400 300 200 100 Size of database giga bp 1 10 100 1000
Building a SynaBASE is fast! Swissprot raw sequence data 8 minutes Search Compare Keywords Annotations Fast & Significant results
All Prokaryotes All genomes!! All variants!! All Multiples…. All Eukaryotes Human Virus Mouse Analyse a marker across 100s of genomes in 100 milliseconds All Plants Sequence data
Personal genome & personalised medicine Human wt Cancer Mouse Dog Ultra-high-throughput Biomarker mapping and analysis
Patterns and structures • SIGNIFICANCE • Speed and 4. Scale • 5. Future proof • A non brute force investment, • hardware independence leads to novel applications • or challenging research projects
3rd party Applications
Users Windows / Linux Linux Itanium C++ Java Java Servlets HTML Application Servers WWW Interface Custom Applications SUITE
Massively Parallel Single Molecule Sequencing analysis Real-time Proteomics Comparative genomics Probe design / testing Personalised medicine Clinical Diagnostics Ultra High Throughput (UHT)
Summary • Unique pattern network dB • Maintains patterns and their relationships • Able to derive Significance from data “a priori” • Self learning mechanism • Accuracy & Speed • Developed world’s 1st genomics platform capable of addressing demanding new applications: • Truly future – Hardware independent • Scalable • Ultra-high-throughput genome analysis z155801