170 likes | 290 Vues
This article explores the intersection of data mining and biological research, illustrating how advanced analytics can yield significant benefits. It highlights various processes, including epitope prediction, protein interaction extraction, and gene expression analysis. Key findings demonstrate improved treatment options for patients, cost savings for pharmaceutical companies, and enhanced scientific discovery. With methodologies such as artificial neural networks and emerging pattern detection, the paper delves into practical applications and the future potential of data mining in biomedicine.
E N D
Datamining: Turning Biological Data into Gold Limsoon Wong KRDL
Jonathan’s blocks Jessica’s blocks Whose block is this? What is Datamining? Jonathan’s rules : Blue or Circle Jessica’s rules : All the rest
What is Datamining? Question: Can you explain how?
What are the Benefits? • To the patient: • Better drug, better treatment • To the pharma: • Save time, save cost, make more $ • To the scientist: • Better science
Epitope Prediction TRAP-559AA MNHLGNVKYLVIVFLIFFDLFLVNGRDVQNNIVDEIKYSE EVCNDQVDLYLLMDCSGSIRRHNWVNHAVPLAMKLIQQLN LNDNAIHLYVNVFSNNAKEIIRLHSDASKNKEKALIIIRS LLSTNLPYGRTNLTDALLQVRKHLNDRINRENANQLVVIL TDGIPDSIQDSLKESRKLSDRGVKIAVFGIGQGINVAFNR FLVGCHPSDGKCNLYADSAWENVKNVIGPFMKAVCVEVEK TASCGVWDEWSPCSVTCGKGTRSRKREILHEGCTSEIQEQ CEEERCPPKWEPLDVPDEPEDDQPRPRGDNSSVQKPEENI IDNNPQEPSPNPEEGKDENPNGFDLDENPENPPNPDIPEQ KPNIPEDSEKEVPSDVPKNPEDDREENFDIPKKPENKHDN QNNLPNDKSDRNIPYSPLPPKVLDNERKQSDPQSQDNNGN RHVPNSEDRETRPHGRNNENRSYNRKYNDTPKHPEREEHE KPDNNKKKGESDNKYKIAGGIAGGLALLACAGLAYKFVVP GAATPYAGEPAPFDETLGEEDKDLDEPEQFRLPEENEWN
1 66 100 Epitope Prediction Results • Prediction by our ANN model for HLA-A11 • 29 predictions • 22 epitopes • 76% specificity • Prediction by BIMAS matrix for HLA-A*1101 Number of experimental binders 19 (52.8%) 5 (13.9%) 12 (33.3%) Rank by BIMAS
Gene Expression Analysis • Clustering gene expression profiles • Classifying gene expression profiles • find stable differentially expressed genes
Gene Expression Analysis Results • The Discovery System • Correlation test • Voter selection • Class prediction
WEB Protein Interaction Extraction “What are the protein-protein interaction pathways from the latest reported discoveries?”
Protein Interaction Extraction Results • Rule-based system for processing free texts in scientific abstracts • Specialized in • extracting protein names • extracting protein-protein interactions Jak1
Medical Record Analysis • Looking for patterns that are • valid • novel • useful • understandable
Medical Record Analysis Results • DeEPs, a novel “emerging pattern’’ method • Beats C4.5, CBA, LB, NB, TAN in 21 out of 32 UCI benchmarks • Works for gene expressions
Under the Hood • Artificial neural network • Neighbourhood analysis • Non-linear analysis • Template matching • Emerging pattern • Hidden markov models • Bayesian inference • Decision tree induction • ...
Epitope Prediction Vladimir Brusic Judice Koh Seah Seng Hong Zhang Guanglan Yu Kun Transcription Start Prediction Vladimir Bajic Seah Seng Hong Gene Expression Analysis Zhang Louxin Zhang Zhuo Zhu Song Medical Records Li Jinyan Protein Interaction Extraction Ng See Kiong Zhang Zhuo Behind the Scene