Bio-Medical Informatics

Bio-Medical Informatics Instructor : HanifYaghoobi Website: site444703.44.webydo.com E-mail : Hyiautcourse@gmail.com My personal Mail: hanifeyaghoobi@gmail.com

About this Course • Activities during the semester 5 score: 1)Home Works 2) MATLAB exercises • Your Final Projects 3score • Final Exam 12 score

Shortliffe “ Medical informatics is the rapidly developing scientific field that deals with resources, devices and formalized methods for optimizing the storage, retrieval and management of biomedical information for problem solving and decision making” Edward Shortliffe, MD, PhD 1995

Organisms • Classified into two types: • Eukaryotes: contain a membrane-bound nucleus and organelles (plants, animals, fungi,…) • Prokaryotes: lack a true membrane-bound nucleus and organelles (single-celled, includes bacteria) • Not all single celled organisms are prokaryotes!

Cells • Complex system enclosed in a membrane • Organisms are unicellular (bacteria, baker’s yeast) or multicellular • Humans: • 60 trillion cells • 320 cell types • Example Animal Cell • www.ebi.ac.uk/microarray/ biology_intro.htm

DNA Basics – cont. • DNA in Eukaryotes is organized in chromosomes.

Chromosomes • In eukaryotes, nucleus contains one or several double stranded DNA molecules orgainized as chromosomes • Humans: • 22 Pairs of autosomes • 1 pair sex chromosomes • Human Karyotype • http://avery.rutgers.edu/WSSP/StudentScholars/ • Session8/Session8.html

www.biotec.or.th/Genome/whatGenome.html

What is DNA? • DNA: Deoxyribonucleic Acid • Single stranded molecule (oligomer, polynucleotide) chain of nucleotides • 4 different nucleotides: • Adenosine (A) • Cytosine (C) • Guanine (G) • Thymine (T)

Nucleotide Bases • Purines (A and G) • Pyrimidines (C and T) • Difference is in base structure • Image Source: www.ebi.ac.uk/microarray/ biology_intro.htm

DNA

The Central DogmaProtein Synthesis Transcription Translation Cell Function Transcriptome Genome Proteome Gene Expression Level

Genome • chromosomal DNA of an organism • number of chromosomes and genome size varies quite significantly from one organism to another • Genome size and number of genes does not necessarily determine organism complexity

ORGANISM CHROMOSOMES GENOME SIZE GENES Homo sapiens (Humans) 23 3,200,000,000 ~ 30,000 Mus musculus (Mouse) 20 , 2600,000,000 ~30,000 Drosophila melanogaster(Fruit Fly) 4 180,000,000 ~18,000 Saccharomyces cerevisiae (Yeast) 16 14,000,000 ~6,000 Zea mays (Corn) 10 2,400,000,000 ??? Genome Comparison

DNA Basics – cont. • The DNA in each chromosome can be read as a discrete signal to {a,t,c,g}. (For example: atgatcccaaatggaca…)

DNA Basics – cont. • In genes (protein-coding region), during the construction of proteins by amino acids, these nucleotides (letters) are read as triplets (codons). Every codon signals one amino acid for the protein synthesis (there are 20 aa).

…CATTGCCAGT… DNA Basics – cont. • There are 6 ways of translating DNA signal to codons signal, called the reading frames (3 * 2 directions).

…CATTGCCAGT… Exon Intron Exon Intron Exon Exon DNA Basics – Cont. • Start: ATG • Stop: TAA, TGA, TAG • gene

Understanding Genome Sequences ~3,289,000,000 characters: aattgtgctctgcaaattatgatagtgatctgtatttactacgtgcatat attttgggccagtgaatttttttctaagctaatatagttatttggacttt tgacatgactttgtgtttaattaaaacaaaaaaagaaattgcagaagtgt tgtaagcttgtaaaaaaattcaaacaatgcagacaaatgtgtctcgcagt cttccactcagtatcatttttgtttgtaccttatcagaaatgtttctatg tacaagtctttaaaatcatttcgaacttgctttgtccactgagtatatta tggacatcttttcatggcaggacatatagatgtgttaatggcattaaaaa taaaacaaaaaactgattcggccgggtacggtggctcacgcctgtaatcc cagcactttgggagatcgaggagggaggatcacctgaggtcaggagttac agacatggagaaaccccgtctctactaaaaatacaaaattagcctggcgt ggtggcgcatgcctgtaatcccagctactcgggaggctgaggcaggagaa tcgcttgaacccgggagcggaggttgcggtgagccgagatcgcaccgttg cactccagcctgggcgacagagcgaaactgtctcaaacaaacaaacaaaa aaacctgatacatggtatgggaagtacattgtttaaacaatgcatggaga tttaggttgtttccagtttttactggcacagatacggcaatgaatataat tttatgtatacattcatacaaatatatcggtggaaaattcctagaagtgg aatggctgggtcagtgggcattcatattgagaaattggaaggatgttgtc aaactctgcaaatcagagtattttagtcttaacctctcttcttcacaccc ttttccttggaagaaagctaaatttagacttttaaacacaaaactccatt ttgagacccctgaaaatctgggttcaaagtgtttgaaaattaaagcagag gctttaatttgtacttatttaggtataatttgtactttaaagttgttcca. . . Goal: Identify components encoded in the DNA sequence

ATGCTCAGCGTGACCTCA . . . CAGCGTTAA M L S V T S . . . Q R STP Open Reading Frame • Protein-encoding DNA sequence consists of a sequence of 3 letter codons • Starts with the START codon (ATG) • Ends with a STOP codon (TAA, TAG, or TGA)

ATGCTCAGCGTGACCTCA . . . CAGCGTTAA M L S V T S . . . Q R STP Finding Open Reading Frames Try all possible starting points • 3 possible offsets • 2 possible strands Simple algorithm finds all ORFs in a genome • Many of these are spurious (are not real genes) • How do we focus on the real ones?

Using Additional Genomes Basic premise “What is important is conserved” Evolution = Variation + Selection • Variation is random • Selection reflects function Idea: • Instead of studying a single genome, compare related genomes • A real open reading frame will be conserved

S. cerevisiae ~10M years S. paradoxus S. mikatae S. bayanus C. glabrata S. castellii K. lactis A. gossypii K. waltii D. hansenii C. albicans Y. lipolytica N. crassa M. graminearum M. grisea A. nidulans S. pombe Phylogentic Tree of Yeasts Kellis et al, Nature 2003

Evolution of Open Reading Frame S. cerevisiae S. paradoxus S. mikatae S. bayanus ATGCTCAGCGTGACCTCA . . . ATGCTCAGCGTGACATCA . . . ATGCTCAGGGTGACA--A . . . ATGCTCAGG---ACA--A . . . Frame shift changes interpretationof downstream seq Conserved positions Variable positions A deletion

Conserved Examples Variable Frame shift Spurious ORF ATG notconserved Confirmed ORF Greedy algorithm to find conserved ORFs surprisingly effective (> 99% accuracy) on verified yeast data Sequencingerror [Kellis et al, Nature 2003]

Defining Conservation Conserved Variable A A A A C C C C C A A A A A A A A A A C A G T C G G T C C C A C A A A C Naïve approach • Consensus between all species Problem: • Rough grained • Ignores distances between species • Ignores the tree topology Goal: • More sensitive and robust methods % conserv 100 33 55 55

Bioinformatics– an area of emerging knowledge • Each cell of the body contains the whole DNA of the individual (about 40,000 genes in the human genome, each of them comprising from 50 to a mln base pairs – A,T,C or G) • The Main Dogma in Genetics: DNA->RNA->proteins • Transcription: DNA (about 5%) -> mRNA • DNA -> pre-RNA -> splicing -> mRNA (only the exons) • Translation: mRNA -> proteins • Proteins make cells alive and specialised (e.g. blue eyes) • Genome -> proteome N.Kasabov, 2003

Bioinformatics • The area of Science that is concerned with the development and applications of methods, tools and systems for storing and processing of biological information to facilitate knowledge discovery. • Interdisciplinary: Information and computer science, Molecular Biology, Biochemistry, Genetics, Physics, Chemistry, Health and Medicine, Mathematics and Statistics, Engineering, Social Sciences. • Biology, Medicine -- Information Science --> IT, Clinics, Pharmacy, I____________________I • Links to Health informatics, Clinical DSS, Pharmaceutical Industry N.Kasabov, 2003

Bioinformatics: challenging problems for computer and information sciences • Discovering patterns (features) from DNA and RNA sequences (e.g. genes, promoters, RBS binding sites, splice junctions) • Analysis of gene expression data and predicting protein abundance • Discovering of gene networks – genes that are co-regulated over time • Protein discovery and protein function analysis • Predicting the development of an organism from its DNA code (?) • Modeling the full development (metabolic processes) of a cell (?) • Implications: health; social,… N.Kasabov, 2003

Problems in Computational Modeling for Bioinformatics • Abundance of genome data, RNA data, protein data and metabolic pathway data is now available (see http://www.ncbi.nlm.nih.gov) and this is just the beginning of computational modeling in Bioinformatics • Complex interactions: • between proteins, genes, DNA code, • between the genome and the environment • much yet to to be discovered • Stability and repetitiveness: Genes arerelativelystable carriers of information. • Many sources of uncertainty: • Alternative splicing • Mutation in genes caused by: ionising radiation (e.g. X-rays); chemical contamination, replication errors, viruses that insert genes into host cells, aging processes, etc. • Mutated genes express differently and cause the production of different proteins • It is extremely difficult to model dynamic, evolving processes N.Kasabov, 2003

Transcription Translation Bioinformatics Important Challenges • Protein Function • Protein 3D Structure • Gene • Predication • Gene Function

Transcription Translation Public Data Base • Protein sequence • KMLSLLMARTYW • DNA • sequence • {A,T,C,G} • Microarray Gene Expression Level

Gene Expression 49

Microarray • What can it be used for? • How does it work? • What are the Advantages? • An Example Application

Bio-Medical Informatics

Bio-Medical Informatics

Presentation Transcript

Medical Informatics:

Medical Informatics

medical informatics

Medical Informatics:

Medical Informatics

Bio-informatics and Ethics

Medical Informatics

Medical Informatics

Medical Informatics:

Medical Informatics

Medical Informatics

Medical Informatics Basics

Medical Informatics

CISC 841 Bio Informatics

Medical informatics

Informatics perspectives in Bio-Informatics

Introduction to Bio-Informatics

Medical Informatics

MEDICAL INFORMATICS

Association for Medical and Bio Informatics Singapore (AMBIS)

Medical Informatics

Medical Informatics: