Bioinformatics and Protein Database Concepts

Surabhi Agarwal Bioinformatics and Protein Database Concepts With the emergence of high-throughput techniques for generation of protein sequences, ‏computational tools are required for storing, sharing, analyzing and updating this data. Databases and its associated features provide tools for accomplishing meaningful storage of biological data.

Master Layout: Part 1 1 This animation consists of 2 parts: Part 1: From wet lab to Bioinformatics Part 2: Database concepts and Protein databases Extract protein, purify and cleave it into smaller peptides. 2 Protein extract 3 Mass Spectrometry Edman degradation 4 Protein sequences determined and stored in databases for future usage MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEP ERNECFLQHKDDNP Protein Sequence 5 Re-draw all images Reference: Biochemistry by Stryer et al., 5th edition

Definitions of the components:Part 1 – From wet lab to bioinformatics 1 Protein: Protein is a bio-molecule made out of chains of amino acid residues. These chains are formed between amino-acids by eliminating a water molecule and forming a “peptide bond”. Proteins are involved in performing the structural, functional and regulatory functions of the cell. Peptide: Small protein fragments which are formed by a stretch of around 50 amino-acids are called peptides Amino acid sequence: The order of amino acids and their linear arrangement is known as amino-acid sequence. It is also known as the primary structure of the protein. Edman degradation: This is a chemical method for sequencing amino acid residues in a protein or a peptide. The N-terminal residue is labelled using phenyl isothiocyanate and then cleaved from the remaining peptide chain without disrupting any of the other peptide bonds. This labelled amino acid is then detected and the procedure is repeated to identify each N-terminal amino acid sequentially. Mass spectrometry:A technique for production and detection of charged molecular species in vacuum, after their separation by magnetic and electric fields based on mass to charge (m/z) ratio. 2 3 4 5

Step 1: Protein Extraction 1 Break open the cells Protein source (usually a cultured tissue or microbial extract) 2 Re-suspend the extract in lysis buffer 3 Centrifugation Supernatant containing proteins is isolated CENTRIFUGE Crude Extract 4 Action Description of the action Audio Narration As shown in animation Redraw all the figures. Animator has to re-draw the figure titled “CENTRIFUGE” with all the labeling, as it has been taken from a web-resource. On ‘protein source’ show the zoom in effect focusing on purple molecule. Show the arrow that leads to breaking the molecule. Add a spin effect on the crude extract to depict centrifugation. Remove the supernatant (orange liquid in last figure) . The cells present in the tissue culture are lysed open thereby releasing crude extract. This extract is centrifuged to separate the protein mixture from the cell debris. The supernatant obtained is made up of a mixture of proteins having a variety of properties. Protein of interest must then be isolated from this mixture. 5 http://3.bp.blogspot.com/_xW3FQUQ2DYI/Rp4DF1r_0HI/AAAAAAAAAhY/B5MzdxVSV6I/s400/centrifugation.png Biochemistry by Stryer et al., 5th edition

Step 1: Protein Extraction 1 Proteins are purified using various techniques such as 2 Solution containing purified protein extract. Proteins are cleaved into smaller peptides using proteases. Chromatography 3 Electrophoresis Action Description of the action Audio Narration 4 As shown in animation This slide is in continuation with the previous slide. Show the arrow from first figure to the two techniques. Then show converging arrows to the last figure The protein of interest is separated from the protein mixture present in the supernatant. This is carried out by suitable techniques such as chromatography or electrophoresis which make use of various properties of the proteins such as their charge, mass etc for separation. 5 Biochemistry by Stryer et al., 5th edition Biochemistry by A.L.Lehninger et al., 3rd edition

Step 2: Edman Degradation 1 Peptide to be sequenced: Ala-Gly-Asp-Phe-Arg-Gly First round 2 3 4 Action Description of the action Audio Narration Edman degradation employs pheny isothiocyanate reagent, which reacts with the amino terminal residue of the peptide giving rise to phenyl thiocarbamoyl derivative of the amino-acid reside. In mild acidic conditions, this cyclic derivative of the amino acid is released in the form of a PTH-amino acid, which can then be identified by chromatographic techniques. The procedure is then repeated to identify each N-terminal amino acid sequentially. Breakdown of Molecule Re-draw all images. Both sides depict the same process. Left side is the schematic and right side is the same process at molecular level. Show the steps of both processes in a parallel fashion 5 Biochemistry by Stryer et al., 5th editiond edition

Step 3: Mass Spectrometry 1 Vacuum Envelope Detection Ionization Mass Analyzer (filtering) Mass Analyzer Ion Source Ion Detector 2 Sort Ions by Mass (m/z) Forms ions (charged molecules) Detects ions 3 Data Processing Data System Mass Spectrum Sample Inlet Relative Abundance 4 2000 1000 Action Description of the action Audio Narration m/z From 1st figure show an arrow leading to figure 2 “Ion Source”. From their Arrow leads to “Mass Analyzer” followed by “Ion Detector”. Enclose all figures n a box titled “Vacuum Envelop”. From there on, arrow leads to the “Data System” and then to “Data Processing” The mass spectrometer is an instrument that produces charged molecular species in vacuum, separates them by means of electric and magnetic fields and measures the mass-to-charge ratios and relative abundances of the ions thus produced. A tandem mass spectrometer makes use of a combination of two mass analyzers, separated by a collision cell, in order to provide improved resolution of the fragment ions. The first mass analyzer usually operates in a scanning mode in order to select only a particular peptide ion which is further fragmented and resolved in the second analyzer. This can be used for protein sequencing studies. Experimental Process as shown in animation 5

Master Layout: Part 2 1 This animation consists of 2 parts: Part 1: From wet lab to Bioinformatics Part 2: Database concepts and Protein databases Based on the type of the data and its prospected usage, design a database schema. 2 3 Provide software and analysis toolsto access this data 4 5 Re-draw all images. Reference: Biochemistry by Stryer et al., 5th edition

Definitions of the components:Part 2 – Database concepts and Protein databases 1 Type of data: The type of data stored in Biological Databases can be of various types such as Pure Sequences, Sequences with structure, meta-data about the source of the sequence, experimental detail, etc. Prospected Usage: The databases are primarily used to store all the information in a single web-based resource. It also provide analysis tools for various sequence analysis functions such as pair-wise sequence alignment, multiple sequence alignment, homology modelling, etc Database schema: The design of the database at various levels is called a database schema. It includes the attributes of all individual tables and the relationships between them. The schema is defined at three levels, namely, “Physical”, “Logical” and “View”. Primary Database: In biological database studies, primary databases store only the protein sequence information. 2 3 4 5

Definitions of the components:Part 2 – Database concepts and Protein databases 1 Secondary Database: In biological database studies, secondary databases refer to the repository of domains and patterns that occur within a sequence. This information can be stored in the form of signature patterns, fingerprints, etc. Structure Database: In biological database studies, structural database store the three-dimensional geometry of the protein. It stores the atomic coordinates of individual atoms in the protein molecule and other geometrical parameters along with sequence information. Analysis tools: Analysis tools are the software tools that are available on most of the web-based database sites. These tools help in conducting further studies and analysis on protein sequences such as alignment, phylogenetic predictions, etc. Meta data: Meta-data is the information about the data that is getting documented in an database. It covers various features such as the source of data, methods for retrieval, etc. 2 3 4 5

Step 1: A generic protein DB: Types of data 1 • Source organism • Scientific name and common name • Taxonomy • Organelle • Amino-acid sequence • Location • Length of the sequence • Molecular type and classification • Accession and version, Gene ID • Keywords an Feature table • Patterns and Domains 2 Sequence Source Gene Reference 3 • Source gene • Corresponding mRNA • Corresponding Coding Sequence (CDS) • Author • Title • Journal • Cross references • Comments Action Description of the action Audio Narration 4 Categories of data Animate the sub-parts according to the order given in this animation, i.e. “Sequence” followed by its descriptive blue box). Similarly for “Source”, “Reference” and “Gene”. Re-draw all images All data related to a protein can be divided into four broad categories namely sequence details, Source, Gene details and References. “Sequence” details contain the features of a protein’s amino acid sequence such as the length, location, patterns and identifiers of the protein sequence. The “source” contains information based on the biological source used for retrieving the protein. “Gene” contains details of the gene from which the proteins is being expressed. “Reference” contains the details of the research publication in which the study was reported. 5 http://www.ncbi.nlm.nih.gov/ http://expasy.org/ http://www.pdb.org/pdb/home/home.do http://www.ddbj.nig.ac.jp/

Step 2: A Generic Protein DB Schema 1 LOGICAL VIEW PHYSICAL 2 Describes which type of data will be stored in which particular table and the relationships between these tables. Describes the user interface of the database and the view that will be shown to the user. Describes the physical location of storage of the data within a database. 3 4 Action Description of the action Audio Narration Database designing is done at various levels such as Physical, Logical and View. At the physical level, we define the purpose of the database which is in accordance with the prospected usage. At the logical level, we define the tables, attributes of the tables and relationship between tables . Logical level is the most complex and important schema for databases and requires a thorough understanding of the data and its contexts and relationships. At the View level we define the views and appearance of the database Defines the various Database schemata Show the three boxes in as the first step while the narrator speaks the first line of audio narration “Database …. and View”. In the next step of animation, show the text of each box 5

Step 3: Protein Database characteristics 1 • SWISS-PROT • UNI-PROT • NCBI PRIMARY/SEQUENCE DATABASE ANALYSIS TOOLS • BLAST • FASTA • Multiple Sequence Alignment • Structure Prediction • Functional annotation • Search engine • Pattern and Domain alignment /search 2 • Prosite • ProDom • Pfam DERIVED/SECONDARY DATABASE TYPE TOOLS 3 • PDB • Proteopedia • Biological Structural Database from EBI STRUCTURAL DATABASE Action Description of the action Audio Narration 4 Defines the various Database schemata Show the central round figure followed by the 3 types of DB on the left and their examples. In the end, show the Analysis tools and its examples A typical biological database can be characterized by its “Type” and its “Tools”. The “Type” defines the category of data that it includes, such as sequence, domains or structure. This implies that the particular database’s most prominent feature includes either sequences, domains or structure and it will primarily be used for their analysis. The analysis tools defines the platforms that the site will provide for gaining an insight into the protein data. 5 http://www.ncbi.nlm.nih.gov/, http://expasy.org/, http://www.pdb.org/pdb/home/home.do, http://www.ddbj.nig.ac.jp/, http://www.ebi.ac.uk/Databases/structure.html http://www.uniprot.org/, http://expasy.org/prosite/, http://prodom.prabi.fr/prodom/current/html/home.php, http://pfam.sanger.ac.uk/, http://www.proteopedia.org/wiki/index.php/Main_Page

Step 4: Database input formats 1 PROTEIN DATABASE Enter your Query term SEARCH DATABASES 2 Serum albumin P01009 MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPF EDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEP ERNECFLQHKDDNP Acute phase OR blood coagulation OR Protease inhibitor 9606[NCBI] UNIQUE ID MOLECULE NAME AMINO-ACID SEQUENCE KEYWORD LITERATURE GENE TAXONOMY Full-length cDNA libraries and normalization SERPINA1 3 Enter the amino-acid sequence of the protein to be analyzed. Enter the key-word to identify this protein. Enter the name of the molecule to be searched. Ex- protein, peptide, gene related to the protein, etc. Enter the name of the gene that codes for the protein or other gene related information Enter the literature related information like the name of the journal, citation or title of the research paper. SEARCH Enter the unique identification number for the protein. These IDs vary according to database such as accession number, GeneID, ODB ID, etc. Enter taxonomic identifiers. 4 Action Description of the action Audio Narration Follow the steps as shown in the animation. DO NOT animate the yellow box. As the animated cursor goes to “Unique ID” narrator will read the text that comes in the yellow box displayed along with “Unique ID” entry. This will be followed by an example entry in the white box. Similarly, “Molecular Name” will be followed by its corresponding narration in yellow box, and so on. For extracting the protein information from a database, users can give a variety of input terms. These can be: Unique ID: <Read the text in the yellow box in each case> Molecular Name Amino-acid sequence Keyword Literature Gene Taxonomy Shows the general functions in a database 5

Step 5: Database output formats 1 CITATIONS PATTERN ANALYSIS 2 MOLECULAR DESCRIPTION SOURCE ORGANISM DETAILS PROTEIN DATABASE Enter your Query term SEARCH DATABASES SECONDARY STRUCTURAL DETAILS ANNNOTATIONS 3 SEARCH EXPERIMENT DETAILS IDs OF ENTRIES IN RELATED DATABASE GENE NAMES AND DESCRIPTION 4 Action Description of the action Audio Narration Once the user submits the query, the output can be of multiple formats. The generalized information that users can obtain from protein databases is the protein’s General Description of the protein molecule Annotations of the protein Name and description of the gene that transcribes them ID of the same protein in other relevant databases Details of the experiment conducted for characterizing proteins Details of the Protein’s secondary structure Details of the organism which was used as a source for obtaining the protein Citations of research conducted for obtaining this protein Patterns occurring within a sequence and their analysis Co-ordinate the animation with the audio narration. For Example, in animation mode, the first step is to display “Molecular Description”. This display must have the first point of audio narration spoken along with it. Show the outputs tab as and when it is narrated Output from database 5

Step 6: Database Analysis Tools 1 OUTPUT INPUT 2 ANALYSIS TOOLS MAPWMHLLTVLALLALWGPNSVQAYSSQHLCGSNLVEALYMTCGRSGFYRPHDRRELEDLQVEQAELGLEAGGLQPSALEMILQKRGIVDQCCNNICTFNQLQNYCNVP Identify physico-chemical properties such as chemical formula, half-life, iso-electric point, molecular weight, etc. Aligned sequences and structures Identify protein from sequence Synonyms and Scientific terminology of proteins Variable and conserved residues Predicted Secondary and Tertiary Structures 3 4 Action Description of the action Audio Narration This slide shows the different kinds of analysis that can be conducted on a given protein sequence. The query can be the protein name, sequence or any other identifier of the protein. In this example, we provide the protein sequence as Input. Once the query protein sequence is entered into the Analysis tool, it can give various kinds of results such as Identify protein from sequence Identify physico-chemical properties such as chemical formula, half-life, iso-electric point, molecular weight, etc. Aligned sequences and structures Variable and conserved residues Predicted Secondary and Tertiary Structures Synonyms and Scientific terminology of proteins Input Output Slide Display the panel in the left. In first step the input appears, followed y the arrow embossed with letters “Analysis Tools”. The output panel appears thereafter, with each output appearing one after the other. At display of each output, the narrator to read aloud the text written 5

Step 1: Case study: To study the characteristics of human serum albumin 1 2 PHYSICO-CHEMICAL PROPERTIES DOMAIN ANALYSIS STRUCTURAL ANALYSIS OBTAIN FASTA SEQUENCE 3 View Full Animation 4 Action Description of the action Audio Narration We explain the usage of Protein databases using the example of “Human Serum Albumin” protein. If you want to view a specific step in the case study, click on the relevant panel. Else click on “View Full Animation” Slides with Options to chose a step or view fll case study Display the 4 panels in the animation. These 4 steps are in sequence, but the user must be given an option to directly go to the specific step if they want to. In the bottom, give a link to view full case study 5 http://www.pdb.org/pdb/home/home.do

Step 1.a : Obtain FASTA Sequence– SWISS PROT 1 2 3 Serum Albumin 4 Action Description of the action Audio Narration Open a web browser and go tohttp://expasy.org/sprot/. On the top right corner of the page, there will be a search box. Click on the downlink ahead of the “Search” box (indicated by the arrow). We get a list of options for the databases to search from. Select UniProtKB. Type the name of the protein of your choice (Ex -Serum Albumin ) in the text box in front of the word “for” Retrieving data All the screen shots taken from the web-site needs to be remade by the animator to simulate the web based environment . None of the images should be a part of the web database. Follow the steps as shown in the animated flowchart 5 http://expasy.org/sprot/

Step 1.b : Obtain FASTA Sequence– SWISS PROT 1 2 3 Action Description of the action Audio Narration 4 The results page for the search shows 179 hits for our query. It is shown on the top of the page. The first 25 of them are shown on the first page, which can be viewed by scrolling down the page. Click on the entry of your choice. Here we click on the human Albumin hit (ALBU_HUMAN) Re-make all the screen shots. Follow the steps as shown in the animated flowchart Retrieving data 5 http://expasy.org/sprot/

Step 1.c : Obtain FASTA Sequence– SWISS PROT 1 2 3 Place for headings. Scroll down to find the word “Sequences” in this position Action Description of the action Audio Narration 4 The first image is displayed parallel to the narration “The top… like this”. When the arrow appears read the second line of narration “Search for…the page”. The second panel of images in this slide goes parallel to narration “Click on tab…new tab”. In the last panel The top of the result page looks like this. Search for the heading “Sequences”, by scrolling down the page. Click on the tab FASTA next to the sequence of your interest. The FASTA sequence opens on a new tab. Save this FASTA sequence in your computer. Retrieving data 5 http://expasy.org/sprot/

Step 1.d : Analysis Tools 1 ProtParam HeliQuest 2 Radar SAPS 3 Three to One ColorSeq 4 Action Description of the action Audio Narration Show the chart with the color coded division for types of tools as shown in figure. Highlight the “Primary Structural Analysis” and follow it up by the display of all the tabs on the right. Highlight the first tool “ProtParam” Types of Tools Once the FASTA sequence is retreived, we can subject it to variety of Protein Analysis toools which are broadly classified into “Sequence Similarity search tools”, “Primary structural analysis tools”, “Phylogenetic Analysis tools”, “Molecular Modeling and Visualisation Tools” and “Structure Prediction tools”. Here we explore the web based service called ProtParam which belongs to “Primary Structural Analysis tools”. For exploring other such services, users can visit http://expasy.org/sprot/ 5 http://expasy.org/sprot/

Step 2.a : Physico-chemical Properties– SWISS PROT 1 Enter the accession number OR paste the sequence here 2 Delete the first line (descriptive line) from your FASTA sequence, such that only the amino –acid sequence is there Click on Compute Parameters 3 Action Description of the action Audio Narration 4 Tool Input Re-make all the screen shots. Follow the steps as shown in the animated flowchart The front-end for the tool will ask you to input the accession ID of the protein under study OR the sequence of that protein. Delete the first line (descriptive line) from your FASTA sequence, such that only the amino acid sequence is there. Click on “Compute Parameters”. On the results page, scroll down to find the various physico-chemical parameters of this protein 5 http://expasy.org/tools/protparam.html /

Step 2.c : Physico-chemical Properties– SWISS PROT 1 CSV stands for “Comma Separated Values. Files with .csv extension, can be easily accessed in Plain text as well as spreadsheet formats 2 3 Action Description of the action Audio Narration 4 This part of the results gives the percentage of each amino acid in the sequence. The highlighted region indicates the CSV file link. CSV stands for “Comma Separated Values”. which can be opened from text as well as spread sheet formats. This file can be downloaded in its comma separated format, by clicking on it. CSV files can also be opened with Microsoft Excel Tool Output Re-make all the screen shots. Follow the steps as shown in the animated flowchart. When the user clicks on the green highlighted tab, the definition must be read aloud alongwith the written display of the definition in a separate box as shown in the slide animation 5 http://expasy.org/tools/protparam.html

Step 2.d : Physico-chemical Properties– SWISS PROT 1 Formula represents the chemical formula for the query molecule Represents the Number of atoms present in the molecule This shows the charge states of the amino acid residues within the protein molecule Half – Life describes the time required for the protein to degrade to half of its original mass 2 Defines the solubility of the proteins. Hydrophobic molecules exhibit a Positive GRAVY value while hydrophilic molecules show a negative GRAVY value 3 Action Description of the action Audio Narration 4 Other information that can be obtained from these databases include chemical formula for the protein, total number of atoms present in the protein, total number of negatively and positively charged residues, estimated half-life of the protein, i.e. the time in which the protein will degrade to half its original mass and the average hydropathicity which gives an insight into the solubility of the proteins. Hydrophobic molecules exhibit a Positive GRAVY value while hydrophilic molecules show a negative GRAVY value Tool Output Re-make all the screen shots. Follow the steps as shown in the animated flowchart. When the user clicks on the green highlighted tab, the difinition must be read aloud alongwith the written display of the definition in a separate box as shown in the slide animation 5 http://expasy.org/tools/protparam.html

Step 3.a : Domain Analysis– PROSITE 1 2 3 Action Description of the action Audio Narration 4 Re-Draw all screen shots. Display the sequence and then minimize it to fit into the input window of the web based tool. Show the clicking effect on the button named “Scan” Go to http://expasy.org/prosite/ .Input the FASTA sequence obtained in previous steps into the input box of the server. Click on Scan. Tool Input 5 http://expasy.org/prosite/

Step 3.b : Domain Analysis– PROSITE 1 HITS BY PROFILE HIT 1 2 HIT 2 3 HIGHEST SCORE HIT 3 Action Description of the action Audio Narration 4 Re-Draw all screen shots. Show the 3 results and then emphasize on the score of thee 2nd hits as it is the highest. Display clicking effect on 2nd hit The results page shows the various profiles that have the highest probability of occurrence on the basis of which they are assigned scores. You should select the hit with the highest score Tool Output 5 http://expasy.org/prosite/

Step 3.c : Domain Analysis– PROSITE 1 Location of Albumin Domain in the sequence – amino acid position 210-402 2 POSITION OF THE PATTERN MATCHED FOR IDENTIFYING DOMAIN CONSERVED CYSTEINE INVOLVED IN DISULPHIDE BOND PROSITE figure of the albumin domain 3 Structure of an albumin domain Action Description of the action Audio Narration 4 Re-Draw all screen shots. Type the name of the query in the search box. Click on Go. Follow it up by an arrow and the output image The result displays the position of the Albumin domain highlighted in the sequence from position 210-402. It also displays a graphical view in form of a downloadable png image where the Profile hits are represented as colored shapes with their PROSITE name. It then displays the structure of the Albumin Domain highlighting the di-sulhphide bonding cysteine residues as “C” and and its signature pattern as “*” Tool Output 5 http://expasy.org/prosite/

Step 4.a : Structural Analysis– RCSB PDB 1 Summary Biology and Chemistry Geometry 2 Classification: Transport Protein Structure Weight: 133377.93 Molecule: Serum albumin Polymer: 1 Type: polypeptide(L) Length: 585 Chains: A, B Molecular Description Related PDB entries 3 Ligand chemical components Derived data Action Description of the action Audio Narration 4 Once the user enters “Serum Albumin” in the PDB search box, in the output page of the selected PDB entry, we find the following tabs. The horizontal tabs summarize the entire result page. The vertical tabs occur as the initial description in the first page. Each of these tabs can be explored in detail. The structural analysis of the protein can display a wide range of properties such as the description of the protein molecule including classification of the protein, the chains it contains, number of amino acids, etc. Re-Draw the tabs. The first panel of tabs is horizontal one. Out of them “Summary” tab is active in this slide. That’s is the tab in white is active. Under “Summary” there are 4 more tabs which are vertical. Out of them the blue tab is Active. Slide 4.a to 4.d shows the vertical tabs active one by one. Slide 4.e. to 4.g, shows the remaining two horizontal tabs active. followed one while reading the audio narration of each slide, with the display it carries Tool Output Display slide 5 http://www.pdb.org/pdb/home/home.do

Step 4.b : Structural Analysis– RCSB PDB 1 Summary Biology and Chemistry Geometry 2 Molecular Description 1AO6 Crystal structure of human serum albumin 1BM0 Crystal structure of human serum albumin 1E7E Human serum albumin complexed with decanoic acid 2BXC Human serum albumin complexed with phenylbutazone 2BXF Human serum albumin complexed with diazepam 2BXN Human serum albumin complexed with myristate and iodipamide Related PDB entries Ligand chemical components 3 Derived data Action Description of the action Audio Narration 4 Re-Draw the tabs. The first panel of tabs is horizontal one. Out of them “Summary” tab is active in this slide. That’s is the tab in white is active. Under “Summary” there are 4 more tabs which are vertical. Out of them the blue tab is Active. Slide 4.a to 4.d shows the vertical tabs active one by one. Slide 4.e. to 4.g, shows the remaining two horizontal tabs active. followed one while reading the audio narration of each slide, with the display it carries The display also shows entries that are closely related to the user’s query, such as in the case of the same protein characterized from a different organism. Tool Output Display slide 5 http://www.pdb.org/pdb/home/home.do

Step 4.c : Structural Analysis– RCSB PDB 1 Summary Biology and Chemistry Geometry 2 Molecular Description Identifier LQZ Name 2-(diethylamino)-N-(2,6-dimethylphenyl)ethanamide Formula C14 H22 N2 O Interaction View Ligand Explorer Related PDB entries Ligand chemical components 3 Derived data Action Description of the action Audio Narration 4 Re-Draw the tabs. The first panel of tabs is horizontal one. Out of them “Summary” tab is active in this slide. That’s is the tab in white is active. Under “Summary” there are 4 more tabs which are vertical. Out of them the blue tab is Active. Slide 4.a to 4.d shows the vertical tabs active one by one. Slide 4.e. to 4.g, shows the remaining two horizontal tabs active. followed one while reading the audio narration of each slide, with the display it carries The protein molecules are generally structurally characterized by attaching it with a ligand and determining its structure from experimental techniques. The description of these ligands is given in the result summary of the query protein Tool Output Display slide 5 http://www.pdb.org/pdb/home/home.do

Step 4.d : Structural Analysis– RCSB PDB 1 Summary Biology and Chemistry Geometry 2 Molecular Description Related PDB entries Ligand chemical components 3 Derived data Action Description of the action Audio Narration 4 Re-Draw the tabs. The first panel of tabs is horizontal one. Out of them “Summary” tab is active in this slide. That’s is the tab in white is active. Under “Summary” there are 4 more tabs which are vertical. Out of them the blue tab is Active. Slide 4.a to 4.d shows the vertical tabs active one by one. Slide 4.e. to 4.g, shows the remaining two horizontal tabs active. followed one while reading the audio narration of each slide, with the display it carries Result summary displays derived data for the Serum Albumin such as the molecular and biological functions that the protein is involved in. Tool Output Display slide 5 http://www.pdb.org/pdb/home/home.do

Step 4.e : Structural Analysis– RCSB PDB 1 Summary Biology and Chemistry Geometry 2 3 Action Description of the action Audio Narration 4 Re-Draw the tabs. The first panel of tabs is horizontal one. Out of them “Biology and Chemistry” tab is active in this slide The Biological aspect of Serum Albumin are also displayed as results. The unique feature of this tab is that it gives a complete list of Single Nucleotide Polymorphisms (SNP) in the protein sequence. This shows the change in amino acids as well as the locations of the SNPs and the SNP Ids. Tool Output Display slide 5 http://www.pdb.org/pdb/home/home.do

Step 4.f : Structural Analysis– RCSB PDB 1 The length of the covalent bonds between two adjacent atoms in a protein molecule The angle formed by 3 consecutive atoms in native conformation of a protein Summary Biology and Chemistry Geometry 2 The angle formed by 2 consecutive planes of 4 linearly bonded atoms 3 Action Description of the action Audio Narration 4 Re-Draw the tabs. The first panel of tabs is horizontal one. Out of them “Geometry” tab is active in this slide The 3-D visualization of Serum Albumin is given as a part of the results which can be viewed from a tool called Jmol. Along with the image analysis from Jmol, users can also study and download the structural characteristics of the protein such as its Bond Length along with the place and frequency of its occurrence. Structural results also summarize the Bond Angle and the Dihedral Angles including the chain where they occur and the frequency of its occurrence. Tool Output Display slide 5 http://www.pdb.org/pdb/home/home.do

Interactivity option 1:Step No 1: To find the sequence corresponding to the beta chain of insulin and compare their lengths in different organisms 1 Check the names of the source organism 4 Sort the file according to sequence lengths 6 Store the sequence ID, source organism and length of the sequence in a separate text file 5 2 Click on the entry corresponding to beta chains 3 Chose and open a primary sequence database of your choice 2 Input the term serum albumin in the search box 1 3 4 Results Interacativity Type Options Boundary/limits Remove the step number from the bottom of the tab. Show all the steps in the mixed order. The user must click on the tabs order wise. If the user clicks at a tab which is not in the right order, then flash a message saying “try again” All the tabs must be arranged in right order. The numbers mentioned indicate the correct order. Arrange the steps in the order to be performed. 5

Questionnaire 1 1. Which of the following is a Protein Sequence Database? Answers: a)Swiss-Protb)PDB c) CSD d)‏ GEO 2. Which server should be used for identifying Protein Domains? Answers: a)NCBI b)DDBJ c) PROSITE d)‏ All 3. Which reagent is used for Edman Degradation? Answers: a)Dabsyl Chloride b)Ninhydrin c) Phenyl iso-thiocyanate d)‏ Cyanogen Bromide 4. Which amongst the following can be used for retrieving proteins from a database Answers: a)Protein Name b) Corresponding Gene Name c) Unique Identifier d) All 2 3 4 5

Questionnaire 1 5. Which one is NOT a step in sequence identification using Mass spectroscopy Answers: a) Labelling terminal residue b) Electro-spray ionization c) Peptide fragmentation d)‏ calculating m/z ratio 6. Which one is NOT a derived protein Database Answers: a) Prosite b) Pfam c) Swiss-prot d) ‏ProDom 7. Answers: The most complex and important database schema is a) Physical b) Logical c) View d)‏ All 2 3 4 5

Links for further reading Reference websites: http://www.proteopedia.org/wiki/index.php/Main_Page http://www.pdb.org/pdb/home/home.do http://www.ncbi.nlm.nih.gov http://expasy.org/sprot/ http://prodom.prabi.fr/prodom/current/html/home.php http://expasy.org/prosite/ http://pfam.sanger.ac.uk

Links for further reading Books: Biochemistry by Stryer et al., 5th edition Biochemistry by A.L.Lehninger et al., 3rd edition Database System Concepts by Korth et al., 5th edition

Bioinformatics and Protein Database Concepts

Bioinformatics and Protein Database Concepts

Presentation Transcript

DATABASE CONCEPTS

DATABASE CONCEPTS

File and Database Concepts

Proteomics and Protein Bioinformatics: Functional Analysis of Protein Sequences

Visualizing Protein Structures and Structural Bioinformatics

Basics of Protein Bioinformatics and Structural Bioinformatics

Bioinformatics and Protein Sequence Analysis

Database Concepts

Database Concepts

Database Concepts

Bioinformatics and Protein Structural Analysis

PROTEIN DATABASE

Protein bioinformatics and systems biology

DATABASE CONCEPTS

Database Concepts

Protein Database

File and Database Concepts

Bioinformatics of Protein Structure

CAP5510 – Bioinformatics Protein Structures

DATABASE CONCEPTS

File and Database Concepts