Annotation Presentation Week 6

Annotation Presentation Week 6 Structure-based Evidence for Function (TIGRfam, Pfam and PDB)

TIGRfams are protein families categorized by functional role

Concept: HMMs A concrete example of an HMM: Consider two friends, Alice and Bob, who live far apart from each other and who talk together daily over the telephone about what they did that day. Bob is only interested in three activities: walking in the park, shopping, and cleaning his apartment. The choice of what to do is determined exclusively by the weather on a given day. Alice has no definite information about the weather where Bob lives, but she knows general trends. Based on what Bob tells her he did each day, Alice tries to guess what the weather must have been like. Alice believes that the weather operates as a discrete Markov chain (system in various states that can change randomly). There are two states, "Rainy" and "Sunny", but she cannot observe them directly, that is, they are hidden from her. On each day, there is a certain chance that Bob will perform one of the following activities, depending on the weather: "walk", "shop", or "clean". Since Bob tells Alice about his activities, those are the observations. The entire system is that of a hidden Markov model (HMM). Alice knows the general weather trends in the area, and what Bob likes to do on average. In other words, the parameters of the HMM are known. HMM: A Hidden Markov Model is a probabilistic model developed from observed sequences of proteins of a known function. The profile HMM is used to score the alignment of the amino acid sequence entered to other proteins base on amino acid identity and position http://en.wikipedia.org/wiki/Hidden_Markov_model

Follow this link from the lab notebook TIGRfams: Haft et al. (2001) Nucleic Acids Research29: 41-43.

Search TIGRFAM database • Change Database to “TIGRFAMS” • Change Scope to GLOBAL • Change E-value cutoff to “0.01” • Enter protein sequence in FASTA format in the box • Click on “Start HMM search” • Then wait… “Click”

RESULTS: Only hits with positive Score & E-value  10-3 should be recorded Score and E-value • Enter the TIGRfam number (format -- TIGRXXXXX) from 'Model' column intoimgACT lab notebook in box for significant TIGRfam hit • Enter TIGRfam name from ‘Description’ column into notebook • NOTE: If full name is cut off in ‘Description’ column, go tohttp://cmr.jcvi.org/cgi-bin/CMR/shared/MakeFrontPages.cgi?page=text_search&crumbs=searches • Enter Score and E-value into Notebook as well “Click”

To obtain full TIGRfam name:

Then what? Full name Complete description

TIGRfam Results in imgACT Notebook

Terms to Know for Pfam • Domain: A structural unit which can be found in multiple protein contexts. e.g., zinc finger, leucine zipper • Family: A collection of related proteins containingthe same domain. e.g., immunoglobulins, CD4, MHC, TCR, etc. • Clan: A collection of multiple protein families. The relationship may be defined by similarity of sequence, structure, or profile-HMM. e.g., ATPase functioning in ETC vs. ATPase functioning in DNA replication.

Click on the link provided in your notebook.

You know the Drill! Enter your FASTA formatamino acid sequence Change E-value to 0.001 “Click”

WAIT…this can sometimes take awhile

RESULTS! Graphic view of domain organization NOTE: Insignificant matches may have valid E-value. . . but this Pfam result is considered insignificant because the length of the alignment is very short & Pfam has detected and flagged this. Notice there may be two types of results based on your designated E-value: Significant and insignificant matches. Only investigate significant matches. If you do not have any significant matches, make a note of this in your notebook by creating a COMMENTS section, entering “No significant hits”. Be sure your search criteria was accurate (e.g., E-value of 0.001)

Investigate SIGNIFICANT matches Click on [Show] to view the “pairwise alignment” for the Pfam match Copy/paste this pair-wise alignment into designated box in your notebook.

How do I interpret the alignment? Top row (#HMM): all capital letters indicate conserved residues in the HMM consensus sequence. Middle row (#MATCH): identical or functionallyconserved (similar) amino acids Bottom row (#SEQ):query sequence aligned toHMM representing the domain/family • Legend for #MATCH • Upper case = identical match (conserved and high frequency) • Lower case = identical match (conserved but low frequency) • + symbol = functionally similar (i.e. aspartic vs. glutamic acid) • Space = no match What is an HMM consensus sequence?

The HMM consensus sequence Right “Click” Pfam link & open in new tab On Pfam family summary page, click on “Alignments”’

The HMM consensus sequence Full: Total number of sequences in database that have been categorized into this Pfam family Seed: Number of sequences within multiple sequence alignment representing architectural variations within a single Pfam family What does this mean?

Architecture Diversity • Domain organization within context of full protein

The HMM consensus sequence Leave default settings and press the [View] button

The HMM consensus sequence A new window will pop up as shown: Click on [Start Jalview] button to view the multiple sequence alignment

The HMM consensus sequence TOO MANY COLORS! How do we read this?!? Another new window will pop up as shown:

The HMM consensus sequence Let’s make the view more manageable by simplifying the colors. . . Select “Percentage Identity” from menu. NOTE: Take the time to browse other color schemes to learn more about your protein.

The HMM consensus sequence This view reveals the amount of conservation in your amino acid sequence.Dark = highest frequency Light = lower frequency Pay special attention to BOTTOM graph: Consensus sequence for protein family Letters show which amino acids occur most frequently at that position. This consensus sequence is used to construct the HMM

What else do I need for my notebook? Pfam name and Pfam number Return to Summary page for Pfam family Pfam number Copy/paste full & abbreviated Pfam name as well as Pfam number into your lab notebook AbbreviatedPfam name FullPfam name

Note: Pay Attention to possible 3D Image • You may see a 3D image when you view your summary. • If you see this image, then this is your first clue that you should expect to have significant hits in the PDB search (next section of this module). • If you don’t see an image, then this suggests no structure has yet been solved for proteins containing the domain identified by Pfam.

What else do I need for my notebook? HMM Logo On Summary page, click on “HMM logo”

What else do I need for my notebook? HMM Logo SAVE this image in .png format and insert into your notebook.

How do we interpret the HMM Logo? HMM Logo: -- Highly conserved amino acids are represented by wide letters -- Amino acids with a high frequency of occurrence in the alignment used to generate the HMM consensus sequence are represented by tall letters

What else do I need for my notebook? Clan name and number Click BROWSE to search for clan information Return to Summary page: Use key words from Pfam family name for clan search

What else do I need for my notebook? Clan name and number Investigate possible clans based on key word search from Pfam family description. To learn more about the clan, click on hyperlink for more clan information.

What else do I need for my notebook? Clan name and number Abbreviated Clan name Clan number Full Clan name NOTE: Not all Pfam families belong to a clan. If no clan is found, enter “None found” in your lab notebook. Tells you which Pfam families belong to this clan. If the Pfam family to which your protein belongs is not in this list, then your protein is NOT a member of this clan.

What else do I need for my notebook? Key functional residues You have THREE key toolsto assist you in identifying theKEY FUNCTIONAL RESIDUESof your protein. Tool #2: HMM Logo Tool #1: Pairwise Alignment Tool #3: Jalview consensus

How do we identify key functional residues? • Tall, wide letter in HMM logo • Capital letter in #MATCH line • Tall bar in graphical depiction of consensus sequence

How do we report key functional residuesin the notebook? Formula: AA(start+HMM#-1) Example: C(47+8-1)= C54 HMM#

SUMMARY: Identifying key functional residues • Use the HMM pair-wise alignment to identify possible key functional residues. 2. Use the HMM Logo and Jalview alignment tools to verify key functional residues. 3. Scan the entireamino acid sequence and recordall key functional residues using proper notation.

Recording results in your Lab Notebook Scroll down

Recording results in your Lab Notebook

REPEAT procedure for all significant Pfam hits 3 hits = 3 notebook entries

PDB • Worldwide depository for three-dimensional structures of large biological molecules, including proteins and nucleic acids • Contains information about structure such as. . . Protein Data Bank • sequence details • atomic coordinates • crystallization conditions • 3-D structure neighbors • derived geometric data • structure factors • 3-D images Berman et al. (2003) Nature Structural Biology 10: 980.

Click on the link provided in your notebook.

Select “Advanced Search”

Select “Sequence (Blast/Fasta)” option Change E-value cut off to 0.001 Click when readyto initiate search Copy/paste your FASTA format protein sequence into query box

Results of PDB Search Scroll down Search hits listed by ascending E-value

Evaluating PDB Results Assess quality of the alignment: Is the E-value less than 10-3?Is a significant proportion of the protein aligned? (Hint: compare alignment length to total length) If so, good hit.  Thumbnail of 3D structure. Click on it to get a high-resolution image for notebook. PDB CODE PDB NAME Citation Alignment and statistics

Recording results in your Lab Notebook NOTE: Revise or add headings and boxes as needed Add to your notebook Scroll down

X You cannot simply copy/paste the entire alignment with correct formatting into your lab notebook…. DELETE THIS SECTION.

Annotation Presentation Week 6

Annotation Presentation Week 6

Presentation Transcript

Week 6

Annotation Presentation Week 4

Semantic Annotation – Week 3

Annotation Presentation Week 3

Unit 91 Week 6 Presentation

Annotation Presentation Week 2

Week-6

REU Program: Week 6 Presentation

Week 6

Week 6

Week 6 Presentation Thursday, Feb 19, 2009

Annotation Presentation Week 2

Annotation Presentation Week 7

Annotation Presentation Week 10

Week 6

Financial Reporting Theory Week 6 Presentation

Week 6 Case Presentation

Week 6

Week 6

Week 6

Week 6

Week 6