Midterm Project

Midterm Project

Database Schema • GeneIDTable • Information about “gene” and corresponding “protein” • gene_id, gene_name, gene_seq, protein_id, protein_name, protein_seq, gene_type • gene_id – primary key (type varchar(255)) • gene_type type varchar(255) • All other entries are of type longtext

Database Schema • GeneFuncTable • Information about “gene functions” • gene_id, gene_fun, comment • gene_id – foreign key • All entries are of type longtext

Database Schema • ProteinFuncTable • Information about “protein functions” • protein_id, protein_fun, comment • All entries are of type longtext

Database Schema • PathwayFuncTable • Information about “pathway functions” • pathway_id, pathway_name, pathway_fun, pathway_loc, comment All entries are of type longtext

Database Schema • PathwayTable • Information about “gene pathway association” • gene_id, pathway_id • gene_id type varchar(255) • pathway_id type longtext

Database Schema • BiologicalProcessTable • Gene Ontology related table • Information about “biological processes” of a particular gene • gene_id, GO_num, biological_process • gene_id – foreign key (type varchar(255)) • All other entries are of type longtext

Database Schema • CellularComponentTable • Gene Ontology related table • Information about “cellular component” • gene_id, GO_num, cellular_component • gene_id – foreign key (type varchar(255)) • All other entries are of type longtext

Database Schema • MolecularFunctionTable • Gene Ontology related table • Information about “molecular functions” • gene_id, GO_num, molecular_function • gene_id – foreign key (type varchar(255)) • All entries are of type longtext

Steps to Follow – Step 1 • Get the RefSeq Accession Number of your species from the NCBI Genome database • e.g. NC_000913 for Escherichia Coli K12

Steps to Follow – Step 2 • Downloading files needed using the NCBI ftp site (ftp://ftp.ncbi.nlm.nih.gov) • genomes/Bacteria/[species name]/[RefSeq #].gbk (main information for genes and proteins and GO functions) • e.g. genomes/Bacteria/Escherichia_coli_k12/NC_000913.gbk • genomes/Bacteria/[species name]/[RefSeq #].ffn (gene sequence) • e.g. genomes/Bacteria/Escherichia_coli_k12/NC_000913.ffn

Steps to Follow – Step 3 • Go to KEGG selected organisms (http://www.genome.jp/kegg/catalog/org_list.html) • Find your species and click the second column of the species (e.g. eco for E Coli) • Go to “pathway maps” to get pathway information to put into the PathwayFunc table

Steps to Follow – Step 4 • Use eutils function of NCBI Entrez to get the file that contains gene pathway association (http://eutils.ncbi.nlm.nih.gov/entrez/eutils/) • Use esearch to search your species in the gene database http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=database&term=query&usehistory=y • Use efetch to fetch the result file • http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=database&WebEnv=WebEnvString&query_key=key

Steps to Follow – Step 5 • Edit .gbk file to remove the beginning and the end part • Parse the .gbk and the .ffn file to fill all the tables except the PathwayFunc table and Pathway table • Link to the sample parser file • Parse.java

Steps to Follow – Step 6 • Parse the eutils resulting file to get the gene pathway association • Link to the sample parsePath file • ParsePath.java

Database Name Format • Example species Escherichia Coli K12 • Species name: Escherichia_Coli_K12 • Database name: escherichia_coli_k12

Sample Output File • outputFile.txt (output file after parsing .gbk and .ffn files) • outputPath.txt (output file after parsing gene pathway association file) • PathwayFunc.txt (output file after analyzing KEGG pathways)

To Find the Number of Genes • Search your species in NCBI gene database • e.g. Escherichia Coli K12 [orgn] • Check the number of genes in your result with this number

Submit your project (the 3 output files, the parsers if any changes) to: • vgummulu@cise.ufl.edu • Any questions: • yizhang@cise.ufl.edu • anupamd@ufl.edu

Midterm Project

Midterm Project

Presentation Transcript

World History Midterm Review Project

Principles of Collegiate Success Midterm Project

Midterm Evaluations of Teaching Pilot Project

Stratus project Midterm Presentation

midterm

Midterm Project (Bioinformatics)

DREAMING PROJECT Midterm Workshop

midterm project

Midterm

Project 1 grading & midterm review

Global Midterm Project

Midterm Project Guide

Midterm

Midterm Project

Midterm Project Report

Midterm Project

ICS 465 Midterm Project

Midterm Project

Midterm/Final Presentation Project Name

Devry ECON 312 Midterm Project Latest

MIDTERM

Midterm Project

Midterm Project

Midterm Project

Presentation Transcript

World History Midterm Review Project

Principles of Collegiate Success Midterm Project

Midterm Evaluations of Teaching Pilot Project

Stratus project Midterm Presentation

midterm

Midterm Project (Bioinformatics)

DREAMING PROJECT Midterm Workshop

midterm project

Midterm

Project 1 grading &amp; midterm review

Global Midterm Project

Midterm Project Guide

Midterm

Midterm Project

Midterm Project Report

Midterm Project

ICS 465 Midterm Project

Midterm Project

Midterm/Final Presentation Project Name

Devry ECON 312 Midterm Project Latest

MIDTERM

Midterm Project

Project 1 grading & midterm review