90 likes | 195 Vues
This document presents a practical example of utilizing BLAST analysis on a dataset of 2,869,704 annotated proteins, focusing on 1,506 mapped barley genes. We discuss the results of the BlastX with a runtime of 17.5 hours, yielding 905 annotations. Additionally, we describe the distribution of analyses within the IPK Cluster, offering insights into job parameters and execution frameworks used in our processing pipeline. The findings provide a comprehensive view of gene annotations and their relevance to the barley genome.
E N D
ATGCTG TGGCAG CGTGCA GTCCAG TCTCGT ACTGCAT Ein praktisches Beispiel 2.869.704annotierteProteine 1.506 kartierteGersten-Gene BlastX Ergebnis: 905 Annotation Laufzeit: 17,5 h
IPK Cluster BROCKEN Ergebnis: 905 Annotation 72 Nodes -> Laufzeit: 16 min
CEF GUI CEF SOAP Web Services file server /data/pdw-20/ file server /data/pdw-16/ • Metadata about • Tools (NCBI BLAST, Spidey, …) • Tool parameters (-i FASTA-query, …) • Files (FASTA, blastable, …) • Jobs/sub jobs (progress, finished, …) master/head node pdw-22 … 22 nodes CEF: Cluster Execution Framework #!/bin/bash projdir=/data/pdw-16/agbi/projects/ #split query file python2.3 /data/pdw-20/python_scripts/splitFas2.py -i Clones.fasta -o $projdir -n 500 blast_db=$projdir/wheat_consensus.txt mergescript=$projdir/domerge.sh echo "#!/bin/sh" > $mergescript echo "cat \\" >> $mergescript z=0 for i in split/* do script_file=$projdir/script/blastjob_$$_$z.sh result_file=$projdir/result/blastresult_$$_$z.txt log_file=$projdir/log/joblog_$$_$z echo "#!/bin/sh" > $script_file #echo "cd $projdir" >> $script_file echo "/usr/bin/blastall -i $projdir/$i -p blastn -d $blast_db -m0 -e 1E-10 -v 10 -b 10 -o $result_file" >> $script_file echo "$result_file \\" >> $mergescript qsub -o $log_file.out -e $log_file.err -q long $script_file echo "qsub -o $log_file.out -e $log_file.err -q long $script_file" z=`expr $z + 1` done echo ">final_result.txt" >> $mergescript echo "rm log/* script/* " >> $mergescript
Eingabe EST-Sequenz >HY01A03T GAATTCGGCACCAGAGTGAGCACGCAAGCCAGTGTTTGTAGCCAGCAGCCACAATGGCCGGGAACATGCT AGCCAACTATGTCCAAGTCTACGTCATGCTCCCGCTGGATGTCGTGAGCGTCGACAACAAGTTCGAGAAG GGCGACGAGATCAGGGCGCAGCTGAAGAAGCTGACGGAGGCTGGCGTGGACGGCGTCATGATAGACGTCT GGTGGGGGCTGGTGGAGGGCAAGGGCCCCAAGGCCTACGACTGGAGCGCCTACAAGCAGGTCTTCGACCT GGTGCACGAGGCCAGGCTCAAGCTGCAGGCCATCATGTCGTTCCACCAGTGCGGTGGCAACGTCGGCGAC GTAGTCAACATCCCCATCCCACAGTGGGTGCGGGATGTCGGCGCTACCGACCCCGACATTTTCTACACGA ACCGCAGAGGGACGAGGAACATCGAGTACCTCACCCTTGGAGTGGATGACCAACCTCTCTTCCATGGAAG AACTGCCGTCCAGATGTATCATGATTACATGGCGAGCTTCAGGGAAAACATGAAAAAGTTCTTGGATGCC GGTACCATCGTGGACATTGAAGTGGGACTTGGCCCGGCTGGAGAGATGAGGTACCCATCCTATCCTCAGA GCCAGGGATGGGTCTTCCCAGGCATCGGAGAATTCATCTGCTATGATAAGTACCTGGAAGCAGACTTCAA
>HY01A03T Length = 700 Plus Strand HSPs: Score = 2595 (395.4 bits), Expect = 3.0e-112, P = 3.0e-112 Identities = 573/618 (92%), Positives = 573/618 (92%), Strand = Plus / Plus Query: 77 CTATGTCCAAGTCTACGTCATGCTCCCGCTGGATGTCGTGAGC--GT-CGACAACAAGTT 133 ||| ||| | | || | | | | || || |||| | | || ||| || Sbjct: 89 CTACGTC-ATG-CTCCCGCTGGATGTCG-TGAGCGTCGACAACAAGTTCGAGAAGGGCGA 145 Query: 134 CGAGA--AGGGCGACGAGATCAGGAAGCTGACGGAGGCTGGCGTGGACGGCGTCATGATA 191 ||||| |||||| | || | | ||||||||||||||||||||||||||||||||||||| Sbjct: 146 CGAGATCAGGGCG-C-AGCTGAAGAAGCTGACGGAGGCTGGCGTGGACGGCGTCATGATA 203 Query: 192 GACGTCTGGTGGGGGCTGGTGGAGGGCAAGGGCCCCAAGGCCTACGACTGGAGCGCCTAC 251 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 204 GACGTCTGGTGGGGGCTGGTGGAGGGCAAGGGCCCCAAGGCCTACGACTGGAGCGCCTAC 263 Query: 252 AAGCAGGTCTTCGACCTGGTACACGAGGCCAGGCTCAAGCTGCAGGCCATCATGTCGTTC 311 |||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||| Sbjct: 264 AAGCAGGTCTTCGACCTGGTGCACGAGGCCAGGCTCAAGCTGCAGGCCATCATGTCGTTC 323 Query: 312 CACCCCGTGCGGTGGCAACGTCGGCGACGTAGTCAACATCCCCATCCCACAGTGGGTGCG 371 |||| |||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 324 CACCA-GTGCGGTGGCAACGTCGGCGACGTAGTCAACATCCCCATCCCACAGTGGGTGCG 382 Query: 372 GGATGTCGGCGCTACCGACCCCGACATTTTCCACACGAACCTCAGAGGGACGAGGAACAT 431 ||||||||||||||||||||||||||||||| ||||||||| |||||||||||||||||| Sbjct: 383 GGATGTCGGCGCTACCGACCCCGACATTTTCTACACGAACCGCAGAGGGACGAGGAACAT 442 Query: 432 CGAGTACCTCACCCTTGGAGTGGATGACCAACCTCTCTTCCATGGAAGAACTGCCGTCCA 491 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 443 CGAGTACCTCACCCTTGGAGTGGATGACCAACCTCTCTTCCATGGAAGAACTGCCGTCCA 502 Query: 492 GATGTATCATGATTACATGGCGAGCTTCAGGGAAAACATGAAAAAGTTCTTGGATGCCGG 551 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 503 GATGTATCATGATTACATGGCGAGCTTCAGGGAAAACATGAAAAAGTTCTTGGATGCCGG 562 Query: 552 TACCATCGTGGACA---A-GTGGGACTTGGCCCGGCTGGAGAGATGAGGTACCCATCCTA 607 |||||||||||||| | ||||||||||||||||||||||||||||||||||||||||| Sbjct: 563 TACCATCGTGGACATTGAAGTGGGACTTGGCCCGGCTGGAGAGATGAGGTACCCATCCTA 622 Query: 608 TCCTCAGAGCCAGGGATGGGTCTTCCCAGGCATCGGAGAATTCATCTGCTATGATAAGTA 667 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 623 TCCTCAGAGCCAGGGATGGGTCTTCCCAGGCATCGGAGAATTCATCTGCTATGATAAGTA 682 Query: 668 CCTGGAAGCAGACTTCAA 685 |||||||||||||||||| Sbjct: 683 CCTGGAAGCAGACTTCAA 700 BlastN-Resultat
BlastX-Resultat >dbj|BAC83773.1| Gene info putative beta-amylase [Oryza sativa (japonica cultivar-group)] gb|EAZ40178.1| hypothetical protein OsJ_023661 [Oryza sativa (japonica cultivar-group)] Length=488 Score = 403 bits (1036), Expect = 4e-111 Identities = 191/215 (88%), Positives = 200/215 (93%), Gaps = 0/215 (0%) Frame = +3 Query 54 MAGNMLANYVQVYVMLPLDVVSVDNKFEKGDEIRAQLKKLTEAGVDGVMIDVWWGLVEGK 233 MAGN+LANYVQV VMLPLDVV+VDNKFEK DE RAQLKKLTEAGVDGVM+DVWWGLVEGK Sbjct 1 MAGNLLANYVQVNVMLPLDVVTVDNKFEKVDETRAQLKKLTEAGVDGVMVDVWWGLVEGK 60 Query 234 GPKAYDWSAYKQVFDLVHEARLKLQAIMSFHQCGGNVGDVVNIPIPQWVRDVGATDPDIF 413 GP +YDW AYKQ+F LV EA LKLQAIMSFHQCGGNVGD+VNIPIPQWVRDVGA+DPDIF Sbjct 61 GPGSYDWEAYKQLFRLVQEAGLKLQAIMSFHQCGGNVGDIVNIPIPQWVRDVGASDPDIF 120 Query 414 YTNRRGTRNIEYLTLGVDDQPLFHGRTAVQMYHDYMASFRENMKKFLDAGTIVDIEVGLG 593 YTNR G RNIEYLTLGVDDQPLFHGRTA+QMY DYM SFRENM +FLD G IVDIEVGLG Sbjct 121 YTNRGGARNIEYLTLGVDDQPLFHGRTAIQMYADYMKSFRENMAEFLDTGVIVDIEVGLG 180 Query 594 PAGEMRYPSYPQSQGWVFPGIGEFICYDKYLEADF 698 PAGEMRYPSYPQSQGWVFPGIGEFICYDKYLEADF Sbjct 181 PAGEMRYPSYPQSQGWVFPGIGEFICYDKYLEADF 215