110 likes | 245 Vues
Join us for a follow-up session where we continue our Unix and Perl tutorial. We will start by accessing the Amazon Web Services console, launching your instance, and updating to the new public DNS address. After a quick quiz based on the provided Arabidopsis data, we will tackle questions that involve data extraction using Unix commands. This hands-on session will enhance your skills in managing virtual machines and manipulating biological data files effectively.
E N D
Follow-up Amazon & Unix Konrad Paszkiewicz
9am – 10:30am • Continue with Unix (and/or Perl tutorial) • Open the text file containing your Amazon password from yesterday • Go to https://nescent.signin.aws.amazon.com/console • Start your instance • Remember that your public DNS address for the VM will have changed • Short 10 min challenge at 10:15before coffee
Quiz! • Use files in: ~/tutorial_materials/Data/Arabidopsis • Q1. How many sequences are listed in the file intron_IME_data.fasta? • Q2. How many proteins with names containing the word GTP are in the file At_proteins.fasta ? • Q3. Print every entry on Chromosome 5 in At_genes.gff and sort them bycolumn 3 • Q4. How many types of feature are there in the At_genes.gff file ? (Hint: this is in Column 3 – how many CDS, exon, mRNA etc entries are there?)
Answers • Q1. • grep "^>" intron_IME_data.fasta • 59260 • Q2. • grep –c "^>.*GTP" At_proteins.fasta • 246 • Q3. • grep Chr5 At_genes.gff | sort –k 3 • Q4. - cut -f 3 At_genes.gff | sort | uniq -c
When to use Perl GFF file FASTA file
After coffee • Jose and Dan • QIIME • You will be using a different AMI so please terminate your current instances