
flow charts make science less hard
Importing honey bee data into lightning 3 Click here to get started
Once the information reloads based on your settings, click on your organism - Kingdom - Group - Subgroup
species selection
species selection assembly
species selection assembly
Checkpoint: data retrieved • Now it’s time for you to decide what you want to do with your data set. • You may want to • Alter its format (unzip, %GC, etc.) • Send it to a supercomputer
Compressed data Uncompressed Lightning 3
From here the file is recursively sent to the directory with the -r command: scp-r ~/Desktop/Primary_Assemblytut_user2@lightning3.its.iastate.edu:/data003/GIFTEACH/BCB660/foldername This command should be thought of as three commands into one: • scp –r ~/Desktop/Primary_Assembly • what you want to move • tut_user2@lightning3.its.iastate.edu • where you want to move it • :/data003/GIFTEACH/BCB660/foldername • where to go once it gets there You’re now ready to log into lightning3
lightning3 address password change directory to folder name containing Primary assembly change directory to Primary_Assembly/ change directory to placed_scaffolds Type this long command This last command takes everything with gz in the name and decompresses each file
Unzipping file at lightning3 • Permission may be denied, if so enter : • This should grant permission to each file chmod –R 777 Primary_Assembly/ re-enter long command
We want to takeallthe individual scaffold files and put them into 1 file.Run the GC program on 1 file instead of 16 files • cat*.fa*>ApisMellifera_4.5.fasta • Here's the breakdown of this command • cat- concatenate, so take all these folders • * - wild card, around key letters • .fa– key letters >- sends command to a file • ApisMellifera_4.5.fasta - this is the name of our file
to convert this to GC content • To convert this to GC content • ./percentGCApisMellifera_4.5fasta • the './percentGC' is a program turning ApisMellifera_4.5fasta into a table format.
Open this file into R • > honeyBeeGC← read.table(“ApisMellifera_4.5gc”) • > ls(honeyBeeGC) • This should read • [1] “V1” “V2” “V3” • If you want a histogram • > hist(honeyBeeGC$V2), breaks=seq(0,100, len=1000))
use join if have a common field • use cat to glue fast afiles together in order cat file1 file2 file3 > redirectedOutput transliterating httyp://stackoverflow.com/questions
list of species • unzipping gzips • current process to generate our .gc format file • percent GC Abdf.fasta >gctemp • seqlen.awkAbr.fasta > sitemp_tabs
cut out first two fields of gctemp outputs to temp file • cut – f1-2 gctemp_tabs