Command-line arguments python myprogram.py 10 100 output.txt sys.argv: [ “myprogram.py”, 10, 100, “output.txt” ] First item in sys.argv list: program name, next items are command-line arguments
Long headers in Fasta file >AC121234 Medicago truncatula clone mth2-19o7, WO RKING DRAFT SEQUENCE, 5 unordered pieces. GGTGAAGGATGAGGATTTGCAAAAGACGGCCTTTAGGACACGTTATGGT CATTACGAGTACAAAGTGATGCCTTTCGGTGTTACTAAGGCGCCTGGTG TTTTTATGGAGTACATGAACCG… … • Some applications can’t handle long headers • Python program for “pruning” the headers, leaving just the unique ID..?
Instructions for how to use the function for line in infile: memory efficient! Note: line ends in newline prune.py No newline in first field after splitting the line; use print Newline still there, write the line as it is with no extra newline sys.argv
Before / after pruning >AC121234 Medicago truncatula clone mth2-19o7, WO RKING DRAFT SEQUENCE, 5 unordered pieces. GGTGAAGGATGAGGATTTGCAAAAGACGGCCTTTAGGACACGTTATGGT CATTACGAGTACAAAGTGATGCCTTTCGGTGTTACTAAGGCGCCTGGTG TTTTTATGGAGTACATGAACCG… … >AC121234 GGTGAAGGATGAGGATTTGCAAAAGACGGCCTTTAGGACACGTTATGGT CATTACGAGTACAAAGTGATGCCTTTCGGTGTTACTAAGGCGCCTGGTG TTTTTATGGAGTACATGAACCG… …
More string methods: splitlines, join, replace Parents Music Resource Center Concerning: crude language in much of today’s music Task: implement censorship to remove bad words
Split text in list of lines In each line, replace each bad word with BEEP censorship.py If any words were BEEPed, print line and play one beep per word Join censored lines with newlines and return full text
"In these moments, moments of our lives All the world is ours And this world is so right You and I sharing this time together Sharing the same dream As the time goes by we will find These are the special times Times we'll remember These are the precious times The tender times we'll hold in our hearts forever These are the sweetest times These times together And through it all, one thing will always be true The special times are the times I share with you With each moment, moment passing by We'll make memories that will last all our lives As you and I travel through time together Living this sweet dream And every day we can say .. With each moment, moment pBEEPing by Beeped words: 1 Program tested on two songs. Celine Dion: We find words containing a bad word: not desirable here. See exercise.
Crime Mob : Ol' stankin BEEP (Hoe) Jank BEEP (Hoe) Suck my BEEP you (Hoe) Ol' fat BEEP (Hoe) But aiight! We finna get these lame BEEP niggaz You see a hoe BEEP nigga, call his BEEP out. Aye! Aye! Stomp his BEEP like (Hoe) Ol' lame BEEP (Hoe) I'ma tell you how it is nigga you betta get the BEEP back cause a nigga like me don't give a BEEP A nigga suppose to gon leave yo BEEP choked You sound like a BEEP yo BEEP I'ma hit we don't give a BEEP cause you is a lame One hitter quitter yo BEEP get popped Back the BEEP up 'fore I show you who reala Whats up wit ya BEEP nigga Ol' sucka BEEP, busta BEEP, cryin to yo momma BEEP I'ma keep up drama I'm a muthaBEEPin plum BEEP See you just a dumb BEEP go on wit yo young BEEP Try me like a sucka but I know you just a lame BEEP In my section they glad to see a nigga that don't give a BEEP Stomp you to the floor and tell you get yo pussy BEEP up Pick that nigga BEEP up, tear his lame BEEP up Niggaz representin Ellenwood time to mBEEP up Throwin blows like Johnny Cage, you think you wanna BEEP wit me Do this BEEP like Pastor Troy Uuh Huh I'm outside hoe Take my BEEPin word I ain't got no reason to lie hoe Beeped words: 34
Regular Expressions – Motivation Problem: search suspicious text for any Danish email address: <something>@<something>.dk text1 = "No Danish email here firstname.lastname@example.org *@$@.hls.29! fj3a“ text2 = "But here: email@example.com what a *(.@#$ nice @#*.( el ds“ text3 = "And here perhaps? firstname.lastname@example.org@bogus@dk @.dk a@.dk" - Cumbersome using ordinary string methods.
RegExp solution (to be explained later) Text2 contains this Danish email address: email@example.com
Regular Expressions • Instead of searching for a specific string we can search for a text pattern • A regular expression is a representation of a text pattern • In Python, regular expression processing capabilities provided by module re
Example Simple regular expression: regExp = “football” - matches only the string “football” To search a text for regExp, we can use re.search( regExp, text )
Compiling Regular Expressions re.search( regExp, text ) • Compile regExp to a special SRE_Pattern or RegexObject object • Search for this SRE_Pattern in text • Result is an SRE_Match object If we need to search for regExp several times, it is more efficient to compile it once and for all: compiledRE = re.compile( regExp) 1. Now compiledRE is an SRE_Pattern object compiledRE.search( text ) 2. Use search method in this SRE_Pattern to search text 3. Result is same SRE_Match object
Searching for ‘football’ import re text1 = "Here are the football results: Bosnia - Denmark 0-7" text2 = "We will now give a complete list of python keywords." regularExpression = "football" compiledRE = re.compile( regularExpression) SRE_Match1 = compiledRE.search( text1 ) SRE_Match2 = compiledRE.search( text2 ) if SRE_Match1: print "Text1 contains the substring ‘football’" if SRE_Match2: print "Text2 contains the substring ‘football’" Compile regular expression and get theSRE_Patternobject Use the sameSRE_Patternobject to search both texts and get either anSRE_Matchobject orNoneif the search was unsuccesful Text1 contains the substring 'football'
Building more sophisticated patterns Metacharacters: ?: matches zero or one occurrences of the expression it follows +: matches one or more occurrences of the expression it follows *: matches zero or more occurrences of the expression it follows # search for zero or one t, followed by two a’s: regExp1 = “t?aa“ # search for g followed by one or more c’s followed by one a: regExp1 = “gc+a“ #search for ct followed by zero or more g’s followed by one a: regExp1 = “ctg*a“
metacharacters.py Use the SRE_Pattern objects to search the text and get SRE_Match objects Text contains the regular expression t?aa Text contains the regular expression gc+a Text contains the regular expression ctg*a