440 likes | 586 Vues
LING 408/508: Computational Techniques for Linguists. Lecture 14 9/24/2012. Outline. Strings Print function and string formatting Long HW #4. Declaring strings. Strings can be enclosed in single quotes or double quotes >>> s1 = 'spam' >>> s2 = "spam" >>> s5 = '' # empty string
 
                
                E N D
LING 408/508: Computational Techniques for Linguists Lecture 14 9/24/2012
Outline • Strings • Print function and string formatting • Long HW #4
Declaring strings • Strings can be enclosed in single quotes or double quotes >>> s1 = 'spam' >>> s2 = "spam" >>> s5 = '' # empty string >>> s6 = "" # empty string • Multi-line strings: triple quotes >>> s3 = '''spam and eggs''' >>> s4 = """spam and eggs"""
Escape sequences: print special characters • Whitespace characters (other than space): \t tab character \n newline • Example: >>> print('help\t\tme\nplease') help me please • Multi-line string contains newline >>> '''spam and eggs''' 'spam\nand eggs'
Escape sequences • Quotes need to be escaped to distinguish from beginning/end of string markers \' single quote \" double quote • Example: >>> 'spa\'m' "spa'm" >>> "spa\"m" 'spa"m'
Escape sequences • However, double quotes may be used without escaping in a single-quoted string, and vice versa >>> s1 = "I ask, \"where is Sarah?\"" >>> s2 = 'I ask, \'where is Sarah?\'' >>> s3 = "I ask, 'how is David?'" >>> s4 = 'I ask, "how is David?"'
Backslash • Since backslash is interpreted as escape, we need to use an escape sequence when we want the backslash character in a string \\ backslash >>> s1 = "hello \napalm" >>> print(s1) hello apalm >>> s2 = "hello \\napalm" >>> print(s2) hello \napalm
Raw strings • A raw string is indicated by a r preceding the string • Raw strings turn off escape mechanism; Python interprets string contents literally • Especially useful for file names • Example: filename1 = 'C:\\mydir\\myfile.txt' filename2 = r'C:\mydir\myfile.txt'
One catch with raw strings • Due to limitations of the tokenizer, raw strings may not have a trailing backslash. • Doesn’t work: >>> dir1 = r'C:\mydir\' SyntaxError: EOL while scanning string literal >>> • Must use double slash or omit final slash: >>> dir2 = 'C:\\mydir\\' >>> dir3 = r'C:\mydir'
Indexing and slicing strings,just like lists >>> s = 'python' >>> s[3] 'h' >>> s[:3] 'pyt' >>> s[3:] 'hon' >>> s[2:4] 'th' >>> s[2:-2] 'th'
Strings are immutable(can’t be modified) >>> s = 'python' >>> s[0] = 'x' Traceback (most recent call last): File "<pyshell#25>", line 1, in <module> mystring[0] = 'x' TypeError: 'str' object does not support item assignment >>> L = [1,2,3,4] # but lists are mutable >>> L[0] = 5 >>> L [5, 2, 3, 4]
Create new strings >>> s1 = 'python' >>> s2 = 'big ' + s1 >>> s2 'big python' >>> s3 = s2[:4] + 'ball' + s2[4:] >>> s3 'big ball python'
Looping over strings >>> s1 = 'python' >>> for c in s1: print(c) p y t h o n
List comprehension over strings >>> L = ['chicken', 'pot', 'pie'] >>> L2 = [s[::-1] for s in L] >>> L2 ['nekcihc', 'top', 'eip']
String operators • Concatenation >>> 'ham' + 'eggs' 'hameggs' • Repetition >>> 'eggs' * 3 'eggseggseggs' • Membership >>> 'cd' in 'abcde' True • Logical operators >>> 'b' < 'a' False
Built-in functions >>> str() # string constructor '' >>> str('hello') 'hello' >>> str([1,2,3,4,5]) '[1, 2, 3, 4, 5]' >>> str((1, 'bye', 3.14)) "(1, 'bye', 3.14)"
Built-in functions >>> len('spam') 4 • Type conversion through type constructors: useful for reading data from files (convert string to numeric types) >>> int('356') 356 >>> float('3.56') 3.5600000000000001 >>> str(356) '356'
Built-in functions >>> a = input() # read a string from the user hello >>> a 'hello' >>> b = input() [3,4,5] >>> b '[3,4,5]' >>> c = eval(input()) # eval converts type 3 >>> c # result is an integer 3 >>> d = eval(input()) [3,4,5] >>> d [3, 4, 5]
Many string methods >>> dir(str) ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
String methods >>> s = 'how_are_you' >>> s.split('_') ['how', 'are', 'you'] >>> s = 'how are\tyou\n' >>> s.split() # splits on whitespace ['how', 'are', 'you'] >>> ''.join(['how', 'are', 'you']) 'howareyou' >>> '_'.join(['how', 'are', 'you']) 'how_are_you'
String methods >>> s = ' how are you\t' >>> s.strip() 'how are you' >>> s # doesn’t modify the string ' how are you\t' >>> s.lstrip() 'how are you\t' >>> s.rstrip() ' how are you'
String methods >>> s = 'good morning' >>> s.startswith('goo') True >>> s.endswith('ning') True >>> s[:3]=='goo' True >>> s[-4:]=='ning' True
String methods >>> s = 'how are you\n' >>> s.upper() # shows return value 'HOW ARE YOU\n' >>> s # string was not modified 'how are you\n' >>> s = s.upper() # modify it >>> s 'HOW ARE YOU\n' >>> s.lower() 'how are you\n' >>> s.islower() # returns boolean False >>> 'HELLO'.isupper() True
count and replace >>> s = 'ab1cd1ef1gh' >>> s.count('1') 3 >>> s.replace('1', ' ') 'ab cd ef gh' >>> s 'ab1cd1ef1gh'
find and index >>> s = 'howareyou' >>> s.find('are') 3 >>> s.index('are') 3 >>> s.find('no') # not in 'howareyou' -1 >>> s.index('no') Traceback (most recent call last): File "<pyshell#17>", line 1, in <module> s.index('no') ValueError: substring not found
find and index >>> help(s.find) Help on built-in function find: find(...) S.find(sub [,start [,end]]) -> int Return the lowest index in S where substring sub is found, such that sub is contained within s[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 on failure. >>> help(s.index) Help on built-in function index: index(...) S.index(sub [,start [,end]]) -> int Like S.find() but raise ValueError when the substring is not found.
rfind >>> s = 'ab.cdef.ghi' >>> s.find('.') 2 >>> s.rfind('.') 7
isalpha, isalnum, isdigit >>> str.isalpha('abcde') # str is name of type True >>> str.isalpha('abcde1') False >>> str.isalpha('abcde+++') False >>> str.isalnum('abcde12345') True >>> str.isdigit('abcde12345') False >>> str.isdigit('12345') True
Useful string constants (in module string, which is rather obsolete) >>> import string # not same as str! >>> dir(string) ['Formatter', 'Template', '_TemplateMetaclass', '__builtins__', '__cached__', '__doc__', '__file__', '__name__', '__package__', '_multimap', '_re', '_string', 'ascii_letters', 'ascii_lowercase', 'ascii_uppercase', 'capwords', 'digits', 'hexdigits', 'octdigits', 'printable', 'punctuation', 'whitespace']>>> string.ascii_letters 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' >>> string.digits '0123456789' >>> string.punctuation '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
Outline • Strings • Print function and string formatting • Long HW #4
String formatting • New style • format method on strings • Show you this first • Old style • Based on C printf function, also in C++, Java, etc. • Much code that you’ll see uses this style • Show you this afterwards
print function >>> print(5) 5 >>> print(3, 5, 7) 3 5 7 >>> print(3, 5, 7, sep='->') # default: sep=' ' 3->5->7 >>> for i in range(3): # default: end='\n' print(i, end=',') 0,1,2,
format method >>> s = '{} is fun' >>> s '{} is fun' >>> s.format('recursion') 'recursion is fun' >>> s2 = s.format('recursion') >>> s2 'recursion is fun' >>> print(s.format('recursion')) 'recursion is fun'
format method >>> s = 'Sarah' >>> d = 'David' >>> print('{0} is neurotic'.format(d)) David is neurotic >>> print('{} is also neurotic'.format(s)) Sarah is also neurotic >>> print('{0} is wife of {1}'.format(s, d)) Sarah is wife of David >>> print('{} is wife of {}'.format(s, d)) Sarah is wife of David >>> print('{1} is husband of {0}'.format(s, d)) David is husband of Sarah
Codes for different types >>> s = 'Sarah' >>> d = 'David' >>> # s: string >>> print('{0:s}\'s car broke down'.format(d)) David's car broke down >>> print('{0}\'s car broke down'.format(s)) Sarah's car broke down >>> print('{0:d} is a crowd'.format(3)) # d: integer 3 is a crowd >>> print('{0:f} is a crowd'.format(3)) # f: float 3.000000 is a crowd
Floating-point precision >>> # default: 6 decimals for a float >>> print('{0:f} is tasty'.format(3.14159)) 3.141590 is tasty >>> # specify 3 decimal places >>> print('{0:.2f} is tasty'.format(3.14159)) 3.14 is tasty >>> # rounds, instead of truncating >>> print('{0:.4f} is tasty'.format(3.14159)) 3.1416 is tasty >>> s = '{0:s}'.format(3.14159) >>> print(s[:s.rfind('.')+4]) # 3.14159
Justification and padding >>> print('---{:6d}---'.format(12)) # default --- 12--- >>> print('---{:06d}---'.format(12)) # pad with zero ---000012--- >>> print('---{:>6d}---'.format(12)) # right justify --- 12--- >>> print('---{:<6d}---'.format(12)) # left justify ---12 --- >>> print('---{:^6d}---'.format(12)) # center --- 12 ---
Different defaults for justification >>> print('---{:6d}---'.format(5)) --- 5--- >>> print('---{:6s}---'.format('hi')) ---hi ---
Print justified tables >>> X = ['wee', 'longer', 'very-long'] >>> Y = [1, 2, 33] >>> >>> for i in range(len(X)): s = '{:>15s}\t\t{:<d}'.format(X[i], Y[i]) print(s) wee 1 longer 2 very-long 33
Old-style string formatting can be used >>> print('%d is an integer' % 3) 3 is an integer >>> print('%d and %d are integers' % (3, 4)) 3 and 4 are integers >>> print('pi is %.3f, I think' % 3.14159) pi is 3.142, I think >>> print('%s had a little lamb.' % 'Mary') Mary had a little lamb. >>> print('It tasted good.') It tasted good.
Justification in old-style string formatting >>> print('---%6d---' % 12) # right justify --- 12--- >>> # DIFF: default right-just for strings >>> print('---%6s---' % 'Mary') # DIFF --- Mary--- >>> # DIFFERENT: syntax for left justify >>> print('---%-6d---' % 12) ---12 --- >>> print('---%06.2f---' % 1.234) # pad withzero ---001.23---
Outline • Strings • Print function and string formatting • Long HW #4
Due Wednesday 10/3 • Link for data will be e-mailed to you
Example of concordancing softwarehttp://www.filebuzz.com/software_screenshot/full/concordance-51987.gif