1 / 44

LING 408/508: Computational Techniques for Linguists

LING 408/508: Computational Techniques for Linguists. Lecture 14 9/24/2012. Outline. Strings Print function and string formatting Long HW #4. Declaring strings. Strings can be enclosed in single quotes or double quotes >>> s1 = 'spam' >>> s2 = "spam" >>> s5 = '' # empty string

kathie
Télécharger la présentation

LING 408/508: Computational Techniques for Linguists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 408/508: Computational Techniques for Linguists Lecture 14 9/24/2012

  2. Outline • Strings • Print function and string formatting • Long HW #4

  3. Declaring strings • Strings can be enclosed in single quotes or double quotes >>> s1 = 'spam' >>> s2 = "spam" >>> s5 = '' # empty string >>> s6 = "" # empty string • Multi-line strings: triple quotes >>> s3 = '''spam and eggs''' >>> s4 = """spam and eggs"""

  4. Escape sequences: print special characters • Whitespace characters (other than space): \t tab character \n newline • Example: >>> print('help\t\tme\nplease') help me please • Multi-line string contains newline >>> '''spam and eggs''' 'spam\nand eggs'

  5. Escape sequences • Quotes need to be escaped to distinguish from beginning/end of string markers \' single quote \" double quote • Example: >>> 'spa\'m' "spa'm" >>> "spa\"m" 'spa"m'

  6. Escape sequences • However, double quotes may be used without escaping in a single-quoted string, and vice versa >>> s1 = "I ask, \"where is Sarah?\"" >>> s2 = 'I ask, \'where is Sarah?\'' >>> s3 = "I ask, 'how is David?'" >>> s4 = 'I ask, "how is David?"'

  7. Backslash • Since backslash is interpreted as escape, we need to use an escape sequence when we want the backslash character in a string \\ backslash >>> s1 = "hello \napalm" >>> print(s1) hello apalm >>> s2 = "hello \\napalm" >>> print(s2) hello \napalm

  8. Raw strings • A raw string is indicated by a r preceding the string • Raw strings turn off escape mechanism; Python interprets string contents literally • Especially useful for file names • Example: filename1 = 'C:\\mydir\\myfile.txt' filename2 = r'C:\mydir\myfile.txt'

  9. One catch with raw strings • Due to limitations of the tokenizer, raw strings may not have a trailing backslash. • Doesn’t work: >>> dir1 = r'C:\mydir\' SyntaxError: EOL while scanning string literal >>> • Must use double slash or omit final slash: >>> dir2 = 'C:\\mydir\\' >>> dir3 = r'C:\mydir'

  10. Indexing and slicing strings,just like lists >>> s = 'python' >>> s[3] 'h' >>> s[:3] 'pyt' >>> s[3:] 'hon' >>> s[2:4] 'th' >>> s[2:-2] 'th'

  11. Strings are immutable(can’t be modified) >>> s = 'python' >>> s[0] = 'x' Traceback (most recent call last): File "<pyshell#25>", line 1, in <module> mystring[0] = 'x' TypeError: 'str' object does not support item assignment >>> L = [1,2,3,4] # but lists are mutable >>> L[0] = 5 >>> L [5, 2, 3, 4]

  12. Create new strings >>> s1 = 'python' >>> s2 = 'big ' + s1 >>> s2 'big python' >>> s3 = s2[:4] + 'ball' + s2[4:] >>> s3 'big ball python'

  13. Looping over strings >>> s1 = 'python' >>> for c in s1: print(c) p y t h o n

  14. List comprehension over strings >>> L = ['chicken', 'pot', 'pie'] >>> L2 = [s[::-1] for s in L] >>> L2 ['nekcihc', 'top', 'eip']

  15. String operators • Concatenation >>> 'ham' + 'eggs' 'hameggs' • Repetition >>> 'eggs' * 3 'eggseggseggs' • Membership >>> 'cd' in 'abcde' True • Logical operators >>> 'b' < 'a' False

  16. Built-in functions >>> str() # string constructor '' >>> str('hello') 'hello' >>> str([1,2,3,4,5]) '[1, 2, 3, 4, 5]' >>> str((1, 'bye', 3.14)) "(1, 'bye', 3.14)"

  17. Built-in functions >>> len('spam') 4 • Type conversion through type constructors: useful for reading data from files (convert string to numeric types) >>> int('356') 356 >>> float('3.56') 3.5600000000000001 >>> str(356) '356'

  18. Built-in functions >>> a = input() # read a string from the user hello >>> a 'hello' >>> b = input() [3,4,5] >>> b '[3,4,5]' >>> c = eval(input()) # eval converts type 3 >>> c # result is an integer 3 >>> d = eval(input()) [3,4,5] >>> d [3, 4, 5]

  19. Many string methods >>> dir(str) ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

  20. String methods >>> s = 'how_are_you' >>> s.split('_') ['how', 'are', 'you'] >>> s = 'how are\tyou\n' >>> s.split() # splits on whitespace ['how', 'are', 'you'] >>> ''.join(['how', 'are', 'you']) 'howareyou' >>> '_'.join(['how', 'are', 'you']) 'how_are_you'

  21. String methods >>> s = ' how are you\t' >>> s.strip() 'how are you' >>> s # doesn’t modify the string ' how are you\t' >>> s.lstrip() 'how are you\t' >>> s.rstrip() ' how are you'

  22. String methods >>> s = 'good morning' >>> s.startswith('goo') True >>> s.endswith('ning') True >>> s[:3]=='goo' True >>> s[-4:]=='ning' True

  23. String methods >>> s = 'how are you\n' >>> s.upper() # shows return value 'HOW ARE YOU\n' >>> s # string was not modified 'how are you\n' >>> s = s.upper() # modify it >>> s 'HOW ARE YOU\n' >>> s.lower() 'how are you\n' >>> s.islower() # returns boolean False >>> 'HELLO'.isupper() True

  24. count and replace >>> s = 'ab1cd1ef1gh' >>> s.count('1') 3 >>> s.replace('1', ' ') 'ab cd ef gh' >>> s 'ab1cd1ef1gh'

  25. find and index >>> s = 'howareyou' >>> s.find('are') 3 >>> s.index('are') 3 >>> s.find('no') # not in 'howareyou' -1 >>> s.index('no') Traceback (most recent call last): File "<pyshell#17>", line 1, in <module> s.index('no') ValueError: substring not found

  26. find and index >>> help(s.find) Help on built-in function find: find(...) S.find(sub [,start [,end]]) -> int Return the lowest index in S where substring sub is found, such that sub is contained within s[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 on failure. >>> help(s.index) Help on built-in function index: index(...) S.index(sub [,start [,end]]) -> int Like S.find() but raise ValueError when the substring is not found.

  27. rfind >>> s = 'ab.cdef.ghi' >>> s.find('.') 2 >>> s.rfind('.') 7

  28. isalpha, isalnum, isdigit >>> str.isalpha('abcde') # str is name of type True >>> str.isalpha('abcde1') False >>> str.isalpha('abcde+++') False >>> str.isalnum('abcde12345') True >>> str.isdigit('abcde12345') False >>> str.isdigit('12345') True

  29. Useful string constants (in module string, which is rather obsolete) >>> import string # not same as str! >>> dir(string) ['Formatter', 'Template', '_TemplateMetaclass', '__builtins__', '__cached__', '__doc__', '__file__', '__name__', '__package__', '_multimap', '_re', '_string', 'ascii_letters', 'ascii_lowercase', 'ascii_uppercase', 'capwords', 'digits', 'hexdigits', 'octdigits', 'printable', 'punctuation', 'whitespace']>>> string.ascii_letters 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' >>> string.digits '0123456789' >>> string.punctuation '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

  30. Outline • Strings • Print function and string formatting • Long HW #4

  31. String formatting • New style • format method on strings • Show you this first • Old style • Based on C printf function, also in C++, Java, etc. • Much code that you’ll see uses this style • Show you this afterwards

  32. print function >>> print(5) 5 >>> print(3, 5, 7) 3 5 7 >>> print(3, 5, 7, sep='->') # default: sep=' ' 3->5->7 >>> for i in range(3): # default: end='\n' print(i, end=',') 0,1,2,

  33. format method >>> s = '{} is fun' >>> s '{} is fun' >>> s.format('recursion') 'recursion is fun' >>> s2 = s.format('recursion') >>> s2 'recursion is fun' >>> print(s.format('recursion')) 'recursion is fun'

  34. format method >>> s = 'Sarah' >>> d = 'David' >>> print('{0} is neurotic'.format(d)) David is neurotic >>> print('{} is also neurotic'.format(s)) Sarah is also neurotic >>> print('{0} is wife of {1}'.format(s, d)) Sarah is wife of David >>> print('{} is wife of {}'.format(s, d)) Sarah is wife of David >>> print('{1} is husband of {0}'.format(s, d)) David is husband of Sarah

  35. Codes for different types >>> s = 'Sarah' >>> d = 'David' >>> # s: string >>> print('{0:s}\'s car broke down'.format(d)) David's car broke down >>> print('{0}\'s car broke down'.format(s)) Sarah's car broke down >>> print('{0:d} is a crowd'.format(3)) # d: integer 3 is a crowd >>> print('{0:f} is a crowd'.format(3)) # f: float 3.000000 is a crowd

  36. Floating-point precision >>> # default: 6 decimals for a float >>> print('{0:f} is tasty'.format(3.14159)) 3.141590 is tasty >>> # specify 3 decimal places >>> print('{0:.2f} is tasty'.format(3.14159)) 3.14 is tasty >>> # rounds, instead of truncating >>> print('{0:.4f} is tasty'.format(3.14159)) 3.1416 is tasty >>> s = '{0:s}'.format(3.14159) >>> print(s[:s.rfind('.')+4]) # 3.14159

  37. Justification and padding >>> print('---{:6d}---'.format(12)) # default --- 12--- >>> print('---{:06d}---'.format(12)) # pad with zero ---000012--- >>> print('---{:>6d}---'.format(12)) # right justify --- 12--- >>> print('---{:<6d}---'.format(12)) # left justify ---12 --- >>> print('---{:^6d}---'.format(12)) # center --- 12 ---

  38. Different defaults for justification >>> print('---{:6d}---'.format(5)) --- 5--- >>> print('---{:6s}---'.format('hi')) ---hi ---

  39. Print justified tables >>> X = ['wee', 'longer', 'very-long'] >>> Y = [1, 2, 33] >>> >>> for i in range(len(X)): s = '{:>15s}\t\t{:<d}'.format(X[i], Y[i]) print(s) wee 1 longer 2 very-long 33

  40. Old-style string formatting can be used >>> print('%d is an integer' % 3) 3 is an integer >>> print('%d and %d are integers' % (3, 4)) 3 and 4 are integers >>> print('pi is %.3f, I think' % 3.14159) pi is 3.142, I think >>> print('%s had a little lamb.' % 'Mary') Mary had a little lamb. >>> print('It tasted good.') It tasted good.

  41. Justification in old-style string formatting >>> print('---%6d---' % 12) # right justify --- 12--- >>> # DIFF: default right-just for strings >>> print('---%6s---' % 'Mary') # DIFF --- Mary--- >>> # DIFFERENT: syntax for left justify >>> print('---%-6d---' % 12) ---12 --- >>> print('---%06.2f---' % 1.234) # pad withzero ---001.23---

  42. Outline • Strings • Print function and string formatting • Long HW #4

  43. Due Wednesday 10/3 • Link for data will be e-mailed to you

  44. Example of concordancing softwarehttp://www.filebuzz.com/software_screenshot/full/concordance-51987.gif

More Related