1 / 95

COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS

COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS. Jehan-François Pâris jfparis@uh.edu. Module Overview. We will learn how to read, create and modify files Pay special attention to pickled files They are very easy to use!. The file system.

opal
Télécharger la présentation

COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COSC 1306—COMPUTER SCIENCE AND PROGRAMMINGPYTHON FUNCTIONS Jehan-François Pâris jfparis@uh.edu

  2. Module Overview We will learn how to read, create and modify files Pay special attention to pickled files They are very easy to use!

  3. The file system • Provides long term storage of information. • Will store data in stable storage (disk) • Cannot be RAM because: • Dynamic RAM loses its contents when powered off • Static RAMis too expensive • System crashes can corrupt contents of the main memory

  4. Overall organization • Data managed by the file system are grouped in user-defined data sets called files • The file system must provide a mechanism for naming these data • Each file system has its own set of conventions • All modern operating systems use a hierarchical directory structure

  5. Windows solution • Each device and each disk partition is identified by a letter • A: and B: were used by the floppy drives • C: is the first disk partition of the hard drive • If hard drive has no other disk partition,D: denotes the DVD drive • Each device and each disk partition has its own hierarchy of folders

  6. Second diskD: Windows solution Flash driveF: C: Windows Users Program Files

  7. UNIX/LINUX organization • Each device and disk partition has its own directory tree • Disk partitions are glued together through theoperation to form a single tree • Typical user does not know where her files are stored

  8. UNIX/LINUX organization Root partition / Other partition usr The magicmount bin Second partition can be accessed as /usr

  9. Mac OS organization • Similar to Windows • Disk partitions are not merged • Represented by separate icons on the desktop

  10. Accessing a file (I) • Your Python programs are stored in a folder AKA directory • On my home PC it is C:\Users\Jehan-Francois Paris\Documents\Courses\1306\Python • All files in that directory can be directly accessed through their names • "myfile.txt"

  11. Accessing a file (II) • Files in subdirectories can be accessed by specifying first the subdirectory • Windows style: • "test\\sample.txt" • Note the double backslash • Linux/Unix/Mac OS X style: • "test/sample.txt" • Generally works for Windows

  12. Why the double backslash? • The backslash is an escape character in Python • Combines with its successor to represent non-printable characters • ‘\n’ represents a newline • ‘\t’ represents a tab • Must use ‘\\’ to represent a plain backslash

  13. Accessing a file (III) • For other files, must use full pathname • Windows Style: • "C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt"

  14. Accessing file contents • Two step process: • First we open the file • Then we access its contents • Read • Write • When we are done, we close the file.

  15. What happens at open() time? • The system verifies • That you are an authorized user • That you have the right permission • Read permission • Write permission • Execute permission exists but doesn’t apply and returns a file handle /file descriptor

  16. The file handle • Gives the user • Direct access to the file • No directory lookups • Authority to execute the file operations whose permissions have been requested

  17. Python open() • open(name, mode = ‘r’, buffering = -1)where • name is name of file • mode is permission requested • Default is ‘r’ for read only • buffering specifies thebuffer size • Use system default value (code -1)

  18. The modes • Can request • ‘r’ for read-only • ‘w’ for write-only • Always overwrites the file • ‘a’ for append • Writes at the end • ‘r+’ or ‘a+’ for updating (read + write/append)

  19. Examples • f1 = open("myfile.txt") same asf1 = open("myfile.txt", "r") • f2 = open("test\\sample.txt", "r") • f3 = open("test/sample.txt", "r") • f4 = open("C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt")

  20. Reading a file • Three ways: • Global reads • Line by line • Pickled files

  21. Global reads • fh.read() • Returns whole contents of file specified by file handle fh • File contents are stored in a single string that might be very large

  22. Example • f2 = open("test\\sample.txt", "r") bigstring = f2.read()print(bigstring)f2.close() # not required

  23. Output of example • To be or not to be that is the questionNow is the winter of our discontent • Exact contents of file ‘test\sample.txt’

  24. Line-by-line reads • for line in fh : # do not forget the column #anything you wantfh.close() # not required

  25. Example • f3 = open("test/sample.txt", "r") for line in f3 : # do not forget the column print(line)f3.close() # not required

  26. Output • To be or not to be that is the questionNow is the winter of our discontent • With one or more extra blank lines

  27. Why? • Each line ends with an end-of-line marker • print(…)adds an extra end-of-line

  28. Trying to remove blank lines • print('----------------------------------------------------')f5 = open("test/sample.txt", "r") for line in f5 : # do not forget the column print(line[:-1]) # remove last charf5.close() # not requiredprint('-----------------------------------------------------')

  29. The output • ----------------------------------------------------To be or not to be that is the questionNow is the winter of our disconten----------------------------------------------------- • The last line did not end with an EOL!

  30. A smarter solution (I) • Only remove the last character if it is an EOL • if line[-1] == ‘\n’ : print(line[:-1]else print line

  31. A smarter solution (II) • print('----------------------------------------------------')fh = open("test/sample.txt", "r")for line in fh : # do not forget the column if line[-1] == '\n' : print(line[:-1]) # remove last char else : print(line)print('-----------------------------------------------------')fh.close() # not required

  32. It works! • ----------------------------------------------------To be or not to be that is the questionNow is the winter of our discontent-----------------------------------------------------

  33. Making sense of file contents • Most files contain more than one data item per line • COSC 713-743-3350UHPD 713-743-3333 • Must split lines • mystring.split(sepchar)where sepchar is a separation character • returns a list of items

  34. Splitting strings • >>> text = "Four score and seven years ago">>> text.split()['Four', 'score', 'and', 'seven', 'years', 'ago'] • >>>record ="1,'Baker, Andy', 83, 89, 85">>> record.split(',')[' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85'] Not what we wanted!

  35. Example # how2split.py print('----------------------------------------------------') f5 = open("test/sample.txt", "r") for line in f5 : words = line.split() for xxx in words : print(xxx) f5.close() # not required print('-----------------------------------------------------')

  36. Output • ----------------------------------------------------Tobe…ofourdiscontent-----------------------------------------------------

  37. Other separators (I) • Commas • CSV Excel format • Values are separated by commas • Strings are stored without quotes • Unless they contain a comma • “Doe, Jane”, freshman, 90, 90 • Quotes within strings are doubled

  38. Other separators (II) • Tabs( ‘\t’) • Advantages: • Your fields will appear nicely aligned • Spaces, commas, … are not an issue • Disadvantage: • You do not see them • They look like spaces

  39. Why it is important • When you must pick your file format, you should decide how the data inside the file will be used: • People will read them • Other programs will use them • Will be used by people and machines

  40. An exercise • Converting our output to CSV format • Replacing tabs by commas • Easy • Will use string replace function

  41. First attempt • fh_in = open('grades.txt', 'r') # the 'r' is optionalbuffer = fh_in.read()newbuffer = buffer.replace('\t', ',')fh_out = open('grades0.csv', 'w')fh_out.write(newbuffer)fh_in.close()fh_out.close()print('Done!')

  42. The output • Alice 90 90 90 90 90Bob 85 85 85 85 85Carol 75 75 75 75 75 becomes • Alice,90,90,90,90,90Bob,85,85,85,85,85Carol,75,75,75,75,75

  43. Dealing with commas (I) • Work line by line • For each line • split input into fields using TAB as separator • store fields into a list • Alice 90 90 90 90 90becomes[‘Alice’, ’90’, ’90’, ’90’, ’90’, ’90’]

  44. Dealing with commas (II) • Put within double quotes any entry containing one or more commas • Output list entries separated by commas • ['"Baker, Alice"', 90, 90, 90, 90, 90] becomes"Baker, Alice",90,90,90,90,90

  45. Dealing with commas (III) • Our troubles are not over: • Must store somewhere all lines until we are done • Store them in a list

  46. Dealing with double quotes • Before wrapping items with commas with double quotes replace • All double quotes by pairs of double quotes • 'Aguirre, "Lalo" Eduardo'becomes'Aguirre, ""Lalo"" Eduardo'then'"Aguirre, ""Lalo"" Eduardo"'

  47. General organization (I) • linelist = [ ] • for line in file • itemlist = line.split(…) • linestring = '' # empty string • for each item in itemlist • remove any trailing newline • double all double quotes • if item contains comma, wrap • add to linestring

  48. General organization (II) • for line in file • … • for each item in itemlist • double all double quotes • if item contains comma, wrap • add to linestring • append linestring to stringlist

  49. General organization (III) • for line in file • … • remove last comma of linestring • add newline at end of linestring • append linestring to stringlist • for linestring in in stringline • write linestring into output file

  50. The program (I) • # betterconvert2csv.py""" Convert tab-separated file to csv"""fh = open('grades.txt','r') #input filelinelist = [ ] # global data structurefor line in fh : # outer loop itemlist = line.split('\t') # print(str(itemlist)) # just for debugging linestring = '' # start afresh

More Related