1 / 37

Working with files

Working with files. CISC3130, Spring 2013 X. Zhang. Outlines. Finish up with awk: pipeline, external commands Commands working with files tree, ls (-d option, -1 option, -R, -a) od (octal dump), stat (show meta data of file) touch command, temporary file, file with random bytes

bridie
Télécharger la présentation

Working with files

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Working with files CISC3130, Spring 2013 X. Zhang

  2. Outlines • Finish up with awk: pipeline, external commands • Commands working with files • tree, ls (-d option, -1 option, -R, -a) • od (octal dump), stat (show meta data of file) • touch command, temporary file, file with random bytes • File checksum, verification • locate, type, which, find command: Finding files

  3. Some useful tips • Bash stores the commands history • Use UP/DOWN arrow to browse them • Use “history” to show past commands • Repeat a previous command • !<command_no> • e.g., !239 • “!<any prefix of previous command> • E.g., !g++ • Search for a command • Type Ctrl-r, and then a string • Bash will search previous commands for a match • File name autocompletion: “tab” key

  4. Output redirection: to pipeline #!/bin/awk -f BEGIN { FS = ":“ ## generate a temporay file "mktemp /tmp/prog.XXXXXXXX" | getline tmpfile print "temp file is: ", tmpfile close ("mktemp") } { # select username for users using bash if ($7 ~ "/bin/bash") print $1 >> tmpfile } END{ while ((getline < tmpfile) > 0) { cmd="mail -s Fellow_BASH_USER " $0 print "Hello," $0 | cmd ## send an email to every bash user } close (tmpfile); } pipe_mail.awk Todo: 1. 2.

  5. Execute external command • Using system function (similar to C/C++) • E.g., system (“rm –f tmp”) to remove a file if (system(“rm –f tmp”)!=0) print “failed to rm tmp” • A shell is started to run the command line passed as argument • Inherit awk program’s standard input/output/error

  6. Outlines • Finish up with awk: pipeline, external commands • Commands working with files • tree, ls (-d option, -1 option, -R, -a) • od (octal dump), stat (show meta data of file), cmp, diff • touch command • temporary file, file with random bytes • locate, type, which, find command: Finding files

  7. What’s in a file ? • files are organized in a hierarchical directory structure • Each file has a name, resides under a directory, is associated with some meta info (permission, owner, timestamps) • Disk files, virtual file system, device files • Contents of disk file: text (ASCII) file (such as your C/C++ source code), executable file (commands), a link to other files, … • ln -s /path/to/file1.txt /path/to/file2.txt • /proc filesystem stores system configuration parameters, resides in kernels memory • Numerical subdirectories exist for every process. • a device file or special file is an interface for a device driver that appears in a file system as if it were an ordinary file • For example, /dev/stdin, /dev/tty*

  8. What’s in a file ? • Recall, ls –l output, first character indicates file types: • d directory, - plain file, b block-type special file, c character-type special file, l symbolic link, s socket • To check type of file: “file filename” • To view “octal dump” of a file: • od [OPTION]... [FILE]... od--traditional [FILE] [[+]OFFSET [[+]LABEL]]   • Important options: • -A: what base to use when displaying address (default: base 8) • -t: specify how to interpret file content • a: named character, c: ASCII character or backslash representation • d[size]: signed decimal, size bytes per integer • o[size], octal ; x[size], hexadecimal

  9. What’s in a file ? • Example of od $echo abc def ghi jkl | od -c 0000000 a b c d e f g h i j k l \n 0000020 [zhang@storm ~]$ echo abc def ghi jkl | od -Ad –c ## same as –t c 0000000 a b c d e f g h i j k l \n 0000016 $ echo abc def ghi jkl | od -Ad -t d1 ## interpret each byte as decimal integer 0000000 97 98 99 32 100 101 102 32 103 104 105 32 106 107 108 10 0000016 $echo abc def ghi jkl | od -Ad -t x1 0000000 61 62 63 20 64 65 66 20 67 68 69 20 6a 6b 6c 0a 0000016

  10. Disk space usage • df  report file system disk space usage df [OPTION]... [FILE]... • Show information about file system on which each FILE resides, or all file systems by default. • du - estimate file space usage du [OPTION]... [FILE]... • Summarize disk usage of each FILE, recursively for directories. • quota - display disk usage and limits

  11. Compare file contents • Compare files • cmp file1 file2: finds the first place where two files differ (in terms of line and character) • diff file1 file2: reports all lines that are different • diff’s output is carefully designed so that it can be used by other programs. For example, revision control systems use diff to manage the differences between successive versions of files under their management. • patch command: apply a diff file to an original patch [options] [originalfile [patchfile]] patch -pnum <patchfile

  12. File checksum • provide a single number, signature, that is characteristic of the file (computed from all of the bytes of the file) • Files with different contents is unlikely to have same checksum • Usage: Software announcements include checksums of distribution files for user to tell whether a copy matches original.

  13. openssl • a cryptography toolkit implementing Secure Sockets Layer and Transport Layer Security network protocols and related cryptography standards • openssl program: a command line tool for using various cryptography functions from shell. • Creation and management of private keys, public keys and parameters • Public key cryptographic operations • Creation of X.509 certificates, CSRs and CRLs • Calculation of Message Digests • Encryption and Decryption with Ciphers • SSL/TLS Client and Server Tests • Handling of S/MIME signed or encrypted mail • Time Stamp requests, generation and verification

  14. Message digest openssl dgst [-md5|-md4|-md2|-sha1|-sha|-mdc2|-ripemd160|-dss1] [-c] [-d] [-hex] [-binary] [-out filename] [-sign filename] [-keyform arg] [-passin arg] [-verify filename] [-prverify filename] [-signature filename] [-hmac key] [file...] Or [md5|md4|md2|sha1|sha|mdc2|ripemd160] [-c] [-d] [file...] • Output message digest of a supplied file or files in hexadecimal form

  15. Example $ md5sum /bin/l? 696a4fa5a98b81b066422a39204ffea4 /bin/ln cd6761364e3350d010c834ce11464779 /bin/lp 351f5eab0baa6eddae391f84d0a6c192 /bin/ls • Output: 32 hexadecimal digits, i.e., 128 bits. • chance of two different files with identical signatures is: 1/2128 (the book: 1/264) • In 2005, researchers were able to create pairs of PostScript documents and X.509 certificates with the same hash. Later that year, MD5's designer Ron Rivest wrote, "md5 and sha1 are both clearly broken (in terms of collision-resistance)."

  16. public-key cryptography • Data security by two related keys: a private key, known only to its owner, and a public key, potentially known to anyone • Examples: RSA, DSA algorithms • Digital signature: Alice => Bob communication • If Alice wants to sign an open letter, she uses her private key to encrypt it. Bob uses Alice’s public key to decrypt signed letter, and can then be confident that only Alice could have signed it, provided that she is trusted not to divulge her private key. • Secrecy: • If Alice wants to send a letter to Bob that only he can read, she encrypts it with Bob’s public key, and he then uses his private key to decrypt it. As long as Bob keeps his private key secret, Alice can be confident that only Bob can read her letter.

  17. Secure Software Distribution • many software archives include digital signatures that incorporate information from a file checksum as well as from signer’s private key. • how to verify such signatures ? $ ls -l coreutils-5.0.tar* ##Show the distribution files -rw-rw-r-- 1 jones devel 6020616 Apr 2 2003 coreutils-5.0.tar.gz -rw-rw-r-- 1 jones devel 65 Apr 2 2003 coreutils-5.0.tar.gz.sig $ gpg coreutils-5.0.tar.gz.sig ##Try to verify the signature gpg: Signature made Wed Apr 2 14:26:58 2003 MST using DSA key ID D333CBA1 gpg: Can't check signature: public key not found

  18. Verify using public key • Obtain public key from public servers • Add the public key to your key ring $ gpg --import temp.key gpg: key D333CBA1: public key "Jim Meyering <jim@meyering.net>" imported gpg: Total number processed: 1 gpg: imported: 1 • Verify the signature successfully: $ gpg coreutils-5.0.tar.gz.sig Verify the digital signature • Online resource: The GNU Privacy Handbook

  19. Outlines • Finish up with awk: pipeline, external commands • Commands working with files • tree, ls and echo (-d option, -1 option, -R, -a) • od (octal dump), stat (show meta data of file), cmp, diff • touch command, mktemp, file with random bytes • File checksum, verification • locate, type, which, find command: Finding files • Process-related commands

  20. touch: update modification time • Touch sometimes used to create empty files: their existence and possibly their timestamps, but not their contents, are significant. • a lock file to indicate that a program is already running, and that a second instance should not be started. • to record a file timestamp for later comparison with other files. • Example: $touch -t 197607040000.00 US-bicentennial $ ls -l US-bicentennial ##List the file -rw-rw-r-- 1 jones devel 0 Jul 4 1976 US-bicentennial $ touch -r US-bicentennial birthday #Copy timestamp to the new birthday file $ ls -l birthday ## List the new file -rw-rw-r-- 1 jones devel 0 Jul 4 1976 birthday

  21. Temporary files • So far, we created in current directory • And remove it after using it • What if multiple scripts use same file name? or malicious users modify the files? • Special directories, /tmp (cleared when system reboots) and /var/tmp • To avoid filename collision, append process id as suffix ## create a temporary file in shell scripts tmpfile=temp.$$ ## $$ (process id) echo $tmpfile

  22. mktemp command • mktemp: takes an optional filename template containing a string of trailing X characters, preferably at least a dozen of them. • mktemp replaces them with an alphanumeric string derived from random numbers and process ID, creates the file with no access for group and other, and prints filename on standard output. $ TMPFILE=`mktemp /tmp/myprog.XXXXXXXXXXXX` || exit 1 Make unique temporary file $ ls -l $TMPFILE List the temporary file -rw------- 1 jones devel 0 Mar 17 07:30 /tmp/myprog.hJmNZbq25727

  23. Random bytes • two random pseudodevices: /dev/random and /dev/urandom. • These devices serve as never-empty streams of random bytes: such a data source is needed in many cryptographic and security applications.

  24. Outlines • Finish up with awk: pipeline, external commands • Commands working with files • tree, ls and echo (-d option, -1 option, -R, -a) • od (octal dump), stat (show meta data of file), cmp, diff • File checksum, verification • touch command • temporary file, file with random bytes • locate, type, which, find command: Finding files

  25. Search for files • locate: find files by name, using regularly updated database constructed by complete scans of the filesystem • locate [OPTION]... PATTERN... $locate cksum • which: display full pathname for a command, using PATH variable $which rm alias rm='rm' /bin/rm • type: shell built-in command, how each name would be interpreted if used as a command name • -t option: report if a name is an alias, shell reserved word, function, builtin, or disk file

  26. find command • find [ files-or-directories ] [ options ]: find files matching specified name patterns, or having given attributes. –atime n: Select files with access times of n days (-ctime, -mtime) –ls: Produce a listing similar to the ls long form, rather than just filenames. –name 'pattern’ : select files matching the shell wildcard pattern (quoted to protect it from shell interpretation). –perm mask: select files matching the specified octal permission mask. –prune: do not descend recursively into directory trees. –size n: select files of size n. –type t: select files of type t,a single letter: d (directory), f (file),or l (symbolic link).

  27. find: basic operations find [ files-or-directories ] [ options ]: • When it finds a file, it first carries out selection restrictions implied by options, and if those tests succeed, it hands the name off to internal action routine. • default action: print name on standard output, • –exec option: provides a command template into which name is substituted, and the command is then executed. files and directories to search (directories are (almost) always descended into recursively) Options: select names for ultimate display or action

  28. find usage examples • find: display all files/directory under current directory • find -ls: display files/directories in “ls” style • find * -prune • find $HOME/. ! -user $USER. • find -ls -type f -fprint /tmp/mytemp $find -ls -type f -fprint /tmp/mytemp 23724924 4 drwxr-xr-x 2 zhang staff 4096 Mar 25 22:40 . 23724925 0 --wx------ 1 zhang staff 0 Mar 25 22:35 ./a 23724927 0 -rw-r--r-- 1 zhang staff 0 Mar 25 22:35 ./b 23724928 4 -rw-r--r-- 1 zhang staff 10 Mar 25 22:40 ./tmp [zhang@storm testfind]$ more /tmp/mytemp ./a ./b ./tmp

  29. find: examples • Files that haven’t been modified in the last year find . -mtime +365 • Unsigned integer: exactly that many days old • Negative: less than that absolute value • Positive: more than that value • Files that user has writing permission find . –perm -200 ## all bits set needs to match • permission mask as an octal string • Unsigned: an exact match on the permissions is required. • Negative: all of the bits set are required to match. • Positive: at least one of the bits set must match, • E.g., +700 //user can read, or write, or execute … • Files that user does not have reading permission • find . ! –perm -400

  30. Find: selector • selector options can be combined: all must match for the action to be taken. • interspersed with the –a (AND) option • –o (OR) option: at least one selector of the surrounding pair must match. • Find nonempty files smaller than 10 blocks (5120 bytes) $ find . -size +0 -a -size -10 • Find files that are empty or unread in the past year $ find . -size 0 -o -atime +365

  31. Usage of find in shell script #!/bin/bash … ## go to top level web site directory find . -name '*.html' -type f | ##Find all HTML files while read file ## Read filename into variable do echo $file ## Print progress mv $file $file.save ## Save a backup copy ##Make the change sed -f $HOME/html2xhtml.sed < $file.save > $file done

  32. html2xhtml.sed • converts HTML to XHTML: converts tags to lowercase, and changes <br> tag into self-closing form, <br/>: s/<H1>/<h1>/g Slash delimiter s/<H2>/<h2>/g s/<H3>/<h3>/g s/<H4>/<h4>/g s/<H5>/<h5>/g s/<H6>/<h6>/g s:</H1>:</h1>:g Colon delimiter, slash in data s:</H2>:</h2>:g .. s:</[Hh][Tt][Mm][LL]>:</html>:g s:</[Hh][Tt][Mm][Ll]>:</html>:g s:<[Bb][Rr]>:<br/>:g HTML to XHTML, standardized XML-based version of HTML

  33. Total file size • $ find -ls | awk '{Sum += $7} END {printf("Total: %.0f bytes\n", Sum)}' • Total: 23079017 bytes

  34. xargs command • Supply the list returned by find as arguments to another command • Via shell’s command substitution feature. E.g., searching for symbol POSIX_OPEN_MAX in system header files: $ grep POSIX_OPEN_MAX /dev/null $(find /usr/include -type f | sort) /usr/include/limits.h: #define _POSIX_OPEN_MAX 16 • Note: why /dev/null here? • Potential problems: command line might exceed system limit => argument list too long error $getconf ARG_MAX ##sysget configuration values 2097152

  35. Xargs command • xargs: takes a list of arguments from standard input, one per line, and feeds them in suitably sized groups (determined by ARG_MAX) to another command given as arguments to xargs. $ find /usr/include -type f | xargs grep POSIX_OPEN_MAX /dev/null /usr/include/bits/posix1_lim.h:#define _POSIX_OPEN_MAX 16 /usr/include/bits/posix1_lim.h:#define _POSIX_FD_SETSIZE _POSIX_OPEN_MAX

  36. Code Studies: filesdirectories

  37. Summary • Finish up with awk: pipeline, external commands • Commands working with files • tree, ls (-d option, -1 option, -R, -a) • od (octal dump), stat (show meta data of file) • touch command, temporary file, file with random bytes • File checksum, verification • locate, type, which, find command: Finding files

More Related