parsing and validating text input n.
Skip this Video
Loading SlideShow in 5 Seconds..
Parsing and Validating Text Input PowerPoint Presentation
Download Presentation
Parsing and Validating Text Input

Parsing and Validating Text Input

117 Vues Download Presentation
Télécharger la présentation

Parsing and Validating Text Input

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. file opening and closing fprintf, fscanf and sscanf fgets and fputs fgetc and putc Parsing a Token Delimited Input Record Example Program using strtok Input Validation Approaches Checking for safe or dangerous input Parsing and Validating Text Input

  2. Except for stdin, stdout and stderr, files have to be opened before reading or writing. fopen() opens a file and returns a filehandle or NULL on error. fclose() closes a filehandle. Example 1: #include <stdio.h> #define IN "master.txt" #define OUT "backup.txt" file opening and closing

  3. Example 1 continued

  4. fscanf() and fprintf() work in the same way as printf and scanf, the difference being the extra first filehandle parameter is used to direct input/output to/from an opened file. fscanf() has no inbuilt protection against buffer overflows, or data in the input file being incompatible with the data wanted. It stops reading a field on encountering whitespace. What if the field contains spaces, tabs or newlines ? fprintf and fscanf

  5. With sscanf() the first parameter is the string it reads and converts. This can be useful if the string has been validated e.g. as a number, and string fields don't contain embedded whitespace. fscanf(stdin, args ... ) is the same as scanf( args ... ) . fprintf(stdout, args ... ) is the same as printf( args ... ) . sscanf(string, args ... ) like fscanf but scans and parses string instead of file. sscanf

  6. fgets() reads a line of data from a file up to and including the newline character: '\n' into a string and then appends the string terminator character: '\0' after the newline. fgets() returns the value NULL (not EOF !) when attempting to read beyond the end of file. fgets() requires: * the name of the string ( or any other pointer giving the address at which it starts), * the maximum number of characters to read - 1 (to leave room for the '\0' end of string marker)‏ * and the filehandle as parameters. fputs() writes a string to a file such that fputs(string,out) is the equivalent of fprintf(out,"%s",string). fgets and fputs

  7. The fgets() function is particularly useful for robustly reading text files organised into records separated using newlines as it contains built in buffer overflow protection. Data can be input using fgets() into a character string, validated to ensure the correct number and types of data items are present and then read from the string into local program variables of the appropriate types using sscanf(). fgets continued

  8. These functions are the file-enabled equivalents of getchar() and putchar(). They are used to read and write single characters from and to files respectively. getc() returns EOF if an attempt is made to read beyond the end of file. c=getc(in); is the equivalent of fscanf(in,"%c",&c); and putc(c,out); is the equivalent of fprintf(out,"%c",c); . fgetc and putc

  9. Copying file one character at a time

  10. Use of the strtok() function in stdlib.h helps make this job a bit easier. The idea is to convert field delimiters into '\0' null characters. Strtok is passed and returns the address of the start of field 1. For fields >= 2 you can either pass it a NULL instead, when it will automatically calculate the address of the start of the next string, unless you choose to calculate and pass the address of the start of the next string and so on. These string addresses can be stored in an array of char pointers for use later. This technique can be used to input fields which include unknown numbers of space and tab (\t) characters. Parsing a Token Delimited Record

  11. strtok() modifies the string it parses, by replacing field delimiters with '\0' NULL byte characters. If this is a problem, clone the string first using strcpy() and then parse the clone. E.G. char *clone; /* using malloc() to avoid buffer overrun */ if( (clone = (char*) malloc(sizeof(char)*(strlen(original)+1)))‏ == NULL ) exit(1); /* error if insufficient memory strcpy(clone,original); /* must remember to free(clone) later */ Warning concerning strtok

  12. strtok example program

  13. Name: Joseph Smith Weight: 64.300000 Age: 25 strtok program output

  14. The static pointer variable value used internally within strtok() won't survive concurrent use in a multi-threaded application. If this is a problem, you can use the re-entrant version strtok_r(), prototype defined in the POSIX.1-2001 standard as follows: char *strtok_r(char *str, const char *delim, char **saveptr); The saveptr has to be passed the address of a pointer variable declared within the caller function, which enables the position within the string being parsed to be remembered between function calls. A thread-safe strtok

  15. Is input likely to be perfect, clumsy or hostile ? Perfect input assumes the person entering data will never use an incorrect key on the keyboard. The program is otherwise allowed to crash. Clumsy input is common for a stand-alone application. An application is fragile and less usable if it crashes e.g. due to casual use of the <enter> key by a user who hasn't read the prompt requesting input data correctly. Hostile input has to be assumed very likely if the application accepts input data from non-authenticated users over the Internet. A standalone application might later become a web-browser plugin. Input Validation Approaches

  16. A buffer overflow occurs when a program writes beyond or outside allocated blocks of memory. Attackers may attempt to write specific data into the executable part of a program, e.g. vectoring execution into inserted code by overwriting a function return address (stack smashing). The allocated block might be an structure or character array, or a block allocated dynamically using malloc(). Many network programs are compromised through buffer overflows. fgets() allows the programmer to specify the maximum buffer size which it will overwrite. Careful programming is needed to ensure access can only be made within allocated memory. Buffer Overflow Protection

  17. A web-based calculator program reads data from an HTML form expected to be in the format: a op b, where a and b are numbers and b is an arithmetic operation e.g. +, -, * and / . A naive programmer has used a Perl or Python eval() function upon this input data and writes the result of the calculation to the web browser. Mr Evil Cracker tests for this possibility with the form input: open("/etc/shadow").read() This results in the output: Traceback (most recent call last): File "", line 1, in ? File "", line 0, in ? IOError: [Errno 13] Permission denied: '/etc/shadow' Hostile input example

  18. This shows that there is some rudimentary security on this system, as the webserver program is not running with the administrator privileges which would allow reading the shadow password file. Mr Evil Cracker hasn't got a crackable form of the password hash file yet, but he now knows that he can run any Python code on the target system with the permissions of the webserver program. As this allows him to create and execute other program files on this server, all he now needs is to find a local privilege escalation exploit, rather than a remote one. His chances of running a program giving him full control of this server are now much greater. Hostile input example 2

  19. The problem with checking for dangerous input is that crackers will know things about your system that you don't. Therefore you don't really know what might be dangerous and what isn't so you can't easily check for specifically dangerous data. However, safe input is within the range of input values which you have designed your program to handle. If the required data is in the form of a string, what are the maximum and minimum string lengths, and what characters should be allowed in a string e.g. to input someones name or address ? You should reject anything not in your allowed designed and tested range of values, sending a suitable error message to the user so that input mistakes can be corrected. Check safe or dangerous input ?

  20. If you want to input numbers, you need to ensure that the data string input can be safely converted to a number. You want to consider what the range of acceptable numbers suitable for input to your program should be: Minimum and maximum values, avoiding numeric overflows. Whether integer or floating point. What the maximum acceptable string length is for each input. Numbers should always be input as strings and then validated and converted. Checking safe input numbers