560 likes | 712 Vues
This chapter delves into the intricacies of strings and arrays in secure programming. It highlights the common challenges and errors associated with array management, such as size determination and out-of-bounds indexing. The text further explores character strings, emphasizing their vulnerabilities due to external inputs and manipulation errors. Mitigation strategies are discussed alongside the safer implementations of string handling functions in C/C++. Notable string vulnerabilities are examined, making this a crucial read for developers focused on enhancing software security and reliability.
E N D
SECURE PROGRAMMING Chapter 2 Strings
Overview • Arrays and their Problems • Character Strings • Common String Manipulation errors • String Vulnerabilities and exploits • Mitigation Strategies • String Handling Functions, the bad and the good • Runtime Protection Strategies • Some Notable Vulnerabilities • Summary
Arrays and their Problems 1) Hard to determine size. 2) Size defaults may not work. 3) Easy to index an array out of bounds. 4) Easy to write non-portable code (non-consistent handling, for example). 5) Size parameters may be wrong (see 3)) 6) Array copying may overflow the array 7) Pointer arithmetic may be incorrect.
Character Strings The problem: Many strings come from outside: • Command line arguments • Environment variables • Console or other input • Text files • Network Connections Strings are not built-in to C/C++, though there is (some) Library support
Character Strings: String Data Type Most people implement a string as a Null terminated array of characters; addressed by a pointer. Have all the problems of arrays magnified because most string manipulation is done through procedures. Five Important terms for arrays: • Bound = size of the array. • Lo = Address of first element of the array • Hi = Address of last element of the array • TooFar = The address of the one-too-far element of the array = Hi + 1 = Lo + Bound • Target size (Tsize) = Bound
Character Strings: String Data Type Two more terms for strings. • Null-terminated if there is a null character within the array. • Length: For null-terminated strings, the number of characters before the (first) null terminator. Problem with determining array size (clear procedure)
Character Strings: String Data Type More problems: What Characters? “Execution Character Set” -locale- setlocale() function Basic execution character set: 26 UC/LC letters, 10 digits 29 graphic characters, space, 33 control characters including HT VT FF Bell BS CR NL, NULL, DEL Execution character set may contain many characters, require multiple bytes to represent a character (multibyte character set); basic character set still present. Locale-specific shift states.
Character Strings: UTF-8 Can represent any character in the Unicode character set, use 1-4 bytes. 0-127, 1 Byte o.w As many 1 bits as the total number of bytes in the sequence, followed by a 0 bit; all succeeding bytes start with 10. Thus: If leading 0, 1 byte: If leading 11, start of multibyte code If leading 10, continuation of multibyte code. (Watch out for vulnerabilities!)
Wide Strings 16 or 32 bit characters Terminated with a null wide character. As is the case with regular strings (with caveats!) • Pointers point to left-most character. • The length is the number of wide characters preceding the null wide character. • The value is the sequence of code values of the contained wide characters, in order.
String Literals Enclosed in double quotes “ Wide string literals prefixed by L String literal tokens are concatenated together. If any of them is prefixed by L, the string is a wide string. Example in text, page 34. Null appended, used to initialize a static array. In C, such a string is modifiable (no 'const' modifier available) but modification is “forbidden”. Watch for declarations of the form: const char s[3] = “abc”; //Not Null terminated string. Use: const char s[] = “abc”
Strings in C++ • Proliferation of string classes. • Standardized (STL) down to • String = typedef for basic_string<char> • Wstring = typedef for basic_string<wchar_t> • Also allows: • null-terminated byte string (NTBS) • NTMBS is an NTBS that contains a sequence of valid multibyte characters and ends in the same shift state it starts.
Strings in C++ (2) basic_string class template specializations are safer than NTBS, but NTBS are required all over the place: • Literals are NTBS • Existing libraries need NTBS or NTMBS string objects are passed by value or reference, while c-strings are passed by pointer. Thank goodness for member function data aka c_str
Character Types Three types: • Plain • Signed • Unsigned May cause compiler warnings if the wrong type is used.
int Some gotcha's: • Getc and friends return an int so that EOF is an authentic -1. • Functions in ctype.h (cctype) like isalpha accept an int because they might be passed the result of a getc or similar. • In C, a character constant has type int, so that sizeof('a') is 4, not 1. In C++ a character constant has type char and its size is 1. Wide character literals have type wchar_t and multicharacter literals have type int.
Unsigned char and wchar_t Unsigned char: all bits handled equally; pure binary. No padding bits, no trap representation, no sign extension, etc. wchar_t: Can be used for natural-language character data. For characters in the basic character set, it does not matter, except for type compatibility issues.
Sizing String headaches Three important numbers: Size = number of bytes allocated to the array (sizeof(a)) Count = number of elements in the array (maybe different from size!) Length = Number of characters before null terminator. Notes: If characters are wide, size may be 2*count or 4*count. (depends on OS) Length MUST be smaller than count. See Program fragments in book, pages 40-41.
Common String Manipulation Errors • Use of gets NONONONONONONONO!!!!!!!!!! • Improperly bounded string copies. Do not use: • strcpy() • strcat() • sprintf() • Watch out for: • Input strings • Environment strings • Parameter strings.... (see programs, pp 42-47)
Common String Manipulation Errors • Sizing strings: • do not use strlen for wide strings; use wcslen • Multiply result by sizeof(wchar_t) Programs, pages 41-42 • Improperly bounded string input: • Do not use: • gets • cin of string with unbounded length • Unbounded string scanf See programs pp 42-43 (the program on page 43 is a typical implementation of gets)
Common String Manipulation Errors • Careless copying and concatenation of strings Program, page 44 • Watch for strcpy, strcat, memcpy, sprint, etc. • Off-by-one errors. (see program, page 47) • Null termination errors (pp 49-49) • String truncation • If you implement them yourself, you may still be in trouble! (page 50)
String Vulnerabilities and Exploits • String Vulnerabilities and Exploits • Where does your data come from? Are you sure? Program on page 51 is bad: • Uses gets • Doesn't even check the exit status of gets
String Vulnerabilities and exploits (see ASM code, pp 56-58) Effect called “Stack Smashing” Example follows (remember the code from IsPasswordOK?)
String Vulnerabilities and exploits This exploit is called “arc injection”
String Vulnerabilities and exploits • Code Injection: • Injection of malicious address and malicious code • Must be acceptable as legitimate input • May not cause abnormal termination • Must result in execution of the malicious code. • IsPasswordOK is vulnerable (page 65) • Exploit with fgets and strcpy on page 66 (unclear; obviously not tested).
String Vulnerabilities and exploits Arc injection aka return-into-libc includes: Branching to an existing function System(), exec(), setuid() are favorites Example of vulnerable code, page 70 Prevents memory-based protection schemes from working.
String Vulnerabilities and exploits Return-Oriented Programming “gadget” = sequence of instructions followed by return. Turing-complete set exists for many architectures, including x86, Solaris libc and there is a compiler. Programs use the stack; values are pushed/popped, return addresses can be skipped for branching. Actually similar to FORTH programming.
Mitigation Strategies Two kinds: Prevent buffer overflows Detect buffer overflows and recover securely Best to do defense in depth and apply both.
Mitigation Strategies Preventing Buffer Overflows: Cert recommends using a consistent plan for managing strings. Three models: • Caller allocates and frees Most likely to prevent memory leaks • Callee allocates, caller frees Ensures sufficient memory is available • Callee allocates and frees (only available in C++) Most secure of the three solutions
Mitigation Strategies Mitigation strategies: Caller allocates and frees: C <string.h> family expanded with c11 functions: strcpy_s strcat_s strncpy_s strncat_s See example 2.5, 2.6, pages 74,75
Mitigation Strategies Callee allocates and frees Biggest problems: DOS attack by exhausting memory Dynamic memory management errors Example 2.7 p 77 FILE *fmemopen , *open_memstream(signature, p78) to do memory “I/O” Example code, page 79 Dynamic allocation disallowed in safety-critical systems
Mitigation Strategies C++ string class pp 80-83
String Handling Functions, the bad and the good gets: replace with fgets or getchar Examples 2.9, 2.10, pp 84-86 … or gets_s Example 2.11, page 87 … or getline() (~= getdelim()) Example 2.12, p88
String Handling Functions, the bad and the good Strcpy() and strcat() Fixes: Allocate required space dynamically Strncpy and strncat are not recommended. Strlcpy() and strlcat() (always null-terminate result) strcpy_s and strcat_s (implementation, page 91) Strdup() (dynamically allocated, requires free(). Summary, pp 92-93
String Handling Functions, the bad and the good strncpy() and strncat() (p 93) See strncpy_s (p 95) and strncat_s (pp 97-98) strndup() (uses dynamic memory allocation) Summary on p 99
String Handling Functions, the bad and the good memcpy() and memmove(): replace by memcpy_s() and memmove_s() respectively Watch out for strlen(). There is an strlen_s, strnlen and strnlen_s, all identical.
Runtime Protection Strategies Detection and recovery Provided via: input validation the compiler and its runtime system (e. g. array bounds checking) Operating system
Runtime Protection StrategiesInput Validation Input data size checking. Object size checking (with ___builtin_object_size()) Use by turning on _FORTIFY_SOURCE=n for n ⩾ 1 (p 104, 105)
Runtime Protection StrategiesThe compiler, runtime system. Visual Studio Compiler-Generated Runtime Checks Turn on with flags: /RTCs turns on checks for: Local variable overflows (including arrays) Use of uninitialized variables Stack pointer corruption Can be tweaked: #pragma runtime_checks(“s”, off/restore) Runtime Bounds Checkers: Libsafe Libverify CRED
Runtime Protection StrategiesThe compiler, runtime system Stack Canaries: StackGuard GCC's Stack-Smashing Protector aka ProPolice -fstack-protector[-all] -wstack-protector C++ .NET stack overrun detection capability /GS recommend adding: #pragma strict_gs_check(on) recommend adding #pragma string_gs_check(on) Recommend compiling with /GS flag and linking with /GS compiledlibraries.
Runtime Protection StrategiesThe Operating System Address space layout randomization Linux (PaX project, 2000) Windows, since Vista MAC OS X since 2007/2011, IOS since 4.3 Nonexecutable Stacks W^X Data Execution Prevention (Microsoft Visual Studio) PaX marked stack as non-executable StackGap
Some Notable Vulnerabilities rlogin – strcpy Kerberos
Summary • Arrays and their Problems • Character Strings • Common String Manipulation errors • String Vulnerabilities and exploits • Mitigation Strategies • String Handling Functions, the bad and the good • Runtime Protection Strategies • Some Notable Vulnerabilities