1 / 41

Lecture 22: Low-Level Programming in C

This lecture covers low-level programming in the C programming language, including pointers, pointer arithmetic, and type checking. It also discusses why garbage collection is challenging in C.

adarnell
Télécharger la présentation

Lecture 22: Low-Level Programming in C

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 22: Low-Level Programming in C David Evans http://www.cs.virginia.edu/evans CS201j: Engineering Software University of Virginia Computer Science

  2. Menu • PS5 • C Programming Language • Pointers in C • Pointer Arithmetic • Type checking in C • Why is garbage collection hard in C? CS 201J Fall 2003

  3. PS5 • Will return in section tomorrow • Some very impressive projects! • Will be posted on the course web site soon • Many people demonstrated ability to figure out complicated new things on their own (not a requirement for PS5) • Stapling penalty for PS6 will be 25 points CS 201J Fall 2003

  4. Programming Languages Phylogeny Fortran (1954) Algol (1958) LISP (1957) CPL (1963), U Cambridge Combined Programming Language Scheme (1975) Simula (1967) BCPL (1967), MIT Basic Combined Programming Language B (1969), Bell Labs C (1970), Bell Labs C++ (1983), Bell Labs Objective C Java (1995), Sun CS 201J Fall 2003

  5. C Programming Language • Developed to build Unix operating system • Main design considerations: • Compiler size: needed to run on PDP-11 with 24KB of memory (Algol60 was too big to fit) • Code size: needed to implement the whole OS and applications with little memory • Performance • Portability • Little (if any consideration): • Security, robustness, maintainability CS 201J Fall 2003

  6. C Language • No support for: • Array bounds checking • Null dereferences checking • Data abstraction, subtyping, inheritance • Exceptions • Automatic memory management • Program crashes (or worse) when something bad happens • Lots of syntactically legal programs have undefined behavior CS 201J Fall 2003

  7. Example C Program • In Java: • void test (int x) { • while (x = 1) { • printf (“I’m an imbecile!”); • x = x + 1; • } • } • > javac Test.java • Test.java:21: incompatible types • found : int • required: boolean • while (x = 1) { • ^ • 1 error • void test (int x) { • while (x = 1) { • printf (“I’m an imbecile!”); • x = x + 1; • } • } Weak type checking: In C, there is no boolean type. Any value can be the test expression. x = 1 assigns 1 to x, and has the value 1. I’m an imbecile! I’m an imbecile! I’m an imbecile! I’m an imbecile! I’m an imbecile! I’m an imbecile! CS 201J Fall 2003

  8. Type Checking isn’t Enough… • void test (boolean x) { • while (x = true) { • printf (“I’m an imbecile!”); • x = !x; • } • } CS 201J Fall 2003

  9. LET Fortran (1954) := Algol (1958) CPL (1963), U Cambridge Combined Programming Language := BCPL (1967), MIT Basic Combined Programming Language := = B (1969), Bell Labs = C (1970), Bell Labs = C++ (1983), Bell Labs Java (1995), Sun = CS 201J Fall 2003

  10. = vs. := • Why does Java use = for assignment? • Algol (designed for elegance for presenting algorithms) used := • CPL and BCPL based on Algol, used := • Thompson and Ritchie had a small computer to implement B, saved space by using = instead • C was successor to B (also on small computer) • C++’s main design goal was backwards compatibility with C • Java’s main design goal was surface similarity with C++ CS 201J Fall 2003

  11. C/C++ Bounds NonChecking > g++ -o bounds bounds.cc > bounds cs s is: cs x is: 9 > bounds cs201 s is: cs201 x is: 49 > bounds cs201j s is: cs201j x is: 27185 > bounds aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa s is: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa x is: 1633771873 Segmentation fault (core dumped) # include <iostream.h> int main (void) { int x = 9; char s[4]; cin >> s; cout << "s is: " << s << endl; cout << "x is: " << x << endl; } (User input) CS 201J Fall 2003

  12. So, why would anyone use C today? CS 201J Fall 2003

  13. Reasons to Use C • Legacy Code • Linux, most open source applications are in C • Simple to write compiler • Programming embedded systems, often only have a C compiler • Performance • Typically 50x faster than interpreted Java • Smaller, simpler, lots of experience CS 201J Fall 2003

  14. User-Defined Structure Types • Use struct to group data • Dot (.) operator to access fields of a struct • Fields are accessible everywhere (no way to make them private) typedef struct { char name[10]; int count; } Tally; CS 201J Fall 2003

  15. Abstract Types in C • How can we get most of the benefits of data abstraction in C? Distinguish between client code and implementation code In client code: Check types by name instead of by structure Don’t allow client code to depend on the representation of a type: Make struct fields inaccessible Don’t allow use of C operators CS 201J Fall 2003

  16. Enforcing Abstract Types • Implementation Code • Where datatype is defined (also naming conventions to allow access) • Rep and abstract type are interchangable • Client Code • Everywhere else • ADT is type name only: cannot access fields, use C operators, treat as rep • Only manipulate by passing to procedures CS 201J Fall 2003

  17. What are those arrows really? Heap Stack sb “hello” CS 201J Fall 2003

  18. Pointers • In Java, an object reference is really just an address in memory • But Java doesn’t let programmers manipulate addresses directly (unless they have a hair dryer to break type safety) Heap Stack 0x80496f0 0x80496f4 0x80496f8 hell o\0\0\0 0x80496fb sb 0x80496f8 0x8049700 0x8049704 0x8049708 CS 201J Fall 2003

  19. Pointers in C • Addresses in memory • Programs can manipulate addresses directly &expr Evaluates to the address of the location expr evaluates to *exprEvaluates to the value stored in the address expr evaluates to CS 201J Fall 2003

  20. &*%&@#*! int f (void) { int s = 1; int t = 1; int *ps = &s; int **pps = &ps; int *pt = &t; **pps = 2; pt = ps; *pt = 3; t = s; } s == 1, t == 1 s == 2, t == 1 s == 3, t == 1 s == 3, t == 3 CS 201J Fall 2003

  21. Rvalues and Lvalues What does = really mean? int f (void) { int s = 1; int t = 1; t = s; t = 2; } left side of = is an “lvalue” it evaluates to a location (address)! right side of = is an “rvalue” it evaluates to a value There is an implicit * when a variable is used as an rvalue! CS 201J Fall 2003

  22. Parameter Passing in C • Actual parameters are rvalues void swap (int a, int b) { int tmp = b; b = a; a = tmp; } int main (void) { int i = 3; int j = 4; swap (i, j); … } The value of i (3) is passed, not its location! swap does nothing CS 201J Fall 2003

  23. Parameter Passing in C • Can pass addresses around void swap (int *a, int *b) { int tmp = *b; *b = *a; *a = tmp; } int main (void) { int i = 3; int j = 4; swap (&i, &j); … } The value of &i is passed, which is the address of i CS 201J Fall 2003

  24. Beware! int *value (void) { int i = 3; return &i; } void callme (void) { int x = 35; } int main (void) { int *ip; ip = value (); printf (“*ip == %d\n", *ip); callme (); printf ("*ip == %d\n", *ip); } But it could really be anything! *ip == 3 *ip == 35 CS 201J Fall 2003

  25. Manipulating Addresses char s[6]; s[0] = ‘h’; s[1] = ‘e’; s[2]= ‘l’; s[3] = ‘l’; s[4] = ‘o’; s[5] = ‘\0’; printf (“s: %s\n”, s); expr1[expr2] in C is just syntactic sugar for *(expr1 + expr2) s: hello CS 201J Fall 2003

  26. Obfuscating C char s[6]; *s = ‘h’; *(s + 1) = ‘e’; 2[s] = ‘l’; 3[s] = ‘l’; *(s + 4) = ‘o’; 5[s] = ‘\0’; printf (“s: %s\n”, s); s: hello CS 201J Fall 2003

  27. Fun with Pointer Arithmetic int match (char *s, char *t) { int count = 0; while (*s == *t) { count++; s++; t++; } return count; } int main (void) { char s1[6] = "hello"; char s2[6] = "hohoh"; printf ("match: %d\n", match (s1, s2)); printf ("match: %d\n", match (s2, s2 + 2)); printf ("match: %d\n", match (&s2[1], &s2[3])); } &s2[1] &(*(s2 + 1))  s2 + 1 The \0 is invisible! match: 1 match: 3 match: 2 CS 201J Fall 2003

  28. Condensing match int match (char *s, char *t) { int count = 0; while (*s == *t) { count++; s++; t++; } return count; } int match (char *s, char *t) { char *os = s; while (*s++ == *t++); return s – os - 1; } s++ evaluates to spre, but changes the value of s Hence, C++ has the same value as C, but has unpleasant side effects. CS 201J Fall 2003

  29. Type Checking in C • Java: only allow programs the compiler can prove are type safe • C: trust the programmer. If she really wants to compare apples and oranges, let her. Exception: run-time type errors for downcasts and array element stores. CS 201J Fall 2003

  30. Type Checking int main (void) { char *s = (char *) 3; printf ("s: %s", s); } Windows 2000 (earlier versions of Windows would just crash the whole machine) CS 201J Fall 2003

  31. In Praise of Type Checking int match (int *s, int *t) { int *os = s; while (*s++ == *t++); return s - os; } int main (void) { char s1[6] = "hello"; char s2[6] = "hello"; printf ("match: %d\n", match (s1, s2)); } match: 2 CS 201J Fall 2003

  32. Different Matching int different (int *s, int *t) { int *os = s; while (*s++ != *t++); return s - os; } int main (void) { char s1[6] = "hello"; printf ("different: %d\n", different ((int *)s1, (int *)s1 + 1)); } different: 29 CS 201J Fall 2003

  33. So, why is it hard to garbage collect C? CS 201J Fall 2003

  34. Mark and Sweep (Java version) active = all objects on stack while (!active.isEmpty ()) newactive = { } foreach (Object a in active) mark a as reachable foreach (Object o that a points to) if o is not marked newactive = newactive U { o } active = newactive sweep () // remove unmarked objects on heap CS 201J Fall 2003

  35. Mark and Sweep (C version?) active = all pointers on stack while (!active.isEmpty ()) newactive = { } foreach (pointer a in active) mark *a as reachable foreach (address p that a points to) if *p is not marked newactive = newactive U { *p } active = newactive sweep () // remove unmarked objects on heap CS 201J Fall 2003

  36. GC Challenges char *f (void) { char *s = (char *) malloc (sizeof (char) * 100); s = s + 20; *s = ‘a’; return s – 20; } There may be objects that only have pointers to their middle! CS 201J Fall 2003

  37. GC Challenges char *f (void) { char *s = (char *) malloc (sizeof (char) * 100); int x = (int) s; s = 0; return (char *) x; } There may be objects that are reachable through values that have non-pointer apparent types! CS 201J Fall 2003

  38. GC Challenges char *f (void) { char *s = (char *) malloc (sizeof (char) * 100); int x = (int) s; x = x - &f; s = 0; return (char *) (x + &f); } There may be objects that are reachable through values that have non-pointer apparent types and have values that don’t even look like addresses! CS 201J Fall 2003

  39. Why not just do reference counting? Where can you store the references? Remember C programs can access memory directly, better not change how objects are stored! CS 201J Fall 2003

  40. Summary • Garbage collection depends on: • Knowing which values are addresses • Knowing that objects without references cannot be reached • Both of these are problems in C • Nevertheless, there are some garbage collectors for C. • Change meaning of some programs • Slow down programs a lot • Are not able to find all garbage CS 201J Fall 2003

  41. Charge • Friday’s section: practice problems on subtyping and concurrency • If you send me questions by Monday, Tuesday’s class will be a quiz review • PS6 due Tuesday • Either staple your assignment before class, or you can use my stapler for $5 per staple CS 201J Fall 2003

More Related