Course Notes for CS1621 Structure of Programming Languages Part B By John C. Ramirez Department of Computer Science Univ

1. Course Notes forCS1621 Structure of Programming LanguagesPart BByJohn C. RamirezDepartment of Computer ScienceUniversity of Pittsburgh

2. 2 These notes are intended for use by students in CS1621 at the University of Pittsburgh and no one else These notes are provided free of charge and may not be sold in any shape or form Material from these notes is obtained from various sources, including, but not limited to, the textbooks: Concepts of Programming Languages, Seventh Edition, by Robert W. Sebesta (Addison Wesley) Programming Languages, Design and Implementation, Fourth Edition, by Terrence W. Pratt and Marvin V. Zelkowitz (Prentice Hall) Compilers Principles, Techniques, and Tools, by Aho, Sethi and Ullman (Addison Wesley)

3. 3 Expressions Expressions are vital to programs Allow programmer to specify the calculations that computer is to perform It is important that programmer understand how a language evaluates expressions Things to consider: Precedence and associativity Order of operand evaluation Side-effects of evaluation Overloadings and coercions

4. 4 Expressions Precedence and Associativity We always learn these rules for any new language Vital to using expressions correctly Most languages have similar precedence for the standard operators: * / then + � But programmer needs to understand precedence and associativity for all operators, especially those that may be unusual

5. 5 Expressions Ex: boolean and relational operators and or not < > <= >= != == In Pascal, the boolean operators have higher precedence than the relational operators (opposite of C++) if x < y then writeln(�Less�); if x < y and y < z then writeln(�Middle�); Above is an error in Pascal, since the first sub-expression evaluated would be y and y if (x < y) and (y < z) then writeln(�Middle�); Now it is ok In C++ if (x < y && y < z) cout << �Middle� << endl; This is fine in C++

6. 6 Expressions Ex: unary ++ and -- in C++ Precedence and associativity are wacky! #include <iostream> using namespace std; int main() { unsigned int i1 = 0, i2, i3, i4, i5, j, k, m1, m2, m3, m4, m5; j = i1++; k = ++i1; cout << j << " " << k << endl; i5 = i4 = i3 = i2 = i1; m1 = i1++ + i1++ + i1++; m2 = i2++ + ++i2 + i2++; m3 = i3++ + ++i3 + ++i3; m4 = ++i4 + i4++ + ++i4; m5 = ++i5 + ++i5 + ++i5; cout << i1 << " " << m1 << endl; cout << i2 << " " << m2 << endl; cout << i3 << " " << m3 << endl; cout << i4 << " " << m4 << endl; cout << i5 << " " << m5 << endl; }

7. 7 Expressions Output? See plusplus.cpp � try it on different platforms http://www.cppreference.com/operator_precedence.html See problem in Assignment 3 Compare to plusplus.java and plusplus.pl

8. 8 Expressions In some cases, expression is ambiguous and compiler will not let you do it, or warn you about it Ex: A ** B ** C in Ada Must have parentheses Ex: Mixing bitwise operators in C++ Warning to use parentheses Sometimes you could probably figure it out, but you�re better off not trying Ex: If more than one coercion can occur in C++ May have defined constructor and conversion fn

9. 9 Expressions Sometimes you don�t think you should care, about precedence and associativity, but you should In math, addition and multiplication are associative and commutative On computer, overflow can cause this to not always be the case: floats x = 1e+30, B = 1.0/1e+30, C = 1e+30 A * B * C A * C * B ~= 1e+30 = infinity see Overflow.cpp F1.add(F2); F2.add(F1) -- If F1 and F2 are from different classes, the operations may be different or perhaps not even legal

10. 10 Expressions Side-effects can also cause evaluation order problems Expressions can involve function calls, which can change variable values Y = f(X) + X; Y = X + f(X); Without side-effects, the results are the same, but if f(X) changes the value of X, the results could be different Most languages allow reference parameters with functions These can cause logic errors if used improperly See side.cpp

11. 11 Expressions How to handle this? Leave it up to the programmer, as in Pascal and C++ Limits compiler optimizations, some of which may include reordering of operations Compiler cannot reorder if it could possibly change result Do not allow (most) side-effects to occur, as in Ada Ada functions cannot change parameters Now optimizations can reorder expressions without changing result (at least due to this) Best advice is to program in such away as to either avoid all side-effects, or to only allow them in cases where they will not affect expression evaluation

12. 12 Expressions Operator Overloading Used in many newer high-level languages Can be good and bad Good: Aids in readability and simplifies code if used correctly Ex: New class Complex variables A, B and C A + B + C is more clear than (A.add(B)).add(C) Ex: String variables can be compared if (A < B) � is clearer than if (A.compareTo(B) < 1) �

13. 13 Expressions Bad: Can harm readability if used incorrectly Ex: + defined to do multiplication But methods could be improperly named as well Function calls are not obvious, especially if other versions of the function exist In C++ we could have an member function + and also a friend function + ? which is used? Can allow some logic errors to go undetected Ex: C++ uses / for float and integer division If user expects a value between 0 and 1, it�s not going to happen if integer division is used

14. 14 Expressions Some languages like C++ and Ada allow programmer-defined operator overloading Others like Java do not Both positions have support

15. 15 Expressions Coercion and conversion In many expressions we use more than one datatype Mixed expressions This seems a reasonable thing to allow However, often the operators and functions used are defined for only a single type In this case, to allow mixed expressions to be used, some types must be converted to other types The differences in languages are whether these conversions should be IMPLICIT or EXPLICIT

16. 16 Expressions Explicit conversion In this case the language allows little or no mixed expressions in the code To allow mixing of data types, the programmer must convert through an operation of function call Ex: Ada does not even allow mixing of floats and integers Good: Everything is clear � no uncertainty or ambiguity Programmer can more easily verify correctness of programs Easier to avoid logic errors

17. 17 Expressions Bad: Makes language very wordy Can be annoying, especially when the types are similar (ex. addition of integers and floats) Implicit conversion � coercion In this case mixed expressions are allowed, and the language coerces types where needed to allow types to match Usually a language has some rules by which the coercions are performed Good: Less wordy � makes programs shorter and sometimes easier to write

18. 18 Expressions Bad: Programs are harder to verify for correctness It is not always clear which coercion is being done, especially when programmer-defined coercions are allowed Can lead to logic errors in programs Ex: In C++ expressions are always coerced if they can be Standard rules of �promotion� for predefined types can be easily remembered However, programmer can also define functions that will be used for coercion Constructors for classes and conversion functions are both implicitly called if necessary Now the rules are less clear and can lead to ambiguity and logic errors

19. 19 Expressions Consider A = B + C where A, B and C are all of different types Any/all of the following could exist: + operator with two type B arguments + operator with two type C arguments Constructor for type B with argument type C Constructor for type C with argument type B Coercion function from C to B Coercion function from B to C Constructor for type A with argument type B Constructor for type A with argument type C How does programmer know which will be used? Should NOT assume any particular coercion will occur in this case Here explicit coercion should be used to remove ambiguity See coercion.cpp and rational.h

20. 20 Expressions Boolean expressions Expressions that evaluate to TRUE or FALSE Formed using relational operators and boolean operators Relational operators � operators which compare values Operands can be most primitive types and complex types as well in some cases Boolean operators � operators used to combine boolean results Operands must be boolean values Exception is C/C++

21. 21 Expressions Same guidelines for precedence and associativity hold here Know the rules for current language Ex: Ada boolean operators and, or have the same precedence but are NON-associative when mixed with each other if A and B or C then � illegal in Ada � must parenthesize Ex: C++ boolean operator && has higher precedence than ||

22. 22 Expressions Short-Circuit Evaluation Important note (that we may not have emphasized earlier): Operator precedence and associativity are for OPERATORS, not OPERANDS The operators simply indicate how the operands are combined/utilized, NOT the order in which they are accessed/determined For example: A + B + C + D We know we first add A and B, then add C, then add D But the VALUES for A, B, C and D could be obtained in ANY ORDER Done to optimize execution (ex. in parallel)

23. 23 Expressions This is significant in (at least) 2 situations: Operand evaluation produces a side-effect that changes result of subsequent operand evaluation As we discussed previously, operand could be a function call with a reference parameter Operand could be used/modified more than once, as with ++ example An operand may not be even be valid if a previous operand evaluates in a certain way Ex: if ((X != 0) && (Y/X < 1)) cout << �rational�; Considering the && operator, if the first operand evaluates to FALSE, the second operand evaluates to a run-time error Now if the compiler would try to do these in parallel it could cause problems Solution is SHORT-CIRCUIT EVALUATION (SSE)

24. 24 Expressions Idea of SSE is simple: Evaluate boolean expressions only until a final answer can be determined For example with &&, we know that FALSE && ANYTHING == FALSE so we would not get the division by zero error SSE is nice because it makes our code simpler If we know compiler uses SSE, we can put into a single expression what otherwise would require two

25. 25 Expressions Ex: if ((X != 0) && (Y/X < 1)) cout << �rational�; Without SSE, how would we have to write this to prevent possible run-time error? Do on board Drawbacks of SSE? Now computer must evaluate operands sequentially Slows down program execution, especially in environments with multiple CPUs So we have safety/ease of programming vs. execution efficiency

26. 26 Expressions Solution is to offer programmer the choice Ada uses arbitrary evaluation of operands normally But special operators and then and or else provide short-circuit evaluation if desired C++ and Java use SSE for && and || but arbitrary evaluation for bitwise & and |

27. 27 Expressions Assignment Central to Imperative Languages Gives a value to a variable Typical syntax: <variable> <assig. operator> <expression> Semantics: Compute lvalue of variable Compute rvalue of expression Store computed rvalue in lvalue location

28. 28 Expressions Variations Some languages allow multiple targets C++ and Java allow conditional targets Wacky ?: operator C, C++ and Java have many assignment variations for convenience Ex: ++, +=, *= C, C++ and Java return the rvalue as operation result Allows assignment to be mixed within other expressions As with many features from C, C++, this is both good and bad

29. 29 Expressions Allows shorter code in cases such as: A = B = C while ((ch = getchar()) != EOF) Since it is changing the value of a variable, order of evaluation is critical Typically associates right to left, and it is a good idea to parenthesize (as above) Famous C/C++ bug that we mentioned before: if (x = y) is wacky! Will ALWAYS be true if y is non-zero Will ALWAYS be false if y is zero Newer compilers warn you about it Not possible in Java since if requires a boolean Concern also must be given for overloading the assignment operator (legal in C++ and Ada) It is possible to cause it to behave differently from what is normally expected Care has to be taken so that it works in all cases

30. 30 Expressions Ex: Overloading = for a linked list variable LList<myData> A, B; // Fill B with various nodes A = B; If we want to use this assignment as with other assignments, we need to return the assigned result as the result of the assignment In C++ this is typically a reference return value, so that we can cascade the operator effectively A = (B = C); (A = B) = C; On the left, when the assignment B = C is finished, we need the rvalue of the result On the right, when the assignment A = B is finished, we need the lvalue of the result Reference allows both (even though right seems silly to do) Also, how about A = A; If we destroy old LL before assigning new one, this could destroy the value

31. 31 Expressions One issue that you may not normally consider: How is the rvalue evaluated? For statically typed languages, there is usually no ambiguity � expression result type must match the type of the variable But for dynamically typed languages, it is no longer clear Ex: in Prolog A = 5 + 3 Since A is not necessarily an integer, 5 + 3 could be taken as a string just as reasonably as it could be taken as an arithmetic expression See assig.pl

32. 32 Control Statements Primary types of control in imperative languages Selection Choose between 1 or more different actions Iteration Repeat an action 0 or more times

33. 33 Control Statements Selection One-way selection if statement exists in virtually every imperative language Idea here is that we either execute a statement or do not In modern languages this is achieved using an if without the optional else Two-way selection Now we incorporate the else with the if

34. 34 Control Statements Typical syntax: if <condition> <statement> else <statement> Interesting issues: Form of condition? What kinds of statements are allowed? Is nesting allowed and how is it interpreted?

35. 35 Control Statements Form of condition Most languages require a boolean expression (true or false only) C/C++ are exceptions � int values are allowed Kinds of statements Original FORTRAN and BASIC allowed only a single statement This is not conducive to good programming techniques Only way to have multiple statements is by using an unconditional branch, i.e. GO TO

36. 36 Control Statements ALGOL 60 introduced the compound statement Now an arbitrary number of statements can be used All newer imperative languages (and updates of older languages) either use compound statements or allow multiple statements within the if Nesting It logically follows that a statement within an if clause or else clause could be another if statement Remember orthogonality What issues occur in this case?

37. 37 Control Statements Only problem of interest is one we have already discussed If the number of if clauses and else clauses are not equal, how are they associated? There are two main approaches to handling this: Use a rule (static semantics) to determine how this is handled This is the approach taken in Pascal, C, C++ and Java System handles the rule consistently, so there is no ambiguity, but, like rules of precedence and associativity, the programmer could forget it or make a mistake that is not caught Can lead to logic errors We have already seen this example

38. 38 Control Statements Use syntax to determine how it is handled This is the approach taken in Ada, BASIC, Modula-2, ALGOL 68 Every if statement must be syntactically terminated (ex: end if) Now an inner if clause without an else clause must still have an end if, and syntactically the outer else can only be associated with the outer if Perl has a slightly different approach: the statement for an if MUST be a compound statement. Result is the same, since the inner if will now be within a compound statement

39. 39 Control Statements Multiple Selection Idea is to choose from many possible options Clearly one way of doing this is through nested if statements Often preferable, especially if the means of selection is a series of separate boolean expressions // Break tie for A and B in some sport if (A beat B twice) then A wins tie else if (B beat A twice) then B wins tie else if (A scored more points than B) then A wins tie else if (B scored more points than A) then B wins tie �

40. 40 Control Statements However, in some situations, the options are based on different result values of a single expression: Ex: Menu in which user chooses an option from 1 to 5; each option causes a different action In these instances, nested ifs could be used In fact these are all we really need But the nesting gets complicated, often making the statements harder to follow and making them more prone to logic errors So many languages supply a case statement Specifically designed for multiple alternative selection based on different results of a single expression

41. 41 Control Statements There are some interesting issues to consider here Many are the same as for two-way selection Text discusses them at length A few that we will look at What happens after the code for the matched selection is executed? One option is to break out of the structure, continuing with the next statement after it This makes each option mutually exclusive This approach is taken by Algol W, Pascal, Ada Probably the most intuitive idea � the choices are mutually exclusive by default

42. 42 Control Statements C, C++ and Java do not automatically break out after the selection has been executed This is good and bad (as usual) Adds flexibility If the execution for one selection is a �superset� of another, it makes sense to allow the flow to continue within the selection statement Causes potential logic problems Programmer must manually add breaks If one is missed no syntax error occurs What happens if no match is found? Two logical alternatives: 1. Do nothing 2. Error

43. 43 Control Statements C, C++, Java adopt the �do-nothing approach� Seems logical that if nothing matches nothing should be done ANSI Standard Pascal and Ada adopt the �error approach� More reliable, since now an accidental out of range value will be detected as an error rather than just a �do nothing� C, C++, Java, Ada, Turbo Pascal, BASIC also provide a �default� choice Good idea to always use so you can detect an out of range value without causing a runtime or logic error

44. 44 Control Statements Iteration Three primary types of iterative loops: conditional loops, counting loops and arbitrary loops Conditional (logically controlled) loops Number of iterations is determined by a boolean condition, and cannot be (usually) precalculated ex: while (infile && valid == 1) Note that we cannot predict when this condition will become false

45. 45 Control Statements Many languages have two versions of the conditional loop Pretest � condition is tested prior to entering the loop body May execute loop body 0 times Posttest � condition is tested immediately after executing loop body Will always execute loop body at least 1 time Ada does not have this version Two versions are provided for convenience � we can always simulate one loop with the other (plus some conditionals) See loops.cpp Clearly the difference is where each is more appropriate

46. 46 Control Statements Conditional loops are the most general kind of loops, and are really all that is needed in an imperative programming language However, many looping applications deal with arrays and sequences of values For convenience and efficiency it is prudent to provide a looping structure geared toward these applications Counting Loops (counter-controlled loops) Number of iterations determined by a control variable, an initial value, a terminal value, and an increment

47. 47 Control Statements We can (usually) precalculate the number of iterations based on the initial value, terminal value and increment Ex: for (int i = 3; i <= N; i+=2) { � i obtains values 3, 5, 7, �, N (or N � 1 if N is even) For N = 31, the number of iterations equals CEILING((TERM � INIT+1)/INCR) or CEILING((N � 3 + 1)/2) = CEILING((31 � 3 + 1)/2) = 15 Precalculation is nice because it allows the computer to base the loop on an iteration count (if it chooses to do so) which can be executed more quickly than conditional testing each time

48. 48 Control Statements Machine can use a register for the iteration count and not have to worry about obtaining operands for the comparisons at each iteration of the loop, something that must be done with a conditional loop To allow precalculation and iteration counts to work, some restrictions must be made on the loop Loop control variable cannot be altered by the programmer within the loop body Terminal value must be calculated only one time, when loop is first entered It will also speed things up if the loop control variable is an integer (or integral type) so no float operations are necessary This is the approach taken in Pascal and Ada See for.p

49. 49 Control Structures Pascal and Ada also do not allow an increment other than 1 or �1, and do not carry the value of the control variable past the end of the loop In Pascal, the value is �officially� undefined, but in any Pascal implementation it will typically be one of two things: 1) The terminal value of the loop or 2) The terminal value + 1 or � 1. 1) typically indicates that iteration counts are being used In Ada, the loop control variable is implicitly declared in the loop header, and becomes really undefined at the end of the loop � accessing it afterward will cause an �undeclared variable� error This is now generally accepted as a good idea, since it reduces side-effect problems of using loop control variables that were declared and assigned elsewhere. C++ and Java both allow (but do not require) this as well

50. 50 Control Structures Attitude in Pascal and Ada is that if you want more complex iteration (ex. increment other than 1 or �1, option of changing number of iterations during the loop�s execution) you should use a while loop C, C++ and Java have a different approach For loop is not really a for loop in the traditional sense It is a very general loop that can be used for any looping application It more appropriately is a while loop with the addition of an initialization-statement and a post-body statement

51. 51 Control Statements for (init-expr; pretest-expr; post-body-expr) Now really anything goes and the pre-test-expr and post-body-expr are evaluated for each iteration of the loop Can certainly be used for a counting loop, as most of you have used it Can also be used as an arbitrary loop to do more or less whatever programmer wants it to do Added flexibility, with added danger The usual for C, C++ see for.cpp

52. 52 "foreach" loop Newer languages also have included a "foreach" loop to iterate through data Key difference between "for" and "foreach" "for" iterates through indexes (typically), which can be used to access an array / collection if desired Loop control variable is typically an integer "foreach" iterates through the values in the collection directly No indexing is used, at least not directly Loop control variable is the data type we are accessing in the collection

53. 53 "foreach" loop foreach loop has its advantages and disadvantages Advantages: Since no counter is used, we eliminate the possibility of index out of bounds problems We can iterate over a collection without having to know the implementation details of the collection Allows for data hiding and improves error prevention We will likely discuss this more when we discuss object-oriented programming

54. 54 "foreach" loop Disadvantage When accessing an array, we may want or need the index value Ex: What if we want to change the data in the array or reorganize it Ex: Sorting would difficult using "foreach" See forEach.java and foreach.pl

55. 55 Control Statements Arbitrary Loops Now the loop is basically an infinite loop, with the programmer expected to break out of it explicitly at some point Ada allows this with the loop end loop; exit statement will break out of the loop, and can be put into an if statement Thus we can break out of the loop from more than one place

56. 56 Control Statements Although C, C++ and Java do not explicitly have this construct, you can certainly build it by making a while or for loop an infinite loop and using the break statement to break out while (1) // C while (true) // Java { { } } Again this feature adds flexibility, but makes code less readable and harder to debug

57. 57 Control Statements Unconditional Branching Transfer execution from one section of code to another section of code Commonly known as the goto Used extensively in early languages which lacked block control structures Ex. early FORTRAN and BASIC programs relied heavily on the goto It was necessary then, but most modern languages contain block control structures

58. 58 Control Statements Even then computer scientists were aware of how problematic they could be �Spaghetti code� that results is very difficult to read Modification of one code segment can significantly impact many parts of the program � programmer must be aware of all places that can �go to� that code segment Debugging is very difficult � it is hard to find and fix logic errors since all possible execution paths are difficult to trace Now languages have blocks and extensive control structures It has been shown that goto adds no functionality (i.e. nothing can be done with it that cannot be done without it) However, many languages still have goto

59. 59 Control Statements Unrestricted goto allows code segments that normally have only one entry and exit point to have many Ex: What happens if you jump into the middle of a procedure (what about parameters?) or a while loop (condition is skipped) Most newer languages that have the goto have restrictions on it Ex: Cannot jump into an inactive statement or block in Pascal If restricted and used infrequently, can actually be useful in some languages Ex: Pascal does not have a break statement. If an exceptional situation would case an exit from a loop, using a goto may be more readable than adding extra convoluted logic

60. 60 Control Statements Some (newer) languages do not have goto at all Ex: Java Allows breaks from loops Has exception handlers

61. 61 Subprograms Subprograms Semi-independent blocks of code with the following basic characteristics: Only one �entry point� � the beginning of the subprograms, and execute when called: Parameter information is passed to subprogram Caller execution is temporarily suspended, and subprogram executes When subprogram terminates, caller execution resumes at point directly following the subprogram call

62. 62 Subprograms What types of subprograms can we have? Most languages have two different types, procedures and functions Procedures can be thought of as new named statements that can supplement the predefined statements in the language Ex: Statements to search or sort an array Once defined, these can be used anywhere they are needed in a program

63. 63 Subprograms In order to have an effect on the overall program, a procedure needs to act on something other than just the variables local to the procedure. This can be done through: Outputting data to the display or to a file Altering a (relatively) global variable that will be accessed/used later by a different part of the program Altering formal parameters such that the actual parameters in the caller are modified This will be discussed in more detail soon

64. 64 Subprograms Functions can be thought of as code segments that calculate and return a single result Modeled after math functions Used within expressions, where result value is substituted for the call The effect of functions on the overall program is the value returned by them. Thus, from an ideal (and mathematical) point of view, functions should have NO OTHER effect on the overall program

65. 65 Subprograms Should NOT modify global variables Should NOT alter actual parameters Naturally, both of the above are allowed in many languages In these cases it is up to the programmer to decide how he/she wants to use functions Again the tradeoff for the increased flexibility is the more potential for logic errors and more difficulty in debugging C/C++/Java Only have functions, no procedures void functions can mimic the behavior of procedures

66. 66 Subprograms Local variables How/when are they allocated? Stack-dynamic: Default in most modern imperative languages Required for recursive calls, since memory must be associated with each call, not each subprogram Ex: Binary Search mid = (left + right)/2; Many different values for mid must be able to coexist, one for each call on the run-time stack Could not do it memory was statically allocated

67. 67 Subprograms Overhead is time for allocation and deallocation each time a subprogram is called May not seem like a lot of time is needed, but it can add up if many calls are made in a program Access must be indirect since actual memory location of variable will not be known until a subprogram call is made Location in run-time stack depends upon calls made prior to current one, which can differ from run to run Also adds some time overhead Static: Used in languages that do not support recursion (ex. older FORTRAN)

68. 68 Subprograms Also optional in other languages, such as C and C++ Allow variables to retain values from call to call Remember the lifetime is the duration of the program Ex: In CS1501 LZW algorithm writing codewords to a file, the bit buffer is static The leftover bits are kept in the buffer for the next call

69. 69 Subprograms Parameters Parameters are vital to subprograms Allow information to be: Passed IN to the subprogram Passed OUT from the subprogram Passed IN and OUT to and from the subprogram When writing subprograms, programmer decides which is required for a given subprogram

70. 70 Subprograms Then programmer utilizes syntax/rules in language being used to achieve the desired option Sometimes the syntax/rules of the language do not fit exactly with the 3 use options given In these cases programmer must be careful to use the parameters as he/she intends Some definitions: Formal Parameter: Parameter specified in the subprogram header Only exists during duration of subprogram exec Sometimes called "parameter"

71. 71 Subprograms Actual Parameter: Parameter specified in call of the subprogram May exist outside of the scope of the procedure Sometimes called just "argument" Rules for Formal and Actual parameters differ, as we will discuss

72. 72 Subprograms Parameter Passing Options Pass-by-Value Pass-by-Reference Pass-by-Result Pass-by-Value-Result Pass-by-Name You should be familiar with Pass-by-Value and Pass-by-Reference Others may be new to you We�ll discuss each

73. 73 Subprograms Pass-by-Value Formal parameter is a copy of the actual parameter i.e. get r-value of actual parameter and copy it into the formal parameter Default in many imperative languages Only kind used in C and Java Used for IN parameter passing Actual can typically be a variable, constant or expression

74. 74 Subprograms Benefit is that actual parameters cannot be altered through manipulation of the formals Also useful in some recursive calls, since a new copy is made with each call Problem is that copying a parameter can be quite expensive, both in terms of time and memory Ex: Consider an object with an array of 1000 floats Object is copied with each call to the function If, for example, recursive calls are made, a lot of memory can be consumed very quickly

75. 75 Subprograms Implementation: Using a run-time stack, this is straightforward When subprogram is called, copy of actual parameter is placed into a local variable, which is stored on the run-time stack (in the activation record for the subprogram) During subprogram execution, formal parameter is used like any other local variable for the subprogram Only difference is that it is initialized via the actual parameter

76. 76 Subprograms Pass-by-Reference Formal parameter is a reference to (or address of) the actual parameter variable get l-value of actual param and copy it into the formal param, then access the actual param indirectly through the formal param Used in Pascal (var parameters), in C (using explicit pointers) and C++ and PHP (&) Most appropriate for IN and OUT parameter passing, but can be used for all Actual param usually restricted to a variable

77. 77 Subprograms Benefit is that we can change or not change the actual parameter using the formal � it is up to the programmer Also good that memory is saved � only an address is copied Problem is that we can miss logic errors if we accidentally alter an actual parameter through the formal parameter Also some applications (ex: some recursion) don�t work as well We may not want change at one call to affect another call

78. 78 Subprograms Constant Reference Parameters Developers of C++ realized that value parameters are not practical for large data objects (too much time and memory, esp. for recursive algorithms) Reference parameters have danger of accidental side effects (when used for IN parameters) Solution is to pass parameters by reference, but not allow them to be altered � constant reference Now compiler gives error if parameter is changed within subprogram Copy made if passed by reference to another sub

79. 79 Subprograms Good concept, but not perfect Programmer can get around it by casting to a pointer and altering indirectly See params.cpp Ada IN parameters have a similar idea Cannot be assigned/altered within the function Cannot be passed by out or in out to another sub More on Ada params shortly Implementation: Using run-time stack, address of actual is stored in activation record Actual is accessed indirectly in sub through its address

80. 80 Subprograms Pass-by-Result Reference parameters are not an exact fit for out parameters Ex: A procedure designed to read data from a file into an object Here we don�t care about what used to be in the object � we just want to be sure that at the end the appropriate value is assigned With reference parameters we COULD access the old value and use it if we �wanted� to (or by mistake) Pass-by-Result prevents this

81. 81 Subprograms In Pass-by-Result, actual parameter is not actually passed to the subprogram � it only waits to have a value passed back to it Formal parameter is a local variable During life of subprogram its value does not affect actual parameter at all At end of subprogram its value is passed back to the actual parameter So what is actually needed of actual parameter is its address (lvalue) When address is obtained can affect result for some contrived examples

82. 82 Subprograms // Note: This is NOT real code int A[8]; for (int i = 0; i < 8; i++) A[i] = i; global int j = 2; foo(A[j]); output(A[]); sub foo(int param) { int temp = 25; j = 5; param = temp; } ------------------------------------------------ Output: 0 1 25 3 4 5 6 7 // if address obtained // at call Output: 0 1 2 3 4 25 6 7 // if obtained at ret.

83. 83 Subprograms If used, address is typically obtained at call Ada �83 out parameters for simple types are ALMOST this, but the formal parameter value cannot be accessed within the sub (so it is not really a local variable) Ada �95 changed out parameters to allow them to be accessed, fitting the Pass-By-Result model more closely Implementation: At sub call, actual param address is calculated and stored in run-time stack, as is the formal param (as a local) Final result of formal is copied back to actual address at end of sub

84. 84 Subprograms Pass-by-Value-Result Now actual parameter�s value is passed to the formal parameter when subprogram is called, being stored and used as a local variable At the end of the subprogram the value is passed back to the actual parameter As the name indicates, this is a combination of Pass-by-Value and Pass-by-Result Used for IN and OUT parameters

85. 85 Subprograms If aliasing is NOT allowed/used, and if no exceptions occur in the subprogram the effect of value-result and reference is the same Precondition: Actual parameter has value obtained previous to call During subprogram: Only formal parameter is accessed, updated as desired Postcondition: Actual parameter has last value assigned within subprogram

86. 86 Subprograms However if aliasing is allowed/used, there can be differences Ex: Actual parameter is accessed directly as a global variable and is also passed to the sub as a parameter With reference params, changes to the formal immediately change the global actual param With value-result params, changes to the formal do not affect the global actual param (until the sub terminates) Ada uses value-result for simple IN OUT parameters But in Ada �83 it is not specified how structured in out params are passed

87. 87 Subprograms Idea is that language creators did not want to require the params to be passed in any specific way They just wanted to require the in-out effect If the result could differ based on whether params are value-result or reference, then the program is erroneous Up to programmer to NOT use aliases Ada �95 clarified, requiring all structured in-out parameters to be reference See params.adb Implementation: Value + Result

88. 88 Subprograms Pass-by-Name Definitely wackiest way of param passing Used for IN and OUT parameters, and only in Algol Idea is that actual parameter is textually substituted for the formal in all places that it is accessed in the subprogram Kind of like a macro substitution It is only evaluated at the point of use in the subprogram Evaluated EACH TIME it is used in subprogram

89. 89 Subprograms Thus the parameter value or address could change based on where/when in the subprogram it is evaluated However, the referencing environment used is that of the CALLER, not of the subprogram So only changes within the subprogram that have a global effect will change its evaluation This also makes implementation more difficult For simple variables this is equivalent to pass-by-reference Variable address evaluates the same way regardless of where in the subprogram it is located

90. 90 Subprograms For constant expressions, this is (almost) equivalent to pass-by-value Evaluation of constant expr. will not change from one part of the subprogram to another But cannot assign a new value to the formal param unless a copy is made But it gets wacky when array elements or variable expressions are passed Now changes within the subprogram can affect the index of the array or a variable within the expression Can cause evaluation to differ in different parts of the subprogram

91. 91 Subprograms global int i = 0, var = 11, n = 5; global int A[2] = {4, 8}; foo(var, 2*n, A[i]); // all pass by name void foo(int x, int y, int z) { x = x + 1; output(var); output(y); n = n + 1; output(y); output(z); z = z + 1; output(z); i = i + 1; z = z + 1; output(z); } 1st: var = var + 1 ? var is 12 2nd: y is 10 ? n = n + 1 ? y is 12 3rd: z (or A[0]) is 4 ? z = z + 1 ? z is 5 4th: i is 1 ? A[1] = A[1] + 1 ? z is 91st: var = var + 1 ? var is 12 2nd: y is 10 ? n = n + 1 ? y is 12 3rd: z (or A[0]) is 4 ? z = z + 1 ? z is 5 4th: i is 1 ? A[1] = A[1] + 1 ? z is 9

92. 92 Subprograms Implementation: It is not trivial to allow macro to be evaluated and reevaluated in environment of the caller Parameterless subprograms called thunks are used Thunk evaluates parameter in current state of caller�s referencing environment Returns the resulting address or value Clearly this is a lot of overhead Overhead and confusing results are why this is not used in newer languages

93. 93 Subprograms Subprograms as Parameters We allow variables as parameters so that we can access their values (or addresses) from within a subprogram Why not allow subprograms so that we can execute them from within a subprogram? Some languages do allow this (ex. Pascal, C++, PHP) However, there are some issues to consider

94. 94 Subprograms Can the parameter subprogram arguments differ in form from each other? If so, how to type check and even check the number of arguments when the subprogram is actually called? Easiest solution is to require the arguments to all have the same form Header of parameter subprogram must be given within the header of the subprogram it is being passed to Scope is also an issue � what is the referencing environment of the subprogram that is being passed as a parameter? Three reasonable possibilities exist:

95. 95 Subprograms The referencing environment in which the parameter subprogram is CALLED: shallow binding The referencing environment in which the parameter subprogram is DEFINED: deep binding The referencing environment in which the parameter subprogram is PASSED as an argument: ad hoc binding Note that shallow binding fits well with dynamic scoping and deep binding fits well with static scoping

96. 96 Subprograms Pascal and C++ both use deep binding Shallow binding is used by SNOBOL, which also uses dynamic scoping Ad hoc binding has never been used See fnparams.cpp

97. 97 Subprograms Overloading (ad hoc polymorphism) Using the same subprogram name with different parameter lists When a subprogram is called, the compiler selects the correct version based on the parameter lists In Ada, return type for a function is also used, since coercion is not done in Ada and function return values cannot be ignored Enables programmer to use the same name for similar functions that take different argument types

98. 98 Subprograms Use: Make it easier for the programmer to use consistent names for subprograms Without overloading: Programmer must make up different but similar names for subprograms that do similar things but for different types Ex: abs(int) fabs(float) labs(long) Ex: ISort(int * A) FSort(float * A) With overloading: Programmer uses the same name and the compiler decides which to use Ex: abs(int) abs(float) abs(long) Ex: Sort(int * A) Sort(float * A)

99. 99 Subprograms But programmer must be careful: Ada and C++ both allow overloading and default parameters Leaving out some parameters in the call could make a call ambiguous i.e. it matches more than one function header Call can also be ambiguous if implicit casting of arguments is done Operator Overloading is the same idea, but with symbols rather than identifiers We discussed these issues previously See Slide 12 of cs1621b.ppt

100. 100 Generics Generics Parametric polymorphism One or more parameters are passed to a subprogram when it is instantiated (i.e. when the code is generated) indicating the types that will be used for the parameters in the subprogram call Can also be used in conjunction with packages (Ada) and classes (C++) Thus a single subprogram declaration can be used to generate many different callable subprograms, all with the same functionality

101. 101 Generics Motivation: Programmers often apply data structures and algorithms to more than one data type Ex. Sorting, Searching algos Ex. BST, PQ, Stack, Queue data structures Even with overloading, the programmer must still write different (identical except for type) versions of the code Generics simply transfer the job of making the different versions from the programmer to the compiler � automates the overloading process Note that DIFFERENT VERSIONS of the code MUST STILL BE generated

102. 102 Generics So the reason we have generics is to save the programmer some time (and perhaps some confusion) Ada vs. C++: In Ada, template instantiations must be explicit Programmer specifies template arguments using the new statement Ex: package int_io is new integer_io(integer); The generic package is integer_io The instantiated package is int_io The type argument is integer As is usual in Ada, if declaration is explicit, there will be no surprises

103. 103 Generics In C++, template instantiations can be explicit or implicit Implicit: generated automatically by the compiler when a call is seen with the appropriate arguments �Duplicate� instantiations are merged into a single code segment Coercion cannot be done, since the types won�t match the template correctly Saves programmer some typing Explicit: programmer declares each version Coercion can be done using regular C++ promotion and conversion rules Programmer is aware of each version See template.cpp and tordlist.h

104. 104 Generics Java Generics In Java 1.5 "generics" were added to the language It is somewhat misleading, since generic abilities were always built into the Java language Collections were defined in terms of class Object, which is the superclass to other Java classes They could be used to store any Java class

105. 105 Generics However, retrieving objects back from the collection required explicit casting to the actual type if we wanted full access to them ArrayList A = new ArrayList(); A.add(new String("Wacky")); String S = (String) A.remove(0); Also any typing mistakes (mixing types in the collection unintentionally) could only be caught at run-time (via casting exceptions) Overall not bad, but some people thought type parameters should be allowed

106. 106 Generics JDK 1.5 added syntax very similar to that for C++ templates However, it is very different from C++ templates (and Ada generics as well) It is not really adding any new generic abilities to the language It is not creating new code for each version of the class or method It is designed to make collections of objects more type-safe See more details in the handout

107. 107 Implementing Subprograms What is involved when a subprogram is called, during its execution, and when it terminates? This will differ depending on if recursion is allowed in a language or not Most modern languages allow recursion, but original FORTRAN (up to FORTRAN 77) did not allow it

108. 108 Implementing Subprograms FORTRAN 77 (and before) All variables within a subprogram were static, and recursive calls were not allowed Activation records were still used, but they also could be static Since all data was static, the size was known at compile time Run-time stack not needed, since at most one call per sub could be performed at a time What do we need to know when a subprogram is called?

109. 109 Implementing Subprograms Return Value Local Variables Parameters Return Address If sub is a function Static Like local variables that are initialized Where to go back to when subprogram ends

110. 110 Implementing Subprograms C, C++ and Java To allow for recursive calls, a run-time stack is used Multiple activations of the same subprogram can co-exist Each needs its own copy of parameters and local variables But subs are not allowed to be directly nested The only non-local variables that need to be accessed are global variables However, inner classes allow a nesting of sorts

111. 111 Implementing Subprograms So the activation record looks similar to that used in FORTRAN With additional link location to access global variables Now multiple instances of an activation record can occur at the same time, so they must be created dynamically (at run-time), unlike in FORTRAN Let�s look at some of the contents of an activation record

112. 112 Implementing Subprograms Temporaries Local Variables Parameters Dynamic Link to previous call Static Link to Non-Locals Return Address Temps and local variables are allocated within the subprog. call. In Pascal, C and C++, the local variables must be of fixed size. In Ada, they can be variable size (ex. arrays) Parameters, links to non-Locals and the return address are placed into the AR by the caller of the subprogram, so they are lower in the record

113. 113 Implementing Subprograms See rtstack.cpp Accessing non-local variables within a subprogram Local variables are located within the activation record (AR) Can be accessed by knowing the base address of the AR plus a local_offset for each variable Ex: Base address of AR = 162 int x, y[5]; // address of x is 162 + (other AR stuff) float z; // address of z is 162 + (other AR stuff) // + 4 + 20

114. 114 Implementing Subprograms Non-locals are located elsewhere For languages like C and C++: Subprograms cannot be nested Besides locals there are global variables For languages like Ada and Pascal: Subprograms can be nested to arbitrary depth A sub can be declared within a sub, which is within a sub, which is within a sub � Using static scope, variables declared in a textual parent sub are accessible from an inner sub Relative global variables But the variable locations could be in different places on the run-time stack How to find them?

115. 115 Implementing Subprograms What do we need to do? Locate the AR that contains the nonlocal Find where in the AR the variable is located Finding where in the AR to look is the same as for local variables Keep track of a local_offset value for the variable Locating the AR is a different story May not be directly prior to current AR

116. 116 Implementing Subprograms Two techniques used to locate AR Static links A link is kept in an AR to that AR�s textual parent (from the declaration) To access a single nonlocal many links may be crossed Display A single array is kept to indicate all of the currently accessible nested subs Any nonlocal can be accessed with two indirect accesses

117. 117 Implementing Subprograms Static Links Due to rules of static scope, if a subprogram is called, its textual parent subprogram MUST be active

118. 118 Implementing Subprograms However, textual parent does NOT have to be previous call on run-time stack So dynamic link in AR is not enough (but would work for dynamic scoping)

119. 119 Implementing Subprograms Static links connect an AR to the AR of the subs textual parent, no matter where previously on the RT stack it is How is this used to access nonlocal variables? Can be determined and maintained based on the nesting depths of the subprograms that are called The difference in the nesting depths between the sub using a nonlocal variable and the sub in which the nonlocal is declared is equal to the number of static links that must be crossed to find the correct AR for the variable

120. 120 Implementing Subprograms This difference can be stored for each variable when the program is compiled, so that at run-time finding the variable is simple

121. 121 Implementing Subprograms What actually happens when a sub is called? AR for textual parent of sub must be located on the run-time stack, so that the static link can be linked to it A clear (but inefficient) way to do this is to follow dynamic links down the RTS until the AR for the parent sub is found A better way can take advantage of the fact that the calling sub and the called sub must be �relatives� in the declaration tree Calling sub could be parent of called sub (but not grandparent) Calling sub could be called sub (direct recursion) Calling sub could be a sibling of called sub Calling sub could be a descendent of called sub (indirect recursion) Calling sub could be a �niece� of called sub

122. 122 Implementing Subprograms So instead of following dynamic links, at compile-time we can pre-calculate the number of static links (from caller) to follow to find the appropriate textual parent AR Always equal to: nesting_depth (calling sub) � nesting_depth(called sub) + 1 Calling sub could be parent of called sub X � (X+1) + 1 = 0 static links (user caller's AR) Calling sub could be called sub (direct recursion) X � X + 1 = 1 static link � same textual parent Calling sub could be a sibling of called sub X � X + 1 = 1 static link � same textual parent Calling sub could be a descendent of called sub (indirect recursion) Calling sub could be a �niece� of called sub Follow diff. in nesting depth + 1 static links

123. 123 Implementing Subprogams procedure Bigsub is procedure A(Flag: Boolean) is procedure B is ... A(false); end; -- B begin -- A if flag then B; else C; end; -- A procedure C is procedure D is ? here end; -- D ... D; end; -- C begin -- Bigsub A(true); end; -- Bigsub Problem 3 in Chapter 10Problem 3 in Chapter 10

124. 124 Implementing Subprograms Evaluation of static links Maintaining is not too time-consuming Chain offsets can be calculated at compile time Local variables can be accessed directly Non-locals must follow 1 or more static links Works well if nesting depths do not get too deep For deep sub nesting, cost of non-local access can be high But usually 2 or 3 levels is max used

125. 125 Implementing Subprograms Display Uses a single array to store links to ARs at all relevant nesting depths To access a nonlocal at a given nesting depth, we just follow the display entry for that depth, then the local_offset Never more than one link to follow Array is updated as subs are called and as they terminate Generally faster than static links if many nesting levels are used We will skip the details here � read the text

126. 126 Implementing Subprograms Nested declaration blocks Idea could be similar to nested subs Blocks could be treated as parameterless subs Static links could be used to determine textual parent But it is actually much easier to handle, since block entry and exit is always the same Parent block goes to child block When child block terminates, we revert to parent block

127. 127 Implementing Subprograms Simply push new block declarations onto run-time stack, and pop them when block terminates But we only have one activation record, so no links are required "Non-locals" can be accessed just like locals

128. 128 Implementing Subprograms Dynamic Scoping When a non-local variable is accessed, we always follow the dynamic links until the correct declaration is found Clearly could differ depending upon call sequence But the mechanics are actually simple ARs must store names of local variables so we know where to stop the search In static scoping the names are not needed � just the offsets

129. 129 Data Abstraction Procedural (process) abstraction: Action can be performed without requiring detailed knowledge of how it is performed Data abstraction: New type can be used without required detailed knowledge of how it is implemented We don't need to know the details of how it is stored in memory We don't need to know the details of how it is manipulated via operations

130. 130 Data Abstraction More formally, an ADT must satisfy two conditions: The declarations of the type and operations (interface) are contained in a single syntactic unit ? ENCAPSULATION The interface does not depend on how the objects are represented or how the operations are implemented The representation of the objects is hidden from users of the ADT ? DATA HIDING Objects can only be manipulated via the provided interface

131. 131 Data Abstraction Ex: Stack Data: something that can store and access multiple data values in the manner dictated by the operations Operations: Push � add new value to top of stack Pop � remove top value from stack Top � view top value (or a copy) without removing Empty � is stack empty User of stack only needs to know the parameters and effect of each operation to use a stack correctly Implementation could be an array, a linked-list, or maybe something different Does not affect use Implementer can �hide� these details from the user through private declarations

132. 132 Data Abstraction The idea of data abstraction was not always supported by programming languages Ex: FORTRAN, Pascal, C did not fully support either encapsulation or data hiding When learning good programming style, users tried to "simulate" data abstraction Logically group type definitions, procedures and functions together as a unit Only access the data type via the procedures and functions Naturally, this was at the programmer's discretion See ADT.p

133. 133 Data Abstraction Newer languages added true data abstraction Ada via packages C++, Java, C#, Ada95 via classes / objects Encapsulation units that contain all details of the new type Access modifiers that prevent access to internal details of the ADT from outside the encapsulation unit See text for more details

134. 134 Object-Oriented Programming (OOP) Characteristics of OOP Data abstraction: encapsulation + information-hiding The operations for manipulating data are considered to be part of the data type (encapsulated) The implementation details of the data type (both the structure of the data and the implementation of the operations) are separate from their specifications and (possibly) hidden from the user As we discussed with ADTs

135. 135 OOP Inheritance The characteristics of an ADT (data + operations) can be passed on to a subtype Subtype can also add new data and operations Allows programmer to build new (derived) types from old (parent) ones Common data/operations do not have to be rewritten (or copied) Operations that are slightly different in derived type can be rewritten (overridden) for that type New data/operations tailor the derived type to the problem at hand Parent type is unchanged and may (sometimes) be used together with derived type

136. 136 OOP Ex: Shape class Has data: CenterX, CenterY Has operations AREA, DRAW Subclasses: Rectangle, Circle, Triangle Each subclass inherits the data and operations from the Shape class Rectangle adds data: length, width Rectangle overrides AREA = length * width, and DRAW in appropriate way for a rectangle Subclass of Rectangle: Square Guarantees that length == width Similar ideas for Circle and Triangle

137. 137 OOP Polymorphism Variables of a parent class can also be assigned objects of a subclass (or subclass of a subclass) Operations used with a variable are based upon the class of the object currently stored (could be a parent type object or a derived type object) Operations may have been overridden in the derived class Dynamic binding allows parent and derived objects to be used together in a logical way

138. 138 OOP Ex: Shape class We could declare: Shape shapelist[100]; � shapelist[0] = new Rectangle(0, 0, 10, 20); shapelist[1] = new Square(50, 100, 30, 30); shapelist[2] = new Circle(100, 50, 25); for (int i = 0; i < 3; i++) shapelist[i].Draw(); Polymorphism allows these different objects to be accessed consistently within the same array Think about how you could do the code above in C or Pascal It would not be easy!

139. 139 OOP One option: Make one giant struct or record to contain all of the data, including a union or variant �Base� class would use only the core data items �Derived� classes would use additional data items as provided in the union or variant To do the operations, we would need a switch or case to test which type the variable is, so that it can be written out appropriately Now what if we want to add another new derived class, Pentagon? With OOP, it is simple to add any new data and override the necessary operations Without OOP we would have to change the overall structure of the data and operations � old types would change, possibly causing problems

140. 140 OOP OO Languages Smalltalk was the first and �purest� OOL All data (even numeric literals) are objects, and are all descendents of class Object Objects are all allocated from the heap, and implicitly deallocated (garbage collection) Variables are references, with implicit dereferencing Execution of a program (logically) involves objects sending messages to each other, executing methods, and responding back So the data is driving the execution, not the control statements

141. 141 OOP Smalltalk example to count letters in an input string | data ctr letters | data := Prompter prompt: 'Enter your name' default:''. ctr := 1. letters := 0. [ctr <= data size] whileTrue: [ (data at: ctr) isLetter ifTrue: [ letters := letters + 1 ]. ctr := ctr + 1. ]. letters printNl. Note variables are not typed Only type checking is that message sent to the object is recognized Even blocks [] are objects Evaluated when appropriate methods are called

142. 142 OOP Consider the �while loop� below [ctr <= data size] whileTrue: [ (data at: ctr) isLetter ifTrue: [ letters := letters + 1 ]. ctr := ctr + 1. ]. Semantics of this loop are as follows: whileTrue: is a message sent to the top block, with the second block as a parameter The top block executes a method corresponding to the whileTrue: message that does the following: Evaluates the top block If true, evaluates the parameter block If false, exits the method This propagation of messages can sometimes lead to very short code, if variables are eliminated

143. 143 OOP Equivalent to previous code: | letters | letters := 0. (Prompter prompt: 'Enter your name' default:'') do: [ :c | c isLetter ifTrue: [ letters := letters + 1 ]. ]. letters printNl. Now we cascade the messages to allow fewer statements (also do: loop iterates through characters in a string, so we don�t need the loop counter (((Prompter prompt: 'Enter your name' default:'') select: [ :c | c isLetter ]) size printNl. Now the select: loop generates a string based on the condition in the block

144. 144 OOP More on Smalltalk (classes and objects) Data in an object can be an instance variable or a class variable Instance variables are associated with objects Separate data for each object Accessible only through the methods defined for that object � always private to the class Class variables are associated with classes Shared data for all objects of the same class Accessible from all objects, but still private to the class Methods have a similar grouping, but are public Instance methods associated with objects Class methods associated with entire class

145. 145 OOP More on Smalltalk (inheritance) Object base class of all others Only single inheritance allowed All inheritance is implementation inheritance Data and methods of parent class are always accessible to the derived class i.e. Cannot hide implementation details from derived class Advantage: Derived class can likely implement its methods more efficiently with access to parent data Disadvantage: Change in parent class implementation will likely require change in derived class implementation Ex. Traversable stack

146. 146 OOP More on Smalltalk (polymorphism) All messages are dynamically bound to methods At run-time, when a message is received, the object�s class is searched for a method, then, if necessary its superclass, its super-superclass and so on up to Object Variables have no types since they are only used to refer to objects, not to determine the messages an object can receive Clearly some liabilities with this approach Slows language down due to run-time overhead Programmer type errors cannot be caught until execution time

147. 147 OOP Let's look at some examples: person.cls as an example of a new class See personTest.st student.cls as an example of a subclass studentTest.st as an example showing polymorphic access twodarry.cls as another subclass example See twodTest.st For more information, see the GNU Smalltalk User's Guide: http://www.gnu.org/software/smalltalk/gst-manual/gst.html

148. 148 OOP C++ is an imperative/OO mix Had to be backward compatible with C Wanted to add object-oriented features Result is that programmer can use as few or as many OO features as he/she wants to C++ Classes and Objects Can be static, stack-dynamic or heap-dynamic Member data and member functions can be private, protected or public Allows programmer to decide Like Smalltalk, has notion of class variables Delcared as static in C++ Destructor needed if object uses dynamic memory

149. 149 OOP C++ Inheritance Do not need a superclass (no Object base class for all other classes) Multiple inheritance is allowed Complex and difficult to use Implementation inheritance or interface inheritance are allowed With interface inheritance, all data and functions are still inherited, but only public ones are directly accessible to the derived class Advantage: Modifications to parent class do not affect derived class, as long as they do not change the interface Disadvantage: Operations may be slower, since they cannot access the data directly

150. 150 OOP C++ polymorphism By default all functions are statically bound Recall that this allows faster execution, a goal of the C++ language However true polymorphism can not be utilized with statically bound functions Dynamic binding is enabled by using virtual functions and pointers (or references) This tells the compiler not to bind the function name to the code until run-time Abstract base classes can be created with pure virtual functions Not implemented in the base class See poly.cpp

151. 151 OOP Java falls in between Smalltalk and C++ Like Smalltalk: Object is base class to other classes Single inheritance only Objects are (almost) all dynamic, with garbage collection References used to access Method names are (by default) dynamically bound Like C++: Access can be private, public or protected Static binding can optionally be used to improve run-time speed Overall syntax for member data and function access Variables are typed

152. 152 OOP Other Java OOP features: Interfaces allow for a simplified form of multiple inheritance An interface is in a sense a base class with no data and only abstract (pure virtual) methods A class that implements an interface simply implements the methods specified therein Advantages: Objects that implement an interface can be used whereever the interface is specified. This allows for a type of generic behavior Ex: Comparable interface, Runnable interface Disadvantage: Can become complicated when interfaces and inheritance are both used Reflection that allows us to manipulate the classes themselves See poly.java

153. 153 OOP OOL Implementation Data: Typically a record/struct type of storage is used � Class Instance Record (CIR) Data members are accessed by name, in the same way as records Subclass adds extra data to CIR of parent class Private access enforced by limiting visibility of the data

154. 154 OOP Subprograms: Static binding Subprograms that will be called are determined by the variable type Variable types are known at compile time and code can be determined then Dynamic Binding: Subprograms that will be called are determined by the object�s type, not the variable�s type Objects stored in a variable are determined at run time Appropriate links must be stored with the object But they are the same for all objects of that class Virtual Method Table (VMT) used to store links to all pertinent subprograms

155. 155 Parallelism Parallelism is incorporated into programs for 2 primary reasons: Program is running in a multiprocessing or distributed environment Many computers now have multiple CPUs Many jobs are distributed over multiple computers in a network A programming language should be able to take advantage of this parallelism Many algorithms can be improved if designed for parallel execution This is PHYSICAL PARALLELISM

156. 156 Parallelism Program is running in a �simulated� parallel environment, allowing for asynchronous activity Ex: Two windows are displayed to the user. One shows the current time (incremented by seconds) and one allows the user to draw images on the screen We don�t want the act of the user drawing to �stop� the clock We don�t want the clock running to prevent the user from drawing Even with a single processor, we want both of these activities to execute �in parallel� This is LOGICAL PARALLELISM

157. 157 Parallelism What issues must we be concerned with? Synchronization Execution of tasks in parallel causes them to be asynchronous Cannot predict at what point in time one task will execute an instruction relative to another task If the tasks are independent, this is not a problem No resources are shared, so it doesn�t matter where in the execution each task is Ex: One task to count ballots from Florida, one task to count ballots from New Mexico

158. 158 Parallelism If the tasks have some dependencies, there can be a problem Most common dependency is shared data To handle this we must synchronize the tasks Cooperation Synchronization One task is dependent upon an output/outcome of another Ex: Task B must process data produced by Task A Contractor B cannot put up drywall until contractor A has finished the wiring Task to count ballots cannot proceed until task that collects ballots provides it with some We must have a mechanism that allows Task B to pause until the data is available B could loop and keep checking for data B could wait for some signal from A

159. 159 Parallelism Competition Synchronization Both tasks are competing for the same shared resource If one or both tasks modify the data, it could cause data inconsistencies Ex: Task A and Task B are MAC machine accesses of the same bank account Task A checks the balance: $200 Task B checks the balance: $200 Task A withdraws $200 Task A updates balance to $0 Task B withdraws $200 Task B updates balance to $-200 We must have some mechanism that ensures MUTUAL EXCLUSION for CRITICAL DATA We could have a LOCK on the data, or a similar mechanism allowing only one task to access it at a time

160. 160 Parallelism Synchronization Mechanisms Semaphores Devised by Dijkstra Basically guards that are placed around code P must succeed to gain access to code Decrements a counter when it succeeds V executes when critical section ends Based on initial value of counter, we can control how many tasks are allowed to access the critical section at once If used properly, can guarantee either cooperation or competition synchronization However, it is easy to NOT use them properly Can cause problems

161. 161 Parallelism Monitors Devised by Hansen and Hoare Critical data section is part of a data object that allows only one task entry at a time Better than semaphores for competition synchronization, because mechanism is built into the monitor Harder to programmer to mess up No better for cooperation synchronization Still must be done manually Used in Concurrent Pascal, Modula-2 and (somewhat) in Java

162. 162 Parallelism Message Passing Proposed by Hansen and Hoare More general than either of the two previous techniques Tasks are synchronized via messages sent to each other Message is similar in look/execution to a subprogram call, but with restrictions: Caller (or passer) of the message is blocked at the call until the receiver is ready to receive it Receiver (or executer) of the message is blocked at the message code until the message is called Caller and Receiver meet at a rendezvous

163. 163 Parallelism Idea is that we know exactly where in the code both tasks will be when a rendezvous occurs So even though tasks execute asynchronously, we synchronize them with respect to each other at a rendezvous Ex: Ada Still much of the work is up to the programmer

164. 164 Parallelism Parallel processing concerns Data consistency We have already discussed this Mutual exclusion is needed to prevent multiple tasks from accessing critical data at the same time However, efforts to ensure data consistency can cause other problems, such as DEADLOCK and STARVATION

165. 165 Parallelism Deadlock When a (shared) resource has restricted access, it can cause a task to stop execution Wait in a semaphore queue Wait in a monitor queue Wait in an accept queue If a circular resource dependency exists, we can get deadlock Ex: Task A has acquired binary semaphore S1 Task B has acquired binary semaphore S2 Task A is waiting for binary semaphore S2 Task B is waiting for binary semaphore S1

166. 166 Parallelism Starvation To combat deadlock, most languages allow a task to release a resource prematurely in some circumstances Ex: If one of the Tasks in the previous example release the semaphore, the other can proceed Under these circumstances there is the possibility that a task may never acquire all of the resources that it needs at the time it needs them � starvation We must be careful to avoid all of these problems when programming in parallel

167. 167 Parallelism Let�s look at Java as an example: Deadlock: see deadlock.java Corrupt data: see corrupt.java Some features of older Java impls are now deprecated because they are too prone to deadlock and starvation problems Suspend / Resume Does not free locked objects Can easily lead to deadlock if not resumed Stop Immediately frees locked objects Can lead to data inconsistency

168. 168 Prolog As we discussed previously, Prolog is a language used for logic programming "Programs" in Prolog consist of facts and rules in a database Facts consist of an identifier followed by a comma separated list of objects (atoms) followed by a period The identifier represents some relationship amongst the objects, and is called a predicate The objects are the arguments Ex. from ex1.pl: father(herb, irving).

169. 169 Prolog Rules are predicates that consist of a head and a body In order for the head to "succeed" in its evaluation, all of the goals in the body must be satisfied These goals could be facts, or could be other rules Ex from ex1.pl: sibling(X,Y) :- X \== Y, parent(P,X), parent(P,Y). The :- can be thought of as "if" Execution of a program is in fact a sequence of questions, or assertions Database is searched in an effort to satisfy all of the assertions

170. 170 Prolog If assertions can be satisfied, answer is yes Otherwise, answer is no If a given assertion succeeds, execution proceeds to the next one If a given assertion fails, execution backtracks and attempts to re-satisfy the previous assertion So what about variable assignments? These are in fact just side effects that occur in an effort to satisfy the query In fact variables are not assigned in the traditional (imperative language) sense

171. 171 Prolog Variables in Prolog are dynamically typed and have two states: Uninstantiated: Variable is not associated with a value Instantiated Variable is associated with a value Once a variable is instantiated, it keeps that value, and all occurrences of that variable within the same scope have that value Cannot be re-assigned in sense of imperative languages However, if execution backtracks past the point at which it was instantiated, it can again become uninstantiated Let's look again at ex1.pl

172. 172 Prolog Recursion and database search Recursion is a fundamental part of programming in prolog Execution is simply satisfaction of goals, and there are no loops as in imperative languages Thus, to build complex "programs" we must utilize recursive programming Each attempt to satisfy a goal initiates a search of the database

173. 173 Prolog By default the DB is searched from top to bottom We can take advantage of this in our programs Ex: put the base case before the recursive case, so we don't have to explicitly test for it Although, as the text points out, this could be considered to be a flaw in the language, since the order that the rules are considered should not matter to the "truth" of the logic

174. 174 Prolog If a subgoal in a rule fails at any point, we backtrack and attempt to resatisfy a previously satisfied subgoal When resatisfying a subgoal, the db search resumes from the point at which it succeeded the first time See recurse.pl

175. 175 Prolog Lists As in Lisp, the list is an important data structure in Prolog A list consists of a head and a tail Tail could be the empty list

Course Notes for CS1621 Structure of Programming Languages Part B By John C. Ramirez Department of Computer Science Univ

Course Notes for CS1621 Structure of Programming Languages Part B By John C. Ramirez Department of Computer Science Univ

Presentation Transcript

Programming Languages: The Essence of Computer Science

Department of Computer Science

Department of Computer Science

Department of Computer Science

C# Programming Course – Part I

C# Programming Course – Part I

Department of computer science

Department of Computer Science

Department of Computer Science

CSE3302 Programming Languages (notes, notes, notes)

Advanced Structure of Programming Languages

Programming Languages Structure

STRUCTURE OF PROGRAMMING LANGUAGES

DEPARTMENT OF COMPUTER SCIENCE

Department of Computer Science

Department of Computer Science

Department of Computer Science

STRUCTURE OF PROGRAMMING LANGUAGES

Department of Computer Science

Department of Computer Science