CSE 452: Programming Languages

CSE 452: Programming Languages Subprograms

Outline • Subprograms • Parameter passing • Type checking • Using multidimensional arrays as parameters • Using subprograms as parameters • Overloaded subprograms • Generic subprograms • Implementation

Parameter Passing • Pass-by-value • Pass-by-result • Pass-by-value-result • Pass-by-reference • Pass-by-name

Parameter Passing in PL • Fortran • Always use inout-mode model of parameter passing • Before Fortran 77, mostly used pass-by-reference • Later implementations mostly use pass-by-value-result • C • mostly pass by value • Pass-by-reference is achieved using pointers as parameters int *p = { 1, 2, 3 }; void change( int *q) { q[0] = 4; } main() { change(p); /* p[0] = 4 after calling the change function */ }

Parameter Passing in PL • C • Pass-by reference: value of pointer is copied to the called function and nothing is copied back #include <stdio.h> void swap (int *p, int *q) { int *temp; temp = p; p = q; q = temp; } main() { int p[] = {1, 2, 3}; int q[] = {4, 5, 6}; int i; swap (p, q); }

Parameter Passing in PL • C++ • includes a special pointer type called a reference type void GetData(double &Num1, const int &Num2) { int temp; for (int i=0; i<Num2; i++) { cout << “Enter a number: “; cin >> temp; if (temp > Num1) { Num1 = temp; return; } } • Num1 and Num2 are passed by reference • const modifier prevents a function from changing the values of reference parameters • Referenced parameters are implicitly dereferenced • Why do we need a constant reference parameter?

Parameter Passing in PL • Ada • Reserved words: in, out, in out (in is the default mode) procedure temp(A : in out Integer; B : in Integer; C : in Integer ) • out mode can be assigned but not referenced • in mode can be referenced but not assigned • in out can be both referenced and assigned • Fortran • Semantic modes are declared using Intent attribute Subroutine temp(A, B, C) Integer, Intent(Inout) :: A Integer, Intent(In) :: B Integer, Intent(Out) :: C

Parameter Passing in PL • Perl • Actual parameters are implicitly placed in a predefined array named @_ sub foo { local $i, $a=0, $b = 1; for ($i=0; $i<scalar(@_); $i++) { $a = $a + $_[$i]; $b = $b * $_[$i]; } return ($a, $b); } … ($a, $b) = foo(1, 2, 3);

Memory contents Code Data Heap Stack Implementing Parameter Passing program code global and static data Dynamically allocated variables local data

Implementing Parameter Passing • Pass by Value • Values copied into stack locations • Stack locations serve as storage for corresponding formal parameters • Pass by Result • Implemented opposite of pass-by-value • Values assigned to actual parameters are placed in the stack, where they can be retrieved by calling program unit upon termination of called subprogram • Pass by Value Result • Stack location for parameters is initialized by by the call and then copied back to actual parameters upon termination of called subprogram

Implementing Parameter Passing • Pass by Reference • Regardless of type of parameter, put the address in the stack • For literals, address of literal is put in the stack • For expressions, compiler must build code to evaluate expression before the transfer of control to the called subprogram • Address of memory cell in which code places the result of its evaluation is then put in the stack • Compiler must make sure to prevent called subprogram from changing parameters that are literals or expressions • Access to formal parameters is by indirect addressing from the stack location of the address

Implementing Parameter Passing val res val-res ref sub(a,b,c,d) sub(w,x,y,z) Main program calls sub(w,x,y,z) where w is passed by value, x is passed by result, y is passed by value-result, and z is passed by reference

Implementing Parameter Passing • Pass by Name • run-time resident code segments or subprograms evaluate the address of the parameter • called for each reference to the formal • Very expensive, compared to pass by reference or value-result

Multidimensional Arrays as Parameters • C: • Uses row major order for matrices address(mat[i, j]) = address(mat[0,0]) + i*num_columns + j • Must specify num_columns but not num_rows void fun (int matrix[][10]) { … } void main() { int mat[5][10]; fun(mat); … } • Does not allow programmers to write function that accepts different number of columns • Alternative: use pointers

Multidimensional Arrays as Parameters • Ada: type Mat_Type is array (Integer range <> Integer range<>) of Float; Mat1 : Mat_Type(1..100, 1..20); function Sumer(Mat : in Mat_Type) return Flat is Sum : Float := 0.0; begin for Row in Mat’range(1) loop for Col in Mat’range(2) loop Sum := Sum + Mat(Row, Col); end loop; end loop; return Sum; end Sumer; No need to specify size of array Use range attribute to obtain size of arrray

Multidimensional Arrays as Parameters • Fortran • Array parameters must have declaration after the header Subroutine Sub (Matrix, Rows, Cols, Result) Integer, Intent(In) :: Rows, Cols Real, Dimension(Rows, Cols), Intent(In) :: Matrix Real, Intent (In) :: Result … End Subroutine Sub

Subprogram Names as Parameters • Issues: • Are parameter types checked? • Early Pascal and FORTRAN 77 do not; later versions of Pascal and FORTRAN 90 do • Ada does not allow subprogram parameters • Java does not allow method names to be passed as parameters • C and C++ - pass pointers to functions; parameters can be type checked • What is the correct referencing environment for a subprogram that was sent as a parameter? • Environment of the call statement that enacts the passed subprogram • Shallow binding • Environment of the definition of the passed subprogram • Deep binding • Environment of the call statement that passed the subprogram as actual parameter • Ad hoc binding (Has never been used)

function sub1() { var x; function sub2() { alert(x); }; function sub3() { var x; x = 3; sub4(sub2); } function sub4(subx) { var x; x = 4; subx(); }; x = 1; sub3(); }; Shallow binding: Referencing environment of sub2 is that of sub4 Deep binding Referencing environment of sub2 is that of sub1 Ad-hoc binding Referencing environment of sub2 is that of sub3 Subprogram Names as Parameters

Overloaded Subprograms • A subprogram that has the same name as another subprogram in the same referencing environment • Every version of the overloaded subprogram must have a unique protocol • Must be different from others in the number, order, or types of its parameters, or its return type (if it is a function) • C++, Java, Ada, and C# include predefined overloaded subprograms – e.g., overloaded constructors in C++ • Overloaded subprograms with default parameters can lead to ambiguous subprogram calls void foo( float b = 0.0 ); void foo(); … foo(); /* call is ambiguous; may lead to compilation error */

Generic (Polymorphic) Subprograms • Polymorphism: • Increase reusability of software • Types: • Ad hoc polymorphism = Overloaded subprogram • Parametric polymorphism • Provided by a subprogram that takes a generic parameter that is used in a type expression • Ada and C++ provide compile-time parametric polymorphism

generic type Index_Type is (<>); type Element_Type is private; type Vector is array (Integer range <>) of Element_Type; procedure Generic_Sort(List : in out Vector); procedure Generic_Sort(List : in out Vector) is Temp : Element_Type; begin for Top in List'First .. Index_Type’Pred(List’Last) loop for Bottom in Index_Type’Succ(Top) .. List’Last loop if List(Top) > List(Bottom) then Temp := List (Top); List(Top) := List(Bottom); List(Bottom) := Temp; end if; end loop; -- for Bottom ... end loop; -- for Top ... end Generic_Sort; Example: procedure Integer_Sort is new Generic_Sort( Index_Type => Integer; Element_Type => Integer; Vector => Int_Array); Generic Subprograms

Generic Subprograms template <class Type> void generic_sort(Type list[], int len) { int top, bottom; Type temp; for (top = 0; top < len - 2; top++) for (bottom = top + 1; bottom < len - 1; bottom++) { if (list[top] > list[bottom]) { temp = list [top]; list[top] = list[bottom]; list[bottom] = temp; } //** end of for (bottom ... } //** end of generic_sort float flt_list[100]; ... generic_sort(flt_list, 100); // Implicit instantiation

Implementing Subprograms • The subprogram call and return operations are together called subprogram linkage • Implementation of subprograms must be based on semantics of subprogram linkage • Implementation: • Simple subprograms • no recursion, use only static local variables • Subprograms with stack-dynamic variables • Nested subprograms

Simple Subprograms • Simple • subprograms are not nested and all local variables are static • Example: early versions of Fortran • Call Semantics require the following actions: • Save execution status of current program unit • Carry out parameter passing process • Pass return address to the callee • Transfer control to the callee • Return Semantics require the following actions: • If pass by value-result or out-mode, move values of those parameters to the corresponding actual parameters • If subprogram is a function, move return value of function to a place accessible to the caller • Restore execution status of caller • Transfer control back to caller

Simple Subprograms • Required Storage: • Status information of the caller • Parameters • return address • functional value (if it is a function) • Subprogram consists of 2 parts: • Subprogram code • Subprogram data • The format, or layout, of the noncode part of an executing subprogram is called an activation record • An activation record instance (ARI) is a concrete example of an activation record (the collection of data for a particular subprogram activation)

Code and Activation record of a program with simple subprograms • Activation record instance for simple subprograms has fixed size. Therefore, it can be statically allocated • Since simple subprograms do not support recursion, there can be only one active version of a given subprogram

Local variables Parameters Dynamic link Return address Subprograms with Stack-Dynamic Variables • Compiler must generate code to cause implicit allocation and deallocation of local variables Run-time stack Top of the stack Activation record instance Points to top of activation record instance of caller Pointer to code segment of the caller and an offset address of the instruction following the call

Subprograms with Stack-Dynamic Variables void sub(float total, int part) { int list[4]; float sum; … } Local variable sum Local variable list[3] Local variable list[2] Local variable list[1] Local variable list[0] Parameter part Parameter total Dynamic link Return address

Example: without Recursion void A(int X) { int Y; … C(Y); } void B(float R) { int S, T; … A(S); … } void C(int Q) { … } void main() { float P; … B(P); … } 2 1 3 Collection of dynamic links present in the stack at any given time is called the dynamic chain

Subprograms with Stack-Dynamic Variables • Recursion adds possibility of multiple simultaneous activations of a subprogram • Each activation requires its own copy of formal parameters and dynamically allocated local variables, along with return address

Subprograms with Recursion int factorial (int n) { … if (n <= 1) return 1; else return n*factorial(n-1); … } void main() { int value; value = factorial(3); … }

Subprograms with Recursion int factorial (int n) { … if (n <= 1) return 1; else return n*factorial(n-1); … } void main() { int value; value = factorial(3); … } 1 2 3

Nested Subprograms • Support for static scoping • Implemented using static link (also called static scope pointer), which points to the bottom of the activation record instance of its static parent

Nested Subprograms • Static chain: • links all static ancestors of executing subprogram • Static_depth • an integer associated with static scope that indicates how deeply it is nested in outermost scope • Chain offset • Difference between static_depth of procedure containing reference to variable x and static_depth of procedure containing declaration of x procedure A is procedure B is procedure C is … end; -- of C … end; -- of B … end; -- of A • Static_depths of A, B, and C are 0, 1, and 2, respectively • If procedure C references a variable declared in A, the chain_offset of that reference is 2

Nested Subprograms program MAIN_2; var X : integer; procedure BIGSUB; var A, B, C : integer; procedure SUB1; var A, D : integer; begin { SUB1 } A := B + C; <-----------------1 end; { SUB1 } procedure SUB2(X : integer); var B, E : integer; procedure SUB3; var C, E : integer; begin { SUB3 } SUB1; E := B + A: <-------------2 end; { SUB3 } begin { SUB2 } SUB3; A := D + E; <----------------3 end; { SUB2 } begin { BIGSUB } SUB2(7); end; { BIGSUB } begin BIGSUB; end. { MAIN_2 } Calling sequence: Main_2 calls BIGSUB BIGSUB calls Sub2 Sub2 calls Sub3 Sub3 calls Sub1

Example program MAIN_2; var X : integer; procedure BIGSUB; var A, B, C : integer; procedure SUB1; var A, D : integer; begin { SUB1 } A := B + C; <-----------------1 end; { SUB1 } procedure SUB2(X : integer); var B, E : integer; procedure SUB3; var C, E : integer; begin { SUB3 } SUB1; E := B + A: <-------------2 end; { SUB3 } begin { SUB2 } SUB3; A := D + E; <----------------3 end; { SUB2 } begin { BIGSUB } SUB2(7); end; { BIGSUB } begin BIGSUB; end. { MAIN_2 } References to A: 1: (0,3) (local) 2: (2,3) (two levels away) 3: (1,3) (one level away)

Nested Subprograms • At position 1 in SUB1: • A - (0, 3) ============> (chain_offset, local_offset) • B - (1, 4) • C - (1, 5) • At position 2 in SUB3: • E - (0, 4) • B - (1, 4) • A - (2, 3) • At position 3 in SUB2: • A - (1, 3) • D - an error ==== ARI for sub1 has been removed • E - (0, 5)

Nested Subprograms • Drawbacks • A nonlocal reference is slow if the number of scopes between the reference and the declaration of the referenced variable is large • Time-critical code is difficult, because the costs of nonlocal references are hard to estimate • Displays • Alternative to static chains • Store static links in a single array called display, instead of storing in the activation records • Accesses to nonlocals require exactly two steps for every access, regardless of the number of scope levels • Link to correct activation record is found using a statically computed value called the display_offset • Compute local_offset within activation record instance

Buffer overflow attack • The effectiveness of the buffer overflow attack has been common knowledge in software circles since the 1980’s • The Internet Worm used it in November 1988 to gain unauthorized access to many networks and systems nationwide • Still used today by hacking tools to gain “root” access to otherwise protected computers • The fix is a very simple change in the way we write array accesses; unfortunately, once code that has this vulnerability is deployed in the field, it is nearly impossible to stop a buffer overflow attack

Overview of Buffer Overflow Attacks • The buffer overflow attack exploits a common problem in many programs. • In several high-level programming languages such as C, “boundary checking”, i.e. checking to see if the length of a variable you are copying is what you were expecting, is not done. void myFunction(char *str) { char bufferB[16]; strcpy(bufferB, str); } void main(){ char bufferA[256]; myFunction(bufferA); }

Overview of Buffer Overflow Attacks void myFunction(char *str) { char bufferB[16]; strcpy(bufferB, str); } void main(){ char bufferA[256]; myFunction(bufferA); } • main() passes a 256 byte array to myFunction(), and myFunction() copies it into a 16 byte array! • Since there is no check on whether bufferB is big enough, the extra data overwrites other unknown space in memory. • This vulnerability is the basis of buffer overflow attacks • How is it used to harm a system? • It modifies the system stack

Overview of Buffer Overflow Attacks Stackcontent void main(){ char bufferA[256]; myFunction(bufferA); } bufferA

Overview of Buffer Overflow Attacks Stackcontent void main(){ char bufferA[256]; myFunction(bufferA); } bufferB OS data Str void myFunction(char *str) { char bufferB[16]; strcpy(bufferB, str); } Dynamic link Return Address to Main bufferA

Overview of Buffer Overflow Attacks Stackcontent void main(){ char bufferA[256]; myFunction(bufferA); } bufferB OS data This region is now contaminated with data from str Str void myFunction(char *str) { char bufferB[16]; strcpy(bufferB, str); } Dynamic link Return Address to Main bufferA May overwrite the return address!!

Overview of Buffer Overflow Attacks Stackcontent • If the content of str is carefully selected, we can point the return address to a piece of code we have written • When the system returns from the function call, it will begin executing the malicious code Malicious Code New Address bufferA

CSE 452: Programming Languages