91.102 - Computing II

91.102 - Computing II Modularity, Information Hiding and Abstract Data Types. Difficulty: Programs that solve “Real World” problems can get very large - up to millions of lines of code. Nobody can understand or remember that many lines of code. Problem: How do we make it possible for normal human beings to contribute to such large endeavors? Note: I did not say “How do we remove the difficulty”, because we do not KNOW how to remove it. We have some techniques that help us get close enough to produce - much of the time - usable programs within feasible budgets and time requirements.

91.102 - Computing II There are two major ideas (borrowed from other, much older, engineering disciplines): 1) Break the task up into well-defined pieces; 2) Hide the implementation details of the pieces whenever possible. What do they imply?

91.102 - Computing II • The first idea requires that we find how to break the project into small pieces that can be individually designed, built and glued back together in such a way that the “big product” can be delivered. • The second idea allows us to recover from occasional bad implementation decisions without having to start all over again. A side benefit is that good implementations of “pieces” might be reusable by other projects, cutting their costs down.

91.102 - Computing II What is the mechanism we provide to implement the policies just described? The MODULE. This consists, physically, of two parts: Interface. Implementation. The Interface provides the PUBLIC information, i.e. the information a USER needs to make use of the functionality. The Implementation contains the PRIVATE information, i.e. the implementation of the functionality advertised in the Interface File.

91.102 - Computing II User Program Or Other Module Interface Public Information Header File Implementation Private Information Code File

91.102 - Computing II What IS a Module? A set of declarations placed into service inside a program. (A quote from the text…) In somewhat more detail, a module is a unit of organization of a software system that A) packages together a collection of entities (data and operations) that provide a set of capabilities useful in the solution of a class of problems; B) carefully controls what external users of the module can see and use. (NB: this has NOTHING to do with C - every useful language implements some variant of these ideas: C provides one of the LEAST sophisticated variants)

91.102 - Computing II A common characteristic of Modules through all the languages that support this mechanism is Separate Compilation. This simply means that collections of functions and data can be compiled independently of one another and of any program that may use them. Any changes to the user program, or to any of the collections require only recompilation of a minimal number of modules. This may not be important with programs of a few hundred or a few thousand lines, but is crucial to the development of software that takes hundreds of thousands or millions of lines to deliver.

91.102 - Computing II How we create a Module (C notation…) ModuleInterface.h A file that contains all the entities that must be visible to the user of the module: any constants, type definitions, variable definitions, and functions (i.e., function prototypes) that the user’s program is allowed to have explicit access to, to use or modify - depending on the entity. ModuleImplementation.c A file that contains all the private entities: the implementation code for the functions, and all those constants, variables and functions which the module user may NOT have direct access to.

91.102 - Computing II #include "ModuleInterface.h" User program ModuleInterface.h #include "ModuleInterface.h" Implementation Private Information Code File ModuleImplementation.c

91.102 - Computing II How we USE a Module: The USER PROGRAM must request the interface file via an include directive. Example: #include <stdio.h> /* include system file */ /* …. Other system inclusions …. */ #include “ModuleInterface.h” /* include non-system module */ /* …. Other modules …. */ /* …. User Program …. */

91.102 - Computing II An Example: Priority Queues. A priority queue is a finite collection of items of the same type, each associated with a number called “the priority of the item”, for which the following operations are defined: 1) Initialization - returns an empty PQ. 2) Check for Empty - is the PQ empty or not? 3) Check for Full - is there any more room in the PQ? 4) Insert a new item into an existing (not Full) PQ. 5) If the PQ is not Empty, remove from it an item X of highest priority.

91.102 - Computing II Notice that we have said NOTHING YET about implementation: the Priority Queue is an ABSTRACT DATA TYPE. The implementor will have to decide on implementation details. A good “user interface” (Module Interface, sometimes also known as API - Application Programmer Interface) - and a good design - will make the implementation details invisible to the user.

91.102 - Computing II What are Priority Queues used for? Every time you run a task on a computer, the task ends up on some kind of priority queue. The Operating System manages a number of such queues, to provide all kinds of services: printing your output, scheduling your task to be run, saving your files to disk, etc.. Any time you attempt to use a “shared resource”, you end up on some kind of PQ, waiting for your turn… So they are important and they are very common.

91.102 - Computing II Remember the list functions: Head, Tail and Cons? They allowed us to construct a list starting with an empty list and items from some universe. The construction added one item at a time. We could find the item at head of the list by just applying the function Head to the list; we could find the remainder of the list by applying the function Tail. In particular: Tail(Cons(&info, L)) = L Head(Cons(&info, L)) = &info And Cons(Head(L), Tail(L)) returns a list with the same contents as L.

91.102 - Computing II We could, by analogy, introduce functions PQCons, PQHead and PQTail. What should they do? Let PQ be a variable pointing to the Priority Queue PQHead(PQ) = &highestPriorityItemInPQ PQTail(PQ) = &aPriorityQueue which contains all the items in PQ EXCEPT FOR the highest priority one. Cons(&info, PQ) = PQ', (the address of) a new priority queue, containing all the old elements plus the new one.

91.102 - Computing II The main thing to observe is that the order of insertion is unrelated to the order of extraction. The order of extraction depends on a property of the information field which we call the PRIORITY.

91.102 - Computing II You may observe that a LIST - as defined by Head, Tail and Cons - is just a priority queue in which the latest item inserted has priority higher than that of any item already in the priority queue... There are a number of reasons why this is NOT the most convenient way to look at Priority Queues - we will now turn to a slightly more conventional approach. We also interested in constructing MODULES rather than just showing how to write a few functions...

91.102 - Computing II An immediate problem is: who decides the type of the objects that will make up the priority queue? It should be obvious from the previous discussions that NO module designer can cover ALL the possible types of objects that could be put into a priority queue - i.e., for which the notion of priority could make sense. This decision must be made by the application programmer who is the module user: only she can know what she is trying to prioritize… PQInterface.h needs to contain an “include directive” to provide the necessary type definitions : this is true for C - it need not be true for other (even strongly typed) languages.

91.102 - Computing II // PQInterface.h #include “PQUserTypes.h” // defines PQItem // See next two slides for choices for here... extern void PQInitialize(PriorityQueue *); //init empty extern bool PQEmpty(PriorityQueue *); // check if empty extern bool PQFull(PriorityQueue *); // check if full extern int PQSize(PriorityQueue *); // how many items extern void PQInsert(PQItem, PriorityQueue *);// insert extern PQItem PQRemove(PriorityQueue *);//remove highest This tells the user what the “user interface” is: the functions, the types of objects expected as function parameters, and the types of objects returned by the functions.

91.102 - Computing II PQInterface.h- Linked List Implementation. C needs to know these details - some other languages can hide them. typedef struct PQNodeTag { PQItem NodeItem; struct PQNodeTag *Link; } PQListNode; typedef struct { int Count; PQListNode *ItemList; } PriorityQueue;

91.102 - Computing II PQInterface.h- Array Implementation. typedef PQItem PQArray[MAXCOUNT]; typedef struct { int Count; PQArray ItemArray; } PriorityQueue;

91.102 - Computing II PQUserTypes.h- User decision on type of objects in Priority Queue and maximum size of queue expected.. #define MAXCOUNT 10 // Just 10? typedef int PQItem; // User defined /* this is where we stop the explicit user knowledge*/ Although YOU don’t need to know any more, your program - actually the compiler compiling your program - does. This is why more detail about the representation is available in the other header file (interface part of the module).

91.102 - Computing II Unfortunately, all this makes it impossible to “compile once - use many times”. The whole module will have to be recompiled every time you change the type of object managed by the priority queue. It can also manage only one single type of object: you could not have Priority Queues of different types of objects using functions with exactly the same names. When the object managed is fixed (e.g., in the strings module, the stdio module, etc.), then it is possible to “compile once - use many times”. In other language environments the ”single type restriction” need not hold.

91.102 - Computing II Example of use: sorting an array. // Define the types typedef int PQItem; typedef PQItem SortingArray[10]; // Declare the array SortingArray A; // Define the sorting function void PriorityQueueSort(SortingArray A) { int i; PriorityQueue PQ; Initialize(&PQ); for(i = 0; i < 10; ++i) PQInsert(A[i], &PQ); for(i = 9; i >= 0; --i) A[i] = PQRemove(&PQ); }

91.102 - Computing II Problem: how can we perform comparisons if we don’t know WHAT kind of items we need to compare and HOW we can compare them? The Priority Queue, AS GIVEN, does not require the definition of a user provided comparison function, and does not accept such a function as a parameter to either the Initialize, Insert or Remove functions. It appears to require such a comparison as a “built in”, which would seriously limit the generality of the Module. We leave this with the statement that it can be done within C - we won’t pursue this at this point, because it would further complicate our discussion.

91.102 - Computing II What are the trade-offs between the two implementations? 1) The linked-list implementation uses only the space it needs, while the array one must allocate all of its space at the beginning. 2) The Sorted linked list implementation is less efficient than the Unsorted array one at inserting an item. 3) The Unsorted array implementation is less efficient than the Sorted linked list one at removing an item.

91.102 - Computing II A Generalization. The idea of modularization can be extended to what is called Work Breakdown Structure : the division of a software project into subprojects, tasks, subtasks, deliverables, etc. The example given by the text is that of the design and implementation of a simple calculator.

91.102 - Computing II In this case, the decisions are “fairly simple”: there is a reasonably clear “user interface module” and a reasonably clear “computation module”. The functions of the two are easily separable. The interface BETWEEN the two can consist of strings of characters: the user interface sends the string containing an expression to the compute engine, which determines the legality of the expression, translates from character form to one suitable for arithmetic, performs the arithmetic, translates the result into character form and returns the result string, to be displayed.

91.102 - Computing II /*CalculatorModuleInterface.h */ char *Expression, *Value; extern void InitializeAndDisplayCalculator(void); extern void GetAndProcessOneEvent(void); extern int UserSubmittedAnExpression(void); extern void Display(char *); extern int UserWantsToQuit(void); extern void ShutDown(void); /*YourCalculatorModuleInterface.h */ extern char *Evaluate(char *);

91.102 - Computing II #include <stdio.h> #include “CalculatorModuleInterface.h” #include “YourCalculatorModuleInterface.h” int main(void) { InitializeAndDisplayCalculator(); do { GetAndProcessOneEvent(); if (USerSubmittedAnExpression()){ Value = Evaluate(Expression); Display(Value); } } while (!UserWantsToQuit()); ShutDown(); }

91.102 - Computing II More Ideas about Information Hiding and Modularization. We cannot "really" implement "Abstract Data Types" - since our implementing them requires we make representational decisions based on multiple considerations. A reasonable question is : How close can we get to a representation independent notation so that our "approximation" to an abstract data type is as good as we can manage?

91.102 - Computing II Example: Three Implementations (in C) of LINKED LISTS to be used for the abstract data type LIST. First Implementation: based on C pointers as links: L Item Link Item Link Item Link Second Implementation: array of Node (Info and Link) structs: 0 1 2 3 4 5 6 7 8 9 .Item .Link x1 x2 x3 x4 1 2 5 -1 L = 0

91.102 - Computing II Third Implementation: array of Info AND array of Link 0 1 2 3 4 5 6 7 8 9 Item x1 x2 x3 x4 Link 1 2 5 -1 L = 0 In the second and third implementation, the Link is just an integer used to index into the array.

91.102 - Computing II Some of these implementations involve pointer variables, some involve arrays of structs, some involve arrays of simple types. How can we design an interface that will work THE SAME WAY regardless of the underlying implementation? 1) It can’t explicitly deal with the underlying pointer variables; 2) It can’t explicitly deal with the underlying structs; 3) It can’t explicitly deal with the underlying arrays.

91.102 - Computing II 4) We assume the Item field can be managed as a struct; (early FORTRAN had no structs, so the multiple fields of a struct would have required one full array each) 5) We have to introduce a NULL that can be made meaningful in all three representations: call it null (lower case - no conflict with the built-in) and be prepared to initialize it to the correct value for each implementation.

91.102 - Computing II The textbook provides a set of functions, all ready for us. Question: how did the author get those functions? Miraculous inspiration? The author is SO experienced that he was able to figure them out by just having the problem presented? Neither alternative: pick a function that you think is fairly representative of the kind of operation you will need to support, code it in all three representations and see what THAT tells you. If that’s not enough, pick a function that will require manipulation of most of the representation features you missed on the first pass and try again. If you need several successive tries, so be it...

91.102 - Computing II A Reverse for Normal Linked Lists. Use a function where the new list is returned through the single parameter passed by reference (i.e., pass its address by value), since this is the accepted way to manage two-way communication via the parameter list. L is a NodePointer which could be either a true pointer or an integer used as an index into the array.

91.102 - Computing II L Item Link Item Link Item Link void Reverse(NodePointer *L) /* L is the address of a pointer to a Node */ { NodePointer R, N; R = null; // NULL in this case while(*L != null) { // there is something N = *L; // save it *L = (*L)->Link; // find the next one N->Link = R; // re-link the saved one R = N; // new head of part-reversed list } *L = R; // new head of reversed list }

91.102 - Computing II A Reverse for Linked Lists as arrays of struct. The NodePointer is just an integer; the array index of the next structure. We need something like: Node ListMemory[MAXPOINTER];// for the array of nodes

91.102 - Computing II 0 1 2 3 4 5 6 7 8 9 .Item .Link x1 x2 x3 x4 1 2 5 -1 L = 0 void Reverse(NodePointer *L) /* L is the address of a pointer to a Node */ { NodePointer R, N; R = null; // -1 in this case while(*L != null) { // there is something N = *L; // save it *L = ListMemory[*L].Link; // next one ListMemory[N].Link = R; // re-link saved one R = N; // new head of part-reversed list } *L = R; // new head of reversed list }

91.102 - Computing II A Reverse for Linked Lists as double arrays. The NodePointer is just an integer. We also need something like: ListItem Item[MAXPOINTER];// for the array of items NodePointer Link[MAXPOINTER];// for the array of links

91.102 - Computing II 0 1 2 3 4 5 6 7 8 9 Item x1 x2 x3 x4 Link 1 2 5 -1 L = 0 void Reverse(NodePointer *L) /* L is the address of a pointer to a Node */ { NodePointer R, N; R = null; // -1 in this case while(*L != null) { // there is something N = *L; // save it *L = Link[*L]; // find the next one Link[N] = R; / re-link the saved one R = N; // new head of part-reversed list } *L = R; // new head of reversed list }

91.102 - Computing II Reverse differs in only two lines from definition to definition: Normal Linked Lists: *L = (*L)->Link; // get the next one N->Link = R; // re-link the saved one Array of Nodes: *L = ListMemory[*L].Link; // get the next one ListMemory[N].Link = R; // re-link the saved one Double Array: *L = Link[*L]; // get the next one Link[N] = R; // re-link the saved one

91.102 - Computing II They differ in the syntax for GETTING the value of the link and for SETTING the value of a link. Those two operations are candidates for “hiding”: introduce intermediate functions that hide the details. Getting the Link: NodePointer GetLink(NodePointer N) { return(N->Link); } // normal linked lists NodePointer GetLink(NodePointer N) { return(ListMemory[N].Link);} // arrays of struct NodePointer GetLink(NodePointer N) { return(Link[N]); } // double arrays

91.102 - Computing II Setting the Link: void SetLink(NodePointer N, NodePointer L) { N->Link = L; } /* normal linked lists */ void SetLink(NodePointer N , NodePointer L) { ListMemory[N].Link = L;} /* arrays of struct */ void SetLink(NodePointer N , NodePointer L) { Link[N] = L; } /* double arrays */

91.102 - Computing II This leads us to the Reverse function: void Reverse(NodePointer *L) /* L is the address of a pointer to a Node */ { NodePointer R, N; R = null; // -1 in this case while(*L != null) { // there is something N = *L; // save it *L = GetLink(*L); // find the next one SetLink(N, R); // re-link the saved one R = N; // new head of part-reversed list } *L = R; // new head of reversed list }

91.102 - Computing II Another area of potential problems is in the allocation and deallocation of nodes. In the Normal Linked List version we could define (in preparation for the fact that ALL implementations will need the same function calls): void AllocateNewNode(NodePointer *N); { *N = (NodePointer)malloc(sizeof(Node)); } void FreeNode(NodePointer N) { free(N); } /* no safety - YOU set to null */ Where the runtime environment takes care of managing space...

91.102 - Computing II When lists are implemented via arrays, we must keep track of which array elements are in use and which are free. NodePointer Avail;// points to the head of the FREE LIST void AllocateNewNode(NodePointer *N); { *N = Avail; /* for arrays of struct */ Avail = ListMemory[Avail].Link; } void AllocateNewNode(NodePointer *N); { *N = Avail; /* for double arrays */ Avail = Link[Avail]; }

91.102 - Computing II Unfortunately, this requires an initialization: Avail = 0; for(i = 0; i < MAXPOINTER - 1; i++) ListMemory[i].Link = i + 1; ListMemory[MAXPOINTER - 1].Link = null; Or for(i = 0; i < MAXPOINTER - 1; i++) Link [i] = i + 1; Link [MAXPOINTER - 1] = null;

91.102 - Computing II Avail = 0; /* and all the nodes are empty */ 0 1 2 3 4 5 6 7 8 9 .Item .Link 1 2 3 4 5 6 7 8 9 null 0 1 2 3 4 5 6 7 8 9 Item Link 1 2 3 4 5 6 7 8 9 null Calls to AllocateNewNode will simply return the first free node.

91.102 - Computing II