CSEP505: Programming Languages Lecture 8: Wrap-Up Types; Start Concurrency

CSEP505: Programming LanguagesLecture 8: Wrap-Up Types; Start Concurrency Dan Grossman Winter 2009

Where were we • Covered the use of type variables to increase expressiveness • Generics = “for all” • Abstract types in interfaces = “there exists” • For both, type variables can be “any type” but the same type variable in the same scope must be the “same type” • Now • 1 more place existentials come up (implementing closures) • ML-style type inference • Some odds and ends so I’m not lying • Combining parametric and subtype polymorphism • Then: PL support for concurrency and parallelism CSE P505 Winter 2009 Dan Grossman

Closures & Existentials • There’s a deep connection between  and how closures are (1) used and (2) compiled • Callbacks are the canonical example: (* interface *) valonKeyEvent: (int->unit)->unit (* implementation *) let callbacks : (int->unit) list ref = ref [] let onKeyEvent f = callbacks := f::(!callbacks) let keyPress i = List.iter (funf-> f i) !callbacks CSE P505 Winter 2009 Dan Grossman

The connection • Key to flexibility: • Each callback can have “private fields” of different types • But each callback has type int->unit • There exists an environment of some type • In C, we don’t have closures or existentials, so we use void* (next slide) • Clients must downcast their environment • Clients must assume library passes back correct environment CSE P505 Winter 2009 Dan Grossman

Now in C /* interface */ typedef struct{void*env;void(*f)(void*,int);}*cb_t; void onKeyEvent(cb_t); /* implementation (assuming a list library) */ list_t callbacks=NULL; void onKeyEvent(cb_t cb){ callbacks=cons(cb, callbacks); } void keyPress(int i) { for(list_t lst=callbacks; lst; lst=lst->tl) lst->hd->f(lst->hd->env, i); } /* clients: full of casts to/from void* */ CSE P505 Winter 2009 Dan Grossman

The type we want • The cb_t type should be an existential (not a forall): • Client does a “pack” to make the argument for onKeyEvent • Must “show” the types match up • Library does an “unpack” in the loop • Has no choice but to pass each cb_t function pointer its own environment • See Cyclone if curious (syntax isn’t pretty though) /* interface using existentials (not C) */ typedef struct{α.α env; void(*f)(α, int);}*cb_t; void onKeyEvent(cb_t); CSE P505 Winter 2009 Dan Grossman

Where are we • Done: understand subtyping • Done: understand “universal” types and “existential” types • Now: making universal types easier to use but less powerful • Type inference • Reconsider first-class polymorphism / polymorphic recursion • Polymorphic-reference problem • Combining parametric and subtype polymorphism CSE P505 Winter 2009 Dan Grossman

The ML type system • Called “Algorithm W” or “Hindley-Milner inference” • In theory, inference “fills out explicit types” • Complete if finds an explicit typing whenever one exists • In practice, often merge inference and checking An algorithm best understood by example… • Then describe the type system for which it infers types • Yes, this is backwards: how does it do it, before defining it CSE P505 Winter 2009 Dan Grossman

Example #1 let f x = let (y,z) = x in (abs y) + z CSE P505 Winter 2009 Dan Grossman

Example #2 let rec sum lst = match lst with [] -> 0 |hd::tl-> hd + (sum tl) CSE P505 Winter 2009 Dan Grossman

Example #3 let rec length lst = match lst with [] -> 0 |hd::tl-> 1 + (length tl) CSE P505 Winter 2009 Dan Grossman

Example #4 let compose f g x = f (g x) CSE P505 Winter 2009 Dan Grossman

Example #5 let recfunnyCount f g lst1 lst2 = match lst1 with [] -> 0 | hd::tl -> (if (f hd) then 1 else 0) + funnyCount g f lst2 tl (* does not type-check: let useFunny = funnyCount (fun x -> x=4) not [2;4;4] [true;false] *) CSE P505 Winter 2009 Dan Grossman

More generally • Infer each let-binding or toplevel binding in order • Except for mutual recursion (do all at once) • Give each variable a fresh “constraint variable” • Add constraints for each subexpression • Very similar to typing rules • Circular constraints fail (so x x never typechecks) • After inferring let-body, generalize (unconstrained constraint variables become type variables) Note: Actual implementations much more efficient than “generate big pile of constraints then solve” • (can unify eagerly) CSE P505 Winter 2009 Dan Grossman

What this infers “Natural” limitations of this algorithm: Universal types, but • Only let-bound variables get polymorphic types • This is why let is not sugar for fun in Caml • No first-class polymorphism (all foralls all the way to the left) • No polymorphic recursion Unnatural limitation imposed for soundness reasons we will see: 4. “Value restriction”: letx= e1 in e2 gives x a polymorphic type only if e1 is a value or a variable • Includes e1 being a function, but not a partial application • Caml has recently relaxed this slightly in some cases CSE P505 Winter 2009 Dan Grossman

Why? • These restrictions are usually tolerable • Polymorphic recursion makes inference undecidable • Proven in 1992 • First-class polymorphism makes inference undecidable • Proven in 1995 • Note: Type inference for ML efficient in practice, but not in theory: A program of size n and run-time n can have a type of size O(2^(2^n)) • The value restriction is one way to prevent an unsoundness with references CSE P505 Winter 2009 Dan Grossman

Given this… Subject to these 4 limitations, inference is perfect: • It gives every expression the most general type it possibly can • Not all type systems even have most-general types • So every program that can type-check can be inferred • That is, explicit type annotations are never necessary • Exceptions are related to the “value restriction” • Make programmer specify non-polymorphic type CSE P505 Winter 2009 Dan Grossman

Going beyond “Good” extensions to ML still being considered A case study for “what matters” for an extension: • Soundness: Does the system still have its “nice properties”? • Conservatism: Does the system still typecheck every program it used to? • Power: Does the system typecheck “a lot” of new programs? • Convenience: Does the system not require “too many” explicit annotations? CSE P505 Winter 2009 Dan Grossman

Where are we • Done: understand subtyping • Done: understand “universal” types and “existential” types • Now: making universal types easier to use but less powerful • Type inference • Reconsider first-class polymorphism / polymorphic recursion • Polymorphic-reference problem • Then: Bounded parametric polymorphism • Synergistic combination of universal and subtyping • Then onto concurrency (more than enough types!) CSE P505 Winter 2009 Dan Grossman

Polymorphic references A sound type system cannot accept this program: letx= ref [] in x := 1::[]; match !x with _ -> () |hd::_ -> hd ^ “gotcha” But it would assuming this interface: type’a ref val ref : ’a -> ’a ref val ! : ’a ref -> ’a val := : ’a ref -> ’a -> unit CSE P505 Winter 2009 Dan Grossman

Solutions Must restrict the type system Many ways exist: • “Value restriction”: ref [] cannot have a polymorphic type • syntactic look for ref type not enough • Let ref [] have type (α.α list) ref • not useful and not an ML type • Tell the type system “mutation is special” • not “just another library interface” CSE P505 Winter 2009 Dan Grossman

Where are we • Done: understand subtyping • Done: understand “universal” types and “existential” types • Now: making universal types easier to use but less powerful • Type inference • Reconsider first-class polymorphism / polymorphic recursion • Polymorphic-reference problem • Combining parametric and subtype polymorphism CSE P505 Winter 2009 Dan Grossman

Why bounded polymorphism Could one language have τ1 ≤τ2 and α.τ? • Sure! They’re both useful and complementary • But how do they interact? • When is α.τ1 ≤β.τ2 ? • What about bounds? letdblL1 x= x.l1 <- x.l1*2; x • Subtyping: dblL1 :{l1=int} →{l1=int} • Can pass subtype, but result type loses a lot • Polymorphism: dblL1 : α.α →α • Lose nothing, but body doesn’t type-check CSE P505 Winter 2009 Dan Grossman

What bounded polymorphism The type we want: dblL1 :α≤{l1=int}.α→α Java and C# generics have this (different syntax) Key ideas: • A bounded polymorphic function can use subsumption as specified by the constraint • Instantiating a bounded polymorphic function must satisfy the constraint CSE P505 Winter 2009 Dan Grossman

Subtyping revisited When isα≤τ1.τ2 ≤α≤τ3.τ4 ? • Note: already “alpha-converted” to same type variable Sound answer: • Contravariant bounds (τ3≤τ1) • Covariant bodies (τ2≤τ4) Problem: Makes subtyping undecidable (1992; surprised many) Common workarounds: • Require invariant bounds (τ3≤τ1 and τ1≤τ3) • Some ad hoc approximation CSE P505 Winter 2009 Dan Grossman

Onward • That’s the end of the “types part” of the course • Which wasn’t all about types • And other parts don’t totally ignore types CSE P505 Winter 2009 Dan Grossman

Concurrency • PL support for concurrency a huge topic • And increasingly important (used to skip entirely) • We’ll just do explicit threads plus • Shared memory (barriers, locks, and transactions) • Synchronous message-passing (CML) • Transactions last (wrong logic, but CML is hw5) • Skipped topics • Futures • Asynchronous methods (joins, tuple-spaces, …) • Data-parallel (vector) languages • … CSE P505 Winter 2009 Dan Grossman

Threads High-level: “Communicating sequential processes” Low-level: “Multiple stacks plus communication” Code for a thread is in a closure (with hidden fields) and Thread.create actually spawns the thread. Most languages makes the same distinction, e.g., Java: • Create a Thread object (just the code and data) • Call its run method to actually spawn the thread (* thread.mli; compile with –vmthread threads.cma ON THE LEFT *) typet (* a thread handle *) val create : (’a->’b) -> ’a -> t (*run new thread*) val self : unit -> t (* which thread am I? *) … CSE P505 Winter 2009 Dan Grossman

Why use threads? Why? Any one of: • Performance (multiprocessor or mask I/O latency) • Isolation (separate errors or responsiveness) • Natural code structure (1 stack not enough) It’s not just performance. Useful terminology not widely enough known: • Concurrency: Respond to external events in a timely fashion • Parallelism: Increase throughput via extra computational resources The current Caml implementation doesn’t support parallelism • F# does (via the CLR) • Hard part is concurrent garbage collection CSE P505 Winter 2009 Dan Grossman

Preemption • We’ll assume pre-emptive scheduling • Running thread can be stopped whenever • yield:unit->unit a semantic no-op (a “hint”) • Because threads may interleave arbitrarily and communicate, execution is non-deterministic • With shared memory, via reads/writes • With message passing, via shared channels CSE P505 Winter 2009 Dan Grossman

A “library”? Threads cannot be implemented as a library Hans-J. Boehm, PLDI2005 • Does not mean you need new language constructs • thread.mli, mutex.mli, condition.mli is fine • Does mean the compiler must know threads exist • (See paper for more compelling examples, e.g., C bit-fields) int x=0, y=0; void f1(){ if(x) ++y; } void f2(){ if(y) ++x; } /* main: run f1, f2 concurrently */ /* can compiler implement f2 as ++x; if(!y) --x; */ CSE P505 Winter 2009 Dan Grossman

Communication If threads do nothing other threads “see”, we are done • Best to do as little communication as possible • E.g., don’t mutate shared data unnecessarily – or hide mutation behind easier-to-use interfaces One way to communicate: Shared memory • One thread writes to a ref, another reads it • Sounds nasty with pre-emptive scheduling • Hence synchronization mechanisms • Taught in O/S for historical reasons! • Fundamentally about restricting interleavings CSE P505 Winter 2009 Dan Grossman

Join “Fork-join” parallelism • Simple approach good for “farm out independent subcomputations, then merge results” (*suspend caller until/unless arg terminates*) val join : Thread.t -> unit Common pattern (in C syntax; Caml also simple): data_t data[N]; result_t results[N]; thread_t tids[N]; for(i=0; i < N; ++i) tids[i] = create(f,&data[i], &results[i]); for(i=0; i < N; ++i) join(tids[i]); // now use/merge results CSE P505 Winter 2009 Dan Grossman

Locks (a.k.a. mutexes) • Caml locks do not have two common features: • Reentrancy (changes semantics of lock) • Banning non-holder release (changes semantics of unlock) • Also want condition variables (see condition.mli) • also known as wait/notify or wait/pulse (* mutex.mli *) typet (* a mutex *) val create : unit -> t val lock : t -> unit (* may block *) valunlock :t -> unit CSE P505 Winter 2009 Dan Grossman

Using locks Among infinite correct idioms using locks (and more incorrect ones), the most common: • Determine what data must be “kept in sync” • Always acquire a lock before accessing that data and release it afterwards • Have a partial order on all locks and if a thread holds m1 it can acquire m2 only if m1<m2 Coarser locking (more data with same lock) trades off parallelism with synchronization • Related performance-bug: false sharing • In general, think about, “the object to lock mapping” CSE P505 Winter 2009 Dan Grossman

Example type acct = { lk: Mutex.t; bal: float ref; avail: float ref } let mkAcct() = {lk=Mutex.create(); bal=ref 0.0; avail=ref 0.0} let geta f = (* return type unit *) Mutex.lock a.lk; (if(!(a.avail) > f) then (a.bal := !(a.bal) -. f; a.avail := !(a.avail) -.f)); Mutex.unlock a.lk let put a f = (* return type unit *) Mutex.lock a.lk; a.bal := !(a.bal) +. f; a.avail := !(a.avail) +.(if f<500. then f else 500.); Mutex.unlock a.lk CSE P505 Winter 2009 Dan Grossman

Getting it wrong Races result from too little synchronization • Data races: simultaneous read-write or write-write of same data • Lots of PL work in last 10 years on types and tools to prevent/detect • Provided language has some guarantees (not C++), may not be a bug • Canonical example: parallel search and “done” bits • Higher-level races much tougher for the PL to help • Amount of non-determinism is problem-specific Deadlock results from too much synchronization • Cycle of threads waiting for each other • Easy to detect dynamically, but then what? CSE P505 Winter 2009 Dan Grossman

The evolution problem Even if you get locking right today, tomorrow’s code change can have drastic effects • Every bank account has its own lock works great until you want an “atomic transfer” function • One lock at a time: race • Both locks first: deadlock with parallel untransfer • Same idea in JDK1.4 (documented in 1.5): synchronizedappend(StringBuffer sb) { int len = sb.length(); if(this.count + len > this.value.length) this.expand(…); sb.getChars(0,len,this.value,this.count); … } // length and getChars also synchronized CSE P505 Winter 2009 Dan Grossman

Where are we • Thread creation • Communication via shared memory • Synchronization with join, locks • Message passing a la Concurrent ML • Very elegant • First done for Standard ML, but available in several functional languages • Can wrap synchronization abstractions to make new ones • In my opinion, quite under-appreciated • Back to shared memory for software transactions CSE P505 Winter 2009 Dan Grossman

The basics • Send and receive return “events” immediately • Sync blocks until “the event happens” • Separating these is key in a few slides (* event.mli; Caml’s version of CML *) type ’a channel (* messages passed on channels *) val new_channel : unit -> ’a channel type ’a event (* when sync’ed on, get an ’a *) valsend:’a channel -> ’a -> unit event valreceive:’a channel -> ’a event valsync: ’a event -> ’a CSE P505 Winter 2009 Dan Grossman

Simple version Note: In SML, the CML book, etc: send = sendEvt receive = recvEvt sendNow = send recvNow = recv Helper functions to define blocking sending/receiving • Message sent when 1 thread sends, another receives • One will block waiting for the other let sendNow ch a = sync (send ch a)(* block *) let recvNow ch = sync (receive ch) (* block *) CSE P505 Winter 2009 Dan Grossman

Example Make a thread to handle changes to a bank account • mkAcct returns 2 channels for talking to the thread • More elegant/functional approach: loop-carried state type action = Putof float |Getof float type acct = action channel * float channel let mkAcct() = let inCh = new_channel() in let outCh = new_channel() in let bal = ref 0.0 in (* state *) let rec loop() = (match recvNow inCh with (* blocks *) Putf -> bal := !bal +. f; | Getf -> bal := !bal -. f);(*allows overdraw*) sendNow outCh !bal; loop () in Thread.create loop (); (inCh,outCh) CSE P505 Winter 2009 Dan Grossman

Example, continued get and put functions use the channels let getacctf = let inCh,outCh = acct in sendNow inCh (Get f); recvNow outCh let putacctf = let inCh,outCh = acct in sendNow inCh (Put f); recvNow outCh Outside the module, don’t see threads or channels!! • Cannot break the communication protocol typeacct val mkAcct : unit -> acct val get : acct->float->float val put : acct->float->float CSE P505 Winter 2009 Dan Grossman

Key points • We put the entire communication protocol behind an abstraction • The infinite-loop-as-server idiom works well • And naturally prevents races • Multiple requests implicitly queued by CML implementation • Don’t think of threads like you’re used to • “Very lightweight” • Asynchronous = spawn a thread to do synchronous • System should easily support 100,000 threads • Cost about as much space as an object plus “current stack” • Quite similar to “actors” in OOP • Cost no time when blocked on a channel • Real example: A GUI where each widget is a thread CSE P505 Winter 2009 Dan Grossman

Simpler example • A stream is an infinite set of values • Don’t compute them until asked • Again we could hide the channels and thread let squares = new_channel() let rec loopi = sendNow squares (i*i); loop (i+1) let _ = create loop 1 letone= recvNow squares letfour= recvNow squares letnine= recvNow squares … CSE P505 Winter 2009 Dan Grossman

So far • sendNow and recvNow allow synchronous message passing • Abstraction lets us hide concurrency behind interfaces • But these block until the rendezvous, which is insufficient for many important communication patterns • Example: add: int channel -> int channel -> int • Must choose which to receive first; hurting performance or causing deadlock if other is ready earlier • Example: or: bool channel -> bool channel -> bool • Cannot short-circuit • This is why we split out sync and have other primitives CSE P505 Winter 2009 Dan Grossman

The cool stuff type ’a event (* when sync’ed on, get an ’a *) valsend:’a channel -> ’a -> unit event valreceive:’a channel -> ’a event valsync: ’a event -> ’a channel valchoose: ’a event list -> ’a event valwrap: ’a event -> (’a -> ’b) -> ’b event • choose: when synchronized on, block until 1 of the events occurs • wrap: An event with the function as post-processing • Can wrap as many times as you want • Note: Skipping a couple other key primitives (e.g., for timeouts) CSE P505 Winter 2009 Dan Grossman

“And from or” • Choose seems great for “until one happens” • But a little coding trick gets you “until all happen” • Code below returns answer on a third channel let add in1 in2 out = letans= sync(choose[ wrap (receive in1) (funi-> sync (receive in2) + i); wrap (receive in2) (funi-> sync (receive in1) + i)]) in sync (send out ans) CSE P505 Winter 2009 Dan Grossman

Another example • Not blocking in the case of inclusive or takes some more work • Spawn a thread to receive the second input (and ignore it) let or in1 in2 = letans= sync(choose[ wrap (receive in1) (funb-> b || sync (receive in2)); wrap (receive in2) (funb-> b || sync (receive in1))]) in sync (send out ans) CSE P505 Winter 2009 Dan Grossman

Circuits If you’re an electrical engineer: • send and receive are ends of a gate • wrap is combinational logic connected to a gate • choose is a multiplexer (no control over which) So after you wire something up, you sync to say “wait for communication from the outside” And the abstract interfaces are related to circuits composing If you’re a UNIX hacker: • UNIX select is “sync of choose” • A pain that they can’t be separated CSE P505 Winter 2009 Dan Grossman

CSEP505: Programming Languages Lecture 8: Wrap-Up Types; Start Concurrency