270 likes | 399 Vues
Collection-oriented programming focuses on operations over collections of values, emphasizing efficiency and functional styles. This programming paradigm encompasses various languages and tools, including APL, SQL, and Python, as well as techniques like map-reduce. Its concise coding style promotes ease of use, especially for parallel operations like reduce and scan. Despite its strengths, such as deterministic results and high-level abstraction, it faces challenges in concurrency and performance. This overview explores the advantages and limitations of collection-oriented programming within practical applications.
E N D
Functional Collection-OrientedProgramming Guy Blelloch Carnegie Mellon University
Collection-oriented programming • Programmer emphasis is on operations over collections of values. (Data Driven) • Array based: APL, Nial, FP, Matlab • Database: SQL, Linq • Scripting: SETL, Python • Data parallel: *Lisp, HPF, Nesl, Id, ZPL • Map-reduce • All of these support some form of Map and some form of reduce.
Collection-oriented programming • Concise code • Promotes a functional style of programming • Has become popular even without parallelism (matlab, python, sql, …) • Parallelism • Map is naturally parallel • Many collection operations are parallel: reduce, scan, collect, flatten, transpose, … • Most often DETERMINISTIC
Collection-oriented programming • “Concurrency” (Non-deterministic environment) • On its own not really useful for “concurrent” applications (e.g. operating systems, or front-end of a web server).
Flat vs. Nested Can collections contain collections? Can arbitrary functions be mapped? • Flat languages • APL, SQL, Map-reduce, HPF, Matlab • Nested Languages • SETL, Python, Nesl, Id I conjecture that flat CO languages will never be general purpose—not good for trees, divide-and-conquer, …
Quicksort in NESL function quicksort(S) = if (#S <= 1) then S else let a = S[rand(#S)]; S1 = {e in S | e < a}; S2 = {e in S | e = a}; S3 = {e in S | e > a}; R = {quicksort(v) : v in [S1, S3]}; in R[0] ++ S2 ++ R[1];
Quicksort in X10 double[] quicksort(double[] S) { if (S.length < 2) return S; double a = S[rand(S.length)]; double[] S1,S2,S3; finish { async { S1 = quicksort(lessThan(S,a));} async { S2 = eqTo(S,a);} S3 = quicksort(grThan(S,a)); } append(S1,append(S2,S3)); }
Matrix Multiplication Fun A*B { if #A < k then baseCase.. A11,A12,A21,A22 = QuadSplit(A) B11,B12,B21,B22 = QuadSplit(B) Parallel { C11 = A11*B11 + A12*B21 C12 = A11*B12 + A12*B22 C21 = A21*B11 + A22*B21 C22 = A21*B12 + A22*B22 } return QuadJoin(C11,C12,C21,C22) } Need to be able to program for locality.
Question: • How general is functional CO programming? • Advantages • High-level/concise • Natural/Intuitive • Deterministic Parallelism (for all partial results) • No need for annotations, commutativity, regions • No speculation • Simple cost model (even including locality) • Potential Disadvantages • Performance • Major rewriting of code • Does not support “concurrency” on its own
Barnes Hut function bTree(Pts,box as (x0,y0,s)) = if #pts = 0 then EMPTY else if #pts = 1 then LEAF(p[0]) else let xm = x0 + s/2; ym = y0 + s/2; parallelLet T1 = bTree({(x,y,d) in pts | x<xm & y<ym}, (x0,y0,s/2)); T2 = bTree({(x,y,d) in pts | x<xm & y>=ym}, (x0,y0+s/2,s/2)); .. in NODE(cmass(T1,T2,T3,T4),box,T1,T2,T3,T4)
Barnes Hut function force(p,LEAF(p’)) = force(p,p’) | force(p,EMPTY) = 0 | force(p,(c,box,T1,T2,T3,T4) if far(p,box) then forceC(p,c) else force(p,T1)+force(p,T2)+force(p,T3) +force(p,T4) function forces(Points,T) = {move(p,force(p,T)) : p in Points};
“Algorithms in the Real World” • Compression: • JPEG *Easily expressed with no shared writeable state ^Depends on algorithm
Compression: • JPEG ^Depends on algorithm
Barnes Hut function bTree(Pts,box as (x0,y0,s)) = if #pts = 0 then EMPTY else if #pts = 1 then LEAF(p[0]) else let xm = x0 + s/2; ym = y0 + s/2; parallelLet T1 = bTree({(x,y,d) in pts | x<xm & y<ym}, (x0,y0,s/2)); T2 = bTree({(x,y,d) in pts | x<xm & y>=ym}, (x0,y0+s/2,s/2)); .. in NODE(cmass(T1,T2,T3,T4),box,T1,T2,T3,T4)
Barnes Hut function force(p,LEAF(p’)) = force(p,p’) | force(p,EMPTY) = 0 | force(p,(c,box,T1,T2,T3,T4) if far(p,box) then forceC(p,c) else force(p,T1)+force(p,T2)+force(p,T3) +force(p,T4) function forces(Points,T) = {force(p,T) : p in Points};
Graph Connectivity 0 2 3 1 4 5 6 Edge List Representation: Edges = [(0,1), (0,2), (2,3), (3,4), (3,5), (3,6), (1,3), (1,5), (5,6), (4,6)]
0 2 1 2 3 1 1 4 1 6 5 6 1 6 2 2 1 1 1 6 1 6 1 1 Graph Contraction 0 2 3 1 4 5 6 Form stars relabel contract
Hooks = [(0,1), (1,3), (1,5), (3,6), (4,6)] Graph Connectivity 0 2 3 1 4 5 6 Edge List Representation: Edges = [(0,1), (0,2), (2,3), (3,4), (3,5), (3,6), (1,3), (1,5), (5,6), (4,6)]
Graph Connectivity L = Vertex Labels, E = Edge List function connectivity(L, E) = if #E = 0 then L else let FL = {coinToss(.5) : x in [0:#L]}; H = {(u,v) in E | Fl[u] and not(Fl[v])}; L = L <- H; E = {(L[u],L[v]): (u,v) in E | L[u]\=L[v]}; in connectivity(L,E);
Conclusions/Questions • Perhaps Functional Programming is adequate for most/all parallel applications. • Collections seems to encourage a functional style even in non functional languages • Give fully deterministic results/and partial results
Quicksort in Multilisp (defun quicksort (L) (qs L nil)) (defun qs (L rest) (if (null L) rest (let ((a (car L)) (L1 (filter (lambda (b) (< b a)) (cdr L))) (L3 (filter (lambda (b) (>= b a)) (cdr L)))) (qs L1 (future (cons a (qs L3 rest))))))) (defun filter (f L) (if (null L) nil (if (f (car L)) (future (cons (car L) (filter f (cdr L)) (filter f (cdr L))))
Quicksort in Multilisp (futures) Work = O(n log n) Not a very good parallel algorithm Span = O(n)
Scan code function addscan(A) = if (#A <= 1) then [0] else let sums = {A[2*i] + A[2*i+1] : i in [0:#a/2]}; evens = addscan(sums); odds = {evens[i] + A[2*i] : i in [0:#a/2]}; in interleave(evens,odds);,
Fourier Transform function fft(a,w) = if #a == 1 then a else let r = {fft(b, w[0:#w:2]): b in [a[0:#a:2],a[1:#a:2]} in {a + b * w : a in r[0] ++ r[0]; b in r[1] ++ r[1]; w in w};
Sparse Vector Matrix Multiply function sparseMxV(M,v) = {sum({v[i]*w : i,w in row}) : row in M};
MapReduce function mapReduce(MAP,REDUCE,documents) = let temp = flatten({MAP(d) : d in documents}); in flatten({REDUCE(k,vs) : (k,vs) in collect(temp)}); • function wordcount(docs) = • mapReduce(d => {(w,1) : w in wordify(d)}, • (w,c) => [(w,sum(c))], • documents); • wordcount(["this is is document 1”, • "this is document 2"]); • [(“1”,1),(“this”,2),(“is”,3),(“document”,2),(“2”,1)]