html5-img
1 / 14

Parallel Programming

Parallel Programming. ——Computing 2.0 By Barry.Cswords. 并行计算的技术现状. LISP 对列表的处理. 先熟悉一下 LISP 语法 MAP 方法 FOLD 方法 Scheme ( LISP 的一种方言)用递归实现,但实际上都可以并行。. (define factorial (lambda ( n) (if (= n 1) 1 (* n (factorial (- n 1))))) (factorial 6)  720.

gay-torres
Télécharger la présentation

Parallel Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Programming ——Computing 2.0 By Barry.Cswords

  2. 并行计算的技术现状

  3. LISP对列表的处理 • 先熟悉一下LISP语法 • MAP方法 • FOLD方法 • Scheme(LISP的一种方言)用递归实现,但实际上都可以并行。 (define factorial (lambda (n) (if (= n 1)1 (* n (factorial (- n 1))))) (factorial 6)  720 (map (lambda (x) (* x x)) '(1 2 3 4 5))  '(1 4 9 16 25) (fold + 0 '(1 2 3 4 5))  15 (fold * 1 '(1 2 3 4 5))  120

  4. Map-Reduce (Hadoop) package org.myorg; import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; public class WordCount { public static void main(String[] args) throws Exception { //以这个类建立任务并起名 JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); //输出列表类型规约 conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); //Map-Reduce算法类规约 conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); //输入输出类型规约 conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); //设置输入、输出路径 FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } (统计计算一篇文章中不同词的个数) • C++代码略简练 • 使用管道上下文实现Map和Reduce关联 • 使用工厂实现IO和调用

  5. Map-Reduce (Hadoop)续 public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1);//封装个数 private Text word = new Text();//暂存每个词用 public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line);//将文章的每个词进行标记 while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one);//一个词算一个,以词为分配依据Key,值value为1 ,置入输出 } } } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get();//多分到一个词(key)就加一(迭代器values中的value值) } output.collect(key, new IntWritable(sum));//返回词,词的个数 } } }

  6. PIG Latin (Hadoop) • 熟悉一下PIG的风格(查找最感兴趣页面) • PIG实现Map-Reduce(无实际含义) • 实际上只负责调度,但实现了一定的松耦合 Visits = load ‘/data/visits’ as (user, url, time); Visits = foreach Visits generate user, Canonicalize(url), time; Pages = load ‘/data/pages’ as (url, pagerank); VP = join Visits by url, Pages by url; UserVisits = group VP by user; Sessions = foreach UserVisits generate flatten(FindSessions(*)); HappyEndings = filter Sessions by BestIsLast(*); store HappyEndings into '/data/happy_endings'; a = FOREACH input GENERATE flatten(Map(*)); b = GROUP a BY $0; c = FOREACH b GENERATE Reduce(*);

  7. package app.quickstart.hello; import aa.core.Actor; import aa.core.ActorName; import aa.core.CreateActorException; public class Hello extends Actor { public Hello() { System.out.print(" Hello"); try { ActorName anWorld = create("app.quickstart.hello.World"); send(anWorld, “world”); } catch (CreateActorException e) { System.out.println( "> Hello.Hello: " + e); } } } Actor Architecture package app.quickstart.hello; import aa.core.Actor; public class World extends Actor { public World(){} public void world() { System.out.println(" World!"); } } (调用一个函数world) 内部定位固定、可移动的Actor • UAN: Universal Actor Name • uan://128.174.245.49:37 • LAN: Location-based Actor Name • lan://128.174.244.147//128.174.245.49:37 这是一种值得学习的整体通信架构。

  8. Thread实现的MapReduce //by qinxiaoyu@ceclipse.org package org.ceclipse.cswords.mapreduce; import java.util.Iterator; import java.util.List; public abstract class MapReducer<I,O> extends Thread{ protected List<O> output=null; protected Iterator<I> input=null; protected int received=0; public MapReducer(Iterator<I> in,List<O> out){ this.input=in; this.output=out; } public void run(){ int i=0;I v=null; while(this.input.hasNext()){ v=input.next(); Reducer r=this.map(v); r.index=i; r.start(); i++; } } public abstract Reducer map(I in); public abstract void finished(); public abstract class Reducer extends Thread{ protected I input=null; protected int index=0; public Reducer(I in){ this.input=in; } public void run(){ output.set(this.index, this.reduce(this.input)); received++; if(received==output.size())finished(); } public abstract O reduce(I in); } } • 对简单运算效率低,但对复杂运算效果明显 • 计算40个斐波那契数的效率对比(有波动): • 该算法:3656MS • For each:5562MS

  9. -module(pmap).   -export([pmap/2]).   pmap(F, L) ->      S = self(),     Pids = lists:map(fun(I) ->        spawn(fun() -> do_fun(S, F, I) end)     end, L),     gather(Pids).   gather([FirstPID|OtherPIDs]) ->     receive       {FirstPID, Result} ->  [Result|gather(OtherPIDs)]     end;   gather([]) ->     [].   do_fun(Parent, F, I) ->                             Parent ! {self(), (catch F(I))}.   -module(fib).   -export([fib/1]).   fib(0) -> 0;   fib(1) -> 1;   fib(N) when N > 1 -> fib(N-1) + fib(N-2).  演员模型的Map-Reduce (Erlang) Eshell > L = lists:seq(0,35).   Eshell > lists:map(fun(X) -> fib:fib(X) end, L).   Eshell > pmap:pmap(fun(X) -> fib:fib(X) end, L).   (计算斐波那契数列) 注:左侧代码可以复用 • pmap负责分发进程并建立map存储pid; • 每个进程中do_fun向父进程发出自己的进程号和目标函数计算结果(Reduce); • gather负责通过递归将每个进程号替换为子进程发出的函数计算结果。

  10. 实验和观摩总结 • 技术横向比较结果重点 • Hadoop相关技术目前是最成熟的,使用PIG很方便实现大数据处理;但有框架过重之嫌,代码移植、嵌入现有解决方案都很麻烦;在并非针对数据的并发中语义不明确; • FP演员模型:Erlang表现优秀,处理并行、通信的语法简洁、明确,但整体语法晦涩;没有第二个类似的虚拟机,其它宿主语言还不成熟; • Java原生方案 • 演员模型 • Kilim:调度器没有真正实现多核并行,仅仅实现了消息机制和纤程(fiber),无助于多核计算;易于实现代码、逻辑的移植,但并没有打破线程的思想,无助于设计; • AA方案:架构清晰,Java风格明显,可以媲美Erlang,效率未知,但还没有其上的开发模式,演员模型带来的差异性明显,需尝试进一步封装; • 线程Map-Reduce方案:效率差,高并发的重量级应用可行。 • 易行整体方案 • 超多核虚拟机+单机并行方案(演员模型?) • Hadoop全套方案 • 从海量存储、海量数据处理(搜索引擎?)开始 • 需封装足够的算法(提供商?)

  11. 眼下存在的问题 • 如何封装并行调度算法 • 面向对象的语义问题 • 函数语言的语法问题(以Erlang为例) • 去除spawn、receive关键词 • 去除模式匹配等方面的晦涩语法 • 领域脚本语言的局限性 • PIG LATIN和云数据库上的SQL局限于面向存储的调度 • 新的设计模式会是怎样的 • 面向对象的子集——消息机制 • 面向对象的特例——面向演员 • 面向对象的补充 • 小问题 • 自然界是否真的存在无穷且无序的集合? • Round Robin如何最优实现“贪心”的目标? • 其他问题?

  12. 在不久的将来所需要的 • 一种高级语言 • 支持并行处理,封装并行调度算法 • 仅支持函数式编程——没有代词、变量 • Java/C风格 • 与流行高级语言无缝接合——相同编译、运行环境 • 一种设计模式 • 横向与面向对象无缝接合 • 不支持面向过程的算法实现 • 可以通过简单的面向过程因素更好地取代仅面向过程的算法实现 • 一种架构 • 与云计算基础架构的松耦合 • 通过采用REST风格的服务协议(HTTP)实现大云

  13. 参考资料 • The Actor Architecture by Myeong-Wuk Jang • Actor-based Programming for Scalable Concurrent Systems by Gul Agha • Reflections on Middleware, Meta-Architectures, and Adaptive Distributed Monitoring by Gul Agha • Cloud Computing Lecture by Jimmy Lin • PIG: web-scale processing by Christopher Olston and many others • Kilim: Isolation-typed actors for Java by Sriram Srinivasan • MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean Sanjay Ghemawat • 等…

  14. 3Q! Contact me:qinxiaoyu@ceclipse.org

More Related