1 / 4

Thinking in G i *(d) calculation with Map-Reduce

Thinking in G i *(d) calculation with Map-Reduce. 2010-3-29. Preprocessing Generate Data Table Divide domain into cells, count number of points in every cell; Accumulate cells into quads; Put all points into quads (I/O intensive operation? need Map Reduce?) Generate Index Table:O (n 2½ )?

Télécharger la présentation

Thinking in G i *(d) calculation with Map-Reduce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thinking in Gi*(d) calculation with Map-Reduce 2010-3-29

  2. Preprocessing • Generate Data Table • Divide domain into cells, count number of points in every cell; • Accumulate cells into quads; • Put all points into quads(I/O intensive operation? need Map Reduce?) • Generate Index Table:O(n2½)? • For every quad, increase its boundary by step, till it covers the whole domain. • In every step, calculate quads which intersect with;(need spatial index?) • Store the deduplicate index item into index table. • Calculation of Gi*(d) • Algorithm of Gi*(d) in M-R(?) • counts how many neighbor quads should be used by index table; • Copy current quad to nodes which neighbor quads reside; • Do map task to calculate Gi*(d) in all neighbored nodes; • Do reduce task to calculate Gi*(d). • C/C++ should be used in Gi*(d) calculation • GPU may be helpful in calculation. • Hotspot cells/quads should be reside in memory/most of nodes • How to accelerate calculation by tuning MR parameters/ Gi*(d) algorithm parameters?

  3. Structure of Tables • DATA_TABLE • Row : Quad_id • Family : data • Count : points in Quad • Body • point info : point1/point2/point3/…… • Each point record : x/y/z(3 float point number, 12 bytes) • INDEX_TABLE • Row: Quad_id • Family : border • XS • XE • YS • YE • Family : D • D1 • D2 • … • Dn

  4. Storage model • Data distribution strategies • Evenly distributed in all nodes • Locality distributed • Data Cache Strategies • ?? • ?? • Application model • Batch processing of Gi*(d) (per cell/per quad) • Interactive processing of Gi*(d) (per point) • Support for different storage strategies Locality distributed Evenly distributed

More Related