1 / 132

Apache Samza * Stream Processing at LinkedIn

Apache Samza * Stream Processing at LinkedIn. Chris Riccomini 11/13/2013. * Incubating. Stream Processing?. 0 ms. Response latency. RPC. 0 ms. Response latency. Synchronous. RPC. 0 ms. Response latency. Later. Possibly much later. Synchronous. Samza. RPC. 0 ms.

miette
Télécharger la présentation

Apache Samza * Stream Processing at LinkedIn

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Apache Samza*Stream Processing at LinkedIn Chris Riccomini 11/13/2013 * Incubating

  2. Stream Processing?

  3. 0 ms Response latency

  4. RPC 0 ms Response latency Synchronous

  5. RPC 0 ms Response latency Later. Possibly much later. Synchronous

  6. Samza RPC 0 ms Response latency Milliseconds to minutes Later. Possibly much later. Synchronous

  7. Newsfeed

  8. News

  9. Ad Relevance

  10. Email

  11. Search Indexing Pipeline

  12. Metrics and Monitoring

  13. Motivation

  14. Real-time Feeds • User activity • Metrics • Monitoring • Database Changes

  15. Real-time Feeds • 10+ billion writes per day • 172,000 messages per second (average) • 55+ billion messages per day to real-time consumers

  16. Stream Processing is Hard • Partitioning • State • Re-processing • Failure semantics • Joins to services or database • Non-determinism

  17. Samza Concepts & Architecture

  18. Streams Partition 0 Partition 1 Partition 2

  19. Streams Partition 0 Partition 1 Partition 2 1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5

  20. Streams Partition 0 Partition 1 Partition 2 1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5

  21. Streams Partition 0 Partition 1 Partition 2 1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5

  22. Streams Partition 0 Partition 1 Partition 2 1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5

  23. Streams Partition 0 Partition 1 Partition 2 1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5

  24. Streams Partition 0 Partition 1 Partition 2 1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5 next append

  25. Tasks Partition 0

  26. Tasks Partition 0 Task 1

  27. Tasks Partition 0 class PageKeyViewsCounterTask implements StreamTask{ public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { GenericRecordrecord = ((GenericRecord) envelope.getMsg()); String pageKey = record.get("page-key").toString(); intnewCount = pageKeyViews.get(pageKey).incrementAndGet(); collector.send(countStream, pageKey, newCount); } }

  28. Tasks Partition 0 class PageKeyViewsCounterTask implements StreamTask{ public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { GenericRecordrecord = ((GenericRecord) envelope.getMsg()); String pageKey = record.get("page-key").toString(); intnewCount = pageKeyViews.get(pageKey).incrementAndGet(); collector.send(countStream, pageKey, newCount); } }

  29. Tasks Partition 0 class PageKeyViewsCounterTask implements StreamTask{ public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { GenericRecordrecord = ((GenericRecord) envelope.getMsg()); String pageKey = record.get("page-key").toString(); intnewCount = pageKeyViews.get(pageKey).incrementAndGet(); collector.send(countStream, pageKey, newCount); } }

  30. Tasks Partition 0 class PageKeyViewsCounterTask implements StreamTask{ public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { GenericRecordrecord = ((GenericRecord) envelope.getMsg()); String pageKey = record.get("page-key").toString(); intnewCount = pageKeyViews.get(pageKey).incrementAndGet(); collector.send(countStream, pageKey, newCount); } }

  31. Tasks Partition 0 class PageKeyViewsCounterTask implements StreamTask{ public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { GenericRecordrecord = ((GenericRecord) envelope.getMsg()); String pageKey = record.get("page-key").toString(); intnewCount = pageKeyViews.get(pageKey).incrementAndGet(); collector.send(countStream, pageKey, newCount); } }

  32. Tasks Partition 0 class PageKeyViewsCounterTask implements StreamTask{ public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { GenericRecordrecord = ((GenericRecord) envelope.getMsg()); String pageKey = record.get("page-key").toString(); intnewCount = pageKeyViews.get(pageKey).incrementAndGet(); collector.send(countStream, pageKey, newCount); } }

  33. Tasks Partition 0 class PageKeyViewsCounterTask implements StreamTask{ public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { GenericRecordrecord = ((GenericRecord) envelope.getMsg()); String pageKey = record.get("page-key").toString(); intnewCount = pageKeyViews.get(pageKey).incrementAndGet(); collector.send(countStream, pageKey, newCount); } }

  34. Tasks Partition 0 class PageKeyViewsCounterTask implements StreamTask{ public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { GenericRecordrecord = ((GenericRecord) envelope.getMsg()); String pageKey = record.get("page-key").toString(); intnewCount = pageKeyViews.get(pageKey).incrementAndGet(); collector.send(countStream, pageKey, newCount); } }

  35. Tasks Partition 0 class PageKeyViewsCounterTask implements StreamTask{ public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { GenericRecordrecord = ((GenericRecord) envelope.getMsg()); String pageKey = record.get("page-key").toString(); intnewCount = pageKeyViews.get(pageKey).incrementAndGet(); collector.send(countStream, pageKey, newCount); } }

  36. Tasks Partition 0 class PageKeyViewsCounterTask implements StreamTask{ public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { GenericRecordrecord = ((GenericRecord) envelope.getMsg()); String pageKey = record.get("page-key").toString(); intnewCount = pageKeyViews.get(pageKey).incrementAndGet(); collector.send(countStream, pageKey, newCount); } }

  37. Tasks Partition 0 Task 1

  38. Tasks Page Views - Partition 0 1 2 3 4 PageKeyViews CounterTask Output Count Stream Partition 0 Partition 1

  39. Tasks Page Views - Partition 0 1 2 3 4 PageKeyViews CounterTask Output Count Stream Partition 0 Partition 1

  40. Tasks Page Views - Partition 0 1 2 3 4 PageKeyViews CounterTask Output Count Stream Partition 0 Partition 1

  41. Tasks Page Views - Partition 0 1 2 3 4 PageKeyViews CounterTask Output Count Stream Partition 0 Partition 1

  42. Tasks Page Views - Partition 0 1 2 3 4 PageKeyViews CounterTask Output Count Stream Partition 0 Partition 1

  43. Tasks Page Views - Partition 0 1 2 3 4 PageKeyViews CounterTask Output Count Stream Partition 0 Partition 1

  44. Tasks Page Views - Partition 0 1 2 3 4 PageKeyViews CounterTask Output Count Stream Partition 1 Partition 0

  45. Tasks Page Views - Partition 0 1 2 3 4 PageKeyViews CounterTask Output Count Stream Partition 1 Partition 0

  46. Tasks Page Views - Partition 0 1 2 3 4 PageKeyViews CounterTask 2 Checkpoint Stream Output Count Stream Partition 1 Partition 1 Partition 0

  47. Tasks Page Views - Partition 0 1 2 3 4 PageKeyViews CounterTask 2 Checkpoint Stream Output Count Stream Partition 1 Partition 1 Partition 0

  48. Tasks Page Views - Partition 0 1 2 3 4 PageKeyViews CounterTask 2 Checkpoint Stream Output Count Stream Partition 1 Partition 1 Partition 0

  49. Tasks Page Views - Partition 0 1 2 3 4 PageKeyViews CounterTask 2 Checkpoint Stream Output Count Stream Partition 1 Partition 0 Partition 1

More Related