1 / 13

Apache NiFi

This presentation attempts to give an overview of the Apache NiFi project. I had intended to specifically examine the registry but found that there was more to say about Nifi itself. It does examine the Registry project as well as extensions and a possible registry for that area. <br> <br><br>Links for further information and connecting<br><br>http://www.semtech-solutions.co.nz<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385

semtechs
Télécharger la présentation

Apache NiFi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Is Apache NiFi ? ● A data flow automation system maintained by Cloudera ● Written in Java ● Open source / Apache 2 License ● Cluster based and scaleable ● Has web based user interface ● Widely extendable ● Offers data flow monitoring

  2. How does Nifi work ? ● NiFi runs in JVM on servers in cluster ● Uses ZooKeeper for configuration/coordination – One node as a Cluster Coordinator – One node as a primary ● JVM encapsulates – Web server – Processor / Extensions – Repositories for ●FlowFile / Content / Data Provenance

  3. Nifi Architecture

  4. Nifi Architecture ● Web Server for monitoring and administration ● Flow controller manages extensions and resources ● FlowFile processor 1 .. N – actual data flow worker – Each processor supports NiFi data flow ● Extensions allow remote system connectivity – Can be user defined ● FlowFile Repo – tracks and maintains current flows ● Content Repo – maintains data in transit ● Provenance Repo – historic data flow information

  5. NiFi Flow Management ● Guaranteed data delivery ● Uses write ahead logs and content repositories ● Queue buffering / back pressure ● Queue priority configuration ● Flow configuration ( latency / throughput ) ● UI based data flow builds ● UI based data flow monitoring ● UI based data provenance

  6. NiFi Cluster

  7. NiFi Cluster ● Nifi Can act in cluster mode, configured by ZooKeeper ● Each node works on a different set of data ● ZooKeeper – Elects a single cluster coordinator node – Handles node fail over ● Cluster coordinator manages cluster membership ● ZooKeeper elects a node as a DataFlow manager

  8. NiFi Repository Storage ● All repository storage is pluggable ● Storage could be change by user defined development ● The default is file system storage with – Multiple file system locations used – Multiple physical partitions used – RAID configurations to optimize I/O ● Archiving available for the content repository – Deletion is automatic and configurable

  9. NiFi Extensions ● Extensions are stored in Nifi Archives ( NAR's ) ● Points of extension include can be – processors, Controller Services, Reporting Tasks, Prioritizers, and Customer User Interfaces ● See these example NAR's by Frank Sauer – For InfluxDB access – JSON transformation – https://github.com/fsauer65/NiFi-Extensions

  10. What Is Apache NiFi Registry ? ● A subproject of Apache NiFi ● For storage and management of shared resources ● Across one or more instances of NiFi and/or MiNiFi ● Offers version control for flows ● Define users, groups and policies for flows ● Support for Linux, Unix and Mac OS X

  11. NiFi Extension Registry ● There was also an extension registry proposal in 2016 ● Prototyped by Puspendu Banerjee ● Created on github at https://github.com/PuspenduBanerjee/nifi/tree/NIFI-ExtRegistry ● ● Seems like a good idea ● A central location for extensions ● But no update since 2016 – For proposal or prototype

  12. Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –

  13. Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

More Related