Thursday, October 09, 2014

Using Flume - Flexible, Scalable, and Reliable Data Streaming by Hari Shreedharan; O'Reilly Media

Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. How to deliver log to Hadoop HDFS. Apache Flume is open source to integrate with HDFS, HBASE and it's a good choice to implement for log data real-time collection from front end or log data system.
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.It uses a simple data model. Source => Channel => Sink
It's a good time to introduce a good book about Flume - Using Flume - Flexible, Scalable, and Reliable Data Streaming by Hari Shreedharan (@harisr1234). It was written with 8 Chapters: giving basic about Apache Hadoop and Apache HBase, idea for Streaming Data Using Apache Flume, about Flume Model (Sources, Channels, Sinks), and some moew for Interceptors, Channel Selectors, Sink Groups, and Sink Processors. Additional, Getting Data into Flume* and Planning, Deploying, and Monitoring Flume.

This book was written about how to use Flume. It's very good to guide about Apache Hadoop and Apache HBase before starting about Flume Data flow model. Readers should know about java code, because they will find java code example in a book and easy to understand. It's a good book for some people who want to deploy Apache Flume and custom components.
Author separated each Chapter for Flume Data flow model. So, Readers can choose each chapter to read for part of Data flow model: reader would like to know about Sink, then read Chapter 5 only until get idea. In addition, Flume has a lot of features, Readers will find example for them in a book. Each chapter has references topic, that readers can use it to find out more and very easy + quick to use in Ebook.
With Illustration in a book that is helpful with readers to see Big Picture using Flume and giving idea to develop it more in each System or Project.
So, Readers will be able to learn about operation and how to configure, deploy, and monitor a Flume cluster, and customize examples to develop Flume plugins and custom components for their specific use-cases.
  • Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers
  • Dive into key Flume components, including sources that accept data and sinks that write and deliver it
  • Write custom plugins to customize the way Flume receives, modifies, formats, and writes data
  • Explore APIs for sending data to Flume agents from your own applications
  • Plan and deploy Flume in a scalable and flexible way—and monitor your cluster once it’s running
Book: Using Flume - Flexible, Scalable, and Reliable Data Streaming
Author: Hari Shreedharan

No comments: