Surachart Opun's Blog: oreilly

Showing posts with label oreilly. Show all posts

Monday, November 17, 2014

Think Stats, 2nd Edition Exploratory Data Analysis By Allen B. Downey; O'Reilly Media

Lots of Python with data analysis books. This might be a good one that is able to help readers perform statistical analysis with programs written in Python. Think Stats, 2nd Edition Exploratory Data Analysis by Allen B. Downey(@allendowney).
This second edition of Think Stats includes the chapters from the first edition, many of them substantially revised, and new chapters on regression, time series analysis, survival analysis, and analytic methods. Additional, It uses uses pandas, SciPy, or StatsModels in Python. Author developed this book using Anaconda from Continuum Analytics. Readers should use it, that will easy from them. Anyway, I tested on Ubuntu and installed pandas, NumPy, SciPy, StatsModels, and matplotlib packages. This book has 14 chapters relate with processes that author works with a dataset. It's for intermediate reader. So, Readers should know how to program (In a book uses Python), and skill in mathematical + statistical.
Each chapter includes exercises that readers can practice and get more understood. Free Sampler

Develop an understanding of probability and statistics by writing and testing code.
Run experiments to test statistical behavior, such as generating samples from several distributions.
Use simulations to understand concepts that are hard to grasp mathematically.
Import data from most sources with Python, rather than rely on data that’s cleaned and formatted for statistics tools.
Use statistical inference to answer questions about real-world data.

surachart@surachart:~/ThinkStats2/code$ pwd
/home/surachart/ThinkStats2/code
surachart@surachart:~/ThinkStats2/code$ ipython notebook --ip=0.0.0.0 --pylab=inline &
[1] 11324
surachart@surachart:~/ThinkStats2/code$ 2014-11-17 19:39:43.201 [NotebookApp] Using existing profile dir: u'/home/surachart/.config/ipython/profile_default'
2014-11-17 19:39:43.210 [NotebookApp] Using system MathJax
2014-11-17 19:39:43.234 [NotebookApp] Serving notebooks from local directory: /home/surachart/ThinkStats2/code
2014-11-17 19:39:43.235 [NotebookApp] The IPython Notebook is running at: http://0.0.0.0:8888/
2014-11-17 19:39:43.236 [NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
2014-11-17 19:39:43.236 [NotebookApp] WARNING | No web browser found: could not locate runnable browser.
2014-11-17 19:39:56.120 [NotebookApp] Connecting to: tcp://127.0.0.1:38872
2014-11-17 19:39:56.127 [NotebookApp] Kernel started: f24554a8-539f-426e-9010-cb3aa3386613
2014-11-17 19:39:56.506 [NotebookApp] Connecting to: tcp://127.0.0.1:43369
2014-11-17 19:39:56.512 [NotebookApp] Connecting to: tcp://127.0.0.1:33239
2014-11-17 19:39:56.516 [NotebookApp] Connecting to: tcp://127.0.0.1:54395

Book: Think Stats, 2nd Edition Exploratory Data Analysis
Author: Allen B. Downey(@allendowney)

Sunday, October 26, 2014

Getting Started with Impala Interactive SQL for Apache Hadoop by John Russell; O'Reilly Media

Impala is open source and a query engine that runs on Apache Hadoop. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time. If you are looking for a book getting start with it - Getting Started with Impala Interactive SQL for Apache Hadoop by John Russell (@max_webster). Assist readers to write, tune, and port SQL queries and other statements for a Big Data environment, using Impala. The SQL examples in this book start from a simple base for easy comprehension, then build toward best practices that demonstrate high performance and scalability. For readers, you can download QuickStart VMs and install. After that, you can use it with examples in a book.
In a book, it doesn't assist readers to install Impala or how to solve the issue from installation or configuration. It has 5 chapters and not much for the number of pages, but enough to guide how to use Impala (Interactive SQL) and has good examples. With chapter 5 - Tutorials and Deep Dives, that it's highlight in a book and the example in a chapter that is very useful.
Free Sampler.

This book assists readers.

Learn how Impala integrates with a wide range of Hadoop components
Attain high performance and scalability for huge data sets on production clusters
Explore common developer tasks, such as porting code to Impala and optimizing performance
Use tutorials for working with billion-row tables, date- and time-based values, and other techniques
Learn how to transition from rigid schemas to a flexible model that evolves as needs change
Take a deep dive into joins and the roles of statistics

[test01:21000] > select "Surachart Opun" Name, NOW() ;
Query: select "Surachart Opun" Name, NOW()
+----------------+-------------------------------+
| name | now() |
+----------------+-------------------------------+
| Surachart Opun | 2014-10-25 23:34:03.217635000 |
+----------------+-------------------------------+
Returned 1 row(s) in 0.14s

Book: Getting Started with Impala Interactive SQL for Apache Hadoop

Author: John Russell (@max_webster)

Sunday, October 19, 2014

Learning Spark Lightning-Fast Big Data Analytics by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia; O'Reilly Media

Apache Spark started as a research project at UC Berkeley in the AMPLab, which focuses on big data analytics. Spark is an open source cluster computing platform designed to be fast and general-purpose for data analytics - It's both fast to run and write. Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much quicker than with disk-based systems like Hadoop MapReduce. Users can write applications quickly in Java, Scala or Python. In additional, it's easy to run standalone or on EC2 or Mesos. It can read data from HDFS, HBase, Cassandra, and any Hadoop data source.
If you would like a book about Spark - Learning Spark Lightning-Fast Big Data Analytics by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. It's a great book for who is interested in Spark development and starting with it. Readers will learn how to express MapReduce jobs with just a few simple lines of Spark code and more...

Quickly dive into Spark capabilities such as collect, count, reduce, and save
Use one programming paradigm instead of mixing and matching tools such as Hive, Hadoop, Mahout, and S4/Storm
Learn how to run interactive, iterative, and incremental analyses
Integrate with Scala to manipulate distributed datasets like local collections
Tackle partitioning issues, data locality, default hash partitioning, user-defined partitioners, and custom serialization
Use other languages by means of pipe() to achieve the equivalent of Hadoop streaming

With Early Release - 7 chapters. Explained Apache Spark overview, downloading and commands that should know, programming with RDDS (+ more advance) as well as working with Key-Value Pairs, etc. Easy to read and Good examples in a book. For people who want to learn Apache Spark or use Spark for Data Analytic. It's a book, that should keep in shelf.

Book: Learning Spark Lightning-Fast Big Data Analytics
Authors: Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia

Thursday, October 09, 2014

Using Flume - Flexible, Scalable, and Reliable Data Streaming by Hari Shreedharan; O'Reilly Media

Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. How to deliver log to Hadoop HDFS. Apache Flume is open source to integrate with HDFS, HBASE and it's a good choice to implement for log data real-time collection from front end or log data system.
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.It uses a simple data model. Source => Channel => Sink
It's a good time to introduce a good book about Flume - Using Flume - Flexible, Scalable, and Reliable Data Streaming by Hari Shreedharan (@harisr1234). It was written with 8 Chapters: giving basic about Apache Hadoop and Apache HBase, idea for Streaming Data Using Apache Flume, about Flume Model (Sources, Channels, Sinks), and some moew for Interceptors, Channel Selectors, Sink Groups, and Sink Processors. Additional, Getting Data into Flume* and Planning, Deploying, and Monitoring Flume.

This book was written about how to use Flume. It's very good to guide about Apache Hadoop and Apache HBase before starting about Flume Data flow model. Readers should know about java code, because they will find java code example in a book and easy to understand. It's a good book for some people who want to deploy Apache Flume and custom components.
Author separated each Chapter for Flume Data flow model. So, Readers can choose each chapter to read for part of Data flow model: reader would like to know about Sink, then read Chapter 5 only until get idea. In addition, Flume has a lot of features, Readers will find example for them in a book. Each chapter has references topic, that readers can use it to find out more and very easy + quick to use in Ebook.
With Illustration in a book that is helpful with readers to see Big Picture using Flume and giving idea to develop it more in each System or Project.
So, Readers will be able to learn about operation and how to configure, deploy, and monitor a Flume cluster, and customize examples to develop Flume plugins and custom components for their specific use-cases.

Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers
Dive into key Flume components, including sources that accept data and sinks that write and deliver it
Write custom plugins to customize the way Flume receives, modifies, formats, and writes data
Explore APIs for sending data to Flume agents from your own applications
Plan and deploy Flume in a scalable and flexible way—and monitor your cluster once it’s running

Book: Using Flume - Flexible, Scalable, and Reliable Data Streaming
Author: Hari Shreedharan

Saturday, September 27, 2014

I Heart Logs - Event Data, Stream Processing, and Data Integration by Jay Kreps; O'Reilly Media

As I have worked in server-side a long time as System Administrator. I must spend with logs. To use it for checking and investigation in issue. As some policies in some Companies, they want to keep logs over year or over ten years. So, it is not unusual to find out idea to store, integrate logs and do something.
A book tittle "I Heart Logs - Event Data, Stream Processing, and Data Integration" by Jay Kreps. It's very interesting. I'd like to know what I can learn from it, how logs work in distributed systems and learn from author who works at LinkedIn. A book! Not much for the number of pages. However, it gives much more for data flow idea, how logs work and author still shows readers why logs are worthy of reader's attention. In a book, that has only 4 chapters, but readers will get concept and idea about Data integration (Making all of an organization’s data easily available in all its storage and processing systems), Real-time data processing (Computing derived data streams) and Distributed system design (How practical systems can by simplified with a log-centric design). In addition, I like it. because author wrote from his experience at LinkedIn.

After reviewing: A book refers a lot of information(It's easy on ebook to click links) that's useful. Readers can use them and find out more on the Internet and use. For Data integration, It's focused to Kafka software that is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system. Additional, It gave why the Big Data Lambda Architecture is good for batch system and a stream processing system and point about things a log can do.

So, Readers will be able to learn:

Learn how logs are used for programmatic access in databases and distributed systems
Discover solutions to the huge data integration problem when more data of more varieties meet more systems
Understand why logs are at the heart of real-time stream processing
Learn the role of a log in the internals of online data systems
Explore how Jay Kreps applies these ideas to his own work on data infrastructure systems at LinkedIn

Book - I Heart Logs - Event Data, Stream Processing, and Data Integration
Author: Jay Kreps

Wednesday, September 10, 2014

Getting Started with Windows VDI by Andrew Fryer

Virtual desktop infrastructure (VDI) is the practice of hosting a desktop operating system within a virtual machine (VM) running on a centralized server. VDI is a variation on the client/server computing model, sometimes referred to as server-based computing.
VDI is the new technology that gives lots of benefits.
• Efficient use of CPU and memory resources
• Reduced desktop downtime and increased availability
• Patches and upgrades performed in data center
• New users can be up and running quickly
• Data and applications reside in secure data centers
• Centralized management reduces operational expenses
Reference
Additional, VDI can be deployed with Microsoft Windows and suggest to learn What’s New in VDI for Windows Server 2012 R2 and 8.1
Anyway, I explained much more before starting to mention a book that was written by Andrew Fryer. Getting Started with Windows VDI - This book guides readers to build VDI by using Windows Server 2012 R2 and 8.1 quickly and easy to follow each chapter.

What Readers Will Learn:

Explore the various server roles and features that provide Microsoft's VDI solution
Virtualize desktops and the other infrastructure servers required for VDI using server virtualization in Windows Server Hyper-V
Build high availability clusters for VDI with techniques such as failover clustering and load balancing
Provide secure VDI to remote users over the Internet
Use Microsoft's Deployment Toolkit and Windows Server Update Services to automate the creation and maintenance of virtual desktops
Carry out performance tuning and monitoring
Understand the complexities of VDI licensing irrespective of the VDI solution you have opted for
Deploy PowerShell to automate all of the above techniques

Saturday, August 30, 2014

OSCON 2014: Complete Video Compilation

OSCON 2014 - Today, it's not only developers, system administrators or organizations have use the Open Source. Businesses have established to use the Open Source as well. So, you can not ignore about Open Source. At OSCON, you'll encounter the open source ecosystem. It helps digging deep into the business of open source.

Five Reasons to Attend OSCON: Get straight to the epicenter of all things open source and get better at what you do, Learn from the best and make valuable connections, Get solutions to your biggest challenges that you can apply today, See the latest developments, products, services, and career trends and Hear it first at OSCON.

It's very good idea to attend the OSCON, if you missed OSCON2014. I mention OSCON 2014: Complete Video Compilation. You can download these videos or view them through our HD player, and learn about open source with more than 350 presenters, including Matthew McCullough (GitHub), Leslie Hawthorn (Elasticsearch), James Turnbull (Docker), Andrei Alexandrescu (Facebook), Tim Berglund (DataStax), Paco Nathan (Zettacap), Kirsten Hunter (Akamai), Matt Ray (Chef Software, Inc.), and Damian Conway (Thoughtstream) among them. In these videos, you are able to see a lot of tracks (Business,Cloud,Community,Computational Thinking,Databases & Datastores,Education,Emerging anguages,Geek Lifestyle,Java & JVM,JavaScript - HTML5 - Web,Mobile Platforms,Open Hardware,Operations & System Admin,Perl,PHP,Python,Security,Tools & Techniques,User Experience).

You will able to learn many tracks as I told you. Anyway, Oreilly has improved video streaming and downloading. Additional, it's very useful for playback speed control and mobile viewing.

Wednesday, August 27, 2014

Hands-On Programming with R by Garrett Grolemund

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.
R language is useful to become a data scientist, as well as a computer scientist. I mention a book that points about a data science with R. A Hands-On Programming with R Write Your Own Functions and Simulations By Garrett Grolemund. It was written how to solve the logistical problems of data science. Additional, How to write our own functions and simulations with R. In a book, readers are able to learn in practical data analysis projects (Weighted Dice, Playing Cards, Slot Machine) and understand more in R. Additional, Appendix A-E will help to install/update R and R packages as well as loading Data and debugging in R code.
Garrett Grolemund maintains shiny.rstudio.com, the development center for the Shiny R package.
Free Sampler.

Tuesday, July 29, 2014

Solid Conference San Francisco 2014: Complete Video Compilation

Solid Conference focused on the intersection of software and hardware. It's great community with Software and Hardware. Audiences will be able to learn new idea to combine software and hardware. It gathered idea from engineers, researchers, roboticists, artists, founders of startups, and innovators.
Oreilly launched HD videos (Solid Conference San Francisco 2014: Complete Video Compilation Experience the revolution at the intersection of hardware and software—and imagine the future) for this conference. Video files might huge for download. It will spend much time. Please Use some download manager programs for help.
After watched, I excited to learn some things new with it (Run times: 36 hours 8 minutes): machines, devices, components and etc.

Sunday, June 29, 2014

Penetration Testing A Hands-On Introduction to Hacking

Assessing overall security on a new system before it goes on-line is also a good idea. It's useful to find holes before somebody else does, verify secure configurations and testing. Penetration Testing is the process of attempting to gain access to resources without knowledge of credentials and find security weaknesses (Interesting paper).
Penetration Testing A Hands-On Introduction to Hacking By Georgia Weidman. A book was written about the basic of Penetration testing. It gave concepts, ideas, and techniques in 5 parts: The Basics, Assessments, Attacks, Exploit Development and Mobile Hacking.

Crack passwords and wireless network keys with brute-forcing and wordlists
Test web applications for vulnerabilities
Use the Metasploit Framework to launch exploits and write your own Metasploit modules
Automate social-engineering attacks
Bypass antivirus software
Turn access to one machine into total control of the enterprise in the post exploitation phase

First of all, Readers must set up their Virtual lab by using Kali linux. A book gave lots of ideas, examples how to use Tools for Penetration Testing. It's very good book for some people who are new with Penetration Testing. It might not cover everything about Penetration Testing, or gave the deeply knowledge. However, it helps readers are able to understand in Penetration testing easily and practice in examples.

Monday, June 23, 2014

The Art of War for Small Business

The Art of War is an ancient Chinese military treatise attributed to Sun Tzu, a high-ranking military general, strategist and tactician. A lot of books have written by using Sun Tzu's ancient The Art of War and adaptation for military, political, and business.

The Art of War for Small Business Defeat the Competition and Dominate the Market with the Masterful Strategies of Sun Tzu, this is a book was applied the Art of War for small business. So, it's a perfect book for small business owners and entrepreneurs entrenched in fierce competition for customers, market share, talent and etc. In a book, it was written with 4 parts with 224 pages - SEIZE THE ADVANTAGE WITH SUN TZU, UNDERSTANDING: ESSENTIAL SUN TZU, PRINCIPLES FOR THE BATTLEFIELD, ADVANCED SUN TZU: STRATEGY FOR YOUR SMALL.
It's not much pages for read and it begins with why the art of war should be used with the small business and gives lot of examples and idea how to apply the art of war with the small business and use it everyday (It helps how to Choose the right ground for your battles, Prepare without falling prey to paralysis, Leverage strengths while overcoming limitations, Strike competitors' weakest points and seize every opportunity, Focus priorities and resources on conquering key challenges, Go where the enemy is not, Build and leverage strategic alliances).

After reading, readers should see the picture of the common advantages and disadvantages in the small business and why the small business needs Sun Tzu. In additional, Readers will learn the basic of the art of war and idea to apply with the small business. It shows the example by giving the real world of small business.

Thursday, June 19, 2014

Intermediate Python Practical Techniques for Deeper Skill Development

It's time to learn more about Python. I found "Intermediate Python Practical Techniques for Deeper Skill Development" video course by Python expert Steve Holden.

It's very useful for Python video learning, but users should have basic about Python. They must install ipython.

Note start ipython by " ipython notebook" command and users can check how to install ipython?

and users should download example codes at https://github.com/DevTeam-TheOpenBastion/int-py-notes

This video course gaves deeply Python learning topics by using iPython, including:

Functions: return values, arguments, decorators, and the function API
Comprehensions, generator functions, and generator expressions
Understanding the import system and namespace relationships
Using the Python DB API to query and maintain relational data, and JSON to extract data from the Web
The NumPy, SciPy, and Matplotlib libraries for numerical and analytical computing
An introduction to unit testing with unit test
Deeper understanding of Unicode, with explanations of encoding and decoding techniques and the relationship between byte strings and text
An introduction to textual analysis using regular expressions
Information sources for documentation, further research, and coding style considerations

First of all, Users should install "ipython" and download examples codes. Users will be able to learn Python each topic easier, because it's easy to follow each example demo in video. It's very good to use this video course and iPython for Python improvement.

Sunday, May 04, 2014

start python learning with "Introduction to Python"

Python is programming language, that supports object-oriented, imperative and functional programming. The key is its simplicity, easy language to learn and easy moving code from development to production more quickly. It's power tool to use with Big Data. So, I believe it's a good time to learn about Python programming language.
You can find many resources about it on the Internet. I started to learn about Python by watching "Introduction to Python" By Jessica McKellar. It's good learning video help to start with Python.

It gives a lot of examples for Python and easy to learn. If you would like to start Python programming by your owner, starting with this and learn:
- Set up a development environment with Python and a text editor
- Explore basic data types such as integers, strings, lists, and dictionaries
- Learn how looping lets you do lots of work with a little bit of code
- Gain access to more functionality in Python with modules
- Practice reading, writing, and running your first Python programs
- Navigate the command line for writing larger programs
- Write your own functions for encapsulating useful work
- Use classes to group, name, and reuse functions and variables
- Practice what you’ve learned with the state capitals quizzer and Scrabble cheater projects

You are supposed to learn and get what are necessary for Python beginning by video course.
During study, you can follow up practice at link.

Saturday, March 29, 2014

Java Cookbook 3rd Edition

Java is a programming language and computing platform. There are lots of applications and websites have used it. About latest Java version, Java 8. Oracle announced Java 8 on March 25, 2014. I mention a book title - Java CookbookJava Cookbook by Ian F. Darwin and this book covers Java 8.
It isn't a book for someone who is new (Readers should know a bit about syntax to write Java) in Java, but It is a book that will help readers learn from real-world examples. Readers or some people who work in Java developments, that can use this book like reference or they can pick some example to use with their work. In a book, Readers will find 24 chapters - "Getting Started: Compiling, Running, and Debugging", "Interacting with the Environment", "Strings and Things", "Pattern Matching with Regular Expressions", "Numbers", "Dates and Times - New API", "Structuring Data with Java", "Object-Oriented Techniques", "Functional Programming Techniques:Functional Interfaces, Streams,Spliterators, Parallel Collections", "Input and Output", "Directory and Filesystem Operations", "Media: Graphics, Audio, Video", "Graphical User Interfaces", "Internationalization and Localization", "Network Clients", "Server-Side Java", "Java and Electronic Mail", "Database Access", "Processing JSON Data", "Processing XML", "Packages and Packaging", "Threaded Java", "Reflection, or “A Class Named Class”", "Using Java with Other Languages".

Each example is useful for learning and practice in Java programming. Everyone can read and use it, just know a bit about Java programming. Anyway, I suggest Readers should know basic with Java programming before start with this book.

Saturday, March 15, 2014

Oracle PL/SQL Programming, 6th Edition

PL/SQL is procedural language, that's very useful when you work with Oracle Database. As DBA, you might have to write PL/SQL to do some tasks. Developer on Oracle Database must know about PL/SQL.
This post I mention a book title - Oracle PL/SQL Programming by Steven Feuerstein (@stevefeuerstein), Bill Pribyl. This book is a new edition that covers PL/SQL on Oracle Database 12c. Readers can use examples in a book for practice and get many ideas for programming in PL/SQL.

A book is easy to read and make understand in PL/SQL. Readers can use a book as guidance and learn a lot of real-world problems with great Authors in PL/SQL. PL/SQL Developers should not miss this book.
In a book, Reader will see 28 chapters in 6 parts.
Part I: Programming in PL/SQL
Part II: PL/SQL Program Structure
Part III: PL/SQL Program Data
Part IV: SQL in PL/SQL
Part V: PL/SQL Application Construction
Part VI: Advanced PL/SQL Topics
Free Sampler.

Thursday, January 23, 2014

Java Performance: The Definitive Guide By Scott Oaks

Java is a programming language and computing platform. You will see lots of applications and websites that are written in Java. Java is fast, secure and reliable. How about performance? Java performance is a matter of concern because lots of business software has been written in Java.

I mention a book titles - Java Performance: The Definitive Guide By Scott Oaks. Readers will learn about the world of Java performance. It will help readers get the best possible performance from a Java application.
In a book, Chapter 2 written about testing Java applications, including pitfalls of Java benchmarking, Chapter 3, talked an overview of some of the tools available to monitor Java applications.
If you are someone who are interested in Java or develop applications in Java. Performance is very important for you. This book is focused on how to best use the JVM and Java Platform APIs so that program run faster. If You are interested in improving your applications in Java. This book can help.

Saturday, January 04, 2014

Programming Elastic MapReduce Using AWS Services to Build an End-to-End Application

Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to quickly and cost-effectively process vast amounts of data. Amazon EMR uses Hadoop, an open source framework, to distribute your data and processing across a resizable cluster of Amazon EC2 instances.

Anyway, You are looking for a book about programming Elastic MapReduce. I mention a book titles - Programming Elastic MapReduce Using AWS Services to Build an End-to-End Application By Kevin Schmidt, Christopher Phillips.

This book will give readers the best practices for using Amazon EMR and various AWS and Apache technologies. Readers will learn much more about.

Get an overview of the AWS and Apache software tools used in large-scale data analysis
Go through the process of executing a Job Flow with a simple log analyzer
Discover useful MapReduce patterns for filtering and analyzing data sets
Use Apache Hive and Pig instead of Java to build a MapReduce Job Flow
Learn the basics for using Amazon EMR to run machine learning algorithms
Develop a project cost model for using Amazon EMR and other AWS tools

A book gives readers how to use Amazon EC2 Services Management Console and learn more about it. Readers will get good examples in a book. However, It will be good, if readers can create an AWS Account and use it with examples in a book. Illustration and example in a book, that is very helpful and make a book easy to read and follow each example.

Tuesday, December 03, 2013

Big Data Analytics From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph

Big Data Analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. It's helpful with companies to make better business decisions.
What is Big Data Analytics?
Big Data Analytics From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph By David Loshin. This book helps readers to understand about Big Data is, why it can add value, what types of
problems are suited to a big data approach, and how to properly plan
to determine the need, align the right people in the organization, and
develop a strategic plan for integration.
It has 11 chapters. It will good, if readers can read some chapters online... Anyway, readers will see:

Chapter 1: We consider the market conditions that have enabled broad acceptance of big data analytics, including commoditization of hardware and software, increased data volumes, growing variation in types of data assets for analysis, different methods for data delivery, and increased expectations for real-time integration of analytical results into operational processes.
Chapter 2: In this chapter, we look at the characteristics of business problems that traditionally have required resources that exceeded the enterprises’ scopes, yet are suited to solutions that can take advantage of the big data platforms (either dedicated hardware or virtualized/cloud based).
Chapter 3: Who in the organization needs to be involved in the process of acquiring, proving, and deploying big data solutions? And what are their roles and responsibilities? This chapter looks at the adoption of new technology and how the organization must align to integrate into the system development life cycle.
Chapter 4: This chapter expands on the previous one by looking at some key issues that often plague new technology adoption and show that the key issues are not new ones and that there is likely to be organizational knowledge that can help in fleshing out a reasonable strategic plan.
Chapter 5: In this chapter, we look at the need for oversight and governance for the data, especially when those developing big data applications often bypass traditional IT and data management channels.
Chapter 6: In this chapter, we look at specialty-hardware designed for analytics and how they are engineered to accommodate large data sets.
Chapter 7: This chapter discusses and provides a high-level overview of tool suites such as Hadoop.
Chapter 8: This chapter examines the MapReduce programming model.
Chapter 9: In this chapter, we look at a variety of alternative methods of data management methods that are being adopted for big data application development.
Chapter 10: This chapter looks at business problems suited for graph analytics, what differentiates the problems from traditional approaches and considerations for discovery versus search analyses.
Chapter 11: This short final chapter reviews best practices for incrementally adopting big data into the enterprise.

In a book, it assists readers about Big Data. It still gives readers about exercises in each chapter, that will help readers think and imagine about big picture for each topic and then able to use idea/knowledge what they read in their work.

Wednesday, November 27, 2013

ZooKeeper Distributed Process Coordination By Flavio Junqueira, Benjamin Reed

Apache ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications.
Why you have to learn about Zookeeper? If you are using application as Hbase, Neo4j, Solr, Accumulo and etc. read...
You can read much more on Apache ZooKeeper website. Anyway... you are looking for a book about Apache ZooKeeper. I mention a book titles ZooKeeper Distributed Process Coordination by Flavio Junqueira, Benjamin Reed.
A book guides readers to use Apache ZooKeeper manages distributed system. In a book has 3 parts - ZooKeeper Concepts and Basics, Programming with ZooKeeper, Administering ZooKeeper.
So, This book is good for readers who are interested in ZooKeepeer and who use applications relate with it. It will help readers to understand more about ZooKeeper Concepts and developer program with ZooKeeper. For me, I like it because it helps me idea to administrate Zookeeper.
A book covers:

Learn how ZooKeeper solves common coordination tasks

Explore the ZooKeeper API’s Java and C implementations and how they differ

Use methods to track and react to ZooKeeper state changes

Handle failures of the network, application processes, and ZooKeeper itself

Learn about ZooKeeper’s trickier aspects dealing with concurrency, ordering, and configuration

Use the Curator high-level interface for connection management

Become familiar with ZooKeeper internals and administration tools

A book will help readers enjoy to learn about ZooKeeper, Anyway, it will be good, if readers know about Java.

Saturday, November 09, 2013

Accumulo - Application Development, Table Design, and Best Practices

The NSA started building Accumulo in 2008 and used the Google Big Table architecture as a starting point. Accumulo is NoSQL database that is a simple key/value data store. BTW, Accumulo joined the Apache community in 2011.
Why Accululo is interesting? Security! It's developed from the BigTable data model and implemented a security mechanism known as cell-level security. Every key-value pair has its own security label, stored under the column visibility element of the key, which is used to determine whether a given user meets the security requirements to read the value. Wow!

If you are looking for a book which is written about it. I believe a book title - Accumulo Application Development, Table Design, and Best Practices By Michael Wall, Aaron Cordova, Billie Rinaldi. It is written by Authors who know/spend time with Accumulo. Anyway, It's pity for now. It's still Early Release ebook, because this book has undergone a careful vetting process with the U.S. government to ensure that classified or proprietary information has not been revealed.

What should you learn from this book?

Get a high-level introduction on what Accumulo has to offer
Take a rapid tour through single- and multiple-node installations, data ingest, and query
Learn how to write Accumulo applications for several use cases, based on examples
Dive into Accumulo internals, including information not available in the documentation
Get detailed information for installing, administering, tuning, and measuring performance
Learn best practices based on successful implementations in the field
Find answers to common questions that every new Accumulo user asks

What did I learn after read this book? In Early Release version, it has 2 chapters for now.
I believe before you will start with Accumulo book. You should know about Hadoop and Zookeeper a bit. In a book, that's not tell you to install/use Hadoop and Zookeeper. For me, I could start with it as simple. A book gave me for idea and what I should know about Accumulo. In Chapter 2, It gave idea to write application. You should know about JAVA! However, on Accumulo (1.5 & 1.4.4)... you can Thrift Proxy! So, you can write application with Python, Ruby, C++, etc.
- Learned a bit - ACCUMULO
I felt excited for waiting to read more -:) I hope I will get much more idea over Accumulo document.