Saturday, November 16, 2013

Learned - Data Science Boot Camp: Acquiring and Transforming Big Data

Found Video Lecture Series: Data Science Boot Camp. It's very great for learning. I hope readers will get benefit from it. I learned a bit about it Today and downloaded Example Code to play. Lesson 1: I learned to use Flume Agent and read data by Hive.
Terminal 1: run flume
[surachart@centos01 ~]$  git clone https://github.com/oraclebigdata/oa_lesson_1_source_and_acquire
Initialized empty Git repository in /home/surachart/oa_lesson_1_source_and_acquire/.git/
remote: Counting objects: 16, done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 16 (delta 5), reused 14 (delta 3)
Unpacking objects: 100% (16/16), done.
[surachart@centos01 ~]$ cd oa_lesson_1_source_and_acquire
[surachart@centos01 oa_lesson_1_source_and_acquire]$ ls
commands.sh  example.xml  flume_example.conf  hive_examples.hql  LICENSE  README.md
[surachart@centos01 oa_lesson_1_source_and_acquire]$ less README.md
[surachart@centos01 oa_lesson_1_source_and_acquire]$ less commands.sh
[surachart@centos01 oa_lesson_1_source_and_acquire]$ cp flume_example.conf flume_example.conf.orig
[surachart@centos01 oa_lesson_1_source_and_acquire]$ vi flume_example.conf
[surachart@centos01 oa_lesson_1_source_and_acquire]$ diff flume_example.conf.orig flume_example.conf
16c16
< hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:8020/user/oracle/flume_example
---
> hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:8020/user/surachart/flume_example
[surachart@centos01 oa_lesson_1_source_and_acquire]$
[surachart@centos01 oa_lesson_1_source_and_acquire]$
[surachart@centos01 oa_lesson_1_source_and_acquire]$ hadoop fs -ls  hdfs://localhost:8020/user/surachart/flume_example
13/11/16 20:47:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: `hdfs://localhost:8020/user/surachart/flume_example': No such file or directory
[surachart@centos01 oa_lesson_1_source_and_acquire]$
[surachart@centos01 oa_lesson_1_source_and_acquire]$
[surachart@centos01 oa_lesson_1_source_and_acquire]$
[surachart@centos01 oa_lesson_1_source_and_acquire]$ flume-ng agent -n hdfs-agent -f ./flume_example.conf
Warning: No configuration directory set! Use --conf <dir> to override.
Info: Including Hadoop libraries found via (/usr/bin/hadoop) for HDFS access
Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.6.1.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar from classpath
Info: Including HBASE libraries found via (/usr/bin/hbase) for HBASE access
Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-api-1.6.1.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-api-1.6.1.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.6.1.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar from classpath
+ exec /usr/java/latest/bin/java -Xmx20m -cp '/usr/lib/flume/lib/*:/etc/hadoop/conf:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/avro-1.5.3.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/zookeeper-3.4.2.jar:/usr/lib/hadoop/.//bin:/usr/lib/hadoop/.//client:/usr/lib/hadoop/.//etc:/usr/lib/hadoop/.//hadoop-annotations-2.0.5-alpha.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.5-alpha.jar:/usr/lib/hadoop/.//hadoop-common-2.0.5-alpha.jar:/usr/lib/hadoop/.//hadoop-common-2.0.5-alpha-tests.jar:/usr/lib/hadoop/.//lib:/usr/lib/hadoop/.//libexec:/usr/lib/hadoop/.//sbin:/usr/contrib/capacity-scheduler/*.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.13.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/.//bin:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.5-alpha.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.5-alpha-tests.jar:/usr/lib/hadoop-hdfs/.//lib:/usr/lib/hadoop-hdfs/.//sbin:/usr/lib/hadoop-hdfs/.//webapps:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/avro-1.5.3.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/junit-4.8.2.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/netty-3.5.11.Final.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop-yarn/.//bin:/usr/lib/hadoop-yarn/.//etc:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-unmanaged-am-launcher-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-client-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//lib:/usr/lib/hadoop-yarn/.//sbin:/usr/lib/hadoop-mapreduce/lib/aopalliance-1.0.jar:/usr/lib/hadoop-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-mapreduce/lib/avro-1.5.3.jar:/usr/lib/hadoop-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-mapreduce/lib/guice-3.0.jar:/usr/lib/hadoop-mapreduce/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-mapreduce/lib/javax.inject-1.jar:/usr/lib/hadoop-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-mapreduce/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-mapreduce/lib/netty-3.5.11.Final.jar:/usr/lib/hadoop-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop-mapreduce/.//bin:/usr/lib/hadoop-mapreduce/.//hadoop-archives-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-datajoin-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-distcp-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-extras-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-gridmix-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-app-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-common-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-core-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-hs-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-hs-plugins-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-jobclient-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-jobclient-2.0.5-alpha-tests.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-shuffle-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-examples-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-rumen-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-streaming-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//lib:/usr/lib/hadoop-mapreduce/.//sbin:/usr/lib/hbase/bin/../conf:/usr/java/latest/lib/tools.jar:/usr/lib/hbase/bin/..:/usr/lib/hbase/bin/../hbase-0.94.5.jar:/usr/lib/hbase/bin/../hbase-0.94.5-tests.jar:/usr/lib/hbase/bin/../hbase.jar:/usr/lib/hbase/bin/../lib/activation-1.1.jar:/usr/lib/hbase/bin/../lib/aopalliance-1.0.jar:/usr/lib/hbase/bin/../lib/asm-3.1.jar:/usr/lib/hbase/bin/../lib/avro-1.5.3.jar:/usr/lib/hbase/bin/../lib/avro-ipc-1.5.3.jar:/usr/lib/hbase/bin/../lib/commons-beanutils-1.7.0.jar:/usr/lib/hbase/bin/../lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hbase/bin/../lib/commons-cli-1.2.jar:/usr/lib/hbase/bin/../lib/commons-codec-1.4.jar:/usr/lib/hbase/bin/../lib/commons-collections-3.2.1.jar:/usr/lib/hbase/bin/../lib/commons-configuration-1.6.jar:/usr/lib/hbase/bin/../lib/commons-daemon-1.0.13.jar:/usr/lib/hbase/bin/../lib/commons-digester-1.8.jar:/usr/lib/hbase/bin/../lib/commons-el-1.0.jar:/usr/lib/hbase/bin/../lib/commons-httpclient-3.1.jar:/usr/lib/hbase/bin/../lib/commons-io-2.1.jar:/usr/lib/hbase/bin/../lib/commons-lang-2.5.jar:/usr/lib/hbase/bin/../lib/commons-logging-1.1.1.jar:/usr/lib/hbase/bin/../lib/commons-math-2.1.jar:/usr/lib/hbase/bin/../lib/commons-net-3.1.jar:/usr/lib/hbase/bin/../lib/core-3.1.1.jar:/usr/lib/hbase/bin/../lib/gmbal-api-only-3.0.0-b023.jar:/usr/lib/hbase/bin/../lib/grizzly-framework-2.1.1.jar:/usr/lib/hbase/bin/../lib/grizzly-framework-2.1.1-tests.jar:/usr/lib/hbase/bin/../lib/grizzly-http-2.1.1.jar:/usr/lib/hbase/bin/../lib/grizzly-http-server-2.1.1.jar:/usr/lib/hbase/bin/../lib/grizzly-http-servlet-2.1.1.jar:/usr/lib/hbase/bin/../lib/grizzly-rcm-2.1.1.jar:/usr/lib/hbase/bin/../lib/guava-11.0.2.jar:/usr/lib/hbase/bin/../lib/guice-3.0.jar:/usr/lib/hbase/bin/../lib/guice-servlet-3.0.jar:/usr/lib/hbase/bin/../lib/high-scale-lib-1.1.1.jar:/usr/lib/hbase/bin/../lib/httpclient-4.1.2.jar:/usr/lib/hbase/bin/../lib/httpcore-4.1.3.jar:/usr/lib/hbase/bin/../lib/jackson-core-asl-1.8.8.jar:/usr/lib/hbase/bin/../lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hbase/bin/../lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hbase/bin/../lib/jackson-xc-1.8.8.jar:/usr/lib/hbase/bin/../lib/jamon-runtime-2.3.1.jar:/usr/lib/hbase/bin/../lib/jasper-compiler-5.5.23.jar:/usr/lib/hbase/bin/../lib/jasper-runtime-5.5.23.jar:/usr/lib/hbase/bin/../lib/javax.inject-1.jar:/usr/lib/hbase/bin/../lib/javax.servlet-3.0.jar:/usr/lib/hbase/bin/../lib/jaxb-api-2.1.jar:/usr/lib/hbase/bin/../lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hbase/bin/../lib/jersey-client-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-core-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-grizzly2-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-guice-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-json-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-server-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-test-framework-core-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-test-framework-grizzly2-1.8.jar:/usr/lib/hbase/bin/../lib/jets3t-0.6.1.jar:/usr/lib/hbase/bin/../lib/jettison-1.1.jar:/usr/lib/hbase/bin/../lib/jetty-6.1.26.jar:/usr/lib/hbase/bin/../lib/jetty-util-6.1.26.jar:/usr/lib/hbase/bin/../lib/jruby-complete-1.6.5.jar:/usr/lib/hbase/bin/../lib/jsch-0.1.42.jar:/usr/lib/hbase/bin/../lib/jsp-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsp-api-2.1.jar:/usr/lib/hbase/bin/../lib/jsr305-1.3.9.jar:/usr/lib/hbase/bin/../lib/junit-4.10-HBASE-1.jar:/usr/lib/hbase/bin/../lib/kfs-0.3.jar:/usr/lib/hbase/bin/../lib/libthrift-0.8.0.jar:/usr/lib/hbase/bin/../lib/log4j-1.2.16.jar:/usr/lib/hbase/bin/../lib/management-api-3.0.0-b012.jar:/usr/lib/hbase/bin/../lib/metrics-core-2.1.2.jar:/usr/lib/hbase/bin/../lib/netty-3.2.4.Final.jar:/usr/lib/hbase/bin/../lib/netty-3.5.11.Final.jar:/usr/lib/hbase/bin/../lib/protobuf-java-2.4.0a.jar:/usr/lib/hbase/bin/../lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hbase/bin/../lib/servlet-api-2.5.jar:/usr/lib/hbase/bin/../lib/snappy-java-1.0.3.2.jar:/usr/lib/hbase/bin/../lib/stax-api-1.0.1.jar:/usr/lib/hbase/bin/../lib/velocity-1.7.jar:/usr/lib/hbase/bin/../lib/xmlenc-0.52.jar:/usr/lib/hbase/bin/../lib/zookeeper.jar:/etc/hadoop/conf:/usr/bin:/usr/etc:/usr/games:/usr/include:/usr/java:/usr/lib:/usr/lib64:/usr/libexec:/usr/local:/usr/sbin:/usr/share:/usr/src:/usr/tmp:/usr/lib/anaconda-runtime:/usr/lib/bigtop-tomcat:/usr/lib/bigtop-utils:/usr/lib/ConsoleKit:/usr/lib/cups:/usr/lib/debug:/usr/lib/flume:/usr/lib/games:/usr/lib/gcc:/usr/lib/hadoop:/usr/lib/hadoop-hdfs:/usr/lib/hadoop-httpfs:/usr/lib/hadoop-mapreduce:/usr/lib/hadoop-yarn:/usr/lib/hbase:/usr/lib/hive:/usr/lib/hue:/usr/lib/java:/usr/lib/java-1.3.1:/usr/lib/java-1.4.0:/usr/lib/java-1.4.1:/usr/lib/java-1.4.2:/usr/lib/java-1.5.0:/usr/lib/java-1.6.0:/usr/lib/java-1.7.0:/usr/lib/java-ext:/usr/lib/jvm:/usr/lib/jvm-commmon:/usr/lib/jvm-exports:/usr/lib/jvm-private:/usr/lib/locale:/usr/lib/lsb:/usr/lib/mahout:/usr/lib/oozie:/usr/lib/pig:/usr/lib/python2.6:/usr/lib/rpm:/usr/lib/sendmail:/usr/lib/sendmail.postfix:/usr/lib/whirr:/usr/lib/yum-plugins:/usr/lib/zookeeper:/usr/lib/zookeeper/bin:/usr/lib/zookeeper/conf:/usr/lib/zookeeper/lib:/usr/lib/zookeeper/zookeeper-3.4.5.jar:/usr/lib/zookeeper/zookeeper.jar:/usr/lib/zookeeper/lib/jline-0.9.94.jar:/usr/lib/zookeeper/lib/log4j-1.2.15.jar:/usr/lib/zookeeper/lib/netty-3.2.2.Final.jar:/etc/hadoop/conf:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/avro-1.5.3.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/zookeeper-3.4.2.jar:/usr/lib/hadoop/.//bin:/usr/lib/hadoop/.//client:/usr/lib/hadoop/.//etc:/usr/lib/hadoop/.//hadoop-annotations-2.0.5-alpha.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.5-alpha.jar:/usr/lib/hadoop/.//hadoop-common-2.0.5-alpha.jar:/usr/lib/hadoop/.//hadoop-common-2.0.5-alpha-tests.jar:/usr/lib/hadoop/.//lib:/usr/lib/hadoop/.//libexec:/usr/lib/hadoop/.//sbin:/usr/contrib/capacity-scheduler/*.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.13.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/.//bin:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.5-alpha.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.5-alpha-tests.jar:/usr/lib/hadoop-hdfs/.//lib:/usr/lib/hadoop-hdfs/.//sbin:/usr/lib/hadoop-hdfs/.//webapps:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/avro-1.5.3.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/junit-4.8.2.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/netty-3.5.11.Final.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop-yarn/.//bin:/usr/lib/hadoop-yarn/.//etc:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-unmanaged-am-launcher-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-client-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.5-alpha.jar:/usr/lib/hadoop-yarn/.//lib:/usr/lib/hadoop-yarn/.//sbin:/usr/lib/hadoop-mapreduce/lib/aopalliance-1.0.jar:/usr/lib/hadoop-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-mapreduce/lib/avro-1.5.3.jar:/usr/lib/hadoop-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-mapreduce/lib/guice-3.0.jar:/usr/lib/hadoop-mapreduce/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-mapreduce/lib/javax.inject-1.jar:/usr/lib/hadoop-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-mapreduce/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-mapreduce/lib/netty-3.5.11.Final.jar:/usr/lib/hadoop-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop-mapreduce/.//bin:/usr/lib/hadoop-mapreduce/.//hadoop-archives-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-datajoin-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-distcp-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-extras-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-gridmix-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-app-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-common-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-core-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-hs-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-hs-plugins-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-jobclient-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-jobclient-2.0.5-alpha-tests.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-shuffle-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-examples-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-rumen-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//hadoop-streaming-2.0.5-alpha.jar:/usr/lib/hadoop-mapreduce/.//lib:/usr/lib/hadoop-mapreduce/.//sbin:/conf' -Djava.library.path=:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib:/usr/lib/hbase/bin/../lib/native/Linux-amd64-64 org.apache.flume.node.Application -n hdfs-agent -f ./flume_example.conf
13/11/16 20:47:52 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 1
13/11/16 20:47:52 INFO node.FlumeNode: Flume node starting - hdfs-agent
13/11/16 20:47:52 INFO nodemanager.DefaultLogicalNodeManager: Node manager starting
13/11/16 20:47:52 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 11
13/11/16 20:47:52 INFO properties.PropertiesFileConfigurationProvider: Configuration provider starting
13/11/16 20:47:52 INFO properties.PropertiesFileConfigurationProvider: Reloading configuration file:./flume_example.conf
13/11/16 20:47:52 INFO conf.FlumeConfiguration: Processing:hdfs-write
13/11/16 20:47:52 INFO conf.FlumeConfiguration: Processing:hdfs-write
13/11/16 20:47:52 INFO conf.FlumeConfiguration: Processing:hdfs-write
13/11/16 20:47:52 INFO conf.FlumeConfiguration: Added sinks: hdfs-write Agent: hdfs-agent
13/11/16 20:47:52 INFO conf.FlumeConfiguration: Processing:hdfs-write
13/11/16 20:47:52 INFO conf.FlumeConfiguration: Processing:hdfs-write
13/11/16 20:47:52 INFO conf.FlumeConfiguration: Processing:hdfs-write
13/11/16 20:47:52 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration  for agents: [hdfs-agent]
13/11/16 20:47:52 INFO properties.PropertiesFileConfigurationProvider: Creating channels
13/11/16 20:47:52 INFO properties.PropertiesFileConfigurationProvider: created channel memoryChannel
13/11/16 20:47:52 INFO interceptor.StaticInterceptor: Creating RegexFilteringInterceptor: regex=^echo.*,excludeEvents=true
13/11/16 20:47:52 INFO sink.DefaultSinkFactory: Creating instance of sink: hdfs-write, type: hdfs
13/11/16 20:47:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/11/16 20:47:54 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
13/11/16 20:47:54 INFO nodemanager.DefaultLogicalNodeManager: Starting new configuration:{ sourceRunners:{netcat-collect=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:netcat-collect,state:IDLE} }} sinkRunners:{hdfs-write=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@1bf3f158 counterGroup:{ name:null counters:{} } }} channels:{memoryChannel=org.apache.flume.channel.MemoryChannel{name: memoryChannel}} }
13/11/16 20:47:54 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel memoryChannel
13/11/16 20:47:54 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: memoryChannel, registered successfully.
13/11/16 20:47:54 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memoryChannel started
13/11/16 20:47:54 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink hdfs-write
13/11/16 20:47:54 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: hdfs-write, registered successfully.
13/11/16 20:47:54 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs-write started
13/11/16 20:47:54 INFO nodemanager.DefaultLogicalNodeManager: Starting Source netcat-collect
13/11/16 20:47:54 INFO source.NetcatSource: Source starting
13/11/16 20:47:54 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:11111]
Terminal 2: send record batches & read data by Hive
[surachart@centos01 ~]$ hadoop fs -ls  hdfs://localhost:8020/user/surachart/flume_example
13/11/16 20:48:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: `hdfs://localhost:8020/user/surachart/flume_example': No such file or directory
[surachart@centos01 ~]$
[surachart@centos01 ~]$ cd oa_lesson_1_source_and_acquire
[surachart@centos01 oa_lesson_1_source_and_acquire]$ head -n 20 example.xml | nc localhost 11111
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
[surachart@centos01 oa_lesson_1_source_and_acquire]$ hadoop fs -ls  hdfs://localhost:8020/user/surachart/flume_example
13/11/16 20:49:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
-rw-r--r--   1 surachart supergroup       1203 2013-11-16 20:49 hdfs://localhost:8020/user/surachart/flume_example/FlumeData.1384609744355
-rw-r--r--   1 surachart supergroup       1147 2013-11-16 20:49 hdfs://localhost:8020/user/surachart/flume_example/FlumeData.1384609744356
-rw-r--r--   1 surachart supergroup       1177 2013-11-16 20:49 hdfs://localhost:8020/user/surachart/flume_example/FlumeData.1384609744357
-rw-r--r--   1 surachart supergroup          0 2013-11-16 20:49 hdfs://localhost:8020/user/surachart/flume_example/FlumeData.1384609744358.tmp
[surachart@centos01 oa_lesson_1_source_and_acquire]$
[surachart@centos01 oa_lesson_1_source_and_acquire]$
[surachart@centos01 oa_lesson_1_source_and_acquire]$
[surachart@centos01 oa_lesson_1_source_and_acquire]$ hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/surachart/hive_job_log_surachart_201311162053_1256958590.txt
hive> show tables;
OK
Time taken: 13.118 seconds
hive> CREATE EXTERNAL TABLE flume_example (record string) ROW FORMAT DELIMITED LINES TERMINATED BY '\n' LOCATION '/user/surachart/flume_example';
OK
Time taken: 1.616 seconds
hive> show tables;                                                                                                                              OK
flume_example
Time taken: 0.413 seconds
hive>
    > SELECT xpath_int(record, "/record/PurchaseDate"), xpath_string(record, "/record/FlightsAvailable") FROM flume_example;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1384583763440_0003, Tracking URL = http://centos01:8088/proxy/application_1384583763440_0003/
Kill Command = /usr/bin/hadoop job  -kill job_1384583763440_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-11-16 20:55:19,963 Stage-1 map = 0%,  reduce = 0%
2013-11-16 20:55:50,703 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.68 sec
2013-11-16 20:55:52,539 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.68 sec
MapReduce Total cumulative CPU time: 3 seconds 680 msec
Ended Job = job_1384583763440_0003
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 3.68 sec   HDFS Read: 4433 HDFS Write: 357 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 680 msec
OK
1368111464      97, 13
1335377562      7, 67, 41, 43, 3
1383190668      97, 67
1366662121
1379048005      3, 43, 61
1376098854
1343518807      43, 29
1341610164
1353358109      41, 43, 97
1377499154
1339295392      23
1358062165      13, 23, 61
1338964230      3, 97
1348172911
1361978449      83, 31
1379977738
1347007379      83, 79, 7, 59
1341128392      3
1351321689      59, 13, 41, 67, 71
1384062904      2, 17, 41
Time taken: 58.03 seconds
Wow! Good for learning.

No comments: