Surachart Opun's Blog: hadoop

Showing posts with label hadoop. Show all posts

Monday, March 31, 2014

MapR Sandbox for Hadoop Learning

I got email about MapR Sandbox, that is a fully functional Hadoop cluster running on a virtual machine (CentOS 6.5) that provides an intuitive web interface for both developers and administrators to get started with Hadoop. I belief it's a good idea to learn about Hadoop and its ecosystem. Users can download for VMware VM or VirtualBox. I downloaded for VirtualBox and imported it. I changed about network to use "Bridged Adapter". After started... I connected it http://ip-address:8443

Then, I selected "Launch HUE" and "Launch MCS", but got some error and fixed.
Finally, I could use HUE and MCS.

Hue is an interface for interacting with web applications that access the MapR File System (MapR-FS). Use the applications in HUE to access MapR-FS, work with tables, run Hive queries, MapReduce jobs, and Oozie workflows.

The MapR Control System (MCS) is a graphical, programmatic control panel for cluster administration that provides complete cluster monitoring functionality and most of the functionality of the command line.

After reviewing MapR Sandbox for VirtualBox, "maprdev" account is development account that can sudo to be root.

login as: maprdev
Server refused our key
Using keyboard-interactive authentication.
Password:
Welcome to your Mapr Demo virtual machine.
[maprdev@maprdemo ~]$ sudo -l
Matching Defaults entries for maprdev on this host:
!visiblepw, always_set_home, env_reset, env_keep="COLORS DISPLAY HOSTNAME HISTSIZE INPUTRC KDEDIR LS_COLORS", env_keep+="MAIL PS1 PS2 QTDIR USERNAME LANG
LC_ADDRESS LC_CTYPE", env_keep+="LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES", env_keep+="LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE",
env_keep+="LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY", secure_path=/sbin\:/bin\:/usr/sbin\:/usr/bin
User maprdev may run the following commands on this host:
(ALL) NOPASSWD: ALL
[maprdev@maprdemo ~]$
[maprdev@maprdemo ~]$ sudo showmount -e localhost
Export list for localhost:
/mapr *
/mapr/my.cluster.com *
[maprdev@maprdemo ~]$

Read More

Thursday, July 25, 2013

Life Learning - Pivotal HD

I was unable to learn much as much people who have spent life with products, but I was interested in something new, for example: Pivotal HD and it was easy for download it as Pivotal HD Single Node VM.

Pivotal HD is a 100% Apache-compatible Hadoop distribution featuring a fully SQL compliant query engine for processing data stored in Hadoop. By adding rich, mature SQL processing, Pivotal HD allows enterprises to simplify development, expand Hadoop’s capabilities, increase productivity, and cut costs. It has been tested for scale on the 1000 node Pivotal Analytics Workbench to ensure that the stack works flawlessly in large enterprise deployments.
Wow! I thought it's not bad, if I will learn a little bit about it.
First of all, I downloaded Pivotal HD Single Node VM and added it in my virtualbox, then started vm and check. (ssh by using "gpadmin"/"password")

[gpadmin@pivhdsne ~]$ pwd
/home/gpadmin
[gpadmin@pivhdsne ~]$ ls
Attic Desktop Documents Downloads gpAdminLogs pivotal-samples workspace
[gpadmin@pivhdsne ~]$ cd Desktop/
[gpadmin@pivhdsne Desktop]$ ls
eclipse Pivotal_Community_Edition_VM_EULA_20130712_final.pdf Pivotal_Docs README README~ start_piv_hd.sh stop_piv_hd.sh
[gpadmin@pivhdsne Desktop]$ cat README
Pivotal HD 1.0.1 Single Node (VM)
Version 1

How to use this VM:
1. Start the Hadoop services using start_all.sh on the desktop
2. Follow the tutorials at http://pivotalhd.cfapps.io/getting-started/pivotalhd-vm.html
3. Leverage the Pivotal HD community at http://gopivotal.com/community for support
4. root and gpadmin accounts have password password
5. Command Center login is gpadmin, with password gpadmin
6. gpadmin account has sudo privileges

What is included:
1. Pivotal HD - Hadoop 2.x, Zookeeper, HBase, Hive, Pig, Mahout
2. Pivotal HAWQ
3. Pivotal Extension Framework (PXF)
4. Pivotal DataLoader
5. Product usage documentation

Other installed packages:
1. JDK 6
2. Ant
3. Maven
4. Eclipse

[gpadmin@pivhdsne Desktop]$ ./start_piv_hd.sh
Starting services
SUCCESS: Start complete
Using JAVA_HOME: /usr/java/jdk1.6.0_26
Starting dataloader in standalone mode...
Starting Embedded Zookeeper Server...
Sending output to /var/log/gphd/dataloader/dataloader-embedded-zk.log
Embedded Zookeeper Server started!
Starting dataloader scheduler...
Sending output to /var/log/gphd/dataloader/dataloader-scheduler.log
Dataloader Scheduler Started!
Starting dataloader manager...
Sending output to /var/log/gphd/dataloader/dataloader-manager.log
Dataloader Manager Started!
Dataloader started!
20130725:01:16:18:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Starting gpstart with args: -a
20130725:01:16:19:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Gathering information and validating the environment...
20130725:01:16:32:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (HAWQ) 4.2.0 build 1'
20130725:01:16:41:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Greenplum Catalog Version: '201306170'
20130725:01:16:41:005738 gpstart:pivhdsne:gpadmin-[WARNING]:-postmaster.pid file exists on Master, checking if recovery startup required
20130725:01:16:41:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Commencing recovery startup checks
20130725:01:16:43:005738 gpstart:pivhdsne:gpadmin-[INFO]:-No socket connection or lock file in /tmp found for port=5432
20130725:01:16:43:005738 gpstart:pivhdsne:gpadmin-[INFO]:-No Master instance process, entering recovery startup mode
20130725:01:16:43:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Clearing Master instance pid file
20130725:01:16:43:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Starting Master instance in admin mode
20130725:01:17:18:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20130725:01:17:18:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Obtaining Segment details from master...
20130725:01:17:20:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Setting new master era
20130725:01:17:20:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Commencing forced instance shutdown
20130725:01:17:33:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Starting Master instance in admin mode
20130725:01:17:46:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20130725:01:17:46:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Obtaining Segment details from master...
20130725:01:17:47:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Setting new master era
20130725:01:17:47:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Master Started...
20130725:01:17:47:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Shutting down master
20130725:01:17:59:005738 gpstart:pivhdsne:gpadmin-[INFO]:-No standby master configured. skipping...
20130725:01:18:00:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
...............
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Process results...
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-----------------------------------------------------
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-   Successful segment starts                                            = 2
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-----------------------------------------------------
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Successfully started 2 of 2 segment instances
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-----------------------------------------------------
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Starting Master instance pivhdsne.localdomain directory /data/1/hawq_master/gpseg-1
20130725:01:18:23:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Command pg_ctl reports Master pivhdsne.localdomain instance active
20130725:01:18:23:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Database successfully started

This was first step. Then Read More..
After I read a bit, I thought I should test some sample. I chose to begin with Setting up the Development Environment. ...some step from link.

[gpadmin@pivhdsne customer_first_and_last_order_dates]$ hadoop jar target/customer_first_and_last_order_dates-1.0.jar com.pivotal.hadoop.CustomerFirstLastOrderDateDriver /retail_demo/orders/orders.tsv.gz /output-mr2
13/07/25 04:28:22 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/07/25 04:28:24 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/07/25 04:28:25 INFO input.FileInputFormat: Total input paths to process : 1
13/07/25 04:28:26 WARN snappy.LoadSnappy: Snappy native library is available
13/07/25 04:28:26 INFO snappy.LoadSnappy: Snappy native library loaded
13/07/25 04:28:27 INFO mapreduce.JobSubmitter: number of splits:1
13/07/25 04:28:27 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/07/25 04:28:27 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
13/07/25 04:28:27 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/07/25 04:28:27 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/07/25 04:28:27 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/07/25 04:28:27 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/07/25 04:28:27 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/07/25 04:28:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1374729279703_0002
13/07/25 04:28:30 INFO client.YarnClientImpl: Submitted application application_1374729279703_0002 to ResourceManager at pivhdsne/127.0.0.2:8032
13/07/25 04:28:30 INFO mapreduce.Job: The url to track the job: http://pivhdsne:8088/proxy/application_1374729279703_0002/
13/07/25 04:28:30 INFO mapreduce.Job: Running job: job_1374729279703_0002
13/07/25 04:29:01 INFO mapreduce.Job: Job job_1374729279703_0002 running in uber mode : false
13/07/25 04:29:01 INFO mapreduce.Job: map 0% reduce 0%
13/07/25 04:29:59 INFO mapreduce.Job: map 3% reduce 0%
13/07/25 04:30:03 INFO mapreduce.Job: map 10% reduce 0%
13/07/25 04:30:06 INFO mapreduce.Job: map 22% reduce 0%
13/07/25 04:30:10 INFO mapreduce.Job: map 33% reduce 0%
13/07/25 04:30:13 INFO mapreduce.Job: map 45% reduce 0%
13/07/25 04:30:16 INFO mapreduce.Job: map 52% reduce 0%
13/07/25 04:30:20 INFO mapreduce.Job: map 59% reduce 0%
13/07/25 04:30:23 INFO mapreduce.Job: map 66% reduce 0%
13/07/25 04:30:32 INFO mapreduce.Job: map 100% reduce 0%
13/07/25 04:31:08 INFO mapreduce.Job: map 100% reduce 33%
13/07/25 04:31:11 INFO mapreduce.Job: map 100% reduce 66%
13/07/25 04:31:26 INFO mapreduce.Job: map 100% reduce 67%
13/07/25 04:31:33 INFO mapreduce.Job: map 100% reduce 68%
13/07/25 04:31:36 INFO mapreduce.Job: map 100% reduce 69%
13/07/25 04:31:39 INFO mapreduce.Job: map 100% reduce 74%
13/07/25 04:31:43 INFO mapreduce.Job: map 100% reduce 78%
13/07/25 04:31:46 INFO mapreduce.Job: map 100% reduce 82%
13/07/25 04:31:49 INFO mapreduce.Job: map 100% reduce 87%
13/07/25 04:31:53 INFO mapreduce.Job: map 100% reduce 91%
13/07/25 04:31:56 INFO mapreduce.Job: map 100% reduce 96%
13/07/25 04:31:59 INFO mapreduce.Job: map 100% reduce 98%
13/07/25 04:32:02 INFO mapreduce.Job: map 100% reduce 100%
13/07/25 04:32:02 INFO mapreduce.Job: Job job_1374729279703_0002 completed successfully
13/07/25 04:32:02 INFO mapreduce.Job: Counters: 43
        File System Counters
                FILE: Number of bytes read=18946633
                FILE: Number of bytes written=38031433
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=72797182
                HDFS: Number of bytes written=11891611
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=184466
                Total time spent by all reduces in occupied slots (ms)=257625
        Map-Reduce Framework
                Map input records=512071
                Map output records=512071
                Map output bytes=17922485
                Map output materialized bytes=18946633
                Input split bytes=118
                Combine input records=0
                Combine output records=0
                Reduce input groups=167966
                Reduce shuffle bytes=18946633
                Reduce input records=512071
                Reduce output records=167966
                Spilled Records=1024142
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=2486
                CPU time spent (ms)=67950
                Physical memory (bytes) snapshot=1088774144
                Virtual memory (bytes) snapshot=5095378944
                Total committed heap usage (bytes)=658378752
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=72797064
        File Output Format Counters
                Bytes Written=11891611
[gpadmin@pivhdsne customer_first_and_last_order_dates]$
[gpadmin@pivhdsne customer_first_and_last_order_dates]$ hadoop fs -cat /output-mr2/part-r-00000 | wc -l
167966
[gpadmin@pivhdsne customer_first_and_last_order_dates]$ hadoop fs -cat /output-mr2/part-r-00000 | tail
54992348        6933068175      2010-10-04 02:21:44     6311149380      2010-10-11 18:21:12
54992896        8136297804      2010-10-02 11:33:38     6310999573      2010-10-11 19:01:48
54993581        6311050522      2010-10-11 21:43:05     8122646976      2010-10-14 16:28:00
54993992        6805481711      2010-10-01 15:32:50     8212538352      2010-10-07 00:20:50
54994403        7708210740      2010-10-08 06:41:29     8122646502      2010-10-14 06:36:05
54994814        8136355210      2010-10-02 15:36:27     7708874714      2010-10-08 19:50:08
54994951        6805748378      2010-10-01 05:38:36     8494440118      2010-10-09 04:44:55
54995088        8136355283      2010-10-02 23:29:08     5007019717      2010-10-13 12:32:23
54995225        6805524075      2010-10-01 08:26:01     6933068564      2010-10-04 23:02:24
54995773        6933024646      2010-10-04 11:57:57     5007019751      2010-10-13 03:36:25
[gpadmin@pivhdsne customer_first_and_last_order_dates]$

I forgot to show url (http://ip:50000)

I thought Pivotal HD is a good product that I can use it to learn about Hadoop and etc (Hadoop 2.x, Zookeeper, HBase, Hive, Pig, Mahout and Pivotal HAWQ).

Tuesday, July 02, 2013

Oracle 12.1.0.1.0 JDBC - java.sql.SQLException: Could not commit with auto-commit set on

Actually, I had not pointed what's wrong about JDBC on 12c database. I thought I will learn more about it later. I thought It's a good day for me to learn something new... 12c database. No! not yet.
I just read about Apache Sqoop book and interested in it. So, some question popped up in my head... Use Sqoop connects to Oracle Database 12c. (Export data from HDFS, then Import to 12c Pluggable DB)

Point! - learn Sqoop and Test a bit about Oracle 12.1.0.1.0 JDBC 4.0.

I began by using "sqoop-1.4.2-cdh4.2.1" for test, why? I had install Kiji for Apache Hadoop + Hbase. It was easy for me ^___________^
I used Sqoop-1 before, I will learn more Sqoop-2 later :)
After I downloaded, copied jdbc driver (Oracle 12.1.0.1.0 JDBC 4.0 compiled with JDK6) from Oracle 12c hHme and ran... I got some error message.

ERROR manager.OracleManager: Failed to rollback transaction
java.sql.SQLException: Could not rollback with auto-commit set on

So, I changed to use jdbc driver (Oracle 11.2.0.3.0 JDBC 4.0 compiled with JDK6), but I have not got that error message. I suspected something might be wrong on "Oracle 12.1.0.1.0 JDBC". (I will read more about it later).
On my test - Sqoop-1 with 12c Database.

[oracle@test12c ~]$ wget http://archive.cloudera.com/cdh4/cdh/4/sqoop-1.4.2-cdh4.2.1.tar.gz
--2013-07-02 13:52:15-- http://archive.cloudera.com/cdh4/cdh/4/sqoop-1.4.2-cdh4.2.1.tar.gz
Resolving archive.cloudera.com... 184.73.217.71
Connecting to archive.cloudera.com|184.73.217.71|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6294861 (6.0M) [application/x-gzip]
Saving to: â€œsqoop-1.4.2-cdh4.2.1.tar.gzâ€

100%[======================================================================================================>] 6,294,861    247K/s   in 2m 49s

2013-07-02 13:55:31 (36.3 KB/s) - â€œsqoop-1.4.2-cdh4.2.1.tar.gzâ€

[oracle@test12c ~]$

[oracle@test12c ~]$ tar zxf sqoop-1.4.2-cdh4.2.1.tar.gz
[oracle@test12c ~]$ cd sqoop-1.4.2-cdh4.2.1
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop list-databases --connect jdbc:oracle:thin:@localhost:1521/orcl --username system -P
+======================================================================+
|      Error: JAVA_HOME is not set and Java could not be found         |
+----------------------------------------------------------------------+
| Please download the latest Sun JDK from the Sun Java web site        |
|       > http://java.sun.com/javase/downloads/ <                      |
|                                                                      |
| HBase requires Java 1.6 or later.                                    |
| NOTE: This script will find Sun Java whether you install using the   |
|       binary or the RPM based installer.                             |
+======================================================================+
+======================================================================+
|      Error: JAVA_HOME is not set and Java could not be found         |
+----------------------------------------------------------------------+
| Please download the latest Sun JDK from the Sun Java web site        |
|       > http://java.sun.com/javase/downloads/ <                      |
|                                                                      |
| Hadoop requires Java 1.6 or later.                                   |
| NOTE: This script will find Sun Java whether you install using the   |
|       binary or the RPM based installer.                             |
+======================================================================+

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ export JAVA_HOME=/u01/app/oracle/product/12.1.0/dbhome_1/jdk
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ export PATH=$JAVA_HOME/bin:$PATH
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop list-databases --connect jdbc:oracle:thin:@localhost:1521/orcl --username system -P
Enter password:
13/07/02 14:05:50 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/02 14:05:50 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver
java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver
        at org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:275)
        at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
        at org.apache.sqoop.manager.OracleManager.listDatabases(OracleManager.java:604)
        at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ cp /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar lib/
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop list-databases --connect jdbc:oracle:thin:@localhost:1521/orcl --username system -P
Enter password:
13/07/02 14:06:43 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/02 14:06:45 INFO manager.OracleManager: Time zone has been set to GMT
13/07/02 14:06:45 ERROR manager.OracleManager: Failed to rollback transaction
java.sql.SQLException: Could not rollback with auto-commit set on
        at oracle.jdbc.driver.PhysicalConnection.rollback(PhysicalConnection.java:4506)
        at org.apache.sqoop.manager.OracleManager.listDatabases(OracleManager.java:615)
        at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
13/07/02 14:06:45 ERROR manager.OracleManager: Failed to list databases
java.sql.SQLException: Could not commit with auto-commit set on
        at oracle.jdbc.driver.PhysicalConnection.commit(PhysicalConnection.java:4439)
        at oracle.jdbc.driver.PhysicalConnection.commit(PhysicalConnection.java:4486)
        at org.apache.sqoop.manager.OracleManager.listDatabases(OracleManager.java:612)
        at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
AUDSYS
GSMUSER
SPATIAL_WFS_ADMIN_USR
SPATIAL_CSW_ADMIN_USR
APEX_PUBLIC_USER
SYSDG
DIP
SYSBACKUP
MDDATA
GSMCATUSER
SYSKM
XS$NULL
OJVMSYS
C##DB_DBA1
ORACLE_OCM
OLAPSYS
SI_INFORMTN_SCHEMA
DVSYS
ORDPLUGINS
XDB
ANONYMOUS
CTXSYS
ORDDATA
GSMADMIN_INTERNAL
APPQOSSYS
APEX_040200
WMSYS
DBSNMP
ORDSYS
MDSYS
DVF
FLOWS_FILES
SYS
SYSTEM
OUTLN
LBACSYS

Got data, but error - java.sql.SQLException: Could not rollback with auto-commit set on

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ md5sum /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar
b99a9e9b93aa31b787d174a1dbec7cc4 /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ls -la /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar
-rw-r--r--. 1 oracle oinstall 3389454 Apr 4 15:15 /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ls -la ~/ojdbc6-11.2.jar
-rw-r--r--. 1 oracle oinstall 2714189 Jul 2 13:29 /home/oracle/ojdbc6-11.2.jar
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ md5sum /home/oracle/ojdbc6-11.2.jar
54c41acb9df6465f45a931fbe9734e1a /home/oracle/ojdbc6-11.2.jar
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ java -jar /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar
Oracle 12.1.0.1.0 JDBC 4.0 compiled with JDK6 on Thu_Apr_04_15:06:58_PDT_2013
#Default Connection Properties Resource
#Tue Jul 02 14:10:14 ICT 2013

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ java -jar /home/oracle/ojdbc6-11.2.jar
Oracle 11.2.0.3.0 JDBC 4.0 compiled with JDK6 on Fri_Aug_26_08:19:15_PDT_2011
#Default Connection Properties Resource
#Tue Jul 02 14:10:26 ICT 2013

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ java -jar lib/ojdbc6.jar
Oracle 12.1.0.1.0 JDBC 4.0 compiled with JDK6 on Thu_Apr_04_15:06:58_PDT_2013
#Default Connection Properties Resource
#Tue Jul 02 14:10:44 ICT 2013

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ rm lib/ojdbc6.jar

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ cp /home/oracle/ojdbc6-11.2.jar lib/
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ java -jar lib/ojdbc6-11.2.jar
Oracle 11.2.0.3.0 JDBC 4.0 compiled with JDK6 on Fri_Aug_26_08:19:15_PDT_2011
#Default Connection Properties Resource
#Tue Jul 02 14:11:15 ICT 2013

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop list-databases --connect jdbc:oracle:thin:@localhost:1521/orcl --username system -P
Enter password:
13/07/02 14:11:30 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/02 14:11:31 INFO manager.OracleManager: Time zone has been set to GMT
AUDSYS
GSMUSER
SPATIAL_WFS_ADMIN_USR
SPATIAL_CSW_ADMIN_USR
APEX_PUBLIC_USER
SYSDG
DIP
SYSBACKUP
MDDATA
GSMCATUSER
SYSKM
XS$NULL
OJVMSYS
C##DB_DBA1
ORACLE_OCM
OLAPSYS
SI_INFORMTN_SCHEMA
DVSYS
ORDPLUGINS
XDB
ANONYMOUS
CTXSYS
ORDDATA
GSMADMIN_INTERNAL
APPQOSSYS
APEX_040200
WMSYS
DBSNMP
ORDSYS
MDSYS
DVF
FLOWS_FILES
SYS
SYSTEM
OUTLN
LBACSYS
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$

Got data and no error message. So, tried to connect Pluggable Database (Service "orclpdb" )

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ lsnrctl status

LSNRCTL for Linux: Version 12.1.0.1.0 - Production on 02-JUL-2013 14:46:36

Copyright (c) 1991, 2013, Oracle. All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=test12c)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 12.1.0.1.0 - Production
Start Date                02-JUL-2013 13:49:28
Uptime                    0 days 0 hr. 57 min. 8 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/test12c/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=test12c)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=test12c)(PORT=5500))(Security=(my_wallet_directory=/u01/app/oracle/admin/orcl/xdb_wallet))(Presentation=HTTP)(Session=RAW))
Services Summary...
Service "orcl" has 1 instance(s).
Instance "orcl", status READY, has 1 handler(s) for this service...
Service "orclXDB" has 1 instance(s).
Instance "orcl", status READY, has 1 handler(s) for this service...
Service "orclpdb" has 1 instance(s).
Instance "orcl", status READY, has 1 handler(s) for this service...
Service "y" has 1 instance(s).
Instance "orcl", status READY, has 1 handler(s) for this service...
The command completed successfully
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop list-databases --connect jdbc:oracle:thin:@localhost:1521/orclpdb --username system -P
Enter password:
13/07/02 14:54:29 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/02 14:54:33 INFO manager.OracleManager: Time zone has been set to GMT
SYS
SYSTEM
OLAPSYS
SI_INFORMTN_SCHEMA
PDBADMIN
DVSYS
AUDSYS
GSMUSER
ORDPLUGINS
SPATIAL_WFS_ADMIN_USR
SPATIAL_CSW_ADMIN_USR
XDB
APEX_PUBLIC_USER
SYSDG
DIP
OUTLN
ANONYMOUS
CTXSYS
ORDDATA
SYSBACKUP
MDDATA
GSMCATUSER
GSMADMIN_INTERNAL
LBACSYS
SYSKM
XS$NULL
OJVMSYS
APPQOSSYS
C##DB_DBA1
ORACLE_OCM
APEX_040200
WMSYS
DBSNMP
ORDSYS
MDSYS
DVF
FLOWS_FILES
[oracle@test12c sqoop-1.4.2-cdh4.2.1]

Wow! I used jdbc driver (Oracle 11.2.0.3.0 JDBC 4.0 compiled with JDK6) to connect Oracle 12c - Pluggable Database.
Something popped up again! Use Sqoop v.1 import data from HDFS to 12c Pluggable Database.

SQL> show con_name

CON_NAME
------------------------------
ORCLPDB

SQL> select * from demo.tb_demo;

no rows selected

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ hadoop fs -ls hdfs://localhost:8020/user/oracle/input
13/07/02 15:47:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   3 oracle supergroup         25 2013-07-02 15:09 hdfs://localhost:8020/user/oracle/input/1.txt
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ hadoop fs -cat hdfs://localhost:8020/user/oracle/input/1.txt
13/07/02 15:47:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1       A1
2       A2
3       A3
4       A4
5       A5
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop export --direct --connect   jdbc:oracle:thin:@localhost:1521/orclpdb --table demo.tb_demo --username demo --password demo -export-dir /user/oracle/input --input-fields-terminated-by "\t" --columns a,b
13/07/02 15:45:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/07/02 15:45:35 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/02 15:45:35 INFO tool.CodeGenTool: Beginning code generation
13/07/02 15:45:36 INFO manager.OracleManager: Time zone has been set to GMT
13/07/02 15:45:37 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM demo.tb_demo t WHERE 1=0
13/07/02 15:45:37 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/oracle/kiji-bento-buri/cluster/bin/../lib/hadoop-2.0.0-mr1-cdh4.2.1
Note: /tmp/sqoop-oracle/compile/e2f345b29efbc16876b18ee822dcc33c/demo_tb_demo.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/07/02 15:45:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-oracle/compile/e2f345b29efbc16876b18ee822dcc33c/demo.tb_demo.jar
13/07/02 15:45:40 INFO mapreduce.ExportJobBase: Beginning export of demo.tb_demo
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/oracle/kiji-bento-buri/cluster/lib/hadoop-2.0.0-mr1-cdh4.2.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/oracle/kiji-bento-buri/cluster/lib/hbase-0.94.2-cdh4.2.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/07/02 15:45:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/07/02 15:45:44 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/02 15:45:45 INFO input.FileInputFormat: Total input paths to process : 1
13/07/02 15:45:45 INFO input.FileInputFormat: Total input paths to process : 1
13/07/02 15:45:46 INFO mapred.JobClient: Running job: job_20130702150710259_0002
13/07/02 15:45:47 INFO mapred.JobClient: map 0% reduce 0%
13/07/02 15:46:08 INFO mapred.JobClient: map 25% reduce 0%
13/07/02 15:46:14 INFO mapred.JobClient: map 50% reduce 0%
13/07/02 15:46:22 INFO mapred.JobClient: map 75% reduce 0%
13/07/02 15:46:27 INFO mapred.JobClient: map 100% reduce 0%
13/07/02 15:46:31 INFO mapred.JobClient: Job complete: job_20130702150710259_0002
13/07/02 15:46:31 INFO mapred.JobClient: Counters: 24
13/07/02 15:46:31 INFO mapred.JobClient:   File System Counters
13/07/02 15:46:31 INFO mapred.JobClient:     FILE: Number of bytes read=0
13/07/02 15:46:31 INFO mapred.JobClient:     FILE: Number of bytes written=930532
13/07/02 15:46:31 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/02 15:46:31 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/02 15:46:31 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/02 15:46:31 INFO mapred.JobClient:     HDFS: Number of bytes read=544
13/07/02 15:46:31 INFO mapred.JobClient:     HDFS: Number of bytes written=0
13/07/02 15:46:31 INFO mapred.JobClient:     HDFS: Number of read operations=19
13/07/02 15:46:31 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/02 15:46:31 INFO mapred.JobClient:     HDFS: Number of write operations=0
13/07/02 15:46:31 INFO mapred.JobClient:   Job Counters
13/07/02 15:46:31 INFO mapred.JobClient:     Launched map tasks=4
13/07/02 15:46:31 INFO mapred.JobClient:     Rack-local map tasks=4
13/07/02 15:46:31 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=70714
13/07/02 15:46:31 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
13/07/02 15:46:31 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/02 15:46:31 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/02 15:46:31 INFO mapred.JobClient:   Map-Reduce Framework
13/07/02 15:46:31 INFO mapred.JobClient:     Map input records=5
13/07/02 15:46:31 INFO mapred.JobClient:     Map output records=5
13/07/02 15:46:31 INFO mapred.JobClient:     Input split bytes=461
13/07/02 15:46:31 INFO mapred.JobClient:     Spilled Records=0
13/07/02 15:46:31 INFO mapred.JobClient:     CPU time spent (ms)=5040
13/07/02 15:46:31 INFO mapred.JobClient:     Physical memory (bytes) snapshot=414687232
13/07/02 15:46:31 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=4217012224
13/07/02 15:46:31 INFO mapred.JobClient:     Total committed heap usage (bytes)=251396096
13/07/02 15:46:31 INFO mapreduce.ExportJobBase: Transferred 544 bytes in 47.8912 seconds (11.3591 bytes/sec)
13/07/02 15:46:31 INFO mapreduce.ExportJobBase: Exported 5 records.

SQL> show con_name

CON_NAME
------------------------------
ORCLPDB

SQL> select * from demo.tb_demo;

         A B
---------- ------------------------------
         1 A1
         2 A2
         3 A3
         4 A4
         5 A5

SQL>

However, I am supposed to use jdbc driver (Oracle 12.1.0.1.0 JDBC 4.0 compiled with JDK6), then Tried it with Oracle 12.1.0.1.0 JDBC 4.0.

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ rm lib/ojdbc6-11.2.jar

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ cp /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar lib/

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ java -jar lib/ojdbc6.jar
Oracle 12.1.0.1.0 JDBC 4.0 compiled with JDK6 on Thu_Apr_04_15:06:58_PDT_2013
#Default Connection Properties Resource
#Tue Jul 02 15:54:35 ICT 2013

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop export --direct --connect   jdbc:oracle:thin:@localhost:1521/orclpdb --table demo.tb_demo --username demo --password demo -export-dir /user/oracle/input --input-fields-terminated-by "\t" --columns a,b
13/07/02 15:50:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/07/02 15:50:35 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/02 15:50:35 INFO tool.CodeGenTool: Beginning code generation
13/07/02 15:50:37 INFO manager.OracleManager: Time zone has been set to GMT
13/07/02 15:50:37 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM demo.tb_demo t WHERE 1=0
13/07/02 15:50:37 WARN manager.SqlManager: SQLException closing ResultSet: java.sql.SQLException: Could not commit with auto-commit set on
13/07/02 15:50:37 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/oracle/kiji-bento-buri/cluster/bin/../lib/hadoop-2.0.0-mr1-cdh4.2.1
Note: /tmp/sqoop-oracle/compile/00bc0bcd5dfe349e2c7383db51763a91/demo_tb_demo.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/07/02 15:50:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-oracle/compile/00bc0bcd5dfe349e2c7383db51763a91/demo.tb_demo.jar
13/07/02 15:50:40 INFO mapreduce.ExportJobBase: Beginning export of demo.tb_demo
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/oracle/kiji-bento-buri/cluster/lib/hadoop-2.0.0-mr1-cdh4.2.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/oracle/kiji-bento-buri/cluster/lib/hbase-0.94.2-cdh4.2.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/07/02 15:50:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/07/02 15:50:44 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/02 15:50:46 INFO input.FileInputFormat: Total input paths to process : 1
13/07/02 15:50:46 INFO input.FileInputFormat: Total input paths to process : 1
13/07/02 15:50:46 INFO mapred.JobClient: Running job: job_20130702150710259_0003
13/07/02 15:50:47 INFO mapred.JobClient: map 0% reduce 0%
13/07/02 15:51:16 INFO mapred.JobClient: map 25% reduce 0%
13/07/02 15:51:20 INFO mapred.JobClient: map 50% reduce 0%
13/07/02 15:51:32 INFO mapred.JobClient: map 75% reduce 0%
13/07/02 15:51:40 INFO mapred.JobClient: map 100% reduce 0%
13/07/02 15:51:45 INFO mapred.JobClient: Job complete: job_20130702150710259_0003
13/07/02 15:51:45 INFO mapred.JobClient: Counters: 24
13/07/02 15:51:45 INFO mapred.JobClient:   File System Counters
13/07/02 15:51:45 INFO mapred.JobClient:     FILE: Number of bytes read=0
13/07/02 15:51:45 INFO mapred.JobClient:     FILE: Number of bytes written=930436
13/07/02 15:51:45 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/02 15:51:45 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/02 15:51:45 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/02 15:51:45 INFO mapred.JobClient:     HDFS: Number of bytes read=544
13/07/02 15:51:45 INFO mapred.JobClient:     HDFS: Number of bytes written=0
13/07/02 15:51:45 INFO mapred.JobClient:     HDFS: Number of read operations=19
13/07/02 15:51:45 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/02 15:51:45 INFO mapred.JobClient:     HDFS: Number of write operations=0
13/07/02 15:51:45 INFO mapred.JobClient:   Job Counters
13/07/02 15:51:45 INFO mapred.JobClient:     Launched map tasks=4
13/07/02 15:51:45 INFO mapred.JobClient:     Rack-local map tasks=4
13/07/02 15:51:45 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=94162
13/07/02 15:51:45 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
13/07/02 15:51:45 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/02 15:51:45 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/02 15:51:45 INFO mapred.JobClient:   Map-Reduce Framework
13/07/02 15:51:45 INFO mapred.JobClient:     Map input records=5
13/07/02 15:51:45 INFO mapred.JobClient:     Map output records=5
13/07/02 15:51:45 INFO mapred.JobClient:     Input split bytes=461
13/07/02 15:51:45 INFO mapred.JobClient:     Spilled Records=0
13/07/02 15:51:45 INFO mapred.JobClient:     CPU time spent (ms)=6030
13/07/02 15:51:45 INFO mapred.JobClient:     Physical memory (bytes) snapshot=418398208
13/07/02 15:51:45 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=4222406656
13/07/02 15:51:45 INFO mapred.JobClient:     Total committed heap usage (bytes)=251396096
13/07/02 15:51:45 INFO mapreduce.ExportJobBase: Transferred 544 bytes in 61.7838 seconds (8.8049 bytes/sec)
13/07/02 15:51:45 INFO mapreduce.ExportJobBase: Exported 5 records.

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ exit
exit

SQL> l
1* select * from demo.tb_demo
SQL> /

         A B
---------- ------------------------------
         1 A1
         2 A2
         3 A3
         4 A4
         1 A1
         2 A2
         3 A3
         4 A4
         5 A5
         5 A5

10 rows selected.

When I tested with Sqoop (export) with jdbc driver (Oracle 12.1.0.1.0 JDBC 4.0 compiled with JDK6) ... I had not found error message. It showed "WARN manager.SqlManager: SQLException closing ResultSet: java.sql.SQLException: Could not commit with auto-commit set on" only. I thought that's all right anyway.

Related posts:
Learned a little bit about importing data from MySQL into HDFS using Sqoop
Learned Kiji in a few minutes

Monday, June 24, 2013

Learned Kiji in a few minutes

I had spent a bit time to learn "Introduction to Data Science". Today, I know the Kiji project. Kiji was built on Apache HBase and Apache Hadoop. I thought it's a good idea to learn HBase and Hadoop from it. You can read more about Architecture. So, I spent time a little bit about it. You can read on website, it's easy to install and use.

[surachart@linux01 ~]$ wget http://archive.kiji.org/tarballs/kiji-bento-albacore-1.0.5-release.tar.gz
[surachart@linux01 ~]$ tar xzf kiji-bento-*.tar.gz
[surachart@linux01 ~]$ cd kiji-bento-albacore
[surachart@linux01 kiji-bento-albacore]$ source bin/kiji-env.sh
Set KIJI_HOME=/home/surachart/kiji-bento-albacore/bin/..
Set KIJI_MR_HOME=/home/surachart/kiji-bento-albacore/bin/..
Set SCHEMA_SHELL_HOME=/home/surachart/kiji-bento-albacore/bin/../schema-shell
Set EXPRESS_HOME=/home/surachart/kiji-bento-albacore/bin/../express
Added kiji and kiji-schema-shell binaries to PATH.
Set BENTO_CLUSTER_HOME=/home/surachart/kiji-bento-albacore/cluster/bin/..
Set HADOOP_HOME=/home/surachart/kiji-bento-albacore/cluster/bin/../lib/hadoop-2.0.0-mr1-cdh4.1.2
Set HADOOP_CONF_DIR=/home/surachart/kiji-bento-albacore/cluster/bin/../lib/hadoop-2.0.0-mr1-cdh4.1.2/conf
Set HBASE_HOME=/home/surachart/kiji-bento-albacore/cluster/bin/../lib/hbase-0.92.1-cdh4.1.2
Set HBASE_CONF_DIR=/home/surachart/kiji-bento-albacore/cluster/bin/../lib/hbase-0.92.1-cdh4.1.2/conf
Added Hadoop, HBase, and bento-cluster binaries to PATH.

[surachart@linux01 kiji-bento-albacore]$ bento start
Running bento-cluster port configuration utility.

Hadoop configuration directory is: /home/surachart/kiji-bento-albacore/cluster/bin/../lib/hadoop-2.0.0-mr1-cdh4.1.2/conf

HBase configuration directory is: /home/surachart/kiji-bento-albacore/cluster/bin/../lib/hbase-0.92.1-cdh4.1.2/conf

Checking if default Hadoop/HBase ports are open...
Default Hadoop/HBase ports are open.
Using these in the Hadoop/HBase configuration for your cluster.

Writing bento-managed configuration.
Writing clean core-site.xml
Writing clean hdfs-site.xml
Writing clean mapred-site.xml
Writing clean hbase-site.xml
Configuration complete.

Starting bento-cluster...
Waiting for clusters to start...
bento-cluster started.

Cluster webapps can be visited at these web addresses:
HDFS NameNode:         http://localhost:50070
MapReduce JobTracker: http://localhost:50030
HBase Master:          http://localhost:60010

Cluster services are available on the following ports:
HDFS NameNode:        8020
MapReduce JobTracker: 8021
Zookeeper:            2181

[surachart@linux01 kiji-bento-albacore]$ bin/kiji install
Creating kiji instance: kiji://localhost:2181/default/
Creating meta tables for kiji instance in hbase...
13/06/24 12:36:33 INFO org.kiji.schema.KijiInstaller: Installing kiji instance '     '.
13/06/24 12:36:41 INFO org.kiji.schema.KijiInstaller: Installed kiji instance 'kiji://localhost:2181/default/'.
Successfully created kiji instance: kiji://localhost:2181/default/
[surachart@linux01 kiji-bento-albacore]$
[surachart@linux01 kiji-bento-albacore]$ kiji-schema-shell
Kiji schema shell v1.0.0
Enter 'help' for instructions (without quotes).
Enter 'quit' to quit.
DDL statements must be terminated with a ';'
schema> CREATE TABLE users WITH DESCRIPTION 'A table for user names and email addresses'
     ->     ROW KEY FORMAT HASH PREFIXED(2)
     ->     WITH LOCALITY GROUP default WITH DESCRIPTION 'main storage' (
     ->       MAXVERSIONS = INFINITY,
     ->       TTL = FOREVER,
     ->       INMEMORY = false,
     ->       COMPRESSED WITH GZIP,
     ->       FAMILY info WITH DESCRIPTION 'basic information' (
     ->         name "string" WITH DESCRIPTION 'the user\'s name',
     ->         email "string"));
OK.
schema> show tables;
Table           Description
=========       ==========================================
users           A table for user names and email addresses
schema> describe users;
Table: users (A table for user names and email addresses)
Row key:
        key: STRING NOT NULL

Column family: info
        Description: basic information

        Column info:name (the user's name)
                Schema: "string"

        Column info:email ()
                Schema: "string"
schema> quit
Thank you for flying Kiji!
[surachart@linux01 kiji-bento-albacore]$ kiji ls
kiji://localhost:2181/default/users

[surachart@linux01 kiji-bento-albacore]$ kiji put     --target=kiji://.env/default/users/info:name     --entity-id='"surachart@gmail.com"'     --value='"Surachart Opun"'
[surachart@linux01 kiji-bento-albacore]$ kiji put     --target=kiji://.env/default/users/info:email     --entity-id='"surachart@gmail.com"'     --value='"surachart@gmail.com"'
[surachart@linux01 kiji-bento-albacore]$
[surachart@linux01 kiji-bento-albacore]$ kiji scan default/users
Scanning kiji table: kiji://localhost:2181/default/users/
entity-id=['surachart@gmail.com'] [1372056917172] info:name
                                 Surachart Opun
entity-id=['surachart@gmail.com'] [1372057025106] info:email
                                 surachart@gmail.com

[surachart@linux01 kiji-bento-albacore]$

Note: I think this open-source framework version was built from Coudera 4.1.2. ^_______^
Read more!

Wednesday, May 29, 2013

Learn a little bit - Pig

I learned Introduction to Data Science by video lecture that talk about Pig. Teacher uses Pig to explain more about MapReduce. So, I was necessary to install Pig on my virutalbox (Hadoop test). Anyway, It's not difficult for installation and test it.
First, I chose to download binary and tested a little bit.

[surachart@linux01 ~]$ wget http://apache.cs.utah.edu/pig/stable/pig-0.11.1.tar.gz
[surachart@linux01 ~]$ tar zxf pig-0.11.1.tar.gz
[surachart@linux01 ~]$ ln -s pig-0.11.1 pig
[surachart@linux01 ~]$ export PATH=$PATH:$HOME/pig/bin
[surachart@linux01 ~]$ pig -x local
2013-05-29 15:22:27,126 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-05-29 15:22:27,129 [main] INFO org.apache.pig.Main - Logging error messages to: /home/surachart/pig_1369815747107.log
2013-05-29 15:22:27,264 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/surachart/.pigbootup not found
2013-05-29 15:22:27,717 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
grunt> quit
[surachart@linux01 ~]$ pig -x mapreduce
2013-05-29 15:22:46,385 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-05-29 15:22:46,389 [main] INFO org.apache.pig.Main - Logging error messages to: /home/surachart/pig_1369815766361.log
2013-05-29 15:22:46,531 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/surachart/.pigbootup not found
2013-05-29 15:22:47,436 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2013-05-29 15:22:49,820 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001
grunt> quit

I spent time in 10 minutes to download and was able to use it. Then, I downloaded pig-wordcount for word count testing.

[surachart@linux01 ~]$ tar xf pig-wordcount-7-26.tar
[surachart@linux01 ~]$ cd pig-wordcount
[surachart@linux01 pig-wordcount]$ ls
input.txt readme wordcount.pig
[surachart@linux01 pig-wordcount]$ cat readme
readme file for Pig tutorial

1) Run Pig Word Count using Local Mode
        bin/pig -x local wordcount.pig
2) Run Pig Word Count using Hadoop Mode
        a.configure Hadoop cluster
        b.bin/pig -x mapreduce wordcount.pig
[surachart@linux01 pig-wordcount]$ cat wordcount.pig
A = load './input.txt';
B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
C = group B by word;
D = foreach C generate COUNT(B), group;
store D into './wordcount';

[surachart@linux01 pig-wordcount]$ pig -x local wordcount.pig
2013-05-29 16:40:25,113 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-05-29 16:40:25,117 [main] INFO org.apache.pig.Main - Logging error messages to: /home/surachart/pig-wordcount/pig_1369820425090.log
2013-05-29 16:40:26,303 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/surachart/.pigbootup not found
2013-05-29 16:40:26,759 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2013-05-29 16:40:30,606 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
2013-05-29 16:40:31,258 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-05-29 16:40:31,348 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2013-05-29 16:40:31,462 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2013-05-29 16:40:31,463 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2013-05-29 16:40:31,605 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-05-29 16:40:31,688 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-05-29 16:40:31,712 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2013-05-29 16:40:31,730 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=93
2013-05-29 16:40:31,731 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-05-29 16:40:31,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-05-29 16:40:31,969 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-05-29 16:40:31,969 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-05-29 16:40:31,970 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Distributed cache not supported or needed in local mode. Setting key [pig.schematuple.local.dir] with code temp directory: /tmp/1369820431968-0
2013-05-29 16:40:32,404 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-05-29 16:40:32,499 [JobControl] INFO org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2013-05-29 16:40:32,536 [JobControl] WARN org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
2013-05-29 16:40:32,753 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-05-29 16:40:32,760 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-05-29 16:40:32,841 [JobControl] WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
2013-05-29 16:40:32,856 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-05-29 16:40:32,912 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-05-29 16:40:34,450 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2013-05-29 16:40:34,471 [pool-1-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local2053039924_0001_m_000000_0
2013-05-29 16:40:34,728 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local2053039924_0001
2013-05-29 16:40:34,729 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C,D
2013-05-29 16:40:34,729 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],B[2,4],D[4,4],C[3,4] C: D[4,4],C[3,4] R: D[4,4]
2013-05-29 16:40:34,864 [pool-1-thread-1] INFO org.apache.hadoop.util.ProcessTree - setsid exited with exit code 0
2013-05-29 16:40:34,911 [pool-1-thread-1] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5b76de14
2013-05-29 16:40:34,982 [pool-1-thread-1] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 93
Input split[0]:
   Length = 93
Locations:

-----------------------

2013-05-29 16:40:35,029 [pool-1-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/home/surachart/pig-wordcount/input.txt:0+93
2013-05-29 16:40:35,101 [pool-1-thread-1] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2013-05-29 16:40:35,184 [pool-1-thread-1] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2013-05-29 16:40:35,186 [pool-1-thread-1] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2013-05-29 16:40:35,384 [pool-1-thread-1] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2013-05-29 16:40:35,487 [pool-1-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[1,4],B[2,4],D[4,4],C[3,4] C: D[4,4],C[3,4] R: D[4,4]
2013-05-29 16:40:35,558 [pool-1-thread-1] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
2013-05-29 16:40:35,748 [pool-1-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine - Aliases being processed per job phase (AliasName[line,offset]): M: A[1,4],B[2,4],D[4,4],C[3,4] C: D[4,4],C[3,4] R: D[4,4]
2013-05-29 16:40:35,782 [pool-1-thread-1] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0
2013-05-29 16:40:35,808 [pool-1-thread-1] INFO org.apache.hadoop.mapred.Task - Task:attempt_local2053039924_0001_m_000000_0 is done. And is in the process of commiting
2013-05-29 16:40:35,843 [pool-1-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner -
2013-05-29 16:40:35,846 [pool-1-thread-1] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local2053039924_0001_m_000000_0' done.
2013-05-29 16:40:35,846 [pool-1-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local2053039924_0001_m_000000_0
2013-05-29 16:40:35,847 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-05-29 16:40:35,968 [Thread-3] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3ebc312f
2013-05-29 16:40:35,974 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2013-05-29 16:40:35,998 [Thread-3] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
2013-05-29 16:40:36,075 [Thread-3] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 156 bytes
2013-05-29 16:40:36,082 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2013-05-29 16:40:36,169 [Thread-3] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-05-29 16:40:36,235 [Thread-3] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: A[1,4],B[2,4],D[4,4],C[3,4] C: D[4,4],C[3,4] R: D[4,4]
2013-05-29 16:40:36,247 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task:attempt_local2053039924_0001_r_000000_0 is done. And is in the process of commiting
2013-05-29 16:40:36,257 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner -
2013-05-29 16:40:36,258 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task attempt_local2053039924_0001_r_000000_0 is allowed to commit now
2013-05-29 16:40:36,275 [Thread-3] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local2053039924_0001_r_000000_0' to file:/home/surachart/pig-wordcount/wordcount
2013-05-29 16:40:36,291 [Thread-3] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2013-05-29 16:40:36,292 [Thread-3] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local2053039924_0001_r_000000_0' done.
2013-05-29 16:40:36,840 [main] WARN org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob for job job_local2053039924_0001
2013-05-29 16:40:36,853 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-05-29 16:40:36,854 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2013-05-29 16:40:36,868 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId StartedAt       FinishedAt      Features
1.2.0   0.11.1 surachart       2013-05-29 16:40:31     2013-05-29 16:40:36     GROUP_BY

Success!

Job Stats (time in seconds):
JobId   Alias   Feature Outputs
job_local2053039924_0001        A,B,C,D GROUP_BY,COMBINER       file:///home/surachart/pig-wordcount/wordcount,

Input(s):
Successfully read records from: "file:///home/surachart/pig-wordcount/input.txt"

Output(s):
Successfully stored records in: "file:///home/surachart/pig-wordcount/wordcount"

Job DAG:
job_local2053039924_0001

2013-05-29 16:40:36,887 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
[surachart@linux01 pig-wordcount]$ cat wordcount/part-r-00000
2       in
1       for
2       pig
2       2012
1       word
1       count
2       school
2       summer
1       indiana
2       tutorial
[surachart@linux01 pig-wordcount]$
[surachart@linux01 pig-wordcount]$ hadoop dfs -put input.txt .
[surachart@linux01 pig-wordcount]$ hadoop dfs -cat input.txt
summer school 2012 in indiana
pig tutorial for summer school 2012
word count in pig tutorial
[surachart@linux01 pig-wordcount]$ less readme
[surachart@linux01 pig-wordcount]$ pig -x mapreduce wordcount.pig
2013-05-29 16:41:51,628 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-05-29 16:41:51,636 [main] INFO org.apache.pig.Main - Logging error messages to: /home/surachart/pig-wordcount/pig_1369820511607.log
2013-05-29 16:41:52,631 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/surachart/.pigbootup not found
2013-05-29 16:41:53,493 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2013-05-29 16:41:54,952 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001
2013-05-29 16:41:57,669 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
2013-05-29 16:41:58,250 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-05-29 16:41:58,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2013-05-29 16:41:58,448 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2013-05-29 16:41:58,448 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2013-05-29 16:41:58,806 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-05-29 16:41:58,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-05-29 16:41:58,903 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2013-05-29 16:41:58,913 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=93
2013-05-29 16:41:58,913 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-05-29 16:41:58,916 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7981723318394905164.jar
2013-05-29 16:42:11,650 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7981723318394905164.jar created
2013-05-29 16:42:11,720 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-05-29 16:42:11,749 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-05-29 16:42:11,750 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-05-29 16:42:11,755 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2013-05-29 16:42:12,129 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-05-29 16:42:12,634 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-05-29 16:42:13,430 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-05-29 16:42:13,435 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-05-29 16:42:13,495 [JobControl] INFO org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2013-05-29 16:42:13,496 [JobControl] WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
2013-05-29 16:42:13,507 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-05-29 16:42:15,162 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201305291026_0016
2013-05-29 16:42:15,162 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C,D
2013-05-29 16:42:15,162 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],B[2,4],D[4,4],C[3,4] C: D[4,4],C[3,4] R: D[4,4]
2013-05-29 16:42:15,163 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201305291026_0016
2013-05-29 16:42:36,017 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2013-05-29 16:43:05,312 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-05-29 16:43:05,323 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId StartedAt       FinishedAt      Features
1.2.0   0.11.1 surachart       2013-05-29 16:41:58     2013-05-29 16:43:05     GROUP_BY

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime        Alias   Feature Outputs
job_201305291026_0016   1       1       10      10      10      10      19      19      19      19      A,B,C,D GROUP_BY,COMBINER       hdfs://localhost:9000/user/surachart/wordcount,

Input(s):
Successfully read 3 records (458 bytes) from: "hdfs://localhost:9000/user/surachart/input.txt"

Output(s):
Successfully stored 10 records (78 bytes) in: "hdfs://localhost:9000/user/surachart/wordcount"

Counters:
Total records written : 10
Total bytes written : 78
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201305291026_0016

2013-05-29 16:43:05,408 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
[surachart@linux01 pig-wordcount]$ hadoop dfs -cat wordcount/part-r-00000
2       in
1       for
2       pig
2       2012
1       word
1       count
2       school
2       summer
1       indiana
2       tutorial
[surachart@linux01 pig-wordcount]$

As above examples, I used "wordcount.pig" script. It didn't sort words as I wanted. So, I changed it a bit and tested.

[surachart@linux01 ~]$ pig -x mapreduce
2013-05-29 19:07:00,396 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-05-29 19:07:00,412 [main] INFO org.apache.pig.Main - Logging error messages to: /home/surachart/pig_1369829220356.log
2013-05-29 19:07:00,565 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/surachart/.pigbootup not found
2013-05-29 19:07:01,429 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2013-05-29 19:07:03,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001
grunt> A = load './input.txt';
grunt> B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
grunt> C = group B by word;
grunt> D = ORDER C BY $0;
grunt> E = foreach D generate COUNT(B), group;
grunt> store E into './wordcount-opun';
2013-05-29 19:07:48,256 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,ORDER_BY
2013-05-29 19:07:49,056 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-05-29 19:07:49,355 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2013-05-29 19:07:49,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
2013-05-29 19:07:49,726 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-05-29 19:07:49,799 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-05-29 19:07:49,810 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2013-05-29 19:07:49,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=93
2013-05-29 19:07:49,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-05-29 19:07:49,821 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job8101458691848291674.jar
2013-05-29 19:08:03,816 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job8101458691848291674.jar created
2013-05-29 19:08:03,914 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-05-29 19:08:03,950 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-05-29 19:08:03,950 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-05-29 19:08:03,970 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2013-05-29 19:08:04,331 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-05-29 19:08:04,834 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-05-29 19:08:05,791 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-05-29 19:08:05,797 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-05-29 19:08:05,866 [JobControl] INFO org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2013-05-29 19:08:05,868 [JobControl] WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
2013-05-29 19:08:05,885 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-05-29 19:08:07,448 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201305291026_0021
2013-05-29 19:08:07,448 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C
2013-05-29 19:08:07,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],B[2,4],C[3,4] C: R:
2013-05-29 19:08:07,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201305291026_0021
2013-05-29 19:08:30,911 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 16% complete
2013-05-29 19:08:50,200 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
2013-05-29 19:09:02,519 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-05-29 19:09:02,523 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-05-29 19:09:02,527 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2013-05-29 19:09:02,587 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=112001
2013-05-29 19:09:02,594 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-05-29 19:09:02,604 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job9027641740102345362.jar
2013-05-29 19:09:15,302 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job9027641740102345362.jar created
2013-05-29 19:09:15,354 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-05-29 19:09:15,358 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-05-29 19:09:15,358 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-05-29 19:09:15,359 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2013-05-29 19:09:15,507 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-05-29 19:09:16,038 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-05-29 19:09:16,040 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-05-29 19:09:16,047 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-05-29 19:09:17,156 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201305291026_0022
2013-05-29 19:09:17,157 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases D
2013-05-29 19:09:17,157 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: D[4,4] C: R:
2013-05-29 19:09:17,158 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201305291026_0022
2013-05-29 19:09:41,586 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2013-05-29 19:10:06,039 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
2013-05-29 19:10:21,970 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-05-29 19:10:21,977 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-05-29 19:10:21,978 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-05-29 19:10:21,982 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job4575885537034789195.jar
2013-05-29 19:10:35,364 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job4575885537034789195.jar created
2013-05-29 19:10:35,412 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-05-29 19:10:35,415 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-05-29 19:10:35,415 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-05-29 19:10:35,416 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2013-05-29 19:10:35,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-05-29 19:10:36,182 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-05-29 19:10:36,183 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-05-29 19:10:36,192 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-05-29 19:10:37,290 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201305291026_0023
2013-05-29 19:10:37,290 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases D,E
2013-05-29 19:10:37,291 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: D[4,4] C: R: E[5,4]
2013-05-29 19:10:37,291 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201305291026_0023
2013-05-29 19:11:02,322 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 83% complete
2013-05-29 19:11:07,401 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 83% complete
2013-05-29 19:11:37,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-05-29 19:11:37,458 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId StartedAt       FinishedAt      Features
1.2.0   0.11.1 surachart       2013-05-29 19:07:49     2013-05-29 19:11:37     GROUP_BY,ORDER_BY

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime        Alias   Feature Outputs
job_201305291026_0021   1       1       11      11      11      11      19      19      19      19      A,B,C   GROUP_BY
job_201305291026_0022   1       1       13      13      13      13      24      24      24      24      D       SAMPLER
job_201305291026_0023   1       1       12      12      12      12      20      20      20      20      D,E     ORDER_BY        hdfs://localhost:9000/user/surachart/wordcount-opun,

Input(s):
Successfully read 3 records (458 bytes) from: "hdfs://localhost:9000/user/surachart/input.txt"

Output(s):
Successfully stored 10 records (78 bytes) in: "hdfs://localhost:9000/user/surachart/wordcount-opun"

Counters:
Total records written : 10
Total bytes written : 78
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201305291026_0021   ->      job_201305291026_0022,
job_201305291026_0022   ->      job_201305291026_0023,
job_201305291026_0023

2013-05-29 19:11:37,585 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
grunt> quit
[surachart@linux01 ~]$ hadoop fs -cat wordcount-opun/part*
2       2012
1       count
1       for
2       in
1       indiana
2       pig
2       school
2       summer
2       tutorial
1       word

Finally, I got result as I wanted ^_________^
Read more on Pig Document.

Surachart Opun's Blog

Monday, March 31, 2014

MapR Sandbox for Hadoop Learning

Thursday, July 25, 2013

Life Learning - Pivotal HD

Tuesday, July 02, 2013

Oracle 12.1.0.1.0 JDBC - java.sql.SQLException: Could not commit with auto-commit set on

Monday, June 24, 2013

Learned Kiji in a few minutes

Wednesday, May 29, 2013

Learn a little bit - Pig

Search This Blog

Blog Archives

Surachart Opun's Blog

Monday, March 31, 2014

MapR Sandbox for Hadoop Learning

Thursday, July 25, 2013

Life Learning - Pivotal HD

Tuesday, July 02, 2013

Oracle 12.1.0.1.0 JDBC - java.sql.SQLException: Could not commit with auto-commit set on

Monday, June 24, 2013

Learned Kiji in a few minutes

Wednesday, May 29, 2013

Learn a little bit - Pig

Search This Blog

Feed Me!

Blog Archives