Showing posts with label hdfs. Show all posts
Showing posts with label hdfs. Show all posts

Monday, October 06, 2014

rsyslog: Send logs to Flume

Good day for learning something new. After read Flume book, that something popped up in my head. Wanted to test "rsyslog" => Flume => HDFS. As we know, forwarding log to other systems. We can set rsyslog:
*.* @YOURSERVERADDRESS:YOURSERVERPORT ## for UDP
*.* @@YOURSERVERADDRESS:YOURSERVERPORT ## for TCP
For rsyslog:
[root@centos01 ~]# grep centos /etc/rsyslog.conf
*.* @centos01:7777
Came back to Flume, I used Simple Example for reference and changed a bit. Because I wanted it write to HDFS.
[root@centos01 ~]# grep "^FLUME_AGENT_NAME\="  /etc/default/flume-agent
FLUME_AGENT_NAME=a1
[root@centos01 ~]# cat /etc/flume/conf/flume.conf
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
#a1.sources.r1.type = netcat
a1.sources.r1.type = syslogudp
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 7777
# Describe the sink
#a1.sinks.k1.type = logger
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://localhost:8020/user/flume/syslog/%Y/%m/%d/%H/
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.batchSize = 10000
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 10000
a1.sinks.k1.hdfs.filePrefix = syslog
a1.sinks.k1.hdfs.round = true


# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
[root@centos01 ~]# /etc/init.d/flume-agent start
Flume NG agent is not running                              [FAILED]
Starting Flume NG agent daemon (flume-agent):              [  OK  ]
Tested to login by ssh.
[root@centos01 ~]#  tail -0f  /var/log/flume/flume.log
06 Oct 2014 16:35:40,601 INFO  [hdfs-k1-call-runner-0] (org.apache.flume.sink.hdfs.BucketWriter.doOpen:208)  - Creating hdfs://localhost:8020/user/flume/syslog/2014/10/06/16//syslog.1412588139067.tmp
06 Oct 2014 16:36:10,957 INFO  [hdfs-k1-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.renameBucket:427)  - Renaming hdfs://localhost:8020/user/flume/syslog/2014/10/06/16/syslog.1412588139067.tmp to hdfs://localhost:8020/user/flume/syslog/2014/10/06/16/syslog.1412588139067
[root@centos01 ~]# hadoop fs -ls hdfs://localhost:8020/user/flume/syslog/2014/10/06/16/syslog.1412588139067
14/10/06 16:37:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   1 flume supergroup        299 2014-10-06 16:36 hdfs://localhost:8020/user/flume/syslog/2014/10/06/16/syslog.1412588139067
[root@centos01 ~]#
[root@centos01 ~]#
[root@centos01 ~]# hadoop fs -cat hdfs://localhost:8020/user/flume/syslog/2014/10/06/16/syslog.1412588139067
14/10/06 16:37:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
sshd[20235]: Accepted password for surachart from 192.168.111.16 port 65068 ssh2
sshd[20235]: pam_unix(sshd:session): session opened for user surachart by (uid=0)
su: pam_unix(su-l:session): session opened for user root by surachart(uid=500)
su: pam_unix(su-l:session): session closed for user root
Look good... Anyway, It needs to adapt more...



Tuesday, July 02, 2013

Oracle 12.1.0.1.0 JDBC - java.sql.SQLException: Could not commit with auto-commit set on

Actually, I had not pointed what's wrong about JDBC on 12c database. I thought I will learn more about it later. I thought It's a good day for me to learn something new... 12c database. No! not yet.
I just read about Apache Sqoop book and interested in it. So, some question popped up in my head... Use Sqoop connects to Oracle Database 12c. (Export data from HDFS, then Import to 12c Pluggable DB)

Point! - learn Sqoop and Test a bit about Oracle 12.1.0.1.0 JDBC 4.0.

I began by using "sqoop-1.4.2-cdh4.2.1" for test, why? I had install Kiji for Apache Hadoop + Hbase. It was easy for me ^___________^
I used Sqoop-1 before, I will learn more Sqoop-2 later :)
After I downloaded, copied jdbc driver (Oracle 12.1.0.1.0 JDBC 4.0 compiled with JDK6) from Oracle 12c hHme and ran... I got some error message.
ERROR manager.OracleManager: Failed to rollback transaction
java.sql.SQLException: Could not rollback with auto-commit set on
So, I changed to use jdbc driver (Oracle 11.2.0.3.0 JDBC 4.0 compiled with JDK6), but I have not got that error message. I suspected something might be wrong on "Oracle 12.1.0.1.0 JDBC". (I will read more about it later).
On my test - Sqoop-1 with 12c Database.
[oracle@test12c ~]$ wget http://archive.cloudera.com/cdh4/cdh/4/sqoop-1.4.2-cdh4.2.1.tar.gz
--2013-07-02 13:52:15--  http://archive.cloudera.com/cdh4/cdh/4/sqoop-1.4.2-cdh4.2.1.tar.gz
Resolving archive.cloudera.com... 184.73.217.71
Connecting to archive.cloudera.com|184.73.217.71|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6294861 (6.0M) [application/x-gzip]
Saving to: “sqoop-1.4.2-cdh4.2.1.tar.gzâ€

100%[======================================================================================================>] 6,294,861    247K/s   in 2m 49s

2013-07-02 13:55:31 (36.3 KB/s) - “sqoop-1.4.2-cdh4.2.1.tar.gzâ€

[oracle@test12c ~]$

[oracle@test12c ~]$ tar zxf sqoop-1.4.2-cdh4.2.1.tar.gz
[oracle@test12c ~]$ cd sqoop-1.4.2-cdh4.2.1
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop list-databases --connect  jdbc:oracle:thin:@localhost:1521/orcl --username system  -P
+======================================================================+
|      Error: JAVA_HOME is not set and Java could not be found         |
+----------------------------------------------------------------------+
| Please download the latest Sun JDK from the Sun Java web site        |
|       > http://java.sun.com/javase/downloads/ <                      |
|                                                                      |
| HBase requires Java 1.6 or later.                                    |
| NOTE: This script will find Sun Java whether you install using the   |
|       binary or the RPM based installer.                             |
+======================================================================+
+======================================================================+
|      Error: JAVA_HOME is not set and Java could not be found         |
+----------------------------------------------------------------------+
| Please download the latest Sun JDK from the Sun Java web site        |
|       > http://java.sun.com/javase/downloads/ <                      |
|                                                                      |
| Hadoop requires Java 1.6 or later.                                   |
| NOTE: This script will find Sun Java whether you install using the   |
|       binary or the RPM based installer.                             |
+======================================================================+

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ export JAVA_HOME=/u01/app/oracle/product/12.1.0/dbhome_1/jdk
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ export PATH=$JAVA_HOME/bin:$PATH
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop list-databases --connect  jdbc:oracle:thin:@localhost:1521/orcl --username system  -P
Enter password:
13/07/02 14:05:50 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/02 14:05:50 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver
java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver
        at org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:275)
        at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
        at org.apache.sqoop.manager.OracleManager.listDatabases(OracleManager.java:604)
        at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ cp /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar lib/
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop list-databases --connect  jdbc:oracle:thin:@localhost:1521/orcl --username system  -P
Enter password:
13/07/02 14:06:43 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/02 14:06:45 INFO manager.OracleManager: Time zone has been set to GMT
13/07/02 14:06:45 ERROR manager.OracleManager: Failed to rollback transaction
java.sql.SQLException: Could not rollback with auto-commit set on

        at oracle.jdbc.driver.PhysicalConnection.rollback(PhysicalConnection.java:4506)
        at org.apache.sqoop.manager.OracleManager.listDatabases(OracleManager.java:615)
        at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
13/07/02 14:06:45 ERROR manager.OracleManager: Failed to list databases
java.sql.SQLException: Could not commit with auto-commit set on

        at oracle.jdbc.driver.PhysicalConnection.commit(PhysicalConnection.java:4439)
        at oracle.jdbc.driver.PhysicalConnection.commit(PhysicalConnection.java:4486)
        at org.apache.sqoop.manager.OracleManager.listDatabases(OracleManager.java:612)
        at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
AUDSYS
GSMUSER
SPATIAL_WFS_ADMIN_USR
SPATIAL_CSW_ADMIN_USR
APEX_PUBLIC_USER
SYSDG
DIP
SYSBACKUP
MDDATA
GSMCATUSER
SYSKM
XS$NULL
OJVMSYS
C##DB_DBA1
ORACLE_OCM
OLAPSYS
SI_INFORMTN_SCHEMA
DVSYS
ORDPLUGINS
XDB
ANONYMOUS
CTXSYS
ORDDATA
GSMADMIN_INTERNAL
APPQOSSYS
APEX_040200
WMSYS
DBSNMP
ORDSYS
MDSYS
DVF
FLOWS_FILES
SYS
SYSTEM
OUTLN
LBACSYS
Got data, but error  - java.sql.SQLException: Could not rollback with auto-commit set on
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ md5sum /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar
b99a9e9b93aa31b787d174a1dbec7cc4  /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ls -la /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar
-rw-r--r--. 1 oracle oinstall 3389454 Apr  4 15:15 /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ls -la ~/ojdbc6-11.2.jar
-rw-r--r--. 1 oracle oinstall 2714189 Jul  2 13:29 /home/oracle/ojdbc6-11.2.jar
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ md5sum /home/oracle/ojdbc6-11.2.jar
54c41acb9df6465f45a931fbe9734e1a  /home/oracle/ojdbc6-11.2.jar
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$  java -jar /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar
Oracle 12.1.0.1.0 JDBC 4.0 compiled with JDK6 on Thu_Apr_04_15:06:58_PDT_2013
#Default Connection Properties Resource
#Tue Jul 02 14:10:14 ICT 2013

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ java -jar /home/oracle/ojdbc6-11.2.jar
Oracle 11.2.0.3.0 JDBC 4.0 compiled with JDK6 on Fri_Aug_26_08:19:15_PDT_2011
#Default Connection Properties Resource
#Tue Jul 02 14:10:26 ICT 2013

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ java -jar  lib/ojdbc6.jar
Oracle 12.1.0.1.0 JDBC 4.0 compiled with JDK6 on Thu_Apr_04_15:06:58_PDT_2013
#Default Connection Properties Resource
#Tue Jul 02 14:10:44 ICT 2013

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ rm lib/ojdbc6.jar

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ cp /home/oracle/ojdbc6-11.2.jar lib/
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ java -jar  lib/ojdbc6-11.2.jar
Oracle 11.2.0.3.0 JDBC 4.0 compiled with JDK6 on Fri_Aug_26_08:19:15_PDT_2011
#Default Connection Properties Resource
#Tue Jul 02 14:11:15 ICT 2013

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop list-databases --connect  jdbc:oracle:thin:@localhost:1521/orcl --username system  -P
Enter password:
13/07/02 14:11:30 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/02 14:11:31 INFO manager.OracleManager: Time zone has been set to GMT
AUDSYS
GSMUSER
SPATIAL_WFS_ADMIN_USR
SPATIAL_CSW_ADMIN_USR
APEX_PUBLIC_USER
SYSDG
DIP
SYSBACKUP
MDDATA
GSMCATUSER
SYSKM
XS$NULL
OJVMSYS
C##DB_DBA1
ORACLE_OCM
OLAPSYS
SI_INFORMTN_SCHEMA
DVSYS
ORDPLUGINS
XDB
ANONYMOUS
CTXSYS
ORDDATA
GSMADMIN_INTERNAL
APPQOSSYS
APEX_040200
WMSYS
DBSNMP
ORDSYS
MDSYS
DVF
FLOWS_FILES
SYS
SYSTEM
OUTLN
LBACSYS
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$
Got data and no error message. So, tried to connect Pluggable Database (Service "orclpdb" )
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ lsnrctl status

LSNRCTL for Linux: Version 12.1.0.1.0 - Production on 02-JUL-2013 14:46:36

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=test12c)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 12.1.0.1.0 - Production
Start Date                02-JUL-2013 13:49:28
Uptime                    0 days 0 hr. 57 min. 8 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/test12c/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=test12c)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=test12c)(PORT=5500))(Security=(my_wallet_directory=/u01/app/oracle/admin/orcl/xdb_wallet))(Presentation=HTTP)(Session=RAW))
Services Summary...
Service "orcl" has 1 instance(s).
  Instance "orcl", status READY, has 1 handler(s) for this service...
Service "orclXDB" has 1 instance(s).
  Instance "orcl", status READY, has 1 handler(s) for this service...
Service "orclpdb" has 1 instance(s).
  Instance "orcl", status READY, has 1 handler(s) for this service...

Service "y" has 1 instance(s).
  Instance "orcl", status READY, has 1 handler(s) for this service...
The command completed successfully
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop list-databases --connect  jdbc:oracle:thin:@localhost:1521/orclpdb --username system  -P
Enter password:
13/07/02 14:54:29 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/02 14:54:33 INFO manager.OracleManager: Time zone has been set to GMT
SYS
SYSTEM
OLAPSYS
SI_INFORMTN_SCHEMA
PDBADMIN
DVSYS
AUDSYS
GSMUSER
ORDPLUGINS
SPATIAL_WFS_ADMIN_USR
SPATIAL_CSW_ADMIN_USR
XDB
APEX_PUBLIC_USER
SYSDG
DIP
OUTLN
ANONYMOUS
CTXSYS
ORDDATA
SYSBACKUP
MDDATA
GSMCATUSER
GSMADMIN_INTERNAL
LBACSYS
SYSKM
XS$NULL
OJVMSYS
APPQOSSYS
C##DB_DBA1
ORACLE_OCM
APEX_040200
WMSYS
DBSNMP
ORDSYS
MDSYS
DVF
FLOWS_FILES
[oracle@test12c sqoop-1.4.2-cdh4.2.1]

Wow! I used jdbc driver (Oracle 11.2.0.3.0 JDBC 4.0 compiled with JDK6) to connect Oracle 12c - Pluggable Database.
Something popped up again! Use Sqoop v.1 import data from HDFS to 12c Pluggable Database.

SQL> show con_name

CON_NAME
------------------------------
ORCLPDB

SQL> select * from demo.tb_demo;

no rows selected

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ hadoop fs -ls hdfs://localhost:8020/user/oracle/input
13/07/02 15:47:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   3 oracle supergroup         25 2013-07-02 15:09 hdfs://localhost:8020/user/oracle/input/1.txt
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ hadoop fs -cat hdfs://localhost:8020/user/oracle/input/1.txt
13/07/02 15:47:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1       A1
2       A2
3       A3
4       A4
5       A5
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$
[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop export  --direct  --connect   jdbc:oracle:thin:@localhost:1521/orclpdb --table demo.tb_demo --username  demo --password demo -export-dir /user/oracle/input  --input-fields-terminated-by "\t"  --columns a,b
13/07/02 15:45:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/07/02 15:45:35 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/02 15:45:35 INFO tool.CodeGenTool: Beginning code generation
13/07/02 15:45:36 INFO manager.OracleManager: Time zone has been set to GMT
13/07/02 15:45:37 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM demo.tb_demo t WHERE 1=0
13/07/02 15:45:37 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/oracle/kiji-bento-buri/cluster/bin/../lib/hadoop-2.0.0-mr1-cdh4.2.1
Note: /tmp/sqoop-oracle/compile/e2f345b29efbc16876b18ee822dcc33c/demo_tb_demo.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/07/02 15:45:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-oracle/compile/e2f345b29efbc16876b18ee822dcc33c/demo.tb_demo.jar
13/07/02 15:45:40 INFO mapreduce.ExportJobBase: Beginning export of demo.tb_demo
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/oracle/kiji-bento-buri/cluster/lib/hadoop-2.0.0-mr1-cdh4.2.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/oracle/kiji-bento-buri/cluster/lib/hbase-0.94.2-cdh4.2.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/07/02 15:45:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/07/02 15:45:44 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/02 15:45:45 INFO input.FileInputFormat: Total input paths to process : 1
13/07/02 15:45:45 INFO input.FileInputFormat: Total input paths to process : 1
13/07/02 15:45:46 INFO mapred.JobClient: Running job: job_20130702150710259_0002
13/07/02 15:45:47 INFO mapred.JobClient:  map 0% reduce 0%
13/07/02 15:46:08 INFO mapred.JobClient:  map 25% reduce 0%
13/07/02 15:46:14 INFO mapred.JobClient:  map 50% reduce 0%
13/07/02 15:46:22 INFO mapred.JobClient:  map 75% reduce 0%
13/07/02 15:46:27 INFO mapred.JobClient:  map 100% reduce 0%
13/07/02 15:46:31 INFO mapred.JobClient: Job complete: job_20130702150710259_0002
13/07/02 15:46:31 INFO mapred.JobClient: Counters: 24
13/07/02 15:46:31 INFO mapred.JobClient:   File System Counters
13/07/02 15:46:31 INFO mapred.JobClient:     FILE: Number of bytes read=0
13/07/02 15:46:31 INFO mapred.JobClient:     FILE: Number of bytes written=930532
13/07/02 15:46:31 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/02 15:46:31 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/02 15:46:31 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/02 15:46:31 INFO mapred.JobClient:     HDFS: Number of bytes read=544
13/07/02 15:46:31 INFO mapred.JobClient:     HDFS: Number of bytes written=0
13/07/02 15:46:31 INFO mapred.JobClient:     HDFS: Number of read operations=19
13/07/02 15:46:31 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/02 15:46:31 INFO mapred.JobClient:     HDFS: Number of write operations=0
13/07/02 15:46:31 INFO mapred.JobClient:   Job Counters
13/07/02 15:46:31 INFO mapred.JobClient:     Launched map tasks=4
13/07/02 15:46:31 INFO mapred.JobClient:     Rack-local map tasks=4
13/07/02 15:46:31 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=70714
13/07/02 15:46:31 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
13/07/02 15:46:31 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/02 15:46:31 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/02 15:46:31 INFO mapred.JobClient:   Map-Reduce Framework
13/07/02 15:46:31 INFO mapred.JobClient:     Map input records=5
13/07/02 15:46:31 INFO mapred.JobClient:     Map output records=5
13/07/02 15:46:31 INFO mapred.JobClient:     Input split bytes=461
13/07/02 15:46:31 INFO mapred.JobClient:     Spilled Records=0
13/07/02 15:46:31 INFO mapred.JobClient:     CPU time spent (ms)=5040
13/07/02 15:46:31 INFO mapred.JobClient:     Physical memory (bytes) snapshot=414687232
13/07/02 15:46:31 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=4217012224
13/07/02 15:46:31 INFO mapred.JobClient:     Total committed heap usage (bytes)=251396096
13/07/02 15:46:31 INFO mapreduce.ExportJobBase: Transferred 544 bytes in 47.8912 seconds (11.3591 bytes/sec)
13/07/02 15:46:31 INFO mapreduce.ExportJobBase: Exported 5 records.

SQL> show con_name

CON_NAME
------------------------------
ORCLPDB

SQL>  select * from demo.tb_demo;

         A B
---------- ------------------------------
         1 A1
         2 A2
         3 A3
         4 A4
         5 A5

SQL>
However, I am supposed to use jdbc driver (Oracle 12.1.0.1.0 JDBC 4.0 compiled with JDK6), then Tried it with Oracle 12.1.0.1.0 JDBC 4.0.

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ rm lib/ojdbc6-11.2.jar

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ cp /u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc6.jar  lib/

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ java -jar lib/ojdbc6.jar
Oracle 12.1.0.1.0 JDBC 4.0 compiled with JDK6 on Thu_Apr_04_15:06:58_PDT_2013
#Default Connection Properties Resource
#Tue Jul 02 15:54:35 ICT 2013

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ ./bin/sqoop export  --direct  --connect   jdbc:oracle:thin:@localhost:1521/orclpdb --table demo.tb_demo --username  demo --password demo -export-dir /user/oracle/input  --input-fields-terminated-by "\t"  --columns a,b
13/07/02 15:50:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/07/02 15:50:35 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/02 15:50:35 INFO tool.CodeGenTool: Beginning code generation
13/07/02 15:50:37 INFO manager.OracleManager: Time zone has been set to GMT
13/07/02 15:50:37 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM demo.tb_demo t WHERE 1=0
13/07/02 15:50:37 WARN manager.SqlManager: SQLException closing ResultSet: java.sql.SQLException: Could not commit with auto-commit set on
13/07/02 15:50:37 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/oracle/kiji-bento-buri/cluster/bin/../lib/hadoop-2.0.0-mr1-cdh4.2.1
Note: /tmp/sqoop-oracle/compile/00bc0bcd5dfe349e2c7383db51763a91/demo_tb_demo.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/07/02 15:50:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-oracle/compile/00bc0bcd5dfe349e2c7383db51763a91/demo.tb_demo.jar
13/07/02 15:50:40 INFO mapreduce.ExportJobBase: Beginning export of demo.tb_demo
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/oracle/kiji-bento-buri/cluster/lib/hadoop-2.0.0-mr1-cdh4.2.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/oracle/kiji-bento-buri/cluster/lib/hbase-0.94.2-cdh4.2.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/07/02 15:50:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/07/02 15:50:44 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/02 15:50:46 INFO input.FileInputFormat: Total input paths to process : 1
13/07/02 15:50:46 INFO input.FileInputFormat: Total input paths to process : 1
13/07/02 15:50:46 INFO mapred.JobClient: Running job: job_20130702150710259_0003
13/07/02 15:50:47 INFO mapred.JobClient:  map 0% reduce 0%
13/07/02 15:51:16 INFO mapred.JobClient:  map 25% reduce 0%
13/07/02 15:51:20 INFO mapred.JobClient:  map 50% reduce 0%
13/07/02 15:51:32 INFO mapred.JobClient:  map 75% reduce 0%
13/07/02 15:51:40 INFO mapred.JobClient:  map 100% reduce 0%
13/07/02 15:51:45 INFO mapred.JobClient: Job complete: job_20130702150710259_0003
13/07/02 15:51:45 INFO mapred.JobClient: Counters: 24
13/07/02 15:51:45 INFO mapred.JobClient:   File System Counters
13/07/02 15:51:45 INFO mapred.JobClient:     FILE: Number of bytes read=0
13/07/02 15:51:45 INFO mapred.JobClient:     FILE: Number of bytes written=930436
13/07/02 15:51:45 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/02 15:51:45 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/02 15:51:45 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/02 15:51:45 INFO mapred.JobClient:     HDFS: Number of bytes read=544
13/07/02 15:51:45 INFO mapred.JobClient:     HDFS: Number of bytes written=0
13/07/02 15:51:45 INFO mapred.JobClient:     HDFS: Number of read operations=19
13/07/02 15:51:45 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/02 15:51:45 INFO mapred.JobClient:     HDFS: Number of write operations=0
13/07/02 15:51:45 INFO mapred.JobClient:   Job Counters
13/07/02 15:51:45 INFO mapred.JobClient:     Launched map tasks=4
13/07/02 15:51:45 INFO mapred.JobClient:     Rack-local map tasks=4
13/07/02 15:51:45 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=94162
13/07/02 15:51:45 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
13/07/02 15:51:45 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/02 15:51:45 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/02 15:51:45 INFO mapred.JobClient:   Map-Reduce Framework
13/07/02 15:51:45 INFO mapred.JobClient:     Map input records=5
13/07/02 15:51:45 INFO mapred.JobClient:     Map output records=5
13/07/02 15:51:45 INFO mapred.JobClient:     Input split bytes=461
13/07/02 15:51:45 INFO mapred.JobClient:     Spilled Records=0
13/07/02 15:51:45 INFO mapred.JobClient:     CPU time spent (ms)=6030
13/07/02 15:51:45 INFO mapred.JobClient:     Physical memory (bytes) snapshot=418398208
13/07/02 15:51:45 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=4222406656
13/07/02 15:51:45 INFO mapred.JobClient:     Total committed heap usage (bytes)=251396096
13/07/02 15:51:45 INFO mapreduce.ExportJobBase: Transferred 544 bytes in 61.7838 seconds (8.8049 bytes/sec)
13/07/02 15:51:45 INFO mapreduce.ExportJobBase: Exported 5 records.

[oracle@test12c sqoop-1.4.2-cdh4.2.1]$ exit
exit

SQL> l
  1*  select * from demo.tb_demo
SQL> /

         A B
---------- ------------------------------
         1 A1
         2 A2
         3 A3
         4 A4
         1 A1
         2 A2
         3 A3
         4 A4
         5 A5
         5 A5

10 rows selected.
When I tested with Sqoop (export) with jdbc driver (Oracle 12.1.0.1.0 JDBC 4.0 compiled with JDK6) ... I had not found error message. It showed "WARN manager.SqlManager: SQLException closing ResultSet: java.sql.SQLException: Could not commit with auto-commit set on" only. I thought that's all right anyway.


Related posts:
Learned a little bit about importing data from MySQL into HDFS using Sqoop
Learned Kiji in a few minutes

Thursday, November 01, 2012

Learn - hadoop-fuse-dfs


Read Mounting HDFS. There is something very interesting. As people know HDFS is Hadoop Distributed File System. So, i wanted to test a bit about hadoop-fuse-dfs. Somehow, I had impala for test only. I thought it's easy to use it and learn.
-bash-4.1$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
rootfs                24607156   4370100  19987060  18% /
devtmpfs               1534680       200   1534480   1% /dev
tmpfs                  1544584       112   1544472   1% /dev/shm
/dev/sda1             24607156   4370100  19987060  18% /
-bash-4.1$ sudo hadoop-fuse-dfs   rw -oserver=127.0.0.1 /mnt/
INFO /data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.1.0/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:164 Adding FUSE arg /mnt/
-bash-4.1$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
rootfs                24607156   4370104  19987056  18% /
devtmpfs               1534680       200   1534480   1% /dev
tmpfs                  1544584       112   1544472   1% /dev/shm
/dev/sda1             24607156   4370104  19987056  18% /
fuse_dfs              24576000         0  24576000   0% /mnt
-bash-4.1$
-bash-4.1$ ls /mnt/user/cloudera/
file.out  file.txt
-bash-4.1$ cat  /mnt/user/cloudera/file.txt
1,Surachart
2,John
3,Jame
-bash-4.1$
-bash-4.1$  hadoop fs -cat file.txt
1,Surachart
2,John
3,Jame
-bash-4.1$ cat /mnt/user/cloudera/file.txt
1,Surachart
2,John
3,Jame
-bash-4.1$ vi /mnt/user/cloudera/file.txt
-bash-4.1$
-bash-4.1$ cat /mnt/user/cloudera/file.txt
-bash-4.1$-- no data, used "cat" command again  --
-bash-4.1$ cat /mnt/user/cloudera/file.txt
1,Surachart
2,John
3,Jame
4,Chart
-bash-4.1$  hadoop fs -cat file.txt
1,Surachart
2,John
3,Jame
4,Chart
-bash-4.1$  hadoop fs -ls
Found 4 items
-rw-r--r--   1 root     cloudera          0 2012-11-01 09:46 .file.txt.swp
-rw-r--r--   1 root     cloudera          0 2012-11-01 09:49 .file.txt.swx
drwxr-xr-x   - cloudera cloudera          0 2012-11-01 09:29 file.out
-rw-r--r--   1 cloudera cloudera         34 2012-11-01 09:58 file.txt
From above example, I mount hdfs to /mnt path and modified file. After saved file (wq), I used "cat" command but no data in file. Then tried again, i found data. :)
Then, tested creating file.

-bash-4.1$ hadoop fs -ls file2.txt
ls: `file2.txt': No such file or directory
-bash-4.1$ echo "TEST1" > /mnt/user/cloudera/file2.txt
-bash-4.1$ echo "TEST2" >> /mnt/user/cloudera/file2.txt
-bash-4.1$ cat /mnt/user/cloudera/file2.txt
TEST1
TEST2
-bash-4.1$ hadoop fs -ls file2.txt
Found 1 items
-rw-r--r--   1 cloudera cloudera         12 2012-11-01 11:08 file2.txt
-bash-4.1$ hadoop fs -cat file2.txt
TEST1
TEST2
However, there supports some... operations.
Supports reads, writes, and directory operations (e.g., cp, ls, more, cat, find, less, rm, mkdir, mv, rmdir). Things like touch, chmod, chown, and permissions are in the works. Fuse-dfs currently shows all files as owned by nobody.
If tested more... Maybe, you might find something weird(not sure why?).
-bash-4.1$ cat /mnt/user/cloudera/file2.txt
TEST1
TEST2
-bash-4.1$ echo "TEST" > /mnt/user/cloudera/file2.txt
-bash-4.1$ cat /mnt/user/cloudera/file2.txt
-bash-4.1$-- no data -- 
-bash-4.1$ hadoop fs -cat file2.txt
-bash-4.1$-- no data -- 
Maybe, someone can help to explain :)... So, just learn and test before use.