Pivotal HD is a 100% Apache-compatible Hadoop distribution featuring a fully SQL compliant query engine for processing data stored in Hadoop. By adding rich, mature SQL processing, Pivotal HD allows enterprises to simplify development, expand Hadoop’s capabilities, increase productivity, and cut costs. It has been tested for scale on the 1000 node Pivotal Analytics Workbench to ensure that the stack works flawlessly in large enterprise deployments.
Wow! I thought it's not bad, if I will learn a little bit about it.
First of all, I downloaded Pivotal HD Single Node VM and added it in my virtualbox, then started vm and check. (ssh by using "gpadmin"/"password")
[gpadmin@pivhdsne ~]$ pwdThis was first step. Then Read More..
/home/gpadmin
[gpadmin@pivhdsne ~]$ ls
Attic Desktop Documents Downloads gpAdminLogs pivotal-samples workspace
[gpadmin@pivhdsne ~]$ cd Desktop/
[gpadmin@pivhdsne Desktop]$ ls
eclipse Pivotal_Community_Edition_VM_EULA_20130712_final.pdf Pivotal_Docs README README~ start_piv_hd.sh stop_piv_hd.sh
[gpadmin@pivhdsne Desktop]$ cat README
Pivotal HD 1.0.1 Single Node (VM)
Version 1
How to use this VM:
1. Start the Hadoop services using start_all.sh on the desktop
2. Follow the tutorials at http://pivotalhd.cfapps.io/getting-started/pivotalhd-vm.html
3. Leverage the Pivotal HD community at http://gopivotal.com/community for support
4. root and gpadmin accounts have password password
5. Command Center login is gpadmin, with password gpadmin
6. gpadmin account has sudo privileges
What is included:
1. Pivotal HD - Hadoop 2.x, Zookeeper, HBase, Hive, Pig, Mahout
2. Pivotal HAWQ
3. Pivotal Extension Framework (PXF)
4. Pivotal DataLoader
5. Product usage documentation
Other installed packages:
1. JDK 6
2. Ant
3. Maven
4. Eclipse
[gpadmin@pivhdsne Desktop]$ ./start_piv_hd.sh
Starting services
SUCCESS: Start complete
Using JAVA_HOME: /usr/java/jdk1.6.0_26
Starting dataloader in standalone mode...
Starting Embedded Zookeeper Server...
Sending output to /var/log/gphd/dataloader/dataloader-embedded-zk.log
Embedded Zookeeper Server started!
Starting dataloader scheduler...
Sending output to /var/log/gphd/dataloader/dataloader-scheduler.log
Dataloader Scheduler Started!
Starting dataloader manager...
Sending output to /var/log/gphd/dataloader/dataloader-manager.log
Dataloader Manager Started!
Dataloader started!
20130725:01:16:18:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Starting gpstart with args: -a
20130725:01:16:19:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Gathering information and validating the environment...
20130725:01:16:32:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (HAWQ) 4.2.0 build 1'
20130725:01:16:41:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Greenplum Catalog Version: '201306170'
20130725:01:16:41:005738 gpstart:pivhdsne:gpadmin-[WARNING]:-postmaster.pid file exists on Master, checking if recovery startup required
20130725:01:16:41:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Commencing recovery startup checks
20130725:01:16:43:005738 gpstart:pivhdsne:gpadmin-[INFO]:-No socket connection or lock file in /tmp found for port=5432
20130725:01:16:43:005738 gpstart:pivhdsne:gpadmin-[INFO]:-No Master instance process, entering recovery startup mode
20130725:01:16:43:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Clearing Master instance pid file
20130725:01:16:43:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Starting Master instance in admin mode
20130725:01:17:18:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20130725:01:17:18:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Obtaining Segment details from master...
20130725:01:17:20:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Setting new master era
20130725:01:17:20:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Commencing forced instance shutdown
20130725:01:17:33:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Starting Master instance in admin mode
20130725:01:17:46:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20130725:01:17:46:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Obtaining Segment details from master...
20130725:01:17:47:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Setting new master era
20130725:01:17:47:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Master Started...
20130725:01:17:47:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Shutting down master
20130725:01:17:59:005738 gpstart:pivhdsne:gpadmin-[INFO]:-No standby master configured. skipping...
20130725:01:18:00:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
...............
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Process results...
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-----------------------------------------------------
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:- Successful segment starts = 2
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:- Failed segment starts = 0
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:- Skipped segment starts (segments are marked down in configuration) = 0
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-----------------------------------------------------
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Successfully started 2 of 2 segment instances
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-----------------------------------------------------
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Starting Master instance pivhdsne.localdomain directory /data/1/hawq_master/gpseg-1
20130725:01:18:23:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Command pg_ctl reports Master pivhdsne.localdomain instance active
20130725:01:18:23:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Database successfully started
After I read a bit, I thought I should test some sample. I chose to begin with Setting up the Development Environment. ...some step from link.
[gpadmin@pivhdsne customer_first_and_last_order_dates]$ hadoop jar target/customer_first_and_last_order_dates-1.0.jar com.pivotal.hadoop.CustomerFirstLastOrderDateDriver /retail_demo/orders/orders.tsv.gz /output-mr2I forgot to show url (http://ip:50000)
13/07/25 04:28:22 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/07/25 04:28:24 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/07/25 04:28:25 INFO input.FileInputFormat: Total input paths to process : 1
13/07/25 04:28:26 WARN snappy.LoadSnappy: Snappy native library is available
13/07/25 04:28:26 INFO snappy.LoadSnappy: Snappy native library loaded
13/07/25 04:28:27 INFO mapreduce.JobSubmitter: number of splits:1
13/07/25 04:28:27 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/07/25 04:28:27 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
13/07/25 04:28:27 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/07/25 04:28:27 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/07/25 04:28:27 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/07/25 04:28:27 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/07/25 04:28:27 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/07/25 04:28:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1374729279703_0002
13/07/25 04:28:30 INFO client.YarnClientImpl: Submitted application application_1374729279703_0002 to ResourceManager at pivhdsne/127.0.0.2:8032
13/07/25 04:28:30 INFO mapreduce.Job: The url to track the job: http://pivhdsne:8088/proxy/application_1374729279703_0002/
13/07/25 04:28:30 INFO mapreduce.Job: Running job: job_1374729279703_0002
13/07/25 04:29:01 INFO mapreduce.Job: Job job_1374729279703_0002 running in uber mode : false
13/07/25 04:29:01 INFO mapreduce.Job: map 0% reduce 0%
13/07/25 04:29:59 INFO mapreduce.Job: map 3% reduce 0%
13/07/25 04:30:03 INFO mapreduce.Job: map 10% reduce 0%
13/07/25 04:30:06 INFO mapreduce.Job: map 22% reduce 0%
13/07/25 04:30:10 INFO mapreduce.Job: map 33% reduce 0%
13/07/25 04:30:13 INFO mapreduce.Job: map 45% reduce 0%
13/07/25 04:30:16 INFO mapreduce.Job: map 52% reduce 0%
13/07/25 04:30:20 INFO mapreduce.Job: map 59% reduce 0%
13/07/25 04:30:23 INFO mapreduce.Job: map 66% reduce 0%
13/07/25 04:30:32 INFO mapreduce.Job: map 100% reduce 0%
13/07/25 04:31:08 INFO mapreduce.Job: map 100% reduce 33%
13/07/25 04:31:11 INFO mapreduce.Job: map 100% reduce 66%
13/07/25 04:31:26 INFO mapreduce.Job: map 100% reduce 67%
13/07/25 04:31:33 INFO mapreduce.Job: map 100% reduce 68%
13/07/25 04:31:36 INFO mapreduce.Job: map 100% reduce 69%
13/07/25 04:31:39 INFO mapreduce.Job: map 100% reduce 74%
13/07/25 04:31:43 INFO mapreduce.Job: map 100% reduce 78%
13/07/25 04:31:46 INFO mapreduce.Job: map 100% reduce 82%
13/07/25 04:31:49 INFO mapreduce.Job: map 100% reduce 87%
13/07/25 04:31:53 INFO mapreduce.Job: map 100% reduce 91%
13/07/25 04:31:56 INFO mapreduce.Job: map 100% reduce 96%
13/07/25 04:31:59 INFO mapreduce.Job: map 100% reduce 98%
13/07/25 04:32:02 INFO mapreduce.Job: map 100% reduce 100%
13/07/25 04:32:02 INFO mapreduce.Job: Job job_1374729279703_0002 completed successfully
13/07/25 04:32:02 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=18946633
FILE: Number of bytes written=38031433
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=72797182
HDFS: Number of bytes written=11891611
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=184466
Total time spent by all reduces in occupied slots (ms)=257625
Map-Reduce Framework
Map input records=512071
Map output records=512071
Map output bytes=17922485
Map output materialized bytes=18946633
Input split bytes=118
Combine input records=0
Combine output records=0
Reduce input groups=167966
Reduce shuffle bytes=18946633
Reduce input records=512071
Reduce output records=167966
Spilled Records=1024142
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=2486
CPU time spent (ms)=67950
Physical memory (bytes) snapshot=1088774144
Virtual memory (bytes) snapshot=5095378944
Total committed heap usage (bytes)=658378752
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=72797064
File Output Format Counters
Bytes Written=11891611
[gpadmin@pivhdsne customer_first_and_last_order_dates]$
[gpadmin@pivhdsne customer_first_and_last_order_dates]$ hadoop fs -cat /output-mr2/part-r-00000 | wc -l
167966
[gpadmin@pivhdsne customer_first_and_last_order_dates]$ hadoop fs -cat /output-mr2/part-r-00000 | tail
54992348 6933068175 2010-10-04 02:21:44 6311149380 2010-10-11 18:21:12
54992896 8136297804 2010-10-02 11:33:38 6310999573 2010-10-11 19:01:48
54993581 6311050522 2010-10-11 21:43:05 8122646976 2010-10-14 16:28:00
54993992 6805481711 2010-10-01 15:32:50 8212538352 2010-10-07 00:20:50
54994403 7708210740 2010-10-08 06:41:29 8122646502 2010-10-14 06:36:05
54994814 8136355210 2010-10-02 15:36:27 7708874714 2010-10-08 19:50:08
54994951 6805748378 2010-10-01 05:38:36 8494440118 2010-10-09 04:44:55
54995088 8136355283 2010-10-02 23:29:08 5007019717 2010-10-13 12:32:23
54995225 6805524075 2010-10-01 08:26:01 6933068564 2010-10-04 23:02:24
54995773 6933024646 2010-10-04 11:57:57 5007019751 2010-10-13 03:36:25
[gpadmin@pivhdsne customer_first_and_last_order_dates]$
I thought Pivotal HD is a good product that I can use it to learn about Hadoop and etc (Hadoop 2.x, Zookeeper, HBase, Hive, Pig, Mahout and Pivotal HAWQ).
No comments:
Post a Comment