Saturday, 17 September 2016

Installing Apache Hbase on Windows using Cygwin64


After installing hadoop on windows using Cygwin which we learnt in our previous blog(Installing Apache Hadoop on Windows 10 using Cygwin64), we now install Hbase on windows using Cygwin. 



Tools Used :


  • Apache Hbase 1.2.3
  • Cygwin64
  • Java 1.8
  • Hadoop 2.7.1



Download hbase-1.2.3-bin.tar.gz binary from here and place under c:/cygwin/root/usr/local.

Start Cygwin terminal as administrator and issue below commands to extract hbase-1.2.3-bin.tar.gz content.

$ cd /usr/local
$ tar xvf hbase-1.2.3-bin.tar.gz
Create logs folder i.e. C:\cygwin\root\usr\local\hbase-1.2.3\logs 


HBase uses the ./conf/hbase-default.xml file for configuration. Some properties do not resolve to existing directories because the JVM runs on Windows. This is the major issue to keep in mind when working with Cygwin: within the shell all paths are *nix-alike, hence relative to the root /. However, every parameter that is to be consumed within the windows processes themself, need to be Windows settings, hence C:\-alike. Change following propeties in the configuration file, adjusting paths where necessary to conform with your own installation:
  • hbase.rootdir must read e.g. file:///C:/cygwin/root/tmp/hbase/data or hdfs://127.0.0.1:9000/hbase in case of hadoop file system.
  • hbase.tmp.dir must read C:/cygwin/root/tmp/hbase/tmp
  • hbase.zookeeper.quorum must read 127.0.0.1 because for some reason localhost doesn't seem to resolve properly on Cygwin.
Make sure the configured hbase.rootdir and hbase.tmp.dir directories exist and have the proper rights set up e.g. by issuing a chmod 777 on them.

Go to  c:/cygwin/root/usr/local/hbase-1.2.3/conf and add the following in hbase-site.xml file. 

<configuration>
<property>
 <name>hbase.rootdir</name> 
 <!--<value>file:///C:/cygwin/root/tmp/hbase/data</value> -->
 <value>hdfs://127.0.0.1:9000/hbase</value>
</property>
<property>
 <name>hbase.zookeeper.quorum</name> 
 <value>127.0.0.1</value> 
</property>
<property>
 <name>hbase.tmp.dir</name> 
 <value>C://cygwin/root/tmp/hbase/tmp</value>
</property>
</configuration>

Add the following to hbase-env.sh file. 

export JAVA_HOME=/usr/local/java/
export HBASE_CLASSPATH=/cygwin/root/usr/local/hbase-1.2.3/lib/
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"
export HBASE_IDENT_STRING=$HOSTNAME

Start a Cygwin terminal, if you haven't already.
Please make sure hadoop is started before issuing hbase start command. Type jps to 
check if Hadoop daemon processes are running. 










Create hbase directory in hdfs.





Refer Hadoop-eclipse-plugin installation blog to create folder in hdfs using eclipse, if you haven't already.









Change directory to HBase installation using CD /usr/local/hbase-1.2.3.
Start HBase using the command sh start-hbase.sh
When prompted to accept the SSH fingerprint, answer yes.
When prompted, provide your password. Maybe multiple times.
When the command completes, the HBase server should have started.
However, to be absolutely certain, check the logs in the ./logs directory for any exceptions.













Zookeeper startup logs
2016-09-18 12:59:10,944 INFO  [main] server.ZooKeeperServer: Server environment:java.library.path=C:\java\jdk1.8.0_101\bin;C:\WINDOWS\Sun\Java\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\cygwin\root\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0;%JAVA_HOME%\bin;%CYGWIN_HOME%\bin;%HADOOP_BIN_PATH%;%HADOOP_HOME%\bin;%MAVEN_HOME%\bin;.
2016-09-18 12:59:10,944 INFO  [main] server.ZooKeeperServer: Server environment:java.io.tmpdir=C:\Users\Naveen\
2016-09-18 12:59:10,944 INFO  [main] server.ZooKeeperServer: Server environment:java.compiler=<NA>
2016-09-18 12:59:10,944 INFO  [main] server.ZooKeeperServer: Server environment:os.name=Windows 10
2016-09-18 12:59:10,944 INFO  [main] server.ZooKeeperServer: Server environment:os.arch=amd64
2016-09-18 12:59:10,944 INFO  [main] server.ZooKeeperServer: Server environment:os.version=10.0
2016-09-18 12:59:10,944 INFO  [main] server.ZooKeeperServer: Server environment:user.name=Naveen
2016-09-18 12:59:10,944 INFO  [main] server.ZooKeeperServer: Server environment:user.home=C:\Users\Naveen
2016-09-18 12:59:10,944 INFO  [main] server.ZooKeeperServer: Server environment:user.dir=C:\cygwin\root\usr\local\hbase-1.2.3
2016-09-18 12:59:10,957 INFO  [main] server.ZooKeeperServer: tickTime set to 3000
2016-09-18 12:59:10,957 INFO  [main] server.ZooKeeperServer: minSessionTimeout set to -1
2016-09-18 12:59:10,957 INFO  [main] server.ZooKeeperServer: maxSessionTimeout set to 90000
2016-09-18 12:59:11,316 INFO  [main] server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181

hbase startup logs
Sun, Sep 18, 2016 12:59:06 PM Starting master on Naveen
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 8
stack size              (kbytes, -s) 2032
cpu time               (seconds, -t) unlimited
max user processes              (-u) 256
virtual memory          (kbytes, -v) unlimited
2016-09-18 12:59:08,128 INFO  [main] util.VersionInfo: HBase 1.2.3
2016-09-18 12:59:08,129 INFO  [main] util.VersionInfo: Source code repository git://kalashnikov.att.net/Users/stack/checkouts/hbase.git.commit revision=bd63744624a26dc3350137b564fe746df7a721a4
.
.
.
.
.
2016-09-18 12:59:25,144 INFO  [regionserver/Naveen/192.168.56.1:0.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/naveen,59600,1474174753236/naveen%2C59600%2C1474174753236.default.1474174764214 with entries=2, filesize=303 B; new WAL /hbase/WALs/naveen,59600,1474174753236/naveen%2C59600%2C1474174753236.default.1474174764743
2016-09-18 12:59:25,242 INFO  [Naveen:59566.activeMasterManager] master.HMaster: Master has completed initialization
2016-09-18 12:59:25,244 INFO  [Naveen:59566.activeMasterManager] quotas.MasterQuotaManager: Quota support disabled
2016-09-18 12:59:25,245 INFO  [Naveen:59566.activeMasterManager] zookeeper.ZooKeeperWatcher: not a secure deployment, proceeding
2016-09-18 12:59:27,174 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: Stopping HBase metrics system...
2016-09-18 12:59:27,183 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system stopped.
2016-09-18 12:59:27,688 INFO  [HBase-Metrics2-1] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties
2016-09-18 12:59:27,693 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-09-18 12:59:27,693 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system started
2016-09-18 12:59:46,246 INFO  [WALProcedureStoreSyncThread] wal.WALProcedureStore: Remove log: hdfs://127.0.0.1:9000/hbase/MasterProcWALs/state-00000000000000000001.log
2016-09-18 12:59:46,246 INFO  [WALProcedureStoreSyncThread] wal.WALProcedureStore: Removed logs: [hdfs://127.0.0.1:9000/hbase/MasterProcWALs/state-00000000000000000002.log]

Type jps to check if HMaster daemon process  is running.











Next we start the HBase shell using the command sh hbase shell












Once after starting hbase, hdfs file system should show below directory structure.


















Now, lets play with some hbase commands.



We’ll start with a basic scan that returns all columns in the cars table.

Using a long column family name, such as columnfamily1 is a horrible idea in production. Every cell (i.e. every value) in HBase is stored fully qualified. This basically means that long column family names will balloon the amount of disk space required to store your data. In summary, keep your column family names as small as possible

To start, I’m going to create a new table named cars. My column family is vi, which is an abbreviation of vehicle information.

The schema that follows below is only for illustration purposes, and should not be used to create a production schema. In production, you should create a Row ID that helps to uniquely identify the row, and that is likely to be used in your queries. Therefore, one possibility would be to shift the Make, Model and Year left and use these items in the Row ID.

create 'cars', 'vi'
Let’s insert 3 column qualifies (make, model, year) and the associated values into the first row (row1).


put 'cars', 'row1', 'vi:make', 'bmw'
put 'cars', 'row1', 'vi:model', '5 series'
put 'cars', 'row1', 'vi:year', '2012'

Now let’s add a second row.

put 'cars', 'row2', 'vi:make', 'mercedes'
put 'cars', 'row2', 'vi:model', 'e class'
put 'cars', 'row2', 'vi:year', '2012'
List the tables using below commands

list


Scan a Table (i.e. Query a Table)

We’ll start with a basic scan that returns all columns in the cars table.

scan 'cars'
You should see output similar to:













Reading the output above you’ll notice that the Row ID is listed under ROW. The COLUMN+CELL field shows the column family after column=, then the column qualifier, a timestamp that is automatically created by HBase, and the value.

Importantly, each row in our results shows an individual row id + column family + column qualifier combination. Therefore, you’ll notice that multiple columns in a row are displayed in multiple rows in our results.

The next scan we’ll run will limit our results to the make column qualifier.

scan 'cars', {COLUMNS => ['vi:make']}
You should see output similar to:
















If you have a particularly large result set, you can limit the number of rows returned with the LIMIT option. In this example I arbitrarily limit the results to 1 row to demonstrate how LIMIT works.




scan 'cars', {COLUMNS => ['vi:make'], LIMIT => 1}
You should see output similar to:










Get One Row
The get command allows you to get one row of data at a time. You can optionally limit the number of columns returned. We’ll start by getting all columns in row1.


get 'cars', 'row1'
You should see output similar to:










When looking at the output above, you should notice how the results under COLUMN show the fully qualified column family:column qualifier, such as vi:make.

To get one specific column include the COLUMN option.

get 'cars', 'row1', {COLUMN => 'vi:model'}
You should see output similar 










You can also get two or more columns by passing an array of columns.

get 'cars', 'row1', {COLUMN => ['vi:model', 'vi:year']}
You should see output similar to:










Delete a Cell (Value)

delete 'cars', 'row2', 'vi:year'
Let’s check that our delete worked.

get 'cars', 'row2'
You should see output that shows 2 columns.












Disable and Delete a Table

disable 'cars'
drop 'cars'
You should see empty table list.











View HBase Command Help

help












Exit the HBase Shell

exit








To stop the HBase server issue the sh stop-hbase.sh command. And wait for it to complete!!! Killing the process might corrupt your data on disk.

$ sh stop-hbase.sh


2 comments:

  1. Nice blog to read.. Explanation are clear and step by step so easy to understand.. thanks a lot for sharing this blog to us

    big data training institute in tambaram | hadoop training in chennai tambaram | big data training in chennai tambaram

    ReplyDelete
  2. Thanks for your article. Its very helpful.Your explanation was very helpful to my project. Hadoop training in chennai | Hadoop Training institute in chennai

    ReplyDelete