一：環境準備

伺服器：hadoop-master,hadoop-slave01,hadoop-slave02
hbase版本：hbase-0.94.17.tar.gz

HBase自帶了Zookeeper，但為了方便其他應用程式使用Zookeeper 在這裡使用單獨安裝的Zookeeper叢集。

Hadoop叢集，Zookeeper叢集，HBase叢集是三個互相獨立的叢集，並不需要部署在相同的物理節點上，他們之間是透過網路通訊的。

二：安裝
tar -zxvf hbase-0.94.17.tar.gz -C /usr/local
chown -R hadoop:hadoop /usr/local/hbase-0.94.17

hbase的版本需要與hadoop對應，檢視是否對應只需要看hbase-0.94.17/lib/hadoop-core後面的版本號是否與hadoop的版本對應，如果不對應，可以將hadoop下hadoop-core檔案複製過來，但是不能保證不會有問題

三：引數配置

切換到hadoop使用者下

su - hadoop

1.配置hbase-env.sh檔案（/usr/local/hbase-0.94.17/conf）

export JAVA_HOME=/usr/local/jdk1.7.0_40

export HBASE_CLASSPATH=/usr/local/hadoop-1.2.1/conf

export HBASE_MANAGES_ZK=false

export HBASE_HEAPSIZE=2048

註釋：
HBASE_CLASSPATH指向存放有Hadoop配置檔案的目錄，這樣HBase可以找到HDFS

的配置資訊，由於本文Hadoop和HBase部署在相同的物理節點，所以就指向了Hadoop安

裝路徑下的conf目錄。HBASE_HEAPSIZE單位為MB，可以根據需要和實際剩餘記憶體設定，

預設為1000。HBASE_MANAGES_ZK=false指示HBase使用已有的Zookeeper而不是自帶的。

2.配置 hbase-site.xml檔案
將hbase-0.94.17/src/main/resources/hbasse-default.xml檔案複製到conf的hbase-site.xml檔案當中，修改如下配置

hbase.rootdir</name>

hdfs://hadoop-master:9000/hbase

The directory shared by region servers.

hbase.hregion.max.filesize

1073741824

Maximum HStoreFile size. If any one of a column families' HStoreFiles has

grown to exceed this value, the hosting HRegion is split in two.

Default: 256M.

hbase.hregion.memstore.flush.size

134217728

Memstore will be flushed to disk if size of the memstore

exceeds this number of bytes. Value is checked by a thread that runs

every hbase.server.thread.wakefrequency.

hbase.cluster.distributed

true

The mode the cluster will be in. Possible values are

false: standalone and pseudo-distributed setups with managed Zookeeper

true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)

hbase.zookeeper.property.clientPort

2181

Property from ZooKeeper's config zoo.cfg.

The port at which the clients will connect.

zookeeper.session.timeout

120000

hbase.zookeeper.property.tickTime

6000

hbase.zookeeper.quorum

hadoop-maste,hadoop-slave01,hadoop-02

Comma separated list of servers in the ZooKeeper Quorum.

For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".

By default this is set to localhost for local and pseudo-distributed modes

of operation. For a fully-distributed setup, this should be set to a full

list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh

this is the list of servers which we will start/stop ZooKeeper on.

hbase.tmp.dir

/data/hadoop/hbase

    hbase.defaults.for.version.skip
    true

    Set to true to skip the 'hbase.defaults.for.version' check.
    Setting this to true can be useful in contexts other than
    the other side of a maven generation; i.e. running in an
    ide. You'll want to set this boolean to true to avoid
    seeing the RuntimException complaint: "hbase-default.xml file
    seems to be for and old version of HBase (@@@VERSION@@@), this
    version is X.X.X-SNAPSHOT"

註釋：

1，hbase.rootdir：hbase所使用的檔案系統為HDFS，根目錄為hdfs://node0:9000/hbase，該目錄應該由HBase自動建立，只需要指定到正確的HDFS NameNode上即可。

2，hbase.hregion.max.filesize：設定HStoreFile的大小，當大於這個數時，就會split 成兩個檔案

3，hbase.hregion.memstore.flush.size：設定memstore的大小，當大於這個值時，寫入磁碟

4，hbase.cluster.distributed：指定hbase為分散式模式

5，hbase.zookeeper.property.clientPort：指定zk的連線埠

6，zookeeper.session.timeout：RegionServer與Zookeeper間的連線超時時間。當超時時間到後，ReigonServer會被Zookeeper從RS叢集清單中移除，HMaster收到移除通知後，會對這臺server負責的regions重新balance，讓其他存活的RegionServer接管.

7，hbase.zookeeper.property.tickTime：

8，hbase.zookeeper.quorum：預設值是 localhost，列出zookeepr的ensemble servers

9，hbase.regionserver.handler.count：
預設值：10
說明：RegionServer的請求處理IO執行緒數。
調優：
這個引數的調優與記憶體息息相關。
較少的IO執行緒，適用於處理單次請求記憶體消耗較高的Big PUT場景（大容量單次PUT或設定了較大cache的scan，均屬於Big PUT）或ReigonServer的記憶體比較緊張的場景。
較多的IO執行緒，適用於單次請求記憶體消耗低，TPS要求非常高的場景。設定該值的時候，以監控記憶體為主要參考。
這裡需要注意的是如果server的region數量很少，大量的請求都落在一個region上，因快速充滿memstore觸發flush導致的讀寫鎖會影響全域性TPS，不是IO執行緒數越高越好。
壓測時，開啟Enabling RPC-level logging，可以同時監控每次請求的記憶體消耗和GC的狀況，最後透過多次壓測結果來合理調節IO執行緒數。
這裡是一個案例?Hadoop and HBase Optimization for Read Intensive Search Applications，作者在SSD的機器上設定IO執行緒數為100，僅供參考。

10，hbase.tmp.dir：指定HBase將後設資料存放路徑

3.配置regionservers

hadoop-slave01

hadoop-slave02

四：啟動與關閉
bin/start-hbase.sh
bin/stop-hbase.sh

五：測試

bin/hbase shell

HBase Shell; enter 'help' for list of supported commands.

Type "exit" to leave the HBase Shell

Version 0.94.12, r1524863, Fri Sep 20 04:44:41 UTC 2013

hbase(main):001:0>

建立一個名為 tbTest的表，這個表只有一個 column family 為 testF。可以列出所有的表來檢查建立情況，然後插入些值。

hbase(main):003:0> create 'tbTest', 'testF
0 row(s) in 1.2200 seconds
hbase(main):003:0> list
tbTest
1 row(s) in 0.0550 seconds

六：監控

用於訪問和監控Hadoop系統執行狀態

	Daemon	預設埠	配置引數
HDFS	Namenode	50070	dfs.http.address
	Datanodes	50075	dfs.datanode.http.address
	Secondarynamenode	50090	dfs.secondary.http.address
	Backup/Checkpoint node*	50105	dfs.backup.http.address
MR	Jobracker	50030	mapred.job.tracker.http.address
MR	Tasktrackers	50060	mapred.task.tracker.http.address
HBase	HMaster	60010	hbase.master.info.port
HBase	HRegionServer	60030	hbase.regionserver.info.port

Linux Hbase安裝

相關文章