【HBase】HBase的環境搭建及基本使用

1、HBase體系結構

2、HBase功能

HBase是一種Hadoop 資料庫,用於儲存資料和檢索資料。與RDBMS 相比,HBase可以儲存海量資料,資料條目數可達上億條,可以準實時檢索,檢索的速度達到秒級別。HBase是基於HDFS的,具有HDFS的優勢:存在多個副本,資料安全性高,普通商用PC或Server就可以,而RDBMS的伺服器都很貴。

3、HBase表的設計

HBase是一種列式儲存的資料庫,也是一種NOSQL資料庫(NOSQL = Not Only SQL),每一列可以存放多個版本的值,表中每條資料有唯一的識別符號,即rowkey,就是這一條資料的主鍵。
每條資料的構成格式:rowkey columnfamily column01 timestamp : value => cell。cell中用位元組陣列進行儲存,可使用工具類Bytes進行位元組陣列和其他型別的轉換。

4、HBase的安裝

(1)進入/opt/software/目錄,將hbase安裝包上傳虛擬機器。
(2)對HBase安裝包賦予執行許可權:
software]$ chmod u x hbase-0.98.6-hadoop2-bin.tar.gz
(3)解壓HBase安裝包:
software]$ tar -zxf hbase-0.98.6-hadoop2-bin.tar.gz -C /opt/modules/
(4)進入/opt/modules/hadoop-2.5.0目錄,啟動namenode和datanode。
(5)修改配置檔案/opt/modules/hbase-0.98.6-hadoop2/conf/hbase-site.xml。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop-senior.ibeifeng.com:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop-senior.ibeifeng.com</value>
</property>
</configuration>

(6)修改配置檔案/opt/modules/hbase-0.98.6-hadoop2/conf/hbase-env.sh。

export JAVA_HOME=/opt/modules/jdk1.7.0_67
# export HBASE_MANAGES_ZK=true

(7)修改配置檔案/opt/modules/hbase-0.98.6-hadoop2/conf/regionservers。

hadoop-senior.ibeifeng.com

(8)進入/opt/modules/hbase-0.98.6-hadoop2/lib目錄,hbase-0.98.6預設hadoop-2.2.0,換成我使用的hadoop版本hadoop-2.5.0。刪除lib目錄下的hadoop-2.2.0版本的所有jar包(以hadoop開頭的所有jar包都刪除),上傳hadoop-2.5.0版本,並將zookeeper-3.4.6.jar替換為zookeeper-3.4.5.jar:

[[email protected] lib]$ rm -rf ./hadoop-annotations-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-auth-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-common-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-hdfs-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-mapreduce-client-app-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-mapreduce-client-common-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-mapreduce-client-core-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-mapreduce-client-jobclient-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-mapreduce-client-shuffle-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-yarn-api-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-yarn-client-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-yarn-common-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-yarn-server-common-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-yarn-server-nodemanager-2.2.0.jar 
[[email protected] lib]$ rm -rf ./hadoop-client-2.2.0.jar 
[[email protected] lib]$ rm -rf ./zookeeper-3.4.6.jar 

(9)hbase啟動方式之一:進入/opt/modules/hbase-0.98.6-hadoop2目錄,啟動hbase程序,使用hbase自帶的zookeeper(我們已經將zookeeper-3.4.6.jar替換為zookeeper-3.4.5.jar):
hbase-0.98.6-hadoop2]$ bin/start-hbase.sh
檢視hbase程序:

[[email protected] hbase-0.98.6-hadoop2]$ jps
2813 HRegionServer
3162 Jps
2724 HMaster
2670 HQuorumPeer
2196 DataNode
2137 NameNode

(10)hbase啟動方式之二:啟動我們自己安裝的zookeeper,並分別啟動master和regionserver:
zookeeper-3.4.5]$ bin/zkServer.sh start
hbase-0.98.6-hadoop2]$ bin/hbase-daemon.sh start master
hbase-0.98.6-hadoop2]$ bin/hbase-daemon.sh start regionserver
檢視hbase程序:
[[email protected] hbase-0.98.6-hadoop2]$ jps

6283 QuorumPeerMain
6483 Jps
6334 HMaster
2196 DataNode
2137 NameNode
6431 HRegionServer

(11)停止hbase程序:
hbase-0.98.6-hadoop2]$ bin/stop-hbase.sh

5、HBase的基本使用

(1)啟動hbase shell命令列:
hbase-0.98.6-hadoop2]$ bin/hbase shell
(2)列出hbase中的表:
hbase(main):001:0> list

TABLE                                                                                                                                 
2018-07-22 11:46:58,921 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
0 row(s) in 3.0660 seconds
=> []

(3)建立表,表名user,列簇info:
hbase(main):002:0> create 'user','info'

0 row(s) in 0.6260 seconds
=> Hbase::Table - user

(4)查詢表user的資訊:
hbase(main):003:0> describe 'user'

DESCRIPTION                                                                            ENABLED                                        
'user', {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICA true                                           
TION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL =                                                
> 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false                                                
', BLOCKCACHE => 'true'}                                                                                                             
1 row(s) in 0.0700 seconds

(5)向表user中插入資料。表名user,rowkey為10001,列簇info,列名name等,cell值為zhangsan:
hbase(main):004:0> put 'user','10001','info:name','zhangsan'
hbase(main):005:0> put 'user','10001','info:age','25'
hbase(main):006:0> put 'user','10001','info:sex','male'
hbase(main):007:0> put 'user','10001','info:address','shanghai'

HBase中的資料查詢有三種方式:
1)依據rowkey查詢,這是最快的,使用get命令;
2)依據範圍查詢,這是最常用的,使用scan range命令;
3)全表掃描,這是最慢的,使用scan命令。

(6)查詢user表中列簇為10001的資訊:
hbase(main):008:0> get 'user','10001'

COLUMN                             CELL                                                                                               
info:address                      timestamp=1532231767144, value=shanghai                                                            
info:age                          timestamp=1532231729180, value=25                                                                  
info:name                         timestamp=1532231687833, value=zhangsan                                                            
info:sex                          timestamp=1532231746853, value=male                                                                
4 row(s) in 0.0300 seconds

查詢user表中列簇為10001,列名為name的資訊:
hbase(main):009:0> get 'user','10001','info:name'

COLUMN                             CELL                                                                                               
info:name                         timestamp=1532231687833, value=zhangsan                                                            
1 row(s) in 0.0160 seconds

(7)插入rowkey為10002的資訊:
hbase(main):010:0> put 'user','10002','info:name','wangwu'
hbase(main):011:0> put 'user','10002','info:age','30'
hbase(main):012:0> put 'user','10002','info:tel','25354212'
hbase(main):013:0> put 'user','10002','info:qq','232523551'
全表掃描user表:
hbase(main):014:0> scan 'user'

ROW                                COLUMN CELL                                                                                        
10001                             column=info:address, timestamp=1532231767144, value=shanghai                                       
10001                             column=info:age, timestamp=1532231729180, value=25                                                 
10001                             column=info:name, timestamp=1532231687833, value=zhangsan                                          
10001                             column=info:sex, timestamp=1532231746853, value=male                                               
10002                             column=info:age, timestamp=1532232249589, value=30                                                 
10002                             column=info:name, timestamp=1532232223162, value=wangwu                                            
10002                             column=info:qq, timestamp=1532232294714, value=232523551                                           
10002                             column=info:tel, timestamp=1532232273419, value=25354212                                           
2 row(s) in 0.0450 seconds

(8)插入user表中列簇為10003的資訊:
hbase(main):015:0> put 'user','10003','info:name','zhaoliu'
(9)範圍查詢:查詢user表中的name列和age列的資訊:
hbase(main):016:0> scan 'user',{COLUMNS => ['info:name','info:age']}

ROW                                COLUMN CELL                                                                                        
10001                             column=info:age, timestamp=1532231729180, value=25                                                 
10001                             column=info:name, timestamp=1532231687833, value=zhangsan                                          
10002                             column=info:age, timestamp=1532232249589, value=30                                                 
10002                             column=info:name, timestamp=1532232223162, value=wangwu                                            
10003                             column=info:name, timestamp=1532232516020, value=zhaoliu                                           
3 row(s) in 0.0410 seconds

(10)範圍查詢:查詢user表中起始rowkey為10002開始的行資訊:
hbase(main):017:0> scan 'user', {STARTROW=>'10002'}

ROW                                COLUMN CELL                                                                                        
10002                             column=info:age, timestamp=1532232249589, value=30                                                 
10002                             column=info:name, timestamp=1532232223162, value=wangwu                                            
10002                             column=info:qq, timestamp=1532232294714, value=232523551                                           
10002                             column=info:tel, timestamp=1532232273419, value=25354212                                           
10003                             column=info:name, timestamp=1532232516020, value=zhaoliu                                           
2 row(s) in 0.0340 seconds

(11)刪除user表中rowkey為10001,列簇為info,列名為name的列資料:
hbase(main):018:0> delete 'user','10001','info:name'
(12)全表掃描user表:
hbase(main):019:0> scan 'user'

ROW                                COLUMN CELL                                                                                        
10001                             column=info:address, timestamp=1532231767144, value=shanghai                                       
10001                             column=info:age, timestamp=1532231729180, value=25                                                 
10001                             column=info:sex, timestamp=1532231746853, value=male                                               
10002                             column=info:age, timestamp=1532232249589, value=30                                                 
10002                             column=info:name, timestamp=1532232223162, value=wangwu                                            
10002                             column=info:qq, timestamp=1532232294714, value=232523551                                           
10002                             column=info:tel, timestamp=1532232273419, value=25354212                                           
10003                             column=info:name, timestamp=1532232516020, value=zhaoliu                                           
3 row(s) in 0.0340 seconds

(13)刪除user表中rowkey為10001的全部資訊:
hbase(main):020:0> deleteall 'user','10001'
全表掃描user表:
hbase(main):021:0> scan 'user'

ROW                                COLUMN CELL                                                                                        
10002                             column=info:age, timestamp=1532232249589, value=30                                                 
10002                             column=info:name, timestamp=1532232223162, value=wangwu                                            
10002                             column=info:qq, timestamp=1532232294714, value=232523551                                           
10002                             column=info:tel, timestamp=1532232273419, value=25354212                                           
10003                             column=info:name, timestamp=1532232516020, value=zhaoliu                                           
2 row(s) in 0.0230 seconds

(14)禁用user表:
hbase(main):022:0> disable 'user'
(15)啟用user表:
hbase(main):023:0> enable 'user'
(16)刪除user表:
hbase(main):024:0> drop 'user'
(17)退出hbase shell命令列:
hbase(main):025:0> exit