HBase備份之ExportSnapshot或CopyTable

_iAm333發表於2014-08-13

文章《HBase備份之匯入匯出》介紹了使用HBase的自帶工具Export和Import來實現在主叢集和從叢集之間拷貝表的目的。本篇介紹一種相比匯入匯出而言，更快速的一種備份辦法。即ExportSnapshot。

1、ExportSnapshot

和Export類似，ExportSnapshot也是使用MapReduce方式來進行表的拷貝。不過和Export不同，ExportSnapshot匯出的是表的快照。我們可以使用ExportSnapshot將表的快照資料先匯出到從叢集，然後再從叢集中使用restore_snapshot命令恢復快照，即可實現表在主從叢集之間的複製工作。具體的操作步驟如下：

1）在主叢集中為表建立快照

$ cd $HBASE_HOME/  
$ bin/hbase shell  
2014-08-13 15:59:12,495 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.2-hadoop2, r1591526, Wed Apr 30 20:17:33 PDT 2014

hbase(main):001:0> snapshot 'test_table', 'test_table_snapshot'
0 row(s) in 0.3370 seconds

2）使用ExportSnapshot命令匯出快照資料

$ cd $HBASE_HOME/  
$ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot test_table_snapshot -copy-to hdfs://follow_cluster_namenode:8082/hbase

其中，test_table_snapshot為剛建的快照名，hdfs://follow_cluster_namenode:8082/hbase為從叢集的hbase的hdfs根目錄的全路徑。

ExportSnapshot命令也可以限定mapper個數，如下：

$ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot test_table_snapshot -copy-to hdfs://follow_cluster_namenode:8082/hbase -mapers n

還可以限定拷貝的流量，如下：

$ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot test_table_snapshot -copy-to hdfs://follow_cluster_namenode:8082/hbase -mapers n -bandwidth 200

上面的例子將拷貝的流量限定為200M。

執行ExportSnapshot命令之後的輸出很長，部分如下：

2014-08-13 16:08:26,318 INFO  [main] mapreduce.Job: Running job: job_1407910396081_0027
2014-08-13 16:08:33,494 INFO  [main] mapreduce.Job: Job job_1407910396081_0027 running in uber mode : false
2014-08-13 16:08:33,495 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2014-08-13 16:08:41,567 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2014-08-13 16:08:42,581 INFO  [main] mapreduce.Job: Job job_1407910396081_0027 completed successfully
2014-08-13 16:08:42,677 INFO  [main] mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=116030
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1386
		HDFS: Number of bytes written=988
		HDFS: Number of read operations=7
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Job Counters 
		Launched map tasks=1
		Rack-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=13518
		Total time spent by all reduces in occupied slots (ms)=0
	Map-Reduce Framework
		Map input records=1
		Map output records=0
		Input split bytes=174
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=23
		CPU time spent (ms)=1860
		Physical memory (bytes) snapshot=323575808
		Virtual memory (bytes) snapshot=1867042816
		Total committed heap usage (bytes)=1029177344
	org.apache.hadoop.hbase.snapshot.ExportSnapshot$Counter
		BYTES_COPIED=988
		BYTES_EXPECTED=988
		FILES_COPIED=1
	File Input Format Counters 
		Bytes Read=224
	File Output Format Counters 
		Bytes Written=0
2014-08-13 16:08:42,685 INFO  [main] snapshot.ExportSnapshot: Finalize the Snapshot Export
2014-08-13 16:08:42,697 INFO  [main] snapshot.ExportSnapshot: Verify snapshot validity
2014-08-13 16:08:42,698 INFO  [main] Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
2014-08-13 16:08:42,713 INFO  [main] snapshot.ExportSnapshot: Export Completed: test_table_snapshot

3）到從叢集中恢復快照

$ cd $HBASE_HOME/  
$ bin/hbase shell  
2014-08-13 16:16:13,817 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.2-hadoop2, r1591526, Wed Apr 30 20:17:33 PDT 2014

hbase(main):001:0> restore_snapshot 'test_table_snapshot'
0 row(s) in 16.4940 seconds

4）檢視錶是否恢復成功

hbase(main):002:0> list
TABLE                                                                                                                               test_table                                                                                                                                                      
1 row(s) in 1.0460 seconds

=> ["test_table"]

另外，還可以通過scan或count命令進行檢驗。

快照恢復操作一般會很快，相比較Export和Import需要匯出和匯入兩次MapReduce任務才能完成表的複製來講，使用ExportSnapshot會快很多。

2、CopyTable

首先，看一下CopyTable命令的使用方法

$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable
Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>

Options:
 rs.class     hbase.regionserver.class of the peer cluster
              specify if different from current cluster
 rs.impl      hbase.regionserver.impl of the peer cluster
 startrow     the start row
 stoprow      the stop row
 starttime    beginning of the time range (unixtime in millis)
              without endtime means from starttime to forever
 endtime      end of the time range.  Ignored if no starttime specified.
 versions     number of cell versions to copy
 new.name     new table's name
 peer.adr     Address of the peer cluster given in the format
              hbase.zookeeer.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
 families     comma-separated list of families to copy
              To copy from cf1 to cf2, give sourceCfName:destCfName. 
              To keep the same name, just give "cfName"
 all.cells    also copy delete markers and deleted cells

Args:
 tablename    Name of the table to copy

Examples:
 To copy 'TestTable' to a cluster that uses replication for a 1 hour window:
 $ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable 
For performance consider the following general options:
-Dhbase.client.scanner.caching=100
-Dmapred.map.tasks.speculative.execution=false

可以看到，它支援設定需要複製的表的時間範圍，cell的版本，也可以指定列簇，設定從叢集的地址等。

對於上面的test_table表，我們可以使用如下命令進行拷貝：

$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=slave1,slave2,slave3:2181:/hbase  test_table

注意：在使用上述語句之前，需要在從叢集建立一個模式和主叢集表test_table相同的表。

使用上述語句的部分執行結果如下：

2014-08-13 16:18:21,812 INFO  [main] mapreduce.Job: Running job: job_1407910396081_0062
2014-08-13 16:18:29,955 INFO  [main] mapreduce.Job: Job job_1407910396081_0062 running in uber mode : false
2014-08-13 16:18:29,957 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2014-08-13 16:18:36,005 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2014-08-13 16:18:37,029 INFO  [main] mapreduce.Job: Job job_1407910396081_0062 completed successfully
2014-08-13 16:18:37,137 INFO  [main] mapreduce.Job: Counters: 37
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=117527
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=88
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=1
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=0
	Job Counters 
		Launched map tasks=1
		Rack-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=9740
		Total time spent by all reduces in occupied slots (ms)=0
	Map-Reduce Framework
		Map input records=1
		Map output records=1
		Input split bytes=88
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=254
		CPU time spent (ms)=1810
		Physical memory (bytes) snapshot=345137152
		Virtual memory (bytes) snapshot=1841782784
		Total committed heap usage (bytes)=1029177344
	HBase Counters
		BYTES_IN_REMOTE_RESULTS=34
		BYTES_IN_RESULTS=34
		MILLIS_BETWEEN_NEXTS=254
		NOT_SERVING_REGION_EXCEPTION=0
		NUM_SCANNER_RESTARTS=0
		REGIONS_SCANNED=1
		REMOTE_RPC_CALLS=3
		REMOTE_RPC_RETRIES=0
		RPC_CALLS=3
		RPC_RETRIES=0
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=0

然後，就可以對比主叢集中的表和從叢集中對應的表資料是否一致。

轉載請註明出處：http://blog.csdn.net/iAm333

賦能雲HBase備份恢復百T級別資料量備份恢復支援
2019-01-03
hbase之hbase shell
2018-10-09
oracle資料庫備份之exp增量備份
2019-03-28
Oracle資料庫
RAC備份恢復之Voting備份與恢復
2019-05-21
備份集和備份片之間的關係
2022-03-18
揭秘ORACLE備份之----RMAN之五（CATALOG）
2018-06-06
Oracle
Oracle 備份恢復之 Flashback
2018-06-27
Oracle
MySQL備份遷移之mydumper
2022-04-25
MySql
PG備份之pg_basebackup工具
2024-03-16
MySQL運維實戰之備份和恢復（8.1）xtrabackup全量備份
2024-03-01
MySql運維
Oracle 備份恢復篇之RMAN catalog
2018-06-27
Oracle
Expdp 備份到ASM之 ORA-39070
2020-06-08
ASM
oracle邏輯備份之--資料泵
2018-03-02
Oracle
BugKu之備份是個好習慣
2021-11-05
Kubernetes備份恢復之velero實戰
2021-09-09
MySQL之許可權管理和備份
2020-12-05
MySql
mysql的冷備份與熱備份
2021-05-15
MySql
mysqldump 備份匯出資料排除某張表或多張表
2018-11-27
MySql
在Linux中，如何進行備份或歸檔檔案（tar 命令）？
2024-06-12
Linux
天翼云云主機快照、雲硬碟備份、雲主機備份之間的區別
2022-04-06
硬碟
初探MySQL資料備份及備份原理
2018-09-05
MySql
Mysql備份與恢復(1)---物理備份
2020-01-09
MySql
HBase （三）之 API的使用
2020-09-26
API
alias 備份
2019-04-05
備份docker
2024-08-17
Docker
備份命令
2022-11-03
NBU指令碼：檢視啟用或未啟用加速的備份策略
2024-07-03
指令碼
mysql學習筆記之備份與恢復
2020-03-26
MySql筆記
Mysql備份與恢復(2)---邏輯備份
2020-01-09
MySql
ManagerDB 備份檔案管理與異地備份
2020-05-18
群暉NAS備份建議及備份方式
2023-02-20
GitLab的自動備份、清理備份與恢復
2019-01-17
Gitlab
redis不重啟，切換RDB備份到AOF備份
2018-12-03
Redis
熱備份/冷備份/ 冷啟動/熱啟動
2018-08-20
MySQL的冷備份和熱備份概念理解（轉）
2018-05-16
MySql
【RMAN】同時建立多個備份（建立多重備份）
2018-06-27
oracle 如何不備份已經備份的歸檔
2020-05-17
Oracle
MySQL備份與主備配置
2019-05-10
MySql
MySQL5.7新特性之備份工具mysqlpump的使用
2018-06-19
MySql

HBase備份之ExportSnapshot或CopyTable

相關文章