背景：現將hive 資料從aws遷移至阿里雲

按照網上的方法踩了大坑最後總結了區別

分兩種情況：

事務引數：TBLPROPERTIES ('transactional'='true')
#########建表開啟事務的情況下
1 方法一export/import
export table collection_server_have_trans  to '/tmp/collection_server_have_trans/';
scp 到另一個叢集下
import from '/tmp/collection_server_have_trans';
結果：讀不出資料
2 方法二：
直接get hdfs，複製到另一個叢集，然後get上傳，然後repair
結果，讀不出資料
3 將上述兩個辦法在一個叢集內做實驗是可以的，難道是跨叢集導致的問題？還是版本的原因呢？
#########建表不開啟事務的情況下
1 方法一export/import
export table collection_server_have_trans  to '/tmp/collection_server_have_trans/';
scp 到另一個叢集下
import from '/tmp/collection_server_have_trans';
結果：讀的出資料
2方法二：
直接複製分割槽，上傳 然後repair
結果，讀的出資料
3。將上述兩個辦法在一個叢集內做實驗也是可以的

透過上面的實驗對比不難得出結論：

在 開啟事務 的情況下：本叢集內export/import 複製hdfs的方法都可用，跨叢集不可使用該方法
在 不開啟事務 的情況下：本叢集內export/import 複製hdfs的方法都可用，跨叢集也可使用該方法

具體操作流程如下：

1.登入目前hive，匯出資料

源端匯出資料
export table collection_server to '/tmp/collection_server' ;
檢視
[hadoop@ip-172-31-33-52 tmp]$ hadoop fs -ls /tmp/collection_server
Found 7 items
-rwxrwxrwt   1 hadoop hadoop      12971 2018-08-21 07:37 /tmp/collection_server/_metadata
drwxrwxrwt   - hadoop hadoop          0 2018-08-21 07:37 /tmp/collection_server/partition_day=180816
drwxrwxrwt   - hadoop hadoop          0 2018-08-21 07:37 /tmp/collection_server/partition_day=180817
drwxrwxrwt   - hadoop hadoop          0 2018-08-21 07:37 /tmp/collection_server/partition_day=180818
drwxrwxrwt   - hadoop hadoop          0 2018-08-21 07:37 /tmp/collection_server/partition_day=180821
drwxrwxrwt   - hadoop hadoop          0 2018-08-21 07:37 /tmp/collection_server/partition_day=180823
drwxrwxrwt   - hadoop hadoop          0 2018-08-21 07:37 /tmp/collection_server/partition_day=180824

2.從hdfs中下載並複製至阿里雲hfds

hfds檔案系統下載到本地
[hadoop@ip-172-31-33-52 tmp]$ hadoop fs -get /tmp/collection_server /tmp/
複製：略
上傳至阿里雲hdfs檔案系統
[hadoop@ip-172-31-46-220 ~]$ hadoop fs -put collection_server /tmp/
檢視：
[hadoop@ip-172-31-46-220 ~]$ hadoop fs -ls /tmp/
Found 4 items
drwxr-xr-x   - hadoop hadoop          0 2018-08-21 07:47 /tmp/collection_server
drwxrwxrwx   - mapred mapred          0 2018-06-22 09:04 /tmp/hadoop-yarn
drwx-wx-wx   - hive   hadoop          0 2018-07-20 09:43 /tmp/hive
-rw-r--r--   1 hadoop hadoop         22 2018-06-27 05:40 /tmp/tt.txt

3.匯入新的阿里雲hive叢集中

hive> import from '/tmp/collection_server';
Copying data from hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816/delta_0000001_0000010
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816/delta_0000011_0000020
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816/delta_0000021_0000030
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816/delta_0000031_0000040
Copying data from hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000041_0000050
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000051_0000060
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000061_0000070
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000071_0000080
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000081_0000090
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000091_0000100
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000101_0000110
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000111_0000120
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000121_0000130
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000131_0000140
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000141_0000150
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000151_0000160
Copying data from hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180818
檢查：
hive> select count(*) from collection_server;
Query ID = hadoop_20180821074918_5ad9d0e5-9bce-4c52-b1dc-93ddd9ef766f
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1529658302761_18066)
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED     30         30        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 7.60 s     
----------------------------------------------------------------------------------------------
OK
21570

更多遷移方法參考：

https://blog.csdn.net/u9999/article/details/78830818

【Hive】hive資料遷移

相關文章