採用importtsv匯入外部資料到hbase中
採用hbase自帶的importtsv工具來匯入資料,首先要把資料檔案上傳到hdfs上,然後匯入hbase表,該方法只能匯入tsv格式的資料,需要先將txt格式轉換為tsv格式.
1.下載資料。本文中使用 “美國國家海洋和大氣管理局 氣候平均值”的公共資料集合。訪問http://www1.ncdc.noaa.gov/pub/data/normals/1981-2010/下載。 在目錄 products | hourly 下的小時溫度資料。下載hly-temp-10pctl.txt檔案。
2.用python指令碼將txt檔案轉換為tsv格式檔案,生成之後上傳到虛擬機器linux下的/home/hadoop/data下,python指令碼下載地址https://github.com/uprush/hac-book/blob/master/2-data-migration/script/to_tsv_hly.py
D:\work\hbase學習>python to_tsv_hly.py -f hly-temp-10pctl.txt -t hly-temp-10pctl.tsv
3.在hdfs上建立資料存放目錄
hadoop fs -mkdir /input
4.將資料庫copy到hdfs資料存放目錄中
[hadoop@ora12c data]$ hadoop fs -copyFromLocal /home/hadoop/data/hly-temp-10pctl.tsv /input
[hadoop@ora12c data]$ hadoop fs -ls /input
Found 1 items
-rw-r--r-- 1 hadoop supergroup 22821236 2015-12-23 20:56 /input/hly-temp-10pctl.tsv
5.在hbase中建立要匯入資料的表
create 'hly_temp', {NAME => 't', VERSIONS => 1}
6.執行資料匯入
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,t:v01,t:v02,t:v03,t:v04,t:v05,t:v06,t:v07,t:v08,t:v09,t:v10,t:v11,t:v12,t:v13,t:v14,t:v15,t:v16,t:v17,t:v18,t:v19,t:v20,t:v21,t:v22,t:v23,t:v24 hly_temp /input
7.匯入後檢查資料
hbase(main):002:0> count 'hly_temp'
Current count: 1000, row: FMW000405040927
Current count: 2000, row: GQW000414150624
Current count: 3000, row: RMW000407100321
..................................................................
Current count: 166000, row: USW000949101017
166805 row(s) in 35.6090 seconds
=> 166805
hbase(main):004:0> scan 'hly_temp', {COLUMNS => 't', LIMIT => 2}
ROW COLUMN+CELL
AQW000617050101 column=t:v01, timestamp=1450924204703, value=759P
AQW000617050101 column=t:v02, timestamp=1450924204703, value=766C
AQW000617050101 column=t:v03, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v04, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v05, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v06, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v07, timestamp=1450924204703, value=752C
AQW000617050101 column=t:v08, timestamp=1450924204703, value=775C
AQW000617050101 column=t:v09, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v10, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v11, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v12, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v13, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v14, timestamp=1450924204703, value=808C
AQW000617050101 column=t:v15, timestamp=1450924204703, value=806C
AQW000617050101 column=t:v16, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v17, timestamp=1450924204703, value=808C
AQW000617050101 column=t:v18, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v19, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v20, timestamp=1450924204703, value=790C
AQW000617050101 column=t:v21, timestamp=1450924204703, value=781C
AQW000617050101 column=t:v22, timestamp=1450924204703, value=781C
AQW000617050101 column=t:v23, timestamp=1450924204703, value=770C
AQW000617050101 column=t:v24, timestamp=1450924204703, value=770C
AQW000617050102 column=t:v01, timestamp=1450924204703, value=768P
AQW000617050102 column=t:v02, timestamp=1450924204703, value=766C
AQW000617050102 column=t:v03, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v04, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v05, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v06, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v07, timestamp=1450924204703, value=757C
AQW000617050102 column=t:v08, timestamp=1450924204703, value=775C
AQW000617050102 column=t:v09, timestamp=1450924204703, value=801C
AQW000617050102 column=t:v10, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v11, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v12, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v13, timestamp=1450924204703, value=811C
AQW000617050102 column=t:v14, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v15, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v16, timestamp=1450924204703, value=808C
AQW000617050102 column=t:v17, timestamp=1450924204703, value=808C
AQW000617050102 column=t:v18, timestamp=1450924204703, value=801C
AQW000617050102 column=t:v19, timestamp=1450924204703, value=795C
AQW000617050102 column=t:v20, timestamp=1450924204703, value=790C
AQW000617050102 column=t:v21, timestamp=1450924204703, value=781C
AQW000617050102 column=t:v22, timestamp=1450924204703, value=781C
AQW000617050102 column=t:v23, timestamp=1450924204703, value=774C
AQW000617050102 column=t:v24, timestamp=1450924204703, value=770C
2 row(s) in 0.4910 seconds
至此,匯入成功!
1.下載資料。本文中使用 “美國國家海洋和大氣管理局 氣候平均值”的公共資料集合。訪問http://www1.ncdc.noaa.gov/pub/data/normals/1981-2010/下載。 在目錄 products | hourly 下的小時溫度資料。下載hly-temp-10pctl.txt檔案。
2.用python指令碼將txt檔案轉換為tsv格式檔案,生成之後上傳到虛擬機器linux下的/home/hadoop/data下,python指令碼下載地址https://github.com/uprush/hac-book/blob/master/2-data-migration/script/to_tsv_hly.py
D:\work\hbase學習>python to_tsv_hly.py -f hly-temp-10pctl.txt -t hly-temp-10pctl.tsv
3.在hdfs上建立資料存放目錄
hadoop fs -mkdir /input
4.將資料庫copy到hdfs資料存放目錄中
[hadoop@ora12c data]$ hadoop fs -copyFromLocal /home/hadoop/data/hly-temp-10pctl.tsv /input
[hadoop@ora12c data]$ hadoop fs -ls /input
Found 1 items
-rw-r--r-- 1 hadoop supergroup 22821236 2015-12-23 20:56 /input/hly-temp-10pctl.tsv
5.在hbase中建立要匯入資料的表
create 'hly_temp', {NAME => 't', VERSIONS => 1}
6.執行資料匯入
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,t:v01,t:v02,t:v03,t:v04,t:v05,t:v06,t:v07,t:v08,t:v09,t:v10,t:v11,t:v12,t:v13,t:v14,t:v15,t:v16,t:v17,t:v18,t:v19,t:v20,t:v21,t:v22,t:v23,t:v24 hly_temp /input
7.匯入後檢查資料
hbase(main):002:0> count 'hly_temp'
Current count: 1000, row: FMW000405040927
Current count: 2000, row: GQW000414150624
Current count: 3000, row: RMW000407100321
..................................................................
Current count: 166000, row: USW000949101017
166805 row(s) in 35.6090 seconds
=> 166805
hbase(main):004:0> scan 'hly_temp', {COLUMNS => 't', LIMIT => 2}
ROW COLUMN+CELL
AQW000617050101 column=t:v01, timestamp=1450924204703, value=759P
AQW000617050101 column=t:v02, timestamp=1450924204703, value=766C
AQW000617050101 column=t:v03, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v04, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v05, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v06, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v07, timestamp=1450924204703, value=752C
AQW000617050101 column=t:v08, timestamp=1450924204703, value=775C
AQW000617050101 column=t:v09, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v10, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v11, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v12, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v13, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v14, timestamp=1450924204703, value=808C
AQW000617050101 column=t:v15, timestamp=1450924204703, value=806C
AQW000617050101 column=t:v16, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v17, timestamp=1450924204703, value=808C
AQW000617050101 column=t:v18, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v19, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v20, timestamp=1450924204703, value=790C
AQW000617050101 column=t:v21, timestamp=1450924204703, value=781C
AQW000617050101 column=t:v22, timestamp=1450924204703, value=781C
AQW000617050101 column=t:v23, timestamp=1450924204703, value=770C
AQW000617050101 column=t:v24, timestamp=1450924204703, value=770C
AQW000617050102 column=t:v01, timestamp=1450924204703, value=768P
AQW000617050102 column=t:v02, timestamp=1450924204703, value=766C
AQW000617050102 column=t:v03, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v04, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v05, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v06, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v07, timestamp=1450924204703, value=757C
AQW000617050102 column=t:v08, timestamp=1450924204703, value=775C
AQW000617050102 column=t:v09, timestamp=1450924204703, value=801C
AQW000617050102 column=t:v10, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v11, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v12, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v13, timestamp=1450924204703, value=811C
AQW000617050102 column=t:v14, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v15, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v16, timestamp=1450924204703, value=808C
AQW000617050102 column=t:v17, timestamp=1450924204703, value=808C
AQW000617050102 column=t:v18, timestamp=1450924204703, value=801C
AQW000617050102 column=t:v19, timestamp=1450924204703, value=795C
AQW000617050102 column=t:v20, timestamp=1450924204703, value=790C
AQW000617050102 column=t:v21, timestamp=1450924204703, value=781C
AQW000617050102 column=t:v22, timestamp=1450924204703, value=781C
AQW000617050102 column=t:v23, timestamp=1450924204703, value=774C
AQW000617050102 column=t:v24, timestamp=1450924204703, value=770C
2 row(s) in 0.4910 seconds
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10972173/viewspace-1944769/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 如何從網頁中匯入外部資料到Excel2003網頁Excel
- SQLServer匯出匯入資料到MySQLServerMySql
- 使用sqlldr載入外部檔案中的資料到Oracle中(轉)SQLOracle
- 使用mysqlimport匯入資料到mysqlMySqlImport
- 匯入網頁資料到 Google Sheet網頁Go
- 從EXCEL匯入資料到SQL SERVERExcelSQLServer
- 使用sqlldr匯入文字資料到oracleSQLOracle
- [pb]從excel匯入資料到datawindowExcel
- Timesten匯入資料到Cache Group
- Python批量匯入Excel資料到MySQLPythonExcelMySql
- 使用load data匯入資料到mysqlMySql
- 資料匯入終章:如何將HBase的資料匯入HDFS?
- 使用sqlload匯入外部資料SQL
- 【Mysql】匯出資料到excel檔案中MySqlExcel
- 採用POI技術進行大資料匯入大資料
- 利用Excel匯入資料到SAP C4CExcel
- 匯入表的部分資料到測試環境
- 資料庫SQL Server DAC 匯入匯出資料到SQL Azure問題資料庫SQLServer
- 如何將外部資料庫 匯入到系統的SQL中資料庫SQL
- windows下把資料從oracle匯入hbaseWindowsOracle
- asp.net中利用NPOI匯出資料到excel中ASP.NETExcel
- SSIS中匯入Excel資料到資料庫的一個常見小問題Excel資料庫
- pandas的外部資料匯入與常用方法
- 用Perl從oracle匯出百萬級資料到excelOracleExcel
- 使用外部表關聯MySQL資料到OracleMySqlOracle
- Oracle imp 匯入資料到另一個表空間Oracle
- vb用陣列方式快速匯出MSFlexGrid表格資料到Excel表格中陣列FlexExcel
- 【匯出匯入】% 在匯入匯出中的應用。
- spark sql與mysql 資料載入與匯出資料到mysqlSparkMySql
- greenplum使用gpfdist與外部表高效匯入資料
- 利用NPOI匯出資料到Execl
- 使用Excel匯入資料到SAP Cloud for Customer系統ExcelCloud
- Java 匯入資料到Excel並提供檔案下載介面JavaExcel
- 資料泵在本地匯出資料到遠端資料庫中資料庫
- c# 採用datatable 快速匯入資料至MSSQL的方法分享C#SQL
- oracle外部表實現大資料量的Excel匯入Oracle大資料Excel
- Sqoop匯出ClickHouse資料到HiveOOPHive
- PHP匯出大量資料到excel表格PHPExcel