採用importtsv匯入外部資料到hbase中
採用hbase自帶的importtsv工具來匯入資料,首先要把資料檔案上傳到hdfs上,然後匯入hbase表,該方法只能匯入tsv格式的資料,需要先將txt格式轉換為tsv格式.
1.下載資料。本文中使用 “美國國家海洋和大氣管理局 氣候平均值”的公共資料集合。訪問http://www1.ncdc.noaa.gov/pub/data/normals/1981-2010/下載。 在目錄 products | hourly 下的小時溫度資料。下載hly-temp-10pctl.txt檔案。
2.用python指令碼將txt檔案轉換為tsv格式檔案,生成之後上傳到虛擬機器linux下的/home/hadoop/data下,python指令碼下載地址https://github.com/uprush/hac-book/blob/master/2-data-migration/script/to_tsv_hly.py
D:\work\hbase學習>python to_tsv_hly.py -f hly-temp-10pctl.txt -t hly-temp-10pctl.tsv
3.在hdfs上建立資料存放目錄
hadoop fs -mkdir /input
4.將資料庫copy到hdfs資料存放目錄中
[hadoop@ora12c data]$ hadoop fs -copyFromLocal /home/hadoop/data/hly-temp-10pctl.tsv /input
[hadoop@ora12c data]$ hadoop fs -ls /input
Found 1 items
-rw-r--r-- 1 hadoop supergroup 22821236 2015-12-23 20:56 /input/hly-temp-10pctl.tsv
5.在hbase中建立要匯入資料的表
create 'hly_temp', {NAME => 't', VERSIONS => 1}
6.執行資料匯入
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,t:v01,t:v02,t:v03,t:v04,t:v05,t:v06,t:v07,t:v08,t:v09,t:v10,t:v11,t:v12,t:v13,t:v14,t:v15,t:v16,t:v17,t:v18,t:v19,t:v20,t:v21,t:v22,t:v23,t:v24 hly_temp /input
7.匯入後檢查資料
hbase(main):002:0> count 'hly_temp'
Current count: 1000, row: FMW000405040927
Current count: 2000, row: GQW000414150624
Current count: 3000, row: RMW000407100321
..................................................................
Current count: 166000, row: USW000949101017
166805 row(s) in 35.6090 seconds
=> 166805
hbase(main):004:0> scan 'hly_temp', {COLUMNS => 't', LIMIT => 2}
ROW COLUMN+CELL
AQW000617050101 column=t:v01, timestamp=1450924204703, value=759P
AQW000617050101 column=t:v02, timestamp=1450924204703, value=766C
AQW000617050101 column=t:v03, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v04, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v05, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v06, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v07, timestamp=1450924204703, value=752C
AQW000617050101 column=t:v08, timestamp=1450924204703, value=775C
AQW000617050101 column=t:v09, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v10, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v11, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v12, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v13, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v14, timestamp=1450924204703, value=808C
AQW000617050101 column=t:v15, timestamp=1450924204703, value=806C
AQW000617050101 column=t:v16, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v17, timestamp=1450924204703, value=808C
AQW000617050101 column=t:v18, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v19, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v20, timestamp=1450924204703, value=790C
AQW000617050101 column=t:v21, timestamp=1450924204703, value=781C
AQW000617050101 column=t:v22, timestamp=1450924204703, value=781C
AQW000617050101 column=t:v23, timestamp=1450924204703, value=770C
AQW000617050101 column=t:v24, timestamp=1450924204703, value=770C
AQW000617050102 column=t:v01, timestamp=1450924204703, value=768P
AQW000617050102 column=t:v02, timestamp=1450924204703, value=766C
AQW000617050102 column=t:v03, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v04, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v05, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v06, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v07, timestamp=1450924204703, value=757C
AQW000617050102 column=t:v08, timestamp=1450924204703, value=775C
AQW000617050102 column=t:v09, timestamp=1450924204703, value=801C
AQW000617050102 column=t:v10, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v11, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v12, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v13, timestamp=1450924204703, value=811C
AQW000617050102 column=t:v14, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v15, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v16, timestamp=1450924204703, value=808C
AQW000617050102 column=t:v17, timestamp=1450924204703, value=808C
AQW000617050102 column=t:v18, timestamp=1450924204703, value=801C
AQW000617050102 column=t:v19, timestamp=1450924204703, value=795C
AQW000617050102 column=t:v20, timestamp=1450924204703, value=790C
AQW000617050102 column=t:v21, timestamp=1450924204703, value=781C
AQW000617050102 column=t:v22, timestamp=1450924204703, value=781C
AQW000617050102 column=t:v23, timestamp=1450924204703, value=774C
AQW000617050102 column=t:v24, timestamp=1450924204703, value=770C
2 row(s) in 0.4910 seconds
至此,匯入成功!
1.下載資料。本文中使用 “美國國家海洋和大氣管理局 氣候平均值”的公共資料集合。訪問http://www1.ncdc.noaa.gov/pub/data/normals/1981-2010/下載。 在目錄 products | hourly 下的小時溫度資料。下載hly-temp-10pctl.txt檔案。
2.用python指令碼將txt檔案轉換為tsv格式檔案,生成之後上傳到虛擬機器linux下的/home/hadoop/data下,python指令碼下載地址https://github.com/uprush/hac-book/blob/master/2-data-migration/script/to_tsv_hly.py
D:\work\hbase學習>python to_tsv_hly.py -f hly-temp-10pctl.txt -t hly-temp-10pctl.tsv
3.在hdfs上建立資料存放目錄
hadoop fs -mkdir /input
4.將資料庫copy到hdfs資料存放目錄中
[hadoop@ora12c data]$ hadoop fs -copyFromLocal /home/hadoop/data/hly-temp-10pctl.tsv /input
[hadoop@ora12c data]$ hadoop fs -ls /input
Found 1 items
-rw-r--r-- 1 hadoop supergroup 22821236 2015-12-23 20:56 /input/hly-temp-10pctl.tsv
5.在hbase中建立要匯入資料的表
create 'hly_temp', {NAME => 't', VERSIONS => 1}
6.執行資料匯入
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,t:v01,t:v02,t:v03,t:v04,t:v05,t:v06,t:v07,t:v08,t:v09,t:v10,t:v11,t:v12,t:v13,t:v14,t:v15,t:v16,t:v17,t:v18,t:v19,t:v20,t:v21,t:v22,t:v23,t:v24 hly_temp /input
7.匯入後檢查資料
hbase(main):002:0> count 'hly_temp'
Current count: 1000, row: FMW000405040927
Current count: 2000, row: GQW000414150624
Current count: 3000, row: RMW000407100321
..................................................................
Current count: 166000, row: USW000949101017
166805 row(s) in 35.6090 seconds
=> 166805
hbase(main):004:0> scan 'hly_temp', {COLUMNS => 't', LIMIT => 2}
ROW COLUMN+CELL
AQW000617050101 column=t:v01, timestamp=1450924204703, value=759P
AQW000617050101 column=t:v02, timestamp=1450924204703, value=766C
AQW000617050101 column=t:v03, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v04, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v05, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v06, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v07, timestamp=1450924204703, value=752C
AQW000617050101 column=t:v08, timestamp=1450924204703, value=775C
AQW000617050101 column=t:v09, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v10, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v11, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v12, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v13, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v14, timestamp=1450924204703, value=808C
AQW000617050101 column=t:v15, timestamp=1450924204703, value=806C
AQW000617050101 column=t:v16, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v17, timestamp=1450924204703, value=808C
AQW000617050101 column=t:v18, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v19, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v20, timestamp=1450924204703, value=790C
AQW000617050101 column=t:v21, timestamp=1450924204703, value=781C
AQW000617050101 column=t:v22, timestamp=1450924204703, value=781C
AQW000617050101 column=t:v23, timestamp=1450924204703, value=770C
AQW000617050101 column=t:v24, timestamp=1450924204703, value=770C
AQW000617050102 column=t:v01, timestamp=1450924204703, value=768P
AQW000617050102 column=t:v02, timestamp=1450924204703, value=766C
AQW000617050102 column=t:v03, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v04, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v05, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v06, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v07, timestamp=1450924204703, value=757C
AQW000617050102 column=t:v08, timestamp=1450924204703, value=775C
AQW000617050102 column=t:v09, timestamp=1450924204703, value=801C
AQW000617050102 column=t:v10, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v11, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v12, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v13, timestamp=1450924204703, value=811C
AQW000617050102 column=t:v14, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v15, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v16, timestamp=1450924204703, value=808C
AQW000617050102 column=t:v17, timestamp=1450924204703, value=808C
AQW000617050102 column=t:v18, timestamp=1450924204703, value=801C
AQW000617050102 column=t:v19, timestamp=1450924204703, value=795C
AQW000617050102 column=t:v20, timestamp=1450924204703, value=790C
AQW000617050102 column=t:v21, timestamp=1450924204703, value=781C
AQW000617050102 column=t:v22, timestamp=1450924204703, value=781C
AQW000617050102 column=t:v23, timestamp=1450924204703, value=774C
AQW000617050102 column=t:v24, timestamp=1450924204703, value=770C
2 row(s) in 0.4910 seconds
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10972173/viewspace-1944769/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- SQLServer匯出匯入資料到MySQLServerMySql
- Python批量匯入Excel資料到MySQLPythonExcelMySql
- 資料匯入終章:如何將HBase的資料匯入HDFS?
- 利用Excel匯入資料到SAP C4CExcel
- 使用Excel匯入資料到SAP Cloud for Customer系統ExcelCloud
- fdw批次匯入外部表
- 如何將外部資料庫 匯入到系統的SQL中資料庫SQL
- pandas的外部資料匯入與常用方法
- spark sql與mysql 資料載入與匯出資料到mysqlSparkMySql
- Sqoop匯出ClickHouse資料到HiveOOPHive
- Java 匯入資料到Excel並提供檔案下載介面JavaExcel
- c# 採用datatable 快速匯入資料至MSSQL的方法分享C#SQL
- Hive資料匯入HBase引起資料膨脹引發的思考Hive
- 從CSV檔案匯入資料到Analytics Cloud裡建立模型和StoryCloud模型
- Vue匯出資料到Excel電子表格VueExcel
- 【oracle 多種形式的外部表匯入、匯出】實驗Oracle
- 使用Logstash工具匯入sqlserver資料到elasticSearch及elk分散式日誌中心SQLServerElasticsearch分散式
- GoldenGate實時投遞資料到大資料平臺(7)– Apache HbaseGo大資料Apache
- IDEA如何匯入外部依賴的jar包IdeaJAR
- waterdrop匯出hdfs資料到clickhouse(text,csv,json)JSON
- 高效整合:聚水潭採購入庫資料匯入MySQL案例MySql
- C#快速匯出百萬級資料到Excel方法C#Excel
- 利用跳板機連線mysql,匯出資料到csvMySql
- MongoDB--Mongodb 中資料匯出與匯入MongoDB
- Eclipse下匯入外部jar包的最簡單方法EclipseJAR
- MongoDB日常運維-07遠端匯出資料到execlMongoDB運維
- 如何通過 Excel import 的方式匯入測試資料到 SAP Commerce Cloud 伺服器ExcelImportCloud伺服器
- Flume:資料匯入到hdfs中
- 用Navicat把SQLServer資料匯入MySQLServerMySql
- sqoop資料匯入匯出OOP
- Oracle 資料匯入匯出Oracle
- 資料泵匯出匯入
- Oracle資料匯入匯出Oracle
- phpMyAdmin匯入/匯出資料PHP
- ASP.NET 匯出gridview中的資料到Excel表中,並對指定單元格換行操作ASP.NETViewExcel
- MySQL入門--匯出和匯入資料MySql
- Python匯出資料到Excel表格-NotImplementedError: formatting_info=True not yet implementedPythonExcelErrorORM
- 一條資料HBase之旅,簡明HBase入門教程開篇
- 用DataX導資料到Clickhouse遇到的坑