HBase初探

541732025發表於2014-03-10
string hbaseCluster = "https://charju.azurehdinsight.net";
string hadoopUsername = "賬戶名字";
string hadoopPassword = "密碼";

ClusterCredentials creds = new ClusterCredentials(new Uri(hbaseCluster), hadoopUsername, hadoopPassword);
var hbaseClient = new HBaseClient(creds);

// No response when GetVersion
var version = hbaseClient.GetVersion();

Console.WriteLine(Convert.ToString(version));
View Code

首先上程式碼,這個太特麼的坑爹了!程式碼在winform中是無法執行滴!!!在命令列應用中是可以的!!!(浪費了老子好幾天的時間……)

在winform中,通過windbg除錯,發現在GetVersion的時候,主執行緒起了一個Task,然後等待Task的完成。在Task執行初期(大概1分鐘內),會有另外一個執行緒,在WaitHandle,然後等一段時間,該執行緒消失。主執行緒中開始Retries呼叫,然後,就沒有然後了……

 

Anyway,命令列中,程式碼是OK的。

我的例子,是利用新浪上的API來得到股票資訊,比如說:http://hq.sinajs.cn/list=sz000977,sh600718,我每秒鐘呼叫一次,然後這些資料刷到hbase裡面去。

 

股票的實體類定義

public class StockEntity
    {
        public string Name { get; set; }
        public double TodayOpeningPrice { get; set; }
        public double YesterdayClosingPrice { get; set; }
        public double CurrentPrice { get; set; }
        public double TodayMaxPrice { get; set; }
        public double TodayMinPrice { get; set; }
        public double BidPriceBuy { get; set; }
        public double BidPriceSell { get; set; }
        public int FixtureNumber { get; set; }
        public double FixtureAmount { get; set; }
        public int Buy1Number { get; set; }
        public double Buy1Price { get; set; }
        public int Buy2Number { get; set; }
        public double Buy2Price { get; set; }
        public int Buy3Number { get; set; }
        public double Buy3Price { get; set; }
        public int Buy4Number { get; set; }
        public double Buy4Price { get; set; }
        public int Buy5Number { get; set; }
        public double Buy5Price { get; set; }
        public int Sell1Number { get; set; }
        public double Sell1Price { get; set; }
        public int Sell2Number { get; set; }
        public double Sell2Price { get; set; }
        public int Sell3Number { get; set; }
        public double Sell3Price { get; set; }
        public int Sell4Number { get; set; }
        public double Sell4Price { get; set; }
        public int Sell5Number { get; set; }
        public double Sell5Price { get; set; }

        public DateTime TransactionTime { get; set; }
    }
View Code

 

資料拉下來之後,新開一個執行緒,讓它去寫到hbase中。

ThreadPool.QueueUserWorkItem(new WaitCallback(SaveStockDataToHbase), se);

 

具體幹活程式碼如下:

 1 private void SaveStockDataToHbase(object state)
 2         {
 3             StockEntity se = state as StockEntity;
 4 
 5             // Insert data into the HBase table.
 6             string rowKey = Guid.NewGuid().ToString();
 7 
 8             CellSet cellSet = new CellSet();
 9             CellSet.Row cellSetRow = new CellSet.Row { key = Encoding.UTF8.GetBytes(rowKey) };
10             cellSet.rows.Add(cellSetRow);
11 
12 
13             Type t = typeof(StockEntity);
14 
15             foreach (string colname in stockEntityColumns)
16             {
17                 var pi = t.GetProperty(colname);
18                 object val = pi.GetValue(se);
19 
20                 Cell value = new Cell { column = Encoding.UTF8.GetBytes("charju:" + colname), data = Encoding.UTF8.GetBytes(Convert.ToString(val)) };
21                 cellSetRow.values.Add(value);
22             }
23 
24             try
25             {
26                 hbaseClient.StoreCells(hbaseStockTableName, cellSet);
27             }
28             catch (Exception ex)
29             {
30                 Console.WriteLine(ex.Message);
31             }
32         }

6~10行,是生成一個新Row。20行,是反射實體類的每一個Property 定義,來取對應的值(否則我要寫一坨重複的程式碼)。21行,把對應的該列資料寫到這個行上。

26行,就是真正的放到hbase中。

 

上面20行,你可能會注意到:charju,這是我的column family的名字。回過頭來,看看hbase中的表是怎麼建立的

string hbaseCluster = "https://charju.azurehdinsight.net";
string hadoopUsername = "<your name>";
string hadoopPassword = "<your password>";
string hbaseStockTableName = "StockInformation";
HBaseClient hbaseClient;

public void CreateHbaseTable()
{

            // Create a new HBase table. - StockInformation
            TableSchema stockTableSchema = new TableSchema();
            stockTableSchema.name = hbaseStockTableName;
            stockTableSchema.columns.Add(new ColumnSchema() { name = "charju" });
            hbaseClient.CreateTable(stockTableSchema);

}

 

而hbaseClient的例項化,是在這裡:

ClusterCredentials creds = new ClusterCredentials(new Uri(hbaseCluster), hadoopUsername, hadoopPassword);
hbaseClient = new HBaseClient(creds);

 

資料寫入後,我們可以有幾個方式來。一是在hbase中配置一下,允許RDP,然後remote上去跑hbase shell命令,可惜我虛機裡面RDP總失敗,不知道為啥。第二種方式,就是用HIVE來查。

連線到hbase的網站後,在hive editor那個介面中,先建立對應的表

CREATE EXTERNAL TABLE StockInformation(rowkey STRING, TodayOpeningPrice STRING, YesterdayClosingPrice STRING, CurrentPrice STRING, TodayMaxPrice STRING, TodayMinPrice STRING, BidPriceBuy STRING, BidPriceSell STRING, FixtureNumber STRING, FixtureAmount STRING, Buy1Number STRING, Buy1Price STRING, Buy2Number STRING, Buy2Price STRING, Buy3Number STRING, Buy3Price STRING, Buy4Number STRING, Buy4Price STRING, Buy5Number STRING, Buy5Price STRING, Sell1Number STRING, Sell1Price STRING, Sell2Number STRING, Sell2Price STRING, Sell3Number STRING, Sell3Price STRING, Sell4Number STRING, Sell4Price STRING, Sell5Number STRING, Sell5Price STRING, TransactionTime STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,charju:TodayOpeningPrice ,charju:YesterdayClosingPrice ,charju:CurrentPrice ,charju:TodayMaxPrice ,charju:TodayMinPrice ,charju:BidPriceBuy ,charju:BidPriceSell ,charju:FixtureNumber ,charju:FixtureAmount ,charju:Buy1Number ,charju:Buy1Price ,charju:Buy2Number ,charju:Buy2Price ,charju:Buy3Number ,charju:Buy3Price ,charju:Buy4Number ,charju:Buy4Price ,charju:Buy5Number ,charju:Buy5Price ,charju:Sell1Number ,charju:Sell1Price ,charju:Sell2Number ,charju:Sell2Price ,charju:Sell3Number ,charju:Sell3Price ,charju:Sell4Number ,charju:Sell4Price ,charju:Sell5Number ,charju:Sell5Price ,charju:TransactionTime')
TBLPROPERTIES ('hbase.table.name' = 'StockInformation');

建立成功後,然後就可以跑SQL了,比如說:

select * from StockInformation where buy1number=9800 order by transactiontime

今天小浪的最大一筆買入。當然,類似於select count(0) 之類的更OK了。

 

 

有用的連線:

https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-tutorial-get-started/

相關文章