大資料架構-使用HBase和Solr配置儲存與索引

wulantian發表於2015-01-23

大資料架構-使用HBase和Solr配置儲存與索引

2014-08-22 11:04 王安琪部落格園字號：T | T

HBase可以通過協處理器Coprocessor的方式向Solr發出請求，Solr對於接收到的資料可以做相關的同步：增、刪、改索引的操作，這樣就可以同時使用HBase儲存量大和Solr檢索效能高的優點了，更何況HBase和Solr都可以叢集。這對海量資料儲存、檢索提供了一種方式，將儲存與索引放在不同的機器上，是大資料架構的必須品。

AD：WOT2015 網際網路運維與開發者大會熱銷搶票

HBase和Solr可以通過協處理器Coprocessor的方式向Solr發出請求，Solr對於接收到的資料可以做相關的同步：增、刪、改索引的操作。將儲存與索引放在不同的機器上，這是大資料架構的必須品，但目前還有很多不懂得此道的同學，他們對於這種思想感到很新奇，不過，這絕對是好的方向，所以不懂得抓緊學習吧。

有個朋友給我的那篇部落格留言，說CDH也可以做這樣的事情，我還沒有試過，他還問我要與此相關的程式碼，於是我就稍微整理了一下，作為本篇文章的主要內容。關於CDH的事，我會盡快嘗試，有知道的同學可以給我留言。

下面我主要講述一下，我測試對HBase和Solr的效能時，使用HBase協處理器向HBase新增資料所編寫的相關程式碼，及解釋說明。

一、編寫HBase協處理器Coprocessor

一旦有資料postPut，就立即對Solr裡相應的Core更新。這裡使用了ConcurrentUpdateSolrServer，它是Solr速率效能的保證，使用它不要忘記在Solr裡面配置autoCommit喲。

/*
*版權：王安琪
*描述：監視HBase，一有資料postPut就向Solr傳送，本類要作為觸發器新增到HBase
*修改時間：2014-05-27
*修改內容：新增
*/
package solrHbase.test;
import java.io.UnsupportedEncodingException;
import ***;
public class SorlIndexCoprocessorObserver extends BaseRegionObserver {
private static final Logger LOG = LoggerFactory
.getLogger(SorlIndexCoprocessorObserver.class);
private static final String solrUrl = "http://192.1.11.108:80/solr/core1";
private static final SolrServer solrServer = new ConcurrentUpdateSolrServer(
solrUrl, 10000, 20);
/**
* 建立solr索引
*
* @throws UnsupportedEncodingException
*/
@Override
public void postPut(final ObserverContext<RegionCoprocessorEnvironment> e,
final Put put, final WALEdit edit, final boolean writeToWAL)
throws UnsupportedEncodingException {
inputSolr(put);
}
public void inputSolr(Put put) {
try {
solrServer.add(TestSolrMain.getInputDoc(put));
} catch (Exception ex) {
LOG.error(ex.getMessage());
}
}
}

注意：getInputDoc是這個HBase協處理器Coprocessor的精髓所在，它可以把HBase內的Put裡的內容轉化成Solr需要的值。其中String fieldName = key.substring(key.indexOf(columnFamily) + 3, key.indexOf("我在這")).trim();這裡有一個亂碼字元，在這裡看不到，請大家注意一下。

public static SolrInputDocument getInputDoc(Put put) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("test_ID", Bytes.toString(put.getRow()));
for (KeyValue c : put.getFamilyMap().get(Bytes.toBytes(columnFamily))) {
String key = Bytes.toString(c.getKey());
String value = Bytes.toString(c.getValue());
if (value.isEmpty()) {
continue;
}
String fieldName = key.substring(key.indexOf(columnFamily) + 3,
key.indexOf("")).trim();
doc.addField(fieldName, value);
}
return doc;
}

二、編寫測試程式入口程式碼main

這段程式碼向HBase請求建了一張表，並將模擬的資料，向HBase連續地提交資料內容，在HBase中不斷地插入資料，同時記錄時間，測試插入效能。

/*
*版權：王安琪
*描述：測試HBaseInsert，HBase插入效能
*修改時間：2014-05-27
*修改內容：新增
*/
package solrHbase.test;
import hbaseInput.HbaseInsert;
import ***;
public class TestHBaseMain {
private static Configuration config;
private static String tableName = "angelHbase";
private static HTable table = null;
private static final String columnFamily = "wanganqi";
/**
* @param args
*/
public static void main(String[] args) {
config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "192.103.101.104");
HbaseInsert.createTable(config, tableName, columnFamily);
try {
table = new HTable(config, Bytes.toBytes(tableName));
for (int k = 0; k < 1; k++) {
Thread t = new Thread() {
public void run() {
for (int i = 0; i < 100000; i++) {
HbaseInsert.inputData(table,
PutCreater.createPuts(1000, columnFamily));
Calendar c = Calendar.getInstance();
String dateTime = c.get(Calendar.YEAR) + "-"
+ c.get(Calendar.MONTH) + "-"
+ c.get(Calendar.DATE) + "T"
+ c.get(Calendar.HOUR) + ":"
+ c.get(Calendar.MINUTE) + ":"
+ c.get(Calendar.SECOND) + ":"
+ c.get(Calendar.MILLISECOND) + "Z 寫入: "
+ i * 1000;
System.out.println(dateTime);
}
}
};
t.start();
}
} catch (IOException e1) {
e1.printStackTrace();
}
}
}

下面的是與HBase相關的操作，把它封裝到一個類中，這裡就只有建表與插入資料的相關程式碼。

/*
*版權：王安琪
*描述：與HBase相關操作，建表與插入資料
*修改時間：2014-05-27
*修改內容：新增
*/
package hbaseInput;
import ***;
import org.apache.hadoop.hbase.client.Put;
public class HbaseInsert {
public static void createTable(Configuration config, String tableName,
String columnFamily) {
HBaseAdmin hBaseAdmin;
try {
hBaseAdmin = new HBaseAdmin(config);
if (hBaseAdmin.tableExists(tableName)) {
return;
}
HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);
tableDescriptor.addFamily(new HColumnDescriptor(columnFamily));
hBaseAdmin.createTable(tableDescriptor);
hBaseAdmin.close();
} catch (MasterNotRunningException e) {
e.printStackTrace();
} catch (ZooKeeperConnectionException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void inputData(HTable table, ArrayList<Put> puts) {
try {
table.put(puts);
table.flushCommits();
puts.clear();
} catch (IOException e) {
e.printStackTrace();
}
}
}

三、編寫模擬資料Put

向HBase中寫入資料需要構造Put，下面是我構造模擬資料Put的方式，有字串的生成，我是由mmseg提供的詞典words.dic中隨機讀取一些詞語連線起來，生成一句字串的，下面的程式碼沒有體現，不過很easy，你自己造你自己想要的資料就OK了。

public static Put createPut(String columnFamily) {
String ss = getSentence();
byte[] family = Bytes.toBytes(columnFamily);
byte[] rowKey = Bytes.toBytes("" + Math.abs(r.nextLong()));
Put put = new Put(rowKey);
put.add(family, Bytes.toBytes("DeviceID"),
Bytes.toBytes("" + Math.abs(r.nextInt())));
******
put.add(family, Bytes.toBytes("Company_mmsegsm"), Bytes.toBytes("ss"));
return put;
}

當然在執行上面這個程式之前，需要先在Solr裡面配置好你需要的列資訊，HBase、Solr安裝與配置，它們的基礎使用方法將會在之後的文章中介紹。在這裡，Solr的列配置就跟你使用createPut生成的Put搞成一樣的列名就行了，當然也可以使用動態列的形式。

四、直接對Solr效能測試

如果你不想對HBase與Solr的相結合進行測試，只想單獨對Solr的效能進行測試，這就更簡單了，完全可以利用上面的程式碼段來測試，稍微組裝一下就可以了。

private static void sendConcurrentUpdateSolrServer(final String url,
final int count) throws SolrServerException, IOException {
SolrServer solrServer = new ConcurrentUpdateSolrServer(url, 10000, 20);
for (int i = 0; i < count; i++) { solrServer.add(getInputDoc(PutCreater.createPut(columnFamily)));
}
}

希望可以幫助到你規格嚴格-功夫到家。這次的文章程式碼又偏多了點，但程式碼是解釋思想的最好的語言，我的提倡就是儘可能的減少程式碼的註釋，盡力簡化你的程式碼，使你的程式碼足夠的清晰易懂，甚至於相似於虛擬碼了，這也是《重構》這本書裡所提倡的。

原文連結：http://www.cnblogs.com/wgp13x/p/3927979.html

HBase 資料儲存結構
2021-02-28
CDH+HBase Indexer+Solr為HBase資料建立二級索引
2019-04-06
IndexSolr索引
理解索引：HBase介紹和架構
2018-06-07
索引架構
Hbase 系統架構與資料結構
2015-10-16
架構資料結構
如何使用HBase？大資料儲存的兩個實戰場景
2018-09-04
大資料
Hbase學習二：Hbase資料特點和架構特點
2023-02-25
架構
層次結構資料的資料庫儲存和使用
2004-10-27
資料庫
關於InnoDB表資料和索引資料的儲存
2022-07-18
索引
大資料的儲存和管理
2013-04-17
大資料
Hbase架構和搭建
2024-11-17
架構
大資料架構和模式（一）——大資料分類和架構簡介
2015-01-31
大資料架構模式
MySQL索引及優化（1）儲存引擎和底層資料結構
2020-05-21
MySql索引優化儲存引擎資料結構
HBase 系統架構及資料結構
2019-06-24
架構資料結構
淺析雲端儲存的TCS和LCA兩大架構
2019-09-17
架構
Shopee ClickHouse 冷熱資料分離儲存架構與實踐
2021-10-22
架構
一個資料庫儲存架構的獨白
2018-12-28
資料庫架構
oracle資料型別與儲存結構
2007-09-29
Oracle資料型別
雲資料庫HBase大資料儲存及實時分析場景應用解析
2017-12-13
資料庫大資料
Hbase儲存格式
2013-12-18
HBase架構與基礎命令
2024-11-17
架構
大資料架構和模式（三）——理解大資料解決方案的架構層
2015-01-31
大資料架構模式
【資料結構——圖和圖的儲存結構】
2020-11-12
資料結構
hbase與phoenix整合(使用phoenix操作hbase資料)
2019-03-17
Streaming Data Warehouse 儲存：需求與架構
2022-11-11
架構
大資料儲存平臺之異構儲存實踐深度解讀
2018-06-06
大資料
Bond——大資料時代的資料交換和儲存格式
2024-07-08
大資料
k8s之資料儲存-配置儲存
2021-08-19
K8S
一文講清HBase儲存結構
2019-01-06
海量列式非關聯式資料庫HBase 架構，shell與API
2021-09-14
資料庫架構API
【PHP資料結構】圖的概念和儲存結構
2021-09-09
PHP資料結構
Oracle資料儲存結構
2013-05-21
Oracle
elasticsearch: 指定索引資料的儲存目錄
2024-08-08
Elasticsearch索引
hadoop異構儲存+lucene索引
2019-08-27
Hadoop索引
solr連線資料庫配置
2014-11-30
Solr資料庫
金融機構關鍵業務系統資料儲存規劃實施與配置
2024-01-18
資料庫儲存與索引技術（三）LSM樹實現案例
2023-03-16
資料庫索引
HipChat 十億級檢索和儲存架構曝光
2014-01-17
架構
一文講清HBase的儲存結構
2019-01-21

大資料架構-使用HBase和Solr配置儲存與索引

大資料架構-使用HBase和Solr配置儲存與索引

相關文章