HBase建模、使用以及優化

友德發表於2018-07-11

原文網址 : https://flycode.co/archives/185610

優化

HBase建表的基本準則

常見rowKey設計遇到的問題以及相應的解決方案

建模案例分析：電商中寶貝點選量建模示例

HBase客戶端的使用

HBase優化

HBase連續分頁問題查詢的參考解決方案

分享的目的：

合理地使用HBase，發揮HBase本身所不具有的功能，提高HBase的執行效率

一、HBase建表的基本準則

1. family數量的控制:

不能太多，建議不要超過2個，一般的情況下，一張表，一個family最好，可以避免在刷快取的時候一個Region下多個Store的相互影響，減少不必要的IO操作；

2. rowKey的設計:

rowKey寫入不能連續，儘量散開，避免寫入region熱點的問題, 導致regionServer負載不均衡，不能充分發揮HBase叢集高併發寫入的效能，極端情況下會出現regionServer當機下線；
在滿足業務的情況下，rowKey的長度越短越好，rowkey的長度越短，越節約記憶體和硬碟的儲存空間；
設定好rowKey的分割符，多個業務欄位拼接，設定好分隔符，如#， $(方便字首範圍查詢, 又沒有熱點問題）

3. 版本數量的控制:

業務上沒有特別要求的情況下，用一個版本，即最大版本和最小版本一樣，均為1；

4. 失效時間的控制:

根據具體業務的需求，合理的設定好失效時間，從節約儲存空間的情況考慮，當然是在滿足業務的情況下儲存的時間越短越好，永久儲存的情況除外

二、常見rowKey設計遇到的問題以及相應的解決方案

1. 連續rowKey導致的region熱點問題：

解決方案：

rowKey整個逆序, 主要針對流水資料，字首的範圍查詢變得不可用

           逆序前分別是： 20170227204355331和20170227204355339  (同一個region)
           逆序後分別是： 13355340272207102 和93355340272207102 (不同的region)

rowKey的部分逆序, 主要針對一些特徵的流水資料，而且還可以實現部分字首的範圍查詢

           逆序前分別是： 20170227204355TNG和20170227204355TFF  (同一個region)
           逆序後分別是： GNT20170227204355 和FFT20170227204355 (不同的region)

對rowKey取MD5 Hash, 相領的值做md5 Hash之後，完全不同
直接使用UUID ，完全雜湊開

2. 相同rowKey的問題, 即業務上沒唯一欄位：

加隨機字串字尾；
加時間戳字尾, 根據需要實現按時間遞增或遞減；
注意：加字首和字尾，最好設定好分隔符，方便業務上的處理，因為大部分用是的是26個大小寫字母和數字，非常適合常見的分割符如 $, #

三、建模案例分析：電商中寶貝點選量建模示例

需求：統計電商中某個寶貝最近一週的點選量
表的設計：

以分鐘為單位的近實時統計表 rowKey=itemId + ‘#’+’yyyyMMddHHmm’
歷史點選量彙總表 rowKey = itemId;
最近一週點選量近似彙總表 rowKey = itemId;

在業務上層每半個小時做定時更新
1.新的一週點選量= 最近的半個小時量-7天前的半小時量 + 原來一週的歷史量
2.新的歷史點選量 += 最近半個小時的點選量；

四、HBase客戶端的使用

1 原生客戶端Api

1.1連線

Configuration conf = HBaseConfiguraton.create(); 
conf 設定 HBase  zk 和其它引數； 
HTable table = new HTable(conf, “test.testTable”);

1.2 單個操作

Put put = new Put(Bytes.toBytes(“row1”));
put.add(Bytes.toBytes(“colfam1”), Bytes.toBytes(“qual1”), Bytes.toBytes(“val1”));
put.add(Bytes.toBytes(“colfam1”), Bytes.toBytes(“qual2”), Bytes.toBytes(“val1”));
table.put(put);

// delete操作
Delete delete = new Delete(toBytes(“row2”));
table.delete(delete);

// Get操作
Get get = new Get(toBytes(“row1”));
Result result = table.get(get);

1.3 批量操作

List<Put> puts = new ArrayList<Put>(2);
Put put2 = new Put(Bytes.toBytes(“row2”));
put2.add(Bytes.toBytes(“colfam1”), Bytes.toBytes(“qual1”), Bytes.toBytes(“val2”));
put2.add(Bytes.toBytes(“colfam1”), Bytes.toBytes(“qual2”), Bytes.toBytes(“val2”));
Put put3 = new Put(Bytes.toBytes(“row3”));
put3.add(Bytes.toBytes(“colfam1”), Bytes.toBytes(“qual1”), Bytes.toBytes(“val3”));
put3.add(Bytes.toBytes(“colfam1”), Bytes.toBytes(“qual2”), Bytes.toBytes(“val3”));
puts.add(put2);
puts.add(put3);
table.put(puts);

1.4 原子自增操作

Long result = table.incrementColumnValue(toBytes(“row1”), toBytes(“colfam1”),

1.5 過濾器的使用

Bytes.toBytes(“qual3”), 3);
PageFilter pageFilter = new PageFilter(1);
Scan scan = new Scan();
scan.setFilter(pageFilter);
Long start = System.currentTimeMillis();
String rowKey = hbaseTemplate.find(tableName, scan, new MinRowKeyMapperExtractor());

注意：不到萬不得已，不要使用Filter，有的Filter的查詢效率很低，最好是結合rowKey範圍掃描進行查詢

1.6 HBase 管理介面API

管理API, 查詢一個HBase叢集所有表

HBaseAdmin admin = new HBaseAdmin(configuration);
HTableDescriptor[] htds = admin.listTables();

2 Spring整合HBase客戶端

1. 新增maven依賴

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>0.98.6-hadoop2</version>
</dependency>
<dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-jpa</artifactId>
    <version>1.6.0.RELEASE</version>
</dependency>
<dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-hadoop</artifactId>
    <version>2.0.2.RELEASE</version>
</dependency>

2.引入Spring HBase工具類

HbaseConfigurationSpringFactoryBean (二次開發的封裝類)

/**
 * hbase工廠類
 * @author tiandesheng
 *
 */
public class HbaseConfigurationSpringFactoryBean implements InitializingBean, FactoryBean<Configuration> {
 
    private static Logger logger = LoggerFactory.getLogger(HbaseConfigurationSpringFactoryBean.class);
    
    private Configuration configuration;
    
    private Properties properties;
    
    public Configuration getObject() throws Exception {
        return configuration;
    }

    public Class<Configuration> getObjectType() {
        return Configuration.class;
    }

    public boolean isSingleton() {
        return true;
    }

    public void afterPropertiesSet() throws Exception {
        
        configuration = HBaseConfiguration.create();
        addProperties(configuration, properties);
        if (logger.isInfoEnabled()) {
            logger.info("Hbase連線初始化完畢!");
        }
    } 
    
    public void addProperties(Configuration configuration, Properties properties) {
        
        if (properties != null) {
            for (Entry<Object, Object> entry : properties.entrySet()) {
                String key = entry.getKey().toString();
                String value = entry.getValue().toString();
                configuration.set(key, value);
            }
        }
    }

    public void setConfiguration(Configuration configuration) {
        this.configuration = configuration;
    }

    public Configuration getConfiguration() {
        return configuration;
    }

    public void setProperties(Properties properties) {
        this.properties = properties;
    }
}

HTableInterfacePoolFactory (二次開發的封裝類)

/**
 * 基於執行緒池的HTable實現工廠
 */
@SuppressWarnings("deprecation")
public class HTableInterfacePoolFactory implements HTableInterfaceFactory, DisposableBean, InitializingBean {

    private static final Logger        logger   = LoggerFactory.getLogger(HTableInterfacePoolFactory.class);

    private int                        poolSize = 50;
    private HTablePool                 pool     = null;
    private Configuration              configuration;
    private Map<String, AtomicInteger> initLock;
    
    public void releaseHTableInterface(HTableInterface table) {
        close(table);
    }
    
    public HTableInterface createHTableInterface(Configuration config, byte[] tableName) {
        
        if (tableName == null) {
            return null;
        }
        if (initLock != null) {
            AtomicInteger tlock = initLock.get(new String(tableName).trim());
            if (tlock != null && tlock.get() == 0) { 
                if (logger.isInfoEnabled()) {
                    logger.info("get Htable:{} connection lock", new String(tableName));
                }
                tlock.getAndAdd(1);
                return pool.getTable(tableName);
            }
        }
        return pool.getTable(tableName);
    }

    public void afterPropertiesSet() throws Exception {
        // 初始化HTablePool
        if (pool == null) {
            pool = new HTablePool(configuration, poolSize);
            if (logger.isInfoEnabled()) {
                logger.info("hbase 連線池建立並初始化完畢!");
            }
        }
    }
    
    private void close(HTableInterface hTableInterface) {
        if(hTableInterface != null) {
            try {
                hTableInterface.close();
            } catch(Throwable t) {
                logger.error("close異常 {},", t);
            }
        }
    }

    public void destroy() throws Exception {
        
        initLock.clear();
        if (logger.isInfoEnabled()) {
            logger.info("Hbase連線池已經關閉!");
        }
    }

    public void setPoolSize(int poolSize) {
        this.poolSize = poolSize;
    }

    public void setConfiguration(Configuration configuration) {
        this.configuration = configuration;
    }
}

HbaseTemplate (核心類)

     Spring的模板設定模式, 用法型別於JdbcTemplate, JmsTemplate, TransactionTemplate)
 通過以Spring 配置檔案方式和純程式碼的方式都可以實現Spring整合HBase客戶端，不管以那種

試，都是要首先設定HBase連線的幾個引數，其中zk和zk的埠這兩個引數一定要包含，然後依次
初始化上面三個類的物件，最終得到的HbaseTemplate物件就是我們要直接要對HBase進行操作對
象， spring配置檔案和純程式碼實現方式如下：

1. spring配置檔案的方式：

<bean id="hBaseConfiguration" class="com.xxx.hbase.client.HbaseConfigurationSpringFactoryBean>
      <property name="properties">
          <props>
               <prop key="hbase.zookeeper.quorum", value="${hbase.zookeeper.quorum}" />
               <prop key="hbase.zookeeper.property.clientPort" ,
                               value="${hbase.zookeeper.property.clientPort}" />
               <prop key="hbase.master.port",value="${hbase.master.port}" />
          </props>
     </property>
</bean>

<bean id="hTableInterfacePoolFactory" class="com.xxx.hbase.client.HTableInterfacePoolFactory" >
       <property name="configuration" ref="hBaseConfiguration" />
       <property name="poolSize" value="${poolSize}" />
</bean>

<bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate"> 
       <property name="encoding" value="utf-8" />
       <property name="configuration" ref="hBaseConfiguration" />
       <property name="tableFactory" ref="hTableInterfacePoolFactory" />
</bean>

2. 純程式碼的方式：

public HbaseTemplate getHbaseTemplate(String hbaseClusterName) {
    HbaseTemplate hbaseTemplate = hbaseTemplates.get(hbaseClusterName);
    if (hbaseTemplate == null ) {
          DcHbaseCluster dcHbaseCluster = 
                dcHbaseClusterDao.selectByHbaseClusterName(hbaseClusterName);
         if (dcHbaseCluster == null) {
                throw new RuntimeException("getHbaseTemplate(String hbaseClusterName1) 出錯
                                                                hbaseClusterName=" + hbaseClusterName);
         }
        String hbaseZookeeperQuorum = dcHbaseCluster.getHbaseZookeeperQuorum();
        String hbaseZookeeperPropertyClientPort = 
                          dcHbaseCluster.getHbaseZookeeperPropertyClientPort();
        String hbaseMasterPort = dcHbaseCluster.getHbaseMasterPort();
        Integer poolSize = dcHbaseCluster.getPoolSize();
        this.addHbaseTemplate(hbaseClusterName, hbaseZookeeperQuorum,
                              hbaseZookeeperPropertyClientPort, hbaseMasterPort, poolSize);
        hbaseTemplate = hbaseTemplates.get(hbaseClusterName);
        if (hbaseTemplate == null) {
              throw new RuntimeException("getHbaseTemplate(String hbaseClusterName2) 出錯
                                                             hbaseClusterName=" + hbaseClusterName);
    }
    }
   return hbaseTemplate;
}


public boolean addHbaseTemplate(String hbaseClusterName,String hbaseZookeeperQuorum,
                                                        String hbaseZookeeperPropertyClientPort,
                                                        String hbaseMasterPort,
                                                        Integer poolSize) {
     HbaseConfigurationSpringFactoryBean hbaseConfigurationSpringFactoryBean =
                                                                           new HbaseConfigurationSpringFactoryBean();
     Properties properties = new Properties();
     //設定zk地址
     properties.put(“hbase.zookeeper.quorum”, hbaseZookeeperQuorum); 
     // 設定zk 埠
     properties.put(“hbase.zookeeper.property.clientPort”, hbaseZookeeperPropertyClientPort);
        // 設定Hmaster埠 
    properties.put(“hbase.master.port”, hbaseMasterPort);   
    hbaseConfigurationSpringFactoryBean.setProperties(properties);
    try {
                 // 初始化
                hbaseConfigurationSpringFactoryBean.afterPropertiesSet(); 
                hbaseConfigurationSpringFactoryBean
     } catch (Exception e) {
              e.printStackTrace();
     throw new RuntimeException("hbaseConfigurationSpringFactoryBean.afterPropertiesSet()出錯");
    }
    Configuration configuration = null;
    try {
                 // 得到連線Configuration物件
                 configuration = (Configuration)hbaseConfigurationSpringFactoryBean.getObject();  
    } catch (Exception e) {
                e.printStackTrace();
               throw new RuntimeException("hbaseConfigurationSpringFactoryBean.getObject()出錯!");
    }
    HTableInterfacePoolFactory hTableInterfacePoolFactory = new HTableInterfacePoolFactory();
    hTableInterfacePoolFactory.setConfiguration(configuration);
    hTableInterfacePoolFactory.setPoolSize(poolSize);
    try {
          hTableInterfacePoolFactory.afterPropertiesSet();   // 初始化hTableInterfacePoolFactory物件
    } catch (Exception e) {
          e.printStackTrace();
          throw new RuntimeException("hTableInterfacePoolFactory.afterPropertiesSet()出錯!");
    }
    HbaseTemplate hbaseTemplate = new HbaseTemplate();
    hbaseTemplate.setEncoding("UTF-8");
    hbaseTemplate.setConfiguration(configuration);
    hbaseTemplate.setTableFactory(hTableInterfacePoolFactory);
    try {
          hbaseTemplate.afterPropertiesSet();   // 初始化hbaseTemplate 物件
         } catch (Exception e) {
         e.printStackTrace();
          throw new RuntimeException("hbaseTemplate.afterPropertiesSet()出錯!");
    }
    configurations.put(hbaseClusterName, configuration);
    hbaseTemplates.put(hbaseClusterName, hbaseTemplate);
    return true;
}

單個Get讀資料操作程式碼示例

public Map<String, String> find(HbaseTemplate hbaseTemplate, 
                            String tableName, 
                    final String rowKey, 
                            final List<String> columns) {
    
       if (logger.isInfoEnabled()) {
              logger.info("find method tableName={}, rowKey={}", tableName, rowKey);
    }
    Result result = hbaseTemplate.execute(tableName, new TableCallback<Result>() {

                 public Result doInTable(HTableInterface table) throws Throwable {
        // 1.產生Get物件
        Get get = getObjectGenerator.generateGet(rowKey, columns);
        // 2.單個查詢hbase
        Long start = System.currentTimeMillis();
        Result result = table.get(get);
        if (logger.isInfoEnabled()) {
               logger.info("find method 消耗時間:{}ms", System.currentTimeMillis() - start);
        }
        return result;
                }
    });
    if (result == null) {
               return null;
    }
    Map<String, String> res = mapFrom(result);
    return res;
}

批量Get讀資料操作程式碼示例

public List<Map<String, String>> batchFind(HbaseTemplate hbaseTemplate, 
                                                       String tableName, 
                                                       final List<String> columns, 
                                                       final List<String> rowKeys) {
     if (logger.isInfoEnabled()) {
      logger.info("batchFind method tableName={}, rowKey={}", tableName, rowKeys);
     }
     Object object = hbaseTemplate.execute(tableName, new TableCallback<Object>() {
          public Object doInTable(HTableInterface table) throws Throwable {
            List<Get> listGets = new ArrayList<Get>(10);
            for (String rowKey : rowKeys) {
                     // 1.產生Get物件
                     Get get = getObjectGenerator.generateGet(rowKey, columns);
                      listGets.add(get);
             }
             // 2.批量查詢hbase
            Long start = System.currentTimeMillis();
            Result[] results = table.get(listGets);
            if (logger.isInfoEnabled()) {
             logger.info("batchFind method 消耗時間:{}ms", System.currentTimeMillis() - start);
            }
           return results;
       }
       });
      if (object == null) {
              return null;
      }
      Result[] results = (Result[])object;
      List<Map<String, String>> res = new ArrayList<Map<String, String>>(10);
      for (Result result : results) {
            Map<String, String> map = mapFrom(result);
            if (map != null) {
                 res.add(map);
             }
       }
     return res;
}

Spring整合HBase的本質

從上面程式碼，可以看出，Spring整合HBase客戶端的本質還是對原始的Spring Client進行封裝，從HBase Template的execute方法主要有三個重要的步驟：

根據表名，獲得或建立HBase表連線例項 HTableInterface；
執行業務上實現的doInTable回撥方法，即執行真正讀寫HBase操作；
把HBase連線例項放回到表池中；

五、HBase優化

1.JVM引數優化:

–Xmn=12G –Xms=24G -Xmx=24G 根據實際機器情況調整，一般為整個機器記憶體的一半，同時建議regionServer的堆記憶體建議不要超過32G ;
-XX:PermSize=512M -XX:MaxPermSize=512M;
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC (建議使用CMS垃圾收集器, 其它JVM引數使用預設比較好)

2.HBase客戶端使用優化：

用表池或建立多個客戶端，適當提高HBase客戶端寫入的併發性；
儘可能批量寫和批量讀，減少RPC互動的次數；
不要讀取不必要的列資料，限定scan的資料範圍；
讀取到客戶端的結果後，一定要關閉結果集，即關閉Results和ReultsScanner;
根據HBase CPU的利用率，合時設定壓縮演算法，前提是要服務需要安裝壓縮演算法包;
6.關閉AutoFlush ，設定setAutoToFlush(false) 不建議，有丟資料的風險;

7.關閉WAL Flag，設定setWriteToWAL(false), 不建議，有丟資料的風險;

3. 建表時優化:

1.根據不同的業務，合理地設定好HBase表名稱空間；

建立預分割槽，減少region分裂的次數，並且寫入的負載也比較好；
動態新增分割槽，在HBase後面新的穩定版本中會有這個功能，華為、阿里、facebook公司內部二次開發的HBase已經新增了這個功能；

4. 運維時優化：

業務低峰期時，手動或定進均衡；
業務高峰期時，關閉自動負載（不建議使用）；

5 .配置引數優化：

設定StoreFile的大小：根據業務場景適當增大hbase.hregion.max.filesize減少寫入過程中split的次數，同時也減少了Region的數量，新版本預設是10G，老版本預設是512M，新版本建議用預設值；
設定memstore固定緩衝區塊的大小：hbase.hregion.memstore.mslab.chunksize, 預設是2M, 最好是根據一次寫入資料大小進行評估，建議用預設值;
減少Zookeeper超時的發生： zookeeper.session.timeout, 預設是3分鐘，可以修改為一分鐘或半分鐘，加快HMaster發現故障的regionServer；
增加處理執行緒： hbase.regionserver.handler.cout, 預設值為10，如果批量寫，這個值可以設定小些，如果是單個讀寫，這個值可以適當設定大些；
啟用資料壓縮：推薦使用Snappy或者LZO壓縮，前提是需要安裝這個壓縮演算法的jar包，然後再進行配置，重啟；
適當增加塊快取的大小： perf.hfile.block.cache.size 預設為0.2，這個需要檢視記憶體刷寫到磁碟的頻率，如果不是很頻繁，可以適當增加這個值的設定，建議0.2 ~ 0.3之間；
調整memStore限制： hbase.regionsever.global.memstore.upperLimit 預設為0.4 hbase.regionsever.global.memstore.lowerLimit 預設為0.35，建議把這兩個值設定近些或相等；
增加阻塞時儲存檔案數目： hbase.hstore.blockingStoreFiles 預設值為7，當一個region的StoreFile的個數超過值的時候，更新就會阻塞, 在高並寫的情況下，設定為10左右比較為合理；
增加阻塞倍率： hbase.region.memstore.block.multiplier 預設值是2，當memstore達到multiplier 乘以flush的大小時，寫入就會阻塞，對於寫壓力比較大，可以增加這個值，一般為設定為2-4；
減少最大日誌檔案限制： hbase.regionserver.maxlogs 預設是32，對於寫壓力比較大的情況，可以減少這個值的設定, 加快後臺非同步執行緒的定時清理工作；

六、HBase連續分頁問題查詢的參考解決方案

在實際的應用中，有很多少分頁查詢顯示的功能，但HBase中分頁過濾器在跨region的時候，會出現各種無法預測的問題，導致讀取的資料丟失，為了解決這個問題，需要在業務端進行多次連續查詢，在基於region根據rowKey連續遞增的情況下，給出如下的解決方案：

原始碼如實現如下：

public Map<String, Map<String, String>> prefixScanOfPageFilter(HbaseTemplate hbaseTemplate, 
                                                                         String tableName,
                                                                          List<String> columns, 
                                                                          String prefix, 
                                                                          String startRow, 
                                                                          int pageSize, 
                                                                          final String isProcessedNostringColumn, 
                                                                          final Set<String> tableColumnSet) {
        
        if (logger.isInfoEnabled()) {
            logger.info("prefixScanOfPageFilter method tableName={}", tableName);
        }
        
        // 每次都要重新查詢
        TreeSet<HbaseRegionInfo> hbaseRegionInfos = new TreeSet<HbaseRegionInfo>(new Comparator<HbaseRegionInfo>() {
            @Override
            public int compare(HbaseRegionInfo o1, HbaseRegionInfo o2) {
                
                if (o1 == null && o2 == null) {
                    return 0;
                } else if (o1 != null && o2 == null) {
                    return 1;
                } else if (o1 == null && o2 != null) {
                    return -1;
                }
                return Bytes.compareTo(o1.getStartRow(), o2.getStartRow());
            }
        });
        
        try {
            HBaseAdmin admin = new HBaseAdmin(hbaseTemplate.getConfiguration());
            List<HRegionInfo> regionInfos = admin.getTableRegions(tableName.getBytes());
            if (regionInfos == null) {
                return null;
            }
            
            for (HRegionInfo hRegionInfo : regionInfos) {
            HbaseRegionInfo hbaseRegionInfo = new HbaseRegionInfo(hRegionInfo.getStartKey(), hRegionInfo.getEndKey() );
                hbaseRegionInfos.add(hbaseRegionInfo);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        
        StringBuffer buffer = new StringBuffer(2);
        String endRow = buffer.append((char)((int)prefix.charAt(0) + 1)).toString();
        
        // 確定region區間範圍
        TreeSet<HbaseRegionInfo> regionInfoRanges = decideRegionRange(hbaseRegionInfos, startRow, endRow);

        // 返回結果
        Map<String, Map<String, String>> result = new HashMap<String, Map<String, String>>(pageSize+1);

        Long start = System.currentTimeMillis();

        byte[] byteBegin = toBytes(startRow);
        byte[] byteEnd = toBytes(endRow);

        HbaseRegionInfo first = regionInfoRanges.first();
        
        while (result.size() < pageSize + 1 ) {
            
            // 分頁過濾器
            PageFilter pageFilter = new PageFilter(pageSize + 1 - result.size());
            // 當前查詢的結果
            Map<String, Map<String, String>> currentResult = null;
            
            byte[] firstEndRow = first.getEndRow();
            
            // 考慮為空的情況  1 >> 結束
            if (firstEndRow == null || firstEndRow.length == 0) {
                Scan scan = new Scan( byteBegin, byteEnd  );
                scan = getObjectGenerator.generateScan(scan, columns);
                scan.setFilter(pageFilter);
                currentResult = hbaseTemplate.find(tableName, scan, 
                      new MapperAndRowKeyDetailExtractor(isProcessedNostringColumn, tableColumnSet));
                result.putAll(currentResult);
                if (logger.isInfoEnabled()) {
                    logger.info("findByPageFilter2 method 消耗時間:{}ms", System.currentTimeMillis() - start);
                }
                return result;
            }
            
            // 比較確定區間
            int cmp = Bytes.compareTo( byteEnd, firstEndRow );
            
            if ( cmp < 0 ) { 
                // 在region範圍之內, 1 >> 結束
                Scan scan = new Scan( byteBegin, byteEnd );
                scan = getObjectGenerator.generateScan(scan, columns);
                scan.setFilter(pageFilter);
                currentResult = hbaseTemplate.find(tableName, scan, 
                      new MapperAndRowKeyDetailExtractor(isProcessedNostringColumn, tableColumnSet));
                result.putAll(currentResult);
                if (logger.isInfoEnabled()) {
                    logger.info("findByPageFilter2 method 消耗時間:{}ms", System.currentTimeMillis() - start);
                }
                return result;
            } else if ( cmp == 0 ) { // 還有特殊處理(暫時不要和1的情況合併)
                // 剛好在region範圍之內， 2 >> 結束
                Scan scan = new Scan( byteBegin, byteEnd );
                scan = getObjectGenerator.generateScan(scan, columns);
                scan.setFilter(pageFilter);
                currentResult = hbaseTemplate.find(tableName, scan, 
                     new MapperAndRowKeyDetailExtractor(isProcessedNostringColumn, tableColumnSet));
                result.putAll(currentResult);
                if (logger.isInfoEnabled()) {
                    logger.info("findByPageFilter2 method 消耗時間:{}ms", System.currentTimeMillis() - start);
                }
                return result;
            } else {
                // 在範圍之外(跨region), 需要再次查詢
                Scan scan = new Scan( byteBegin, firstEndRow );
                scan = getObjectGenerator.generateScan(scan, columns);
                scan.setFilter(pageFilter);
                currentResult = hbaseTemplate.find(tableName, scan, 
                     new MapperAndRowKeyDetailExtractor(isProcessedNostringColumn, tableColumnSet));
                result.putAll(currentResult);
                if (result.size() < pageSize + 1) {
                    if ( first.getStartRow() == null || first.getStartRow().length == 0 ) {
                        HbaseRegionInfo tmp = new HbaseRegionInfo(first.getEndRow(), first.getEndRow());
                        first = regionInfoRanges.ceiling(tmp);
                    } else {
                        first = regionInfoRanges.higher(first);
                    }
                    if (first == null) {
                        return result;
                    }
                    byteBegin = first.getStartRow();
                } else {
                    // 數量滿足要求  4 >> 結束
                    if (logger.isInfoEnabled()) {
                        logger.info("findByPageFilter2 method 消耗時間:{}ms", System.currentTimeMillis() - start);
                    }
                    return result;
                }
            }
        }
    return null;
    }

/**
     * 確定region的範圍
     * @param sets
     * @param startRow
     * @param endRow
     * @return
     */
private TreeSet<HbaseRegionInfo> decideRegionRange(TreeSet<HbaseRegionInfo> sets, String startRow, String endRow) {
        
        HbaseRegionInfo first = new HbaseRegionInfo(toBytes(startRow), toBytes(startRow));
        HbaseRegionInfo second = new HbaseRegionInfo(toBytes(endRow), toBytes(endRow));
        
        HbaseRegionInfo firstLower = sets.lower(first);
        HbaseRegionInfo firstHigher = sets.ceiling(first);
        HbaseRegionInfo secondLower = sets.lower(second);
        HbaseRegionInfo secondHigher = sets.ceiling(second);
        
        HbaseRegionInfo firstRegion = null;
        HbaseRegionInfo secondRegion = null;
        
        if ( (firstLower != null  &&  (firstLower.getEndRow() == null || firstLower.getEndRow().length == 0) ) &&  
                      firstHigher == null) {
            TreeSet<HbaseRegionInfo> result = new TreeSet<HbaseRegionInfo>(new Comparator<HbaseRegionInfo>() {
                @Override
                public int compare(HbaseRegionInfo o1, HbaseRegionInfo o2) {
                    
                    if (o1 == null && o2 == null) {
                        return 0;
                    } else if (o1 != null && o2 == null) {
                        return 1;
                    } else if (o1 == null && o2 != null) {
                        return -1;
                    }
                    return Bytes.compareTo(o1.getStartRow(), o2.getStartRow());
                }
            });
            result.add(firstLower);
            return result;
        }
        
        // 連續相連的情況
//        if (firstLower.getEndRow().equals(firstHigher.getStartRow())) {
            if (startRow.equals(firstHigher.getStartRow())) {
                // 在region邊緣，即第一個值
                firstRegion = firstHigher;
            } else {
                // 在region區間
                firstRegion = firstLower;
            }
//        } 
        
        if ((secondLower != null && (secondLower.getEndRow() == null || secondLower.getEndRow().length == 0) ) &&  
                     secondHigher == null) {
            secondRegion = secondLower;
        }
        
//        if (secondLower.getEndRow().equals(secondHigher.getStartRow())) {
            if (endRow.equals(secondHigher.getStartRow())) {
                secondRegion = secondHigher;
            } else {
                secondRegion = secondLower;
            }
//        }
        
        return (TreeSet<HbaseRegionInfo>)sets.subSet(firstRegion, true, secondRegion, true);
    }

因為region一般都比較大，在實踐中，我們發現絕大數查詢，是不存在跨頁查詢，跨頁查詢的情況下，大部分也是跨1 個頁，跨2個以上頁的情況非常罕見，因此查詢效率沒有感覺到明顯變慢

Hbase優化
2019-03-21
優化
HBase查詢優化
2018-08-05
優化
provider的使用以及優化心得
2021-03-29
IDE優化
Apache HBase MTTR 優化實踐
2022-03-26
Apache優化
使用React中後臺效能優化以及移動端優化
2018-10-25
React優化
使用vue中後臺效能優化以及移動端優化
2018-10-25
Vue優化
HBase最佳實踐－讀效能優化策略
2018-12-28
優化
HBase記憶體配置及JVM優化
2020-12-30
記憶體JVM優化
Hbase優化入門
2020-11-30
優化
尾遞迴以及優化
2018-08-21
遞迴優化
MySQL 索引原理以及優化
2019-06-03
MySql索引優化
插入排序以及優化
2019-03-04
排序優化
mysql 大表中count() 使用方法以及效能優化.
2019-02-16
MySql優化
HBase 讀流程解析與優化的最佳實踐
2019-03-25
優化
HBase查詢優化之Short-Circuit Local Reads
2018-08-12
優化UI
Android效能優化(4)：UI渲染機制以及優化
2020-02-16
Android優化UI
Webpack入門以及打包優化
2020-02-29
Web優化
php使用hbase
2021-11-04
PHP
Synchronized的實現原理以及優化
2020-11-09
synchronized優化
SpringBoot中使用Docker、Zipkin構建模組化
2024-04-26
Spring BootDocker
前端不止：Web效能優化–關鍵渲染路徑以及優化策略
2018-06-03
前端Web優化
Spark讀Hbase優化 --手動劃分region提高並行數
2018-12-15
Spark優化並行
MySQL查詢優化之優化器工作流程以及優化的執行計劃生成
2020-10-23
MySql優化
Ol4網格生成以及優化
2020-02-11
優化
hbase與phoenix整合(使用phoenix操作hbase資料)
2019-03-17
Hbase master gone 系統崩潰. 遭遇 hbase bug 以及對應的解決方案.
2019-05-23
ASTGo
HBase寫吞吐場景資源消耗量化分析及優化
2019-01-16
優化
Spark 讀取 Hbase 優化 --手動劃分 region 提高並行數
2018-12-16
Spark優化並行
HBase 寫吞吐場景資源消耗量化分析及優化
2019-02-12
優化
StarUML 建模使用
2021-03-09
Oracle優化案例-使用with as優化Subquery Unnesting（七）
2018-10-31
Oracle優化
SQL優化案例-使用with as優化Subquery Unnesting（七）
2018-11-28
SQL優化
13、nginx服務叢集搭建以及優化
2020-11-21
Nginx優化
Node效能如何進行監控以及優化？
2021-07-20
優化
HBase資料庫效能調優OW
2022-03-21
資料庫
HBase最佳化實戰
2018-08-31
# Kotlin使用優化（四）
2019-03-26
Kotlin優化
EntityFramework使用及優化
2018-06-11
Framework優化