前言:本文詳細介紹了 HBase ValueFilter 過濾器 Java&Shell API 的使用,並貼出了相關示例程式碼以供參考。ValueFilter 基於列值進行過濾,在工作中涉及到需要通過HBase 列值進行資料過濾時可以考慮使用它。比較器細節及原理請參照之前的更文:HBase Filter 過濾器之比較器 Comparator 原理及原始碼學習
一。Java Api
頭部程式碼
/**
* 用於列值過濾。
*/
public class ValueFilterDemo {
private static boolean isok = false;
private static String tableName = "test";
private static String[] cfs = new String[]{"f1","f2"};
private static String[] data = new String[]{
"row-1:f1:c1:abcdefg",
"row-2:f1:c2:abc",
"row-3:f2:c3:abc123456",
"row-4:f2:c4:1234abc567"
};
public static void main(String[] args) throws IOException {
MyBase myBase = new MyBase();
Connection connection = myBase.createConnection();
if (isok) {
myBase.deleteTable(connection, tableName);
myBase.createTable(connection, tableName, cfs);
// 造資料
myBase.putRows(connection, tableName, data);
}
Table table = connection.getTable(TableName.valueOf(tableName));
Scan scan = new Scan();
中部程式碼
向右滑動滾動條可檢視輸出結果。
1. BinaryComparator 構造過濾器
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("abc"))); // [row-2:f1:c2:abc]
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.NOT_EQUAL, new BinaryComparator(Bytes.toBytes("abc"))); // [row-1:f1:c1:abcdefg, row-3:f2:c3:abc123456, row-4:f2:c4:1234abc567]
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.GREATER, new BinaryComparator(Bytes.toBytes("abc"))); // [row-1:f1:c1:abcdefg, row-3:f2:c3:abc123456]
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL, new BinaryComparator(Bytes.toBytes("abc1"))); // [row-1:f1:c1:abcdefg, row-3:f2:c3:abc123456]
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.LESS, new BinaryComparator(Bytes.toBytes("abc"))); // [row-4:f2:c4:1234abc567]
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("abc"))); // [row-2:f1:c2:abc, row-4:f2:c4:1234abc567]
2. BinaryPrefixComparator 構造過濾器
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes("123"))); // [row-4:f2:c4:1234abc567]
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.NOT_EQUAL, new BinaryPrefixComparator(Bytes.toBytes("ab"))); // [row-4:f2:c4:1234abc567]
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.GREATER, new BinaryPrefixComparator(Bytes.toBytes("ab"))); // [] 只比較prefix長度的位元組
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL, new BinaryPrefixComparator(Bytes.toBytes("ab"))); // [row-1:f1:c1:abcdefg, row-2:f1:c2:abc, row-3:f2:c3:abc123456]
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.LESS, new BinaryPrefixComparator(Bytes.toBytes("abc"))); // [row-4:f2:c4:1234abc567]
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryPrefixComparator(Bytes.toBytes("abc"))); // [row-1:f1:c1:abcdefg, row-2:f1:c2:abc, row-3:f2:c3:abc123456, row-4:f2:c4:1234abc567]
3. SubstringComparator 構造過濾器
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator("123")); // [row-3:f2:c3:abc123456, row-4:f2:c4:1234abc567]
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.NOT_EQUAL, new SubstringComparator("def")); // [row-2:f1:c2:abc, row-3:f2:c3:abc123456, row-4:f2:c4:1234abc567]```
4. RegexStringComparator 構造過濾器
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.NOT_EQUAL, new RegexStringComparator("4[a-z]")); // [row-1:f1:c1:abcdefg, row-2:f1:c2:abc, row-3:f2:c3:abc123456]
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator("4[a-z]")); // [row-4:f2:c4:1234abc567]
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator("abc")); // [row-1:f1:c1:abcdefg, row-2:f1:c2:abc, row-3:f2:c3:abc123456, row-4:f2:c4:1234abc567]
尾部程式碼
scan.setFilter(valueFilter);
ResultScanner scanner = table.getScanner(scan);
Iterator<Result> iterator = scanner.iterator();
LinkedList<String> keys = new LinkedList<>();
while (iterator.hasNext()) {
String key = "";
Result result = iterator.next();
for (Cell cell : result.rawCells()) {
byte[] rowkey = CellUtil.cloneRow(cell);
byte[] family = CellUtil.cloneFamily(cell);
byte[] column = CellUtil.cloneQualifier(cell);
byte[] value = CellUtil.cloneValue(cell);
key = Bytes.toString(rowkey) + ":" + Bytes.toString(family) + ":" + Bytes.toString(column) + ":" + Bytes.toString(value);
keys.add(key);
}
}
System.out.println(keys);
scanner.close();
table.close();
connection.close();
}
}
二。Shell Api
1. BinaryComparator 構造過濾器
方式一:
hbase(main):006:0> scan 'test',{FILTER=>"ValueFilter(=,'binary:abc')"}
ROW COLUMN+CELL
row-2 column=f1:c2, timestamp=1589453592471, value=abc
1 row(s) in 0.0240 seconds
支援的比較運算子:= != > >= < <=
,不再一一舉例。
方式二:
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.BinaryComparator
import org.apache.hadoop.hbase.filter.ValueFilter
hbase(main):010:0> scan 'test',{FILTER => ValueFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), BinaryComparator.new(Bytes.toBytes('abc')))}
ROW COLUMN+CELL
row-2 column=f1:c2, timestamp=1589453592471, value=abc
1 row(s) in 0.0230 seconds
支援的比較運算子:LESS
、LESS_OR_EQUAL
、EQUAL
、NOT_EQUAL
、GREATER
、GREATER_OR_EQUAL
,不再一一舉例。
推薦使用方式一,更簡潔方便。
2. BinaryPrefixComparator 構造過濾器
方式一:
hbase(main):011:0> scan 'test',{FILTER=>"ValueFilter(=,'binaryprefix:ab')"}
ROW COLUMN+CELL
row-1 column=f1:c1, timestamp=1589453592471, value=abcdefg
row-2 column=f1:c2, timestamp=1589453592471, value=abc
row-3 column=f2:c3, timestamp=1589453592471, value=abc123456
3 row(s) in 0.0430 seconds
方式二:
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.BinaryPrefixComparator
import org.apache.hadoop.hbase.filter.ValueFilter
hbase(main):013:0> scan 'test',{FILTER => ValueFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), BinaryPrefixComparator.new(Bytes.toBytes('ab')))}
ROW COLUMN+CELL
row-1 column=f1:c1, timestamp=1589453592471, value=abcdefg
row-2 column=f1:c2, timestamp=1589453592471, value=abc
row-3 column=f2:c3, timestamp=1589453592471, value=abc123456
3 row(s) in 0.0440 seconds
其它同上。
3. SubstringComparator 構造過濾器
方式一:
hbase(main):014:0> scan 'test',{FILTER=>"ValueFilter(=,'substring:123')"}
ROW COLUMN+CELL
row-3 column=f2:c3, timestamp=1589453592471, value=abc123456
row-4 column=f2:c4, timestamp=1589453592471, value=1234abc567
2 row(s) in 0.0340 seconds
方式二:
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.filter.ValueFilter
hbase(main):016:0> scan 'test',{FILTER => ValueFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('123'))}
ROW COLUMN+CELL
row-3 column=f2:c3, timestamp=1589453592471, value=abc123456
row-4 column=f2:c4, timestamp=1589453592471, value=1234abc567
2 row(s) in 0.0240 seconds
區別於上的是這裡直接傳入字串進行比較,且只支援EQUAL
和NOT_EQUAL
兩種比較符。
4. RegexStringComparator 構造過濾器
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.RegexStringComparator
import org.apache.hadoop.hbase.filter.ValueFilter
hbase(main):018:0> scan 'test',{FILTER => ValueFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), RegexStringComparator.new('4[a-z]'))}
ROW COLUMN+CELL
row-4 column=f2:c4, timestamp=1589453592471, value=1234abc567
1 row(s) in 0.0290 seconds
該比較器直接傳入字串進行比較,且只支援EQUAL
和NOT_EQUAL
兩種比較符。若想使用第一種方式可以傳入regexstring
試一下,我的版本有點低暫時不支援,不再演示了。
注意這裡的正則匹配指包含關係,對應底層find()
方法。
ValueFilter
不支援使用 LongComparator
比較器,且 BitComparator
、NullComparator
比較器用之甚少,也不再介紹。
檢視文章全部原始碼請訪以下GitHub地址:
https://github.com/zhoupengbo/demos-bigdata/blob/master/hbase/hbase-filters-demos/src/main/java/com/zpb/demos/ValueFilterDemo.java
轉載請註明出處!歡迎關注本人微信公眾號【HBase工作筆記】