一.錯誤
使用BulkLoad向Hbase匯入資料時出現了錯誤
2014-04-04 15:39:08,521 WARN org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles - Bulk load operation did not find any files to load in directory hdfs://192.168.1.200:9000/user/root/output1. Does it contain files in subdirectories that correspond to column family names?
然後去看MapReduce的臨時輸出目錄,果然沒有data資料夾,只有_SUCCESS檔案。
二.job.setMapOutputValueClass與job.setOutputValueClass
這一定是Reduce的問題了,去看看HFileOutputFormat.configureIncrementalLoad(job, htable); 到底做了什麼。
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(KeyValue.class);
job.setOutputFormatClass(HFileOutputFormat.class);
// Based on the configured map output class, set the correct reducer to properly
// sort the incoming values.
// TODO it would be nice to pick one or the other of these formats.
if (KeyValue.class.equals(job.getMapOutputValueClass())) {
job.setReducerClass(KeyValueSortReducer.class);
} else if (Put.class.equals(job.getMapOutputValueClass())) {
job.setReducerClass(PutSortReducer.class);
} else if (Text.class.equals(job.getMapOutputValueClass())) {
job.setReducerClass(TextSortReducer.class);
} else {
LOG.warn("Unknown map output value type:" + job.getMapOutputValueClass());
}
Debug時發現,job.getMapOutputValueClass為KeyValue。再看看job.setMapOutputValueClass和job.setOutputValueClass的區別
getOutputValueClass mapreduce.job.output.value.class
setOutputValueClass mapreduce.job.output.value.class
setMapOutputValueClass mapreduce.map.output.value.class
getMapOutputValueClass mapreduce.map.output.value.class
/**
* Set the value class for the map output data. This allows the user to
* specify the map output value class to be different than the final output
* value class.
*
* @param theClass the map output value class.
* @throws IllegalStateException if the job is submitted
*/
public void setMapOutputValueClass(Class<?> theClass
) throws IllegalStateException {
ensureState(JobState.DEFINE);
conf.setMapOutputValueClass(theClass);
}
/**
* Get the value class for the map output data. If it is not set, use the
* (final) output value class This allows the map output value class to be
* different than the final output value class.
*
* @return the map output value class.
*/
public Class<?> getMapOutputValueClass() {
Class<?> retv = getClass(JobContext.MAP_OUTPUT_VALUE_CLASS, null,
Object.class);
if (retv == null) {
retv = getOutputValueClass();
}
return retv;
}
也就是
- getMapOutputValueClass的值,在沒有setMapOutputValueClass時,將使用setOutputValueClass的值。
- 允許map output value的class(即getMapOutputValueClass)和最終output value的(Reduceo output value的)class(即getOutputValueClass)不同。泛型類PutSortReducer<ImmutableBytesWritable, Put, ImmutableBytesWritable, KeyValue>說明map output value的class為Put,最終的為KeyValue。
- 上述同樣適用於KeyClass。
我在程式裡job.setOutputValueClass(Put.class),改為job.setMapOutputValueClass(Put.class)即可。
三.HBase刪除所有資料
這個問題跟主題沒有任何關係,就當做綠葉吧。
昨天突然有一想法,如果不重灌Hbase,有沒有辦法“格式化”HBase。
首先想到的是刪掉了Hdfs上hbase目錄,再重啟HBase,發現RegionServer連線不上Master。應該是-ROOT-表和.META.表已經被刪掉了,RegionServer向zookeeper彙報心跳時,zookeeper去-ROOT-表裡查詢此RegionServer的相關資訊,發現資訊已經丟失,也就無法將此RegionServer資訊通知給Master。刪掉zookeeper資訊,再次重啟成功。
rm -rf /tmp/hbase-root*
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/tmp/hbase-root</value> default
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>