HBase學習之四: mapreduce處理資料後儲存到hbase及錯誤java.lang.NoClassDefFoundError的解決辦法

anickname發表於2016-07-07

mapreduce處理資料後儲存到hbase原始碼(參考網上資料測試OK):
map類:

package hbase;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class HBaseMapper extends Mapper<LongWritable, Text, Text, Text> {
	public void map(LongWritable key, Text value, Context context)
	throws IOException, InterruptedException {
	String[] item = value.toString().split(",");
	String k = item[0];
	String v = item[1];
	context.write(new Text(k), new Text(v));
	}

}

reduce類:

package hbase;

import java.io.IOException;

import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;

public class HBaseReducer extends TableReducer<Text,Text,ImmutableBytesWritable>{
	 public void reduce(Text key,Iterable<Text> value,Context context) throws IOException, InterruptedException{
	 String k = key.toString();
	 String v = value.iterator().next().toString();
	 Put putrow = new Put(Bytes.toBytes(k));
	 putrow.add(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes(v));
	 context.write(new ImmutableBytesWritable(Bytes.toBytes(k)), putrow);//注意key和value的型別
 }

}

driver類:

package hbase;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.util.Tool;

public class HBaseDriver extends Configured implements Tool {

	@Override
	public int run(String[] arg0) throws Exception {
	// TODO Auto-generated method stub
	Configuration conf = HBaseConfiguration.create();
	conf.set("hbase.zookeeper.quorum",
	"hadoop001.icccuat.com,hadoop002.icccuat.com,hadoop003.icccuat.com");
	conf.set("zookeeper.znode.parent", "/hbase-unsecure");
	@SuppressWarnings("deprecation")
	Job job = new Job(conf, "Txt-to-Hbase");
	job.setJarByClass(TxHBase.class);
	Path in = new Path("/home/hbase/");//輸入檔案路徑
FileInputFormat.addInputPath(job, in);
	job.setMapperClass(HBaseMapper.class);
	job.setReducerClass(HBaseReducer.class);
	job.setMapOutputKeyClass(Text.class);
	job.setMapOutputValueClass(Text.class);
	TableMapReduceUtil.initTableReducerJob("emp", HBaseReducer.class, job);//emp為hbase中的表名
job.waitForCompletion(true);
	return 0;
	}

}

程式入口:
package hbase;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.util.ToolRunner;

public class TxHBase {
	public static void main(String[] args) throws Exception {
	int mr = ToolRunner.run(new Configuration(), new HBaseDriver(), args);
	System.exit(mr);
	}
}


輸入檔案內容如下:
[hdfs@hadoop002 lib]$ hadoop fs -cat /home/hbase/a.txt;
78827,jiangxiaozhi
666777,zhangsan
77877,hecheng
123322,liusi
注意:逗號前部分是rowkey和表中已有資料的rowkey不能重複。
將寫好的程式打jar包:
export->jar file->next->指定儲存路徑->next->next->指定main class(此處為TxHBase)->finish
傳到hadoop叢集中執行,輸入命令hadoop jar mapreducehbase.jar hbase.TxHBase 輸入的主類名是類全限定名。
此時報錯,資訊如下:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
	at hbase.HBaseDriver.run(HBaseDriver.java:18)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at hbase.TxHBase.main(TxHBase.java:8)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	... 9 more

不加入hbase相關的程式碼執行mapreduce程式沒有任何問題,一旦加入hbase相關的程式碼,報各種各樣和hbase相關的NoClassDefFoundError
錯誤,網上找了很多資料,原因大致為:mr程式沒有引用到叢集上的hbase jar包。解決辦法是把hbase的jar包加入到hadoop classpath中,
經過測試發現下面的辦法可行:
在hadoop安裝目錄的conf目錄下找到hadoop-env.sh檔案,開啟找到HADOOP_CLASSPATH:
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}${JAVA_JDBC_LIBS}:${MAPREDUCE_LIBS}
加入hbase lib後變成:
HBASE_LIBS=/usr/hdp/2.2.6.0-2800/hbase/lib/* --hbase jar包位置
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}${JAVA_JDBC_LIBS}:${MAPREDUCE_LIBS}:${HBASE_LIBS} --引入jar包${HBASE_LIBS} 
不需要重啟,重新執行命令hadoop jar mapreducehbase.jar hbase.TxHBase 執行成功。接下來檢視資料是否插入成功
進入hbase查詢:
hbase(main):001:0> scan 'emp'
ROW                                              COLUMN+CELL                                                                                                                                 
 1001                                            column=info:age, timestamp=1467103276147, value=20                                                                                          
 1001                                            column=info:name, timestamp=1467103276137, value=zhangsan                                                                                   
 1002                                            column=info:age, timestamp=1467103276151, value=21                                                                                          
 1002                                            column=info:name, timestamp=1467103276149, value=lisi                                                                                       
 1003                                            column=info:age, timestamp=1467103276154, value=22                                                                                          
 1003                                            column=info:name, timestamp=1467103276152, value=wangwu                                                                                     
 1004                                            column=info:age, timestamp=1467103276157, value=22                                                                                          
 1004                                            column=info:name, timestamp=1467103276156, value=xiaoming                                                                                   
 1005                                            column=info:age, timestamp=1467103276160, value=17                                                                                          
 1005                                            column=info:name, timestamp=1467103276159, value=hanmeimei                                                                                  
 1006                                            column=info:age, timestamp=1467103276165, value=28                                                                                          
 1006                                            column=info:name, timestamp=1467103276162, value=xiaohong                                                                                   
 1007                                            column=info:age, timestamp=1467103276168, value=45                                                                                          
 1007                                            column=info:name, timestamp=1467103276167, value=haimingwei                                                                                 
 1008                                            column=info:age, timestamp=1467103276172, value=16                                                                                          
 1008                                            column=info:name, timestamp=1467103276170, value=xiaoqi                                                                                     
 123322                                          column=info:name, timestamp=1467809673640, value=liusi         ----新插入資料ok                                                                                    
 2001                                            column=info:age, timestamp=1467103276175, value=23                                                                                          
 2001                                            column=info:name, timestamp=1467103276173, value=zhaoliu                                                                                    
 3002                                            column=info:age, timestamp=1467103276178, value=24                                                                                          
 3002                                            column=info:name, timestamp=1467103276177, value=liqi                                                                                       
 666777                                          column=info:name, timestamp=1467809673640, value=zhangsan      ----新插入資料ok                                                                                
 77877                                           column=info:name, timestamp=1467809673640, value=hecheng       ----新插入資料ok                                                                             
 78827                                           column=info:name, timestamp=1467809673640, value=jiangxiaozhi  ----新插入資料ok                                                                              
14 row(s) in 0.2780 seconds
另外在做hbase filter複雜條件分頁查詢的時候,PageFilter設定的頁數不起作用,最後改成了rs.next(pageSize)才OK,據說是filter被傳送到各個regionserver執行,但是不能保證每個執行結果彙集到客戶端的時候一定和設定的pageSize一致。至於filter的具體執行邏輯還有待驗證。

相關文章