大資料專案實戰之 --- 使用者畫像專案分析
一、使用者畫像專案分析
-------------------------------------------------------
1.概念
使用者畫像也叫使用者資訊標籤化、客戶資訊。 根據使用者的資訊和行為動作,用一些標籤把使用者描繪出來,描繪的標籤就是使用者畫像。
2.專案簡介
通過手機安裝的app,判斷機主的年齡區間和性別
通過對app設定性別和年齡段的相應權重結合使用者每天使用app的時長和次數
//核心演算法
/** 性別融合 */
public void protraitSex(double newdata1, double newdata2, long factor) {
double sum = (this.data1 + this.data2 + (newdata1 + newdata2) * factor);
if(sum != 0){
this.data1 = (this.data1 + newdata1 * times) / sum;
this.data2 = (this.data2 + newdata2 * times) / sum;
}
}
3.幾個概念
a.appID:用於標記APP,APP的唯一編號,判斷終端安裝的APP
b.年齡段說明
名稱 ==> 說明
年齡段1 ==> 24歲以下
年齡段2 ==> 25-30歲
年齡段3 ==> 31-35歲
年齡段4 ==> 36-40歲
年齡段5 ==> 40歲以上
c.標籤庫
標籤庫的設計,主要是針對應用市場上主流的App,對App進行分類,App分類資訊包括:App名稱,男女權重資訊,各個年齡段的權重資訊。
AppID|App名稱|男性權重|女性權重|年齡段1|年齡段2|年齡段3|年齡段4|年齡段5
10001|QQ|0.001|0.001|0|0.2|0.3|0.2|0.3
10002|飛信|0.001|0.001|0|0.2|0.3|0.2|0.3
10003|MSN|0.001|0.001|0|0.2|0.3|0.2|0.3
10004|阿里旺旺|0.001|0.001|0|0.2|0.3|0.2|0.3
10005|微信|0.001|0.001|0|0.2|0.3|0.2|0.3
10006|陌陌|0.001|0.001|0|0.2|0.3|0.2|0.3
10007|米聊|0.001|0.001|0|0.2|0.3|0.2|0.3
10008|啪啪|0.001|0.001|0|0.2|0.3|0.2|0.3
10009|飛聊|0.001|0.001|0|0.2|0.3|0.2|0.3
10010|來往|0.001|0.001|0|0.2|0.3|0.2|0.3
10011|連我|0.001|0.001|0|0.2|0.3|0.2|0.3
10012|有你|0.001|0.001|0|0.2|0.3|0.2|0.3
4.畫像資料
10011|連我|0.001|0.001|0|0.2|0.3|0.2|0.3
10012|有你|0.001|0.001|0|0.2|0.3|0.2|0.3
10013|Kakao Talk|0.001|0.001|0|0.2|0.3|0.2|0.3
10014|Whatsapp|0.001|0.001|0|0.2|0.3|0.2|0.3
10015|比鄰|0.001|0.001|0|0.2|0.3|0.2|0.3
20016|新浪讀書|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20017|瀟湘書院|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20018|紅袖添香|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20019|縱橫中文網|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20020|掌上書院|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20021|和閱讀|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20022|掌閱iReader|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20023|QQ閱讀|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20024|百閱|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20025|塔讀小說|0.001|0.001|0.1|0.3|0.3|0.2|0.1
20026|Flipboard|0.001|0.001|0.1|0.3|0.3|0.2|0.1
...
...
5.彙總週期
每天彙總一次,建議當天凌晨統計前一天資料
6.注意事項
a.必備條件:性別比例、年齡比例必須在識別出AppID後進行,如果AppID編號為空,則不做處理。
b.性別和年齡的判斷,都依賴於標籤庫的設計,也就是使用者手機安裝的App,根據手機App計算使用者的性別和年齡權重資訊,最後判斷使用者的年齡和性別。
二、開始專案
-----------------------------------------------------
1.編寫使用者畫像類
package userdraw;
import java.math.BigDecimal;
/**
* 使用者畫像
**/
public class UserDraw {
// 屬性
private String statTimeDay;
private String MDN; //手機號
private double male; //男性
private double female; // 女性
private double age1;
private double age2;
private double age3;
private double age4;
private double age5;
// 重寫toString
public String toString() {
StringBuffer sb = new StringBuffer();
sb.append(statTimeDay).append("|");
sb.append(MDN).append("|");
sb.append(new BigDecimal(male).setScale(3, 4).doubleValue()).append("|");
sb.append(new BigDecimal(female).setScale(3, 4).doubleValue()).append("|");
sb.append(new BigDecimal(age1).setScale(3, 4).doubleValue()).append("|");
sb.append(new BigDecimal(age2).setScale(3, 4).doubleValue()).append("|");
sb.append(new BigDecimal(age3).setScale(3, 4).doubleValue()).append("|");
sb.append(new BigDecimal(age4).setScale(3, 4).doubleValue()).append("|");
sb.append(new BigDecimal(age5).setScale(3, 4).doubleValue()).append("|");
return sb.toString();
}
// 融合方法
/** 性別融合 */
public void protraitSex(double male2, double female2, long times) {
double sum = (this.male + this.female + (male2 + female2) * times);
if(sum != 0){
this.male = (this.male + male2 * times) / sum;
this.female = (this.female + female2 * times) / sum;
}
}
/** 年齡段融合 */
public void protraitAge(double pAge1, double pAge2, double pAge3, double pAge4, double pAge5, long times) {
double sum = (age1 + age2 + age3 + age4 + age5 ) // 之前的APP的
+ (pAge1 + pAge2 + pAge3 + pAge4 + pAge5 ) * times;// 當前的APP的
if(sum != 0){
this.age1 = (pAge1 * times + age1) / sum;
this.age2 = (pAge2 * times + age2) / sum;
this.age3 = (pAge3 * times + age3) / sum;
this.age4 = (pAge4 * times + age4) / sum;
this.age5 = (pAge5 * times + age5) / sum;
}
}
/** 初始化男女概率 */
public void initSex(float male, float female) {
float sum = male + female;
if(sum != 0){
this.male = male / sum;
this.female = female / sum;
}
}
/** 初始化年齡段概率 */
public void initAge(float pAge1, float pAge2, float pAge3, float pAge4, float pAge5) {
float sum = pAge1 + pAge2 + pAge3 + pAge4 + pAge5;
if(sum != 0){
this.age1 = pAge1 / sum;
this.age2 = pAge2 / sum;
this.age3 = pAge3 / sum;
this.age4 = pAge4 / sum;
this.age5 = pAge5 / sum;
}
}
// setter and getter method
public String getStatTimeDay() {
return statTimeDay;
}
public void setStatTimeDay(String statTimeDay) {
this.statTimeDay = statTimeDay;
}
public String getMDN() {
return MDN;
}
public void setMDN(String mDN) {
MDN = mDN;
}
public double getMale() {
return male;
}
public double getFemale() {
return female;
}
public double getAge1() {
return age1;
}
public double getAge2() {
return age2;
}
public double getAge3() {
return age3;
}
public double getAge4() {
return age4;
}
public double getAge5() {
return age5;
}
}
2.編寫主入口函式
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job1 = Job.getInstance(conf, "UserDrawMapReduceJob1");
job1.setJarByClass(UserDrawMapReduce.class);
job1.setMapperClass(MyMap.class);
job1.setReducerClass(MyReduce.class);
job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(TextArrayWritable.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(Text.class);
job1.setInputFormatClass(TextInputFormat.class);
job1.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job1, new Path("file:///D:\\share\\project\\userdraw\\data"));// 輸入路徑
FileOutputFormat.setOutputPath(job1, new Path("file:///D:\\share\\project\\userdraw\\out"));// 輸出路徑
Boolean state1 = job1.waitForCompletion(true);
System.out.println("job1執行成功!!!");
if (state1) {
conf = new Configuration();
Job job2 = Job.getInstance(conf, "UserDrawMapReduceJob2");
job2.setJarByClass(UserDrawMapReduce.class);
job2.setMapperClass(MyMap2.class);
job2.setReducerClass(MyReduce2.class);
job2.setMapOutputKeyClass(Text.class);
job2.setMapOutputValueClass(Text.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(Text.class);
job2.setInputFormatClass(TextInputFormat.class);
job2.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job2, new Path("file:///D:\\share\\project\\userdraw\\out"));// 輸入路徑
FileOutputFormat.setOutputPath(job2, new Path("file:///D:\\share\\project\\userdraw\\out2"));// 輸出路徑
Boolean state2 = job2.waitForCompletion(true);
System.out.println("job2執行成功!!!");
if (state2) {
conf = new Configuration();
// 設定zookeeper
conf.set(UserDrawMapReduce.conf.consite, UserDrawMapReduce.conf.hbaseip);
// 設定hbase表名稱
conf.set(TableOutputFormat.OUTPUT_TABLE, UserDrawMapReduce.conf.tableDraw);
// 將該值改大,防止hbase超時退出
conf.set(UserDrawMapReduce.conf.coftime, UserDrawMapReduce.conf.time);
Job job3 = Job.getInstance(conf,
"UserDrawPutInHbase");
job3.setJarByClass(UserDrawMapReduce.class);
TableMapReduceUtil.addDependencyJars(job3);
FileInputFormat.setInputPaths(job3, new Path("file:///D:\\share\\project\\userdraw\\out2"));
job3.setMapperClass(UserDrawPutInHbaseMap.class);
job3.setMapOutputKeyClass(Text.class);
job3.setMapOutputValueClass(Text.class);
job3.setReducerClass(UserDrawPutInHbaseReduce.class);
job3.setOutputFormatClass(TableOutputFormat.class);
job3.waitForCompletion(true);
}
}
}
3.編寫MR作業類
a.第一次MR作業
public class UserDrawMapReduce {
public static Config conf = new Config();
public static class MyMap extends Mapper<LongWritable, Text, Text, TextArrayWritable> {
Text k = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//一行文字
String line = value.toString();
//通過 | 進行切割
String[] dataArray = line.split(conf.Separator);
//唯一標識:手機號+appid
String uiqkey = dataArray[Integer.parseInt(conf.MDN)]
+ dataArray[Integer.parseInt(conf.appID)]; // MDN + appID
String[] val = new String[5];
//時間
String timenow = dataArray[Integer.parseInt(conf.Date)];
SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd");
val[0] = sdf.format(Long.parseLong(timenow));//時間
val[1] = dataArray[Integer.parseInt(conf.MDN)];// 手機號
val[2] = dataArray[Integer.parseInt(conf.appID)];// appID
val[3] = "1";// 計數
val[4] = dataArray[Integer.parseInt(conf.ProcedureTime)];// 使用時長
k.set(uiqkey);
context.write(k, new TextArrayWritable(val));
}
}
/**
* 統計 每個軟體使用的總次數和總時間,按照unikey進行聚合
*/
public static class MyReduce extends Reducer<Text, TextArrayWritable, Text, Text> {
Text v = new Text();
public void reduce(Text key, Iterable<TextArrayWritable> values,
Context context) throws IOException, InterruptedException {
long sum = 0;
int count = 0;
String[] res = new String[5];
boolean flg = true;
for (TextArrayWritable t : values) {
String[] vals = t.toStrings();
if (flg) {
res = vals;
}
if (vals[3] != null) {
count = count + 1;
}
if (vals[4] != null) {
sum += Long.valueOf(vals[4]);
}
}
res[3] = String.valueOf(count);
res[4] = String.valueOf(sum);
StringBuffer sb = new StringBuffer();
sb.append(res[0]).append("|");// 時間
sb.append(res[1]).append("|");// 手機號
sb.append(res[2]).append("|");// appID
sb.append(res[3]).append("|");// 計數
sb.append(res[4]);// 使用時長
v.set(sb.toString());
context.write(null, v);
}
}
}
b.第二次MR作業
package userdrawmr;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import userdraw.UserDraw;
import util.LoadHdfsTable;
public class UserDrawMapReduce2 {
public static class MyMap2 extends Mapper<LongWritable, Text, Text, Text> {
Text k = new Text();
//更改key為手機號
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] dataArray = line.split("\\|");
String newkey = dataArray[1] ; // MDN手機號
k.set(newkey);
context.write(k, value);
}
}
public static class MyReduce2 extends Reducer<Text, Text, Text, Text> {
Map<String, String[]> appMap = LoadHdfsTable.getAppMap();
Text v = new Text();
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
Map<String, UserDraw> userDrawMap = new HashMap<String, UserDraw>();
Set<String> keySet = userDrawMap.keySet();
String keyMDN = null;
for (Text t : values) {
String[] dataArray = t.toString().split("\\|");
keyMDN = dataArray[1]; // 使用者MDN
String appID = dataArray[2]; // APPID
// 根據appID獲取對應的標籤資訊
if (appID.length() > 0) { // appID不能為空
if (appMap.get(appID) == null) {
continue;
}
String favourite = appMap.get(appID)[2];
float male = Float.parseFloat(appMap.get(appID)[1]);
float female = Float.parseFloat(appMap.get(appID)[2]);
float age1 = Float.parseFloat(appMap.get(appID)[3]);
float age2 = Float.parseFloat(appMap.get(appID)[4]);
float age3 = Float.parseFloat(appMap.get(appID)[5]);
float age4 = Float.parseFloat(appMap.get(appID)[6]);
float age5 = Float.parseFloat(appMap.get(appID)[7]);
long times = Long.parseLong(dataArray[4]);
if (userDrawMap.containsKey(keyMDN)==true) {
UserDraw userDraw = userDrawMap.get(keyMDN);
// 性別權重
userDraw.protraitSex(male, female, times);
// 年齡段權重
userDraw.protraitAge(age1, age2, age3, age4, age5, times);
} else {
userDrawMap.put(keyMDN, createDrawData(dataArray, favourite, male, female, age1, age2, age3, age4, age5, times));
}
}
}
for (String keys : keySet) {
v.set(userDrawMap.get(keys).toString());
context.write(null, v);
}
}
}
// 建立畫像資料
private static UserDraw createDrawData(String[] dataArray, //
String favourite, //興趣愛好
float male, float female, //性別
float age1, float age2, float age3, float age4, float age5, //年齡
long times) {
UserDraw userDraw = new UserDraw();
userDraw.setStatTimeDay(dataArray[0]);
userDraw.setMDN(dataArray[1]);
// 初始化
userDraw.initAge(age1, age2, age3, age4, age5);
userDraw.initSex(male, female);
return userDraw;
}
}
c.第三次MR作業,將資料寫入到Hbase
package userdrawputinhbase;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class UserDrawPutInHbaseMap extends Mapper<LongWritable, Text, Text, Text>{
Text k2 = new Text();
Text v2 = new Text();
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] splited = line.split("\\|");
k2.set(splited[1]);
v2.set(line);
context.write(k2, v2);
}
}
--------------------------------------------------------------------
package userdrawputinhbase;
import java.io.IOException;
import org.apache.hadoop.hbase.client.Durability;
import org.apache.hadoop.hbase.client.Mutation;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class UserDrawPutInHbaseReduce extends TableReducer<Text, Text, NullWritable>{
@SuppressWarnings("deprecation")
@Override
protected void reduce(Text k2, Iterable<Text> val,
Reducer<Text, Text, NullWritable,Mutation>.Context context)
throws IOException, InterruptedException {
for (Text v2 : val) {
String[] splited = v2.toString().split("\\|");
//rowkey
if(k2.toString().length()!=0){
Put put = new Put(Bytes.toBytes(k2.toString()));
//跳過寫入Hlog,提高寫入速度
put.setDurability(Durability.SKIP_WAL);
put.add(Bytes.toBytes("draw"), Bytes.toBytes("mdn"), Bytes.toBytes(splited[1]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("male"), Bytes.toBytes(splited[2]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("female"), Bytes.toBytes(splited[3]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("age1"), Bytes.toBytes(splited[4]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("age2"), Bytes.toBytes(splited[5]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("age3"), Bytes.toBytes(splited[6]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("age4"), Bytes.toBytes(splited[7]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("age5"), Bytes.toBytes(splited[8]));
context.write(NullWritable.get(), put);
}
}
}
}
4.工具類
a.Config.class
package util;
import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
public class Config {
static Properties properties;
static{
properties = new Properties();
//可以獲取資原始檔
InputStream inStream = Thread.currentThread().getContextClassLoader().getResourceAsStream("UserDraw.properties");
try {
properties.load(inStream);
} catch (IOException e) {
e.printStackTrace();
}
}
//UserDraw
public String Separator = properties.getProperty("Separator");
public String Date = properties.getProperty("Date");
public String MDN = properties.getProperty("MDN");
public String appID = properties.getProperty("appID");
public String count = properties.getProperty("count");
public String ProcedureTime = properties.getProperty("ProcedureTime");
//Hbase
public String consite = properties.getProperty("consite");
public String hbaseip = properties.getProperty("hbaseip");
public String coftime = properties.getProperty("coftime");
public String time = properties.getProperty("time");
public String tableDraw = properties.getProperty("tableDraw");
}
b.LoadHdfsTable.class
package util;
import java.io.BufferedReader;
import java.util.HashMap;
import java.util.Map;
public class LoadHdfsTable {
public static String appTab = "/home/appTab.txt";
private static Map<String, String[]> appMap = new HashMap<String, String[]>();
static {
try {
StringBuffer sb = new StringBuffer();
String line = null;
BufferedReader breader = ReadHdfsFile.fileReader(appTab);
while ((line = breader.readLine()) != null) {
String[] appArray = line.split("\\|");
// appName
sb.append(appArray[1]).append(",");
// 性別權重
sb.append(appArray[2]).append(",").append(appArray[3]).append(",");
// 年齡段權重
sb.append(appArray[4]).append(",").append(appArray[5]).append(",").append(appArray[6]).append(",");
sb.append(appArray[7]).append(",").append(appArray[8]);
String[] appToValueArray = sb.toString().split(",");
appMap.put(appArray[0], appToValueArray);
sb.delete(0, sb.length());
}
} catch (Exception e) {
e.printStackTrace();
}
}
public static Map<String, String[]> getAppMap() {
return appMap;
}
public static void setAppMap(Map<String, String[]> appMap) {
LoadHdfsTable.appMap = appMap;
}
}
c.ReadHdfsFile.classs
package util;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.InputStreamReader;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
/** 讀取檔案 **/
public class ReadHdfsFile {
public static BufferedReader fileReader(String fileName) throws Exception {
// Configuration conf = new Configuration();
// FileSystem fs = FileSystem.get(conf);
// FSDataInputStream in = fs.open(new Path(fileName));
// BufferedReader br = new BufferedReader(new InputStreamReader(in));
FileReader fr = new FileReader("D:\\share\\project\\userdraw\\appTab.txt");
return new BufferedReader(fr);
}
}
d.TextArrayWritable類
package util;
import java.util.ArrayList;
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.Text;
public class TextArrayWritable extends ArrayWritable {
public TextArrayWritable() {
super(Text.class);
}
public TextArrayWritable(String[] strings) {
super(Text.class);
Text[] texts = new Text[strings.length];
for (int i = 0; i < strings.length; i++) {
texts[i] = new Text(strings[i]);
}
set(texts);
}
public TextArrayWritable(ArrayList<String> strings) {
super(Text.class);
Text[] texts = new Text[strings.size()];
int i = 0;
for (String str : strings) {
texts[i] = new Text(str);
i++;
}
set(texts);
}
public ArrayList<String> toArrayList(String[] writables) {
ArrayList<String> arraylist = new ArrayList<String>();
for (String writable : writables) {
arraylist.add(writable.toString());
}
return arraylist;
}
public ArrayList<String> toArrayList() {
return toArrayList(super.toStrings());
}
}
三、資原始檔
-------------------------------------
1.Resources
[UserDraw.properties]
#使用者畫像配置檔案:
################################
#欄位分隔符號
Separator=\\|
#日期
Date=11
#手機號
MDN=0
#appID
appID=15
#計數
count=1
#使用時長
ProcedureTime=12
#Hbase資料庫配置檔案:
consite=hbase.zookeeper.quorum
hbaseip=192.168.0.4,192.168.0.5,192.168.0.6
coftime=dfs.socket.timeout
time=180000
tableDraw=user_draw
2.pom.xml依賴
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>groupId</groupId>
<artifactId>UserDraw</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.2.4</version>
</dependency>
</dependencies>
</project>
四、其他
------------------------------------------------
1.建立hbase表:
create 'user_draw',{NAME=>'draw',VERSIONS=>1,BLOCKCACHE=>true,BLOOMFILTER=>'ROW',COMPRESSION=>'SNAPPY'},
{SPLITS => ['/','+','0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w',
'x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']}
2.入庫hbase程式碼:
for (Text v2 : val) {
String[] splited = v2.toString().split("\\|");
//rowkey
if(k2.toString().length()!=0){
Put put = new Put(Bytes.toBytes(k2.toString()));
//跳過寫入Hlog,提高寫入速度
put.setDurability(Durability.SKIP_WAL);
put.add(Bytes.toBytes("draw"), Bytes.toBytes("mdn"), Bytes.toBytes(splited[1]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("male"), Bytes.toBytes(splited[2]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("female"), Bytes.toBytes(splited[3]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("age1"), Bytes.toBytes(splited[4]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("age2"), Bytes.toBytes(splited[5]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("age3"), Bytes.toBytes(splited[6]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("age4"), Bytes.toBytes(splited[7]));
put.add(Bytes.toBytes("draw"), Bytes.toBytes("age5"), Bytes.toBytes(splited[8]));
context.write(NullWritable.get(), put);
}
}
相關文章
- 大資料爬蟲專案實戰教程大資料爬蟲
- 倍智:物業專案經理人才畫像資料分析報告
- 3.0 阿里雲大資料專案實戰開發阿里大資料
- 入門Python資料分析最好的實戰專案(二)Python
- 入門Python資料分析最好的實戰專案(一)Python
- 專案實戰之gradle在實際專案中的使用Gradle
- 專案實戰之Rxjava、RxBinding在實際專案中的使用RxJava
- Docker實戰之執行專案Docker
- Flutter路由專案實戰之fluroFlutter路由
- 大資料專案實踐(一)——之HDFS叢集配置大資料
- 客快物流大資料專案(五十一):資料庫表分析 物流專案 資料庫表設計大資料資料庫
- dubbo專案實戰
- HarmonyOS 實戰專案
- LittleFox專案實戰
- python實戰專案Python
- 實戰專案之自動簡歷
- 專案實戰之元件化架構元件化架構
- go語言實戰教程:實戰專案資源匯入和專案框架搭建Go框架
- Flutter專案實戰(1):通用專案框架搭建Flutter框架
- 4.Flink實時專案之資料拆分
- 大資料專案實踐(五)——Hue安裝大資料
- Flutter上線專案實戰——Vap視訊動畫Flutter動畫
- pyspider 實戰專案之爬取去哪兒IDE
- 資料分析專案精講!電商平臺人、貨、場分析實戰,附資料來源
- TypeScript + 大型專案實戰TypeScript
- AMP專案實戰分享
- 資料分析專案(一)——爬蟲篇爬蟲
- FastAPI專案實戰: 個人部落格專案的APIASTAPI
- Vue + Spring Boot 專案實戰(四):資料庫的引入VueSpring Boot資料庫
- NetCore專案實戰篇03---HTTP Patch 更新資料NetCoreHTTP
- TensorFlow系列專題(六):實戰專案Mnist手寫資料集識別
- 外掛化之VirtualApk實戰一:專案配置APK
- 專案分析
- 大資料智慧:金融行業使用者畫像最佳實踐大資料行業
- 淺談 Angular 專案實戰Angular
- 爬蟲實戰專案集合爬蟲
- 專案實戰小問題:
- Jenkins部署Python專案實戰JenkinsPython