hadoop權威指南上 天氣例子測試執行

後開啟撒打發了發表於2017-10-15

一、先程式碼準備好。 程式碼在本文後面
我的hadoop路勁是/Users/chenxun/software/hadoop-2.8.1 所以我在這個建了個自己資料夾myclass目錄,把程式碼放到這個目錄下面。如圖所示:

[chenxun@chen.local 17:21 ~/software/hadoop-2.8.1/myclass]$ll
total 64
-rw-r--r--  1 chenxun  staff  1017 10 15 15:36 MaxTemperature.java
-rw-r--r--  1 chenxun  staff   977 10 15 15:39 MaxTemperatureMapper.java
-rw-r--r--  1 chenxun  staff   579 10 15 15:39 MaxTemperatureReducer.java

二、配置程式碼編譯環境classpath的值
配置好java環境和hadoop編譯需要的hadoop依賴jar包

vim ~/.bash_profile

JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home
CLASSPAHT=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export HADOOP_HOME=/Users/chenxun/software/hadoop-2.8.1
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

for f in $HADOOP_HOME/share/hadoop/common/hadoop-*.jar;do
     export CLASSPATH=$CLASSPATH:$f
done
for f in $HADOOP_HOME/share/hadoop/hdfs/hadoop-*.jar;do
     export CLASSPATH=$CLASSPATH:$f
done
for f in $HADOOP_HOME/share/hadoop/mapreduce/hadoop-*.jar;do
     export CLASSPATH=$CLASSPATH:$f
done
for f in $HADOOP_HOME/share/hadoop/yarn/hadoop-*.jar;do
     export CLASSPATH=$CLASSPATH:$f
done

export CLASSPATH=$CLASSPATH:$HADOOP_HOME/share/common/lib:$HADOOP_HOME/share/hdfs/lib:$HADOOP_HOME/share/mapreduce/lib:$HADOOP_HOME/share/tools/lib:$HADOOP_HOME/share/yarn/lib

source ~/.bash_profile 

三、編譯程式碼和打包成jar包

javac *.java

jar -cvf MaxTemperature.jar .

[chenxun@chen.local 17:21 ~/software/hadoop-2.8.1/myclass]$ll
total 64
-rw-r--r--  1 chenxun  staff  1413 10 15 15:40 MaxTemperature.class
-rw-r--r--  1 chenxun  staff  6333 10 15 16:18 MaxTemperature.jar
-rw-r--r--  1 chenxun  staff  1017 10 15 15:36 MaxTemperature.java
-rw-r--r--  1 chenxun  staff  1876 10 15 15:40 MaxTemperatureMapper.class
-rw-r--r--  1 chenxun  staff   977 10 15 15:39 MaxTemperatureMapper.java
-rw-r--r--  1 chenxun  staff  1687 10 15 15:40 MaxTemperatureReducer.class
-rw-r--r--  1 chenxun  staff   579 10 15 15:39 MaxTemperatureReducer.java

四、準備資料
在網站下載hadoop天氣資料:ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2010/
我把天氣資料放到file.txt中:資料如下
0029227070999991901122820004+62167+030650FM-12+010299999V0200501N003119999999N0000001N9-01561+99999100061ADDGF108991999999999999999999
0029227070999991901122906004+62167+030650FM-12+010299999V0200901N003119999999N0000001N9-01501+99999100181ADDGF108991999999999999999999
0029227070999991901122913004+62167+030650FM-12+010299999V0200701N002119999999N0000001N9-01561+99999100271ADDGF104991999999999999999999
0029227070999991901122920004+62167+030650FM-12+010299999V0200701N002119999999N0000001N9-02001+99999100501ADDGF107991999999999999999999
0029227070999991901123006004+62167+030650FM-12+010299999V0200701N003119999999N0000001N9-01501+99999100791ADDGF108991999999999999999999
0029227070999991901123013004+62167+030650FM-12+010299999V0200901N003119999999N0000001N9-01331+99999100901ADDGF108991999999999999999999
0029227070999991901123020004+62167+030650FM-12+010299999V0200701N002119999999N0000001N9-01221+99999100831ADDGF108991999999999999999999
0029227070999991901123106004+62167+030650FM-12+010299999V0200701N004119999999N0000001N9-01391+99999100521ADDGF108991999999999999999999
0029227070999991901123113004+62167+030650FM-12+010299999V0200701N003119999999N0000001N9-01391+99999100321ADDGF108991999999999999999999
0029227070999991901123120004+62167+030650FM-12+010299999V0200701N004119999999N0000001N9-01391+99999100281ADDGF108991999999999999999999

建立hdfs資料輸入檔案路勁

[chenxun@chen.local 16:42 ~/software/hadoop-2.8.1/myclass]$hadoop fs -mkdir -p /user/chenxun/data
[chenxun@chen.local 16:42 ~/software/hadoop-2.8.1/myclass]$hadoop fs -ls /user/chenxun
Found 3 items
drwxr-xr-x   - chenxun supergroup          0 2017-10-15 16:42 /user/chenxun/data
drwxr-xr-x   - chenxun supergroup          0 2017-10-14 01:54 /user/chenxun/input
drwxr-xr-x   - chenxun supergroup          0 2017-10-14 01:55 /user/chenxun/output

把天氣資料上傳到資料輸入路勁下面:

[chenxun@chen.local 16:47 ~/software/hadoop-2.8.1/myclass]$hadoop fs -put ./data/file.txt /user/chenxun/data
[chenxun@chen.local 16:47 ~/software/hadoop-2.8.1/myclass]$hadoop fs -ls /user/chenxun/data
Found 1 items
-rw-r--r--   1 chenxun supergroup       9855 2017-10-15 16:47 /user/chenxun/data/file.txt

執行程式碼:

[chenxun@chen.local 17:10 ~/software/hadoop-2.8.1/myclass]$hadoop jar MaxTemperature.jar  MaxTemperature /user/chenxun/data/file.txt /user/chenxun/dataoutput
。。。
。。。。。
[chenxun@chen.local 17:11 ~/software/hadoop-2.8.1/myclass]$hadoop fs -ls /user/chenxun/dataoutput
Found 2 items
-rw-r--r--   1 chenxun supergroup          0 2017-10-15 17:11 /user/chenxun/dataoutput/_SUCCESS
-rw-r--r--   1 chenxun supergroup          9 2017-10-15 17:11 /user/chenxun/dataoutput/part-r-00000
[chenxun@chen.local 17:11 ~/software/hadoop-2.8.1/myclass]$
[chenxun@chen.local 17:11 ~/software/hadoop-2.8.1/myclass]$
[chenxun@chen.local 17:12 ~/software/hadoop-2.8.1/myclass]$hadoop fs -cat /user/chenxun/dataoutput/part-r-00000
1901    -56

程式碼:
MaxTemperature.java

import java.io.IOException;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemperature {

  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      System.err.println("Usage: MaxTemperature <input path> <output path>");
      System.exit(-1);
    }

    Job job = new Job();
    job.setJarByClass(MaxTemperature.class);
    job.setJobName("Max temperature");

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setMapperClass(MaxTemperatureMapper.class);
    job.setReducerClass(MaxTemperatureReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

MaxTemperatureMapper.java

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MaxTemperatureMapper
  extends Mapper<LongWritable, Text, Text, IntWritable> {

  private static final int MISSING = 9999;

  @Override
  public void map(LongWritable key, Text value, Context context)
      throws IOException, InterruptedException {

    String line = value.toString();
    String year = line.substring(15, 19);
    int airTemperature;
    if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
      airTemperature = Integer.parseInt(line.substring(88, 92));
    } else {
      airTemperature = Integer.parseInt(line.substring(87, 92));
    }
    String quality = line.substring(92, 93);
    if (airTemperature != MISSING && quality.matches("[01459]")) {
      context.write(new Text(year), new IntWritable(airTemperature));
    }
  }
}

MaxTemperatureReducer.java

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer
  extends Reducer<Text, IntWritable, Text, IntWritable> {

  @Override
  public void reduce(Text key, Iterable<IntWritable> values,
      Context context)
      throws IOException, InterruptedException {

    int maxValue = Integer.MIN_VALUE;
    for (IntWritable value : values) {
      maxValue = Math.max(maxValue, value.get());
    }
    context.write(key, new IntWritable(maxValue));
  }
}

相關文章