[轉]Writing an Hadoop MapReduce Program in Python

BloodD發表於2014-09-10

原文網址 : https://segmentfault.com/a/1190000000663504

mapper.py

#!/usr/bin/env python
"""A more advanced Mapper, using Python iterators and generators."""

import sys

def read_input(file):
    for line in file:
        # split the line into words
        yield line.split()

def main(separator='\t'):
    # input comes from STDIN (standard input)
    data = read_input(sys.stdin)
    for words in data:
        # write the results to STDOUT (standard output);
        # what we output here will be the input for the
        # Reduce step, i.e. the input for reducer.py
        #
        # tab-delimited; the trivial word count is 1
        for word in words:
            print '%s%s%d' % (word, separator, 1)

if __name__ == "__main__":
    main()

reducer.py

#!/usr/bin/env python
"""A more advanced Reducer, using Python iterators and generators."""

from itertools import groupby
from operator import itemgetter
import sys

def read_mapper_output(file, separator='\t'):
    for line in file:
        yield line.rstrip().split(separator, 1)

def main(separator='\t'):
    # input comes from STDIN (standard input)
    data = read_mapper_output(sys.stdin, separator=separator)
    # groupby groups multiple word-count pairs by word,
    # and creates an iterator that returns consecutive keys and their group:
    #   current_word - string containing a word (the key)
    #   group - iterator yielding all ["&lt;current_word&gt;", "&lt;count&gt;"] items
    for current_word, group in groupby(data, itemgetter(0)):
        try:
            total_count = sum(int(count) for current_word, count in group)
            print "%s%s%d" % (current_word, separator, total_count)
        except ValueError:
            # count was not a number, so silently discard this item
            pass

if __name__ == "__main__":
    main()

轉自：http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

Hadoop（三）通過C#/python實現Hadoop MapReduce
2022-05-01
HadoopC#Python
Hadoop學習——MapReduce
2019-04-06
Hadoop
hadoop_MapReduce yarn
2020-11-11
HadoopYarn
Hadoop（十四）MapReduce概述
2024-09-18
Hadoop
在Docker容器中使用Hadoop執行Python MapReduce作業
2022-05-12
DockerHadoopPython
Hadoop面試題之MapReduce
2021-12-23
Hadoop面試題
Hadoop 專欄 - MapReduce 入門
2021-01-21
Hadoop
MapReduce 示例：減少 Hadoop MapReduce 中的側連線
2021-09-17
Hadoop
hadoop的mapreduce串聯執行
2018-09-01
Hadoop
從分治演算法到 Hadoop MapReduce
2018-11-23
演算法Hadoop
Hadoop學習（二）——MapReduce\Yarn架構
2019-02-20
HadoopYarn架構
Hadoop（十九）MapReduce OutputFormat 資料壓縮
2024-09-19
HadoopORM
Hadoop面試題總結（三）——MapReduce
2021-10-16
Hadoop面試題
Hadoop的mapreduce出現問題，報錯The auxService:mapreduce_shuffle does not exist
2020-12-24
HadoopUX
Hadoop之MapReduce2架構設計
2018-05-28
Hadoop架構
談談Hadoop MapReduce和Spark MR實現
2020-07-27
HadoopSpark
hadoop之mapreduce.input.fileinputformat.split.minsize引數
2018-10-24
HadoopORM
Hadoop 學習系列（四）之 MapReduce 原理講解
2019-03-04
Hadoop
Hadoop之MapReduce2基礎梳理及案例
2018-05-28
Hadoop
Hadoop框架：MapReduce基本原理和入門案例
2020-11-22
Hadoop框架
Hadoop 三劍客之 —— 分散式計算框架 MapReduce
2019-06-27
Hadoop分散式框架
Hadoop學習第四天--MapReduce提交過程
2024-08-10
Hadoop
Writing on important details
2024-03-30
ImportAI
Narrative writing revision
2024-05-20
Spark與Hadoop MapReduce相比，有哪些優點你知道嗎？
2019-01-30
SparkHadoop
Narrative writing about a person
2024-03-30
Preparation for MCM/ICM Writing
2020-11-10
基於Python實現MapReduce
2024-05-14
Python
python--- 之The program 'python' can be found in the following packages: * python-minimal * python3
2019-03-18
PythonPackage
Hadoop大資料實戰系列文章之Mapreduce 計算框架
2020-11-10
Hadoop大資料框架
【轉載】MapReduce程式設計 Intellij Idea配置MapReduce程式設計環境
2020-04-07
程式設計IntelliJIdea
大型資料集處理之道：深入瞭解Hadoop及MapReduce原理
2023-10-13
Hadoop
Profiling an Assembly Program
2024-11-05
小白學習大資料測試之hadoop hdfs和MapReduce小實戰
2018-09-03
大資料Hadoop
Script of Narrative Writing from different point of view
2024-04-28
View
Writing your first Django app, part 1
2024-08-25
DjangoAPP
Spark Driver Program剖析
2020-09-19
Spark
【Cloud Computing】Hadoop環境安裝、基本命令及MapReduce字數統計程式
2021-11-28
CloudHadoop
hadoop archive合併小檔案並進行mapreduce來減少map的數量
2018-10-25
HadoopHive

[轉]Writing an Hadoop MapReduce Program in Python

mapper.py

reducer.py

相關文章