hadoop基準測試_Hadoop TeraSort基準測試

cuma2369發表於2020-07-29

hadoop基準測試

TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark.

TeraSortHadoop廣泛使用的基準之一。 Hadoop的發行版包含輸入生成器和排序實現:TeraGen生成輸入,而TeraSort進行排序。 在這裡,我們提供了一個使用Hadoop TeraSort基準測試的簡短教程

TeraGen generates random data that can be used as input data for a subsequent running of TeraSort.

TeraGen生成隨機資料,可用作後續TeraSort執行的輸入資料。

通過TeraGen生成輸入 (Generate input by TeraGen)

The syntax for TeraGen:

TeraGen的語法:

$ hadoop jar hadoop-*examples*.jar teragen 
<number of 100-byte rows> <output dir>

To make the TeraGen run on multiple nodes with multiple tasks, you may need to specify the number of map tasks (30 here as an example; for Hadoop 2):

為了使TeraGen在具有多個任務的多個節點上執行,您可能需要指定對映任務的數量(這裡以30個為例;對於Hadoop 2):

$ hadoop -D mapreduce.job.maps 30 
jar hadoop-*examples*.jar teragen 
<number of 100-byte rows> <output dir>

The number of mappers depends on the number of rows you will generate and the number of nodes you have. For more information on how to set the number of mappers and reducers, please check this post.

對映器的數量取決於您將生成的行數和擁有的節點數。 有關如何設定對映器和縮減器數量的更多資訊,請檢查此帖子

執行TeraSort (Run TeraSort)

After the data is generated, run the sort by TeraSort

生成資料後,按TeraSort執行排序

$ hadoop jar hadoop-*examples*.jar terasort 
<input dir> <output dir>

You may also need to set the number of mappers and reducers for better performance.

您可能還需要設定對映器和化簡器的數量,以獲得更好的效能。

驗證TeraSort排序後的輸出資料 (Validate the sorted output data of TeraSort)

TeraValidate ensures that the output data of TeraSort is globally sorted.

TeraValidate確保TeraSort的輸出資料是全域性排序的。

The syntax for TeraValidate:

TeraValidate的語法:

$ hadoop jar hadoop-*examples*.jar teravalidate 
<output dir> <terasort-validate dir>

翻譯自: https://www.systutorials.com/hadoop-terasort-benchmark/

hadoop基準測試

相關文章