hadoop基準測試_Hadoop TeraSort基準測試

cuma2369發表於2020-07-29

原文網址 : https://blog.csdn.net/cuma2369/article/details/107665444

hadoop基準測試

TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark.

TeraSort是Hadoop廣泛使用的基準之一。 Hadoop的發行版包含輸入生成器和排序實現：TeraGen生成輸入，而TeraSort進行排序。在這裡，我們提供了一個使用Hadoop TeraSort基準測試的簡短教程。

TeraGen generates random data that can be used as input data for a subsequent running of TeraSort.

TeraGen生成隨機資料，可用作後續TeraSort執行的輸入資料。

通過TeraGen生成輸入 (Generate input by TeraGen)

The syntax for TeraGen:

TeraGen的語法：

$ hadoop jar hadoop-*examples*.jar teragen 
<number of 100-byte rows> <output dir>

To make the TeraGen run on multiple nodes with multiple tasks, you may need to specify the number of map tasks (30 here as an example; for Hadoop 2):

為了使TeraGen在具有多個任務的多個節點上執行，您可能需要指定對映任務的數量（這裡以30個為例；對於Hadoop 2）：

$ hadoop -D mapreduce.job.maps 30 
jar hadoop-*examples*.jar teragen 
<number of 100-byte rows> <output dir>

The number of mappers depends on the number of rows you will generate and the number of nodes you have. For more information on how to set the number of mappers and reducers, please check this post.

對映器的數量取決於您將生成的行數和擁有的節點數。有關如何設定對映器和縮減器數量的更多資訊，請檢查此帖子。

執行TeraSort (Run TeraSort)

After the data is generated, run the sort by TeraSort

生成資料後，按TeraSort執行排序

$ hadoop jar hadoop-*examples*.jar terasort 
<input dir> <output dir>

You may also need to set the number of mappers and reducers for better performance.

您可能還需要設定對映器和化簡器的數量，以獲得更好的效能。

驗證TeraSort排序後的輸出資料 (Validate the sorted output data of TeraSort)

TeraValidate ensures that the output data of TeraSort is globally sorted.

TeraValidate確保TeraSort的輸出資料是全域性排序的。

The syntax for TeraValidate:

TeraValidate的語法：

$ hadoop jar hadoop-*examples*.jar teravalidate 
<output dir> <terasort-validate dir>

翻譯自: https://www.systutorials.com/hadoop-terasort-benchmark/

hadoop基準測試

基準測試
2021-04-15
TGI 基準測試
2024-06-25
benchmark 基準測試
2022-04-08
MinkowskiEngine基準測試
2021-01-04
MYSQL 效能測試方法 - 基準測試（benchmarking）
2024-11-06
MySql
[轉帖]sysbench基準測試
2024-05-05
【基準測試】BenchmarkDotNet介紹
2020-05-17
MySQL學習 - 基準測試
2019-05-02
MySql
JMH- benchmark基準測試
2024-10-10
postgresql:pgbench基準效能測試
2020-12-08
SQL
Lettuce和Jedis的基準測試
2018-10-17
ubuntu 快速測試 cpu 基準水平
2024-11-11
Ubuntu
Linkerd和Istio基準測試 - linkerd
2021-12-01
技術基礎 | Apache Cassandra 4.0基準測試
2021-01-23
Apache
公有云RDS-MySQL基準測試
2018-11-06
MySql
使用 JMH 做 Kotlin 的基準測試
2018-12-14
Kotlin
高通SNPE - 基準測試概述（3）
2020-10-28
資料庫基準測試工具 sysbench
2019-05-06
資料庫
Go 語言基準測試入門
2024-10-14
Go
基準測試：HTTP/3 有多快？ - requestmetrics
2021-12-16
HTTP
利用sysbench進行MySQL OLTP基準測試
2018-07-01
MySql
【總結】簡述 MySQL 基準測試工具
2019-07-17
MySql
[總結] 簡述 MySQL 基準測試工具
2019-07-17
MySql
Java基準效能測試--JMH使用介紹
2021-06-29
Java
Hadoop測試常見問題和測試方法
2019-07-29
Hadoop
精準測試
2020-10-10
MySQL效能基準測試對比：5.7 VS 8.0
2019-03-07
MySql
基於 AI 大模型的精準測試分享
2024-04-23
AI大模型
VMmark 4.0.1 - 虛擬化平臺基準測試
2024-07-23
Python 3.11效能基準測試看起來很棒 - Phoronix
2022-06-07
Python
基於TPC-C基準的Python ORM的效能測試
2020-07-27
PythonORM
測試標準1
2019-04-09
使用Sysbench對滴滴雲MySQL進行基準測試
2020-07-30
MySql
如何設計一個流計算基準測試？
2020-09-17
聊一聊資料庫基準測試那些事
2023-01-06
資料庫
大資料測試之hadoop初探
2019-08-07
大資料Hadoop
華納雲：如何使用HammerDB進行MySQL基準測試
2023-12-19
MySql
微服務基準測試：Chronicle Queue比Kafka快750倍？
2022-01-27
微服務Kafka

hadoop基準測試_Hadoop TeraSort基準測試

通過TeraGen生成輸入 (Generate input by TeraGen)

執行TeraSort (Run TeraSort)

驗證TeraSort排序後的輸出資料 (Validate the sorted output data of TeraSort)

相關文章