hadoop官網翻譯第一天

xiaoliuyiting發表於2018-12-24

The Apache™ Hadoop

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

Apache Hadoop專案是以可靠、可擴充套件和分散式計算為目的而發展而來的開源軟體

The Apache Hadoop software library is a framework that allows for the distributed processing(處理) of large data sets across clusters of computers using simple programming models. It is designed to scale up(放大,擴充套件) from single servers to thousands of machines, each offering local computation and storage. Rather than rely on (依賴) hardware to deliver (提供) high-availability(高可用性), the library itself is designed to detect(發現,檢查) and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Apache Hadoop 軟體庫是一個允許在叢集計算機上使用簡單的程式設計模型來進行大資料集的分散式任務的框架,它是設計來從單伺服器擴充套件到成千臺機器上,每個機器提供本地的計算和儲存。相比於依賴硬體來實現高可用,該庫自己設計來檢查和管理應用部署的失敗情況,因此是在叢集計算機之上提供高可用的服務,每個節點都有可能失敗。

 

Modules(元件)

The project includes these modules:

  • Hadoop Common: The common utilities that support the other Hadoop modules.

        通用的工具來支援其他的Hadoop模組

  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput (高吞吐量)access to application data.

        一個提供高可用獲取應用資料的分散式檔案系統

  • Hadoop YARN: A framework for job scheduling(排程) and cluster resource management.

       Job排程和叢集資源管理的框架

  • Hadoop MapReduce: A YARN-based system for parallel(並行) processing of large data sets.

       基於YARN系統的並行處理大資料集的程式設計模型

        Hadoop的物件儲存。

 

 Related projects 相關專案

Other Hadoop-related projects at Apache include:

  • Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.

       一個基於web的工具,用來供應、管理和監測Apache Hadoop叢集包括支援Hadoop HDFS、Hadoop MapReduce、Hive、           HCatalog、HBase、ZooKeeper、Oozie、Pig和Sqoop。Ambari 也提供一個可視的儀表盤來檢視叢集的健康狀態(比如熱              圖),並且能夠以一種使用者友好的方式根據其特點視覺化的檢視MapReduce、pig和Hive 應用來診斷其效能特徵。

  • Avro™: A data serialization system.

    資料序列化系統

  • Cassandra™: A scalable multi-master database with no single points of failure.

    可擴充套件的多主節點資料庫,而且沒有單節點失敗情況

  • Chukwa™: A data collection system for managing large distributed systems.

      管理大型分散式系統的資料收集系統

  • HBase™: A scalable, distributed database that supports structured data storage for large tables.

     一個可擴充套件的分散式資料庫,支援大表的結構化資料儲存

  • Hive™: A data warehouse infrastructure that provides data summarization(概述) and ad hoc querying.

      一個提供資料概述和AD組織查詢的資料倉儲

  • Mahout™: A Scalable machine learning and data mining library.

       可擴充套件大的機器學習和資料探勘庫

  • Pig™: A high-level data-flow language and execution framework for parallel computation.

      一個支援平行計算的高階的資料流語言和執行框架

  • Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.

         一個快速通用的Hadoop資料的計算引擎。spark 提供一個簡單和富有表現力的程式設計模型並支援多領域應用,包括ETL、機            器學習、流處理 和圖計算。

  • Tez™: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine.

       一個通用的資料流處理框架,構建在Hadoop YARN上,提供一個有力的靈活的引擎來執行一個任意的DAG任務來處理資料          (批處理和互動式兩種方式)。Tez 可以被Hive、Pig和其他Hadoop生態系統框架和其他商業軟體(如:ETL工具)使用,用來          替代Hadoop MapReduce 作為底層的執行引擎。

  • ZooKeeper™: A high-performance coordination service for distributed applications.

      一個應用於分散式應用的高效能的協調服務

 

加今日份的小可愛

 

 

 

 

相關文章