Hadoop的生態系統 - KEYWORD

leniz發表於2016-12-19
大資料不單單指面臨的資料巨大,其實工具圖也是頗為壯觀。 每次閱讀都是看到各種各樣的新詞(可能對老人來說是舊詞),所以我想把這些詞記錄下來。
儘量是那種一句話可以解釋的,獲得一個直觀的感受。

主框架:
  • Hadoop Common: The common utilities that support the other Hadoop modules. ( 更像是介面集合 )
  • Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data.  ( 底層的檔案分佈系統 )
  • Hadoop YARN: A framework for job scheduling and cluster resource management. (這個是Hadoop 2版本後才出現的事務管理框架,Yet Another Resource Negotiator)
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets. (分散式資料處理模型和執行環境)

衍生產品:

  • Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.  (Web式的Hadoop管理平臺)
  • Avro: A data serialization system. (一種序列化系統,用於支援高效、跨語言的RPC和持久化資料儲存)
  • Cassandra: A scalable multi-master database with no single points of failure.
  • Chukwa: A data collection system for managing large distributed systems.
  • HBase: A scalable, distributed database that supports structured data storage for large tables. (一種分散式的,按列儲存的資料庫。HBase使用HDFS作為底層儲存,同時支援MapReduce的批量式計算和點查詢-隨機讀取)
  • Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.(一種分散式的,按列儲存的資料倉儲, HIVE管理HDFS中的儲存資料,提供SQL訪問)
  • Mahout: A Scalable machine learning and data mining library. (機器學習的運用庫)
  • Pig: A high-level data-flow language and execution framework for parallel computation. (資料流語言。執行在MapReduce和HDFS之上)
  • Spark: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
  • Tez: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive?, Pig? and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop? MapReduce as the underlying execution engine.
  • ZooKeeper: A high-performance coordination service for distributed applications.
  • Sqoop: 該工具用於在結構化資料儲存(如關係型資料庫)和HDFS之間高效批量傳輸。 (ETL工具)
  • Oozie: 該服務用於執行和排程Hadoop作業(如MapReduce,Pig,Hive和Sqoop作業) (比較類似作業監控系統

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/554557/viewspace-2130905/,如需轉載,請註明出處,否則將追究法律責任。

相關文章