Hadoop的生態系統 - KEYWORD
大資料不單單指面臨的資料巨大,其實工具圖也是頗為壯觀。 每次閱讀都是看到各種各樣的新詞(可能對老人來說是舊詞),所以我想把這些詞記錄下來。
儘量是那種一句話可以解釋的,獲得一個直觀的感受。
主框架:
儘量是那種一句話可以解釋的,獲得一個直觀的感受。
主框架:
-
Hadoop Common:
The common utilities that support the other Hadoop modules. ( 更像是介面集合 )
-
Hadoop Distributed File System (HDFS):
A distributed file system that provides high-throughput access to application data. ( 底層的檔案分佈系統 )
-
Hadoop YARN:
A framework for job scheduling and cluster resource management. (這個是Hadoop 2版本後才出現的事務管理框架,Yet Another Resource Negotiator)
-
Hadoop MapReduce:
A YARN-based system for parallel processing of large data sets. (分散式資料處理模型和執行環境)
衍生產品:
-
Ambari:
A web-based tool for provisioning, managing, and monitoring Apache Hadoop
clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive,
HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides
a dashboard for viewing cluster health such as heatmaps and ability to
view MapReduce, Pig and Hive applications visually alongwith features to
diagnose their performance characteristics in a user-friendly manner. (Web式的Hadoop管理平臺)
-
Avro:
A data serialization system. (一種序列化系統,用於支援高效、跨語言的RPC和持久化資料儲存)
- Cassandra: A scalable multi-master database with no single points of failure.
- Chukwa: A data collection system for managing large distributed systems.
-
HBase:
A scalable, distributed database that supports structured data storage for large tables. (一種分散式的,按列儲存的資料庫。HBase使用HDFS作為底層儲存,同時支援MapReduce的批量式計算和點查詢-隨機讀取)
- Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.(一種分散式的,按列儲存的資料倉儲, HIVE管理HDFS中的儲存資料,提供SQL訪問)
-
Mahout:
A Scalable machine learning and data mining library. (機器學習的運用庫)
-
Pig:
A high-level data-flow language and execution framework for parallel computation. (資料流語言。執行在MapReduce和HDFS之上)
- Spark: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
- Tez: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive?, Pig? and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop? MapReduce as the underlying execution engine.
- ZooKeeper: A high-performance coordination service for distributed applications.
- Sqoop: 該工具用於在結構化資料儲存(如關係型資料庫)和HDFS之間高效批量傳輸。 (ETL工具)
-
Oozie: 該服務用於執行和排程Hadoop作業(如MapReduce,Pig,Hive和Sqoop作業) (比較類似作業監控系統
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/554557/viewspace-2130905/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Hadoop 生態系統Hadoop
- hadoop生態系統Hadoop
- hadoop 之Hadoop生態系統Hadoop
- 初入Hadoop生態系統Hadoop
- Hadoop工具生態系統指南Hadoop
- Hadoop生態系統介紹Hadoop
- [O'Reilly精品圖書推薦]Hadoop生態系統Hadoop
- Hadoop生態系統各元件與Yarn的相容性如何?Hadoop元件Yarn
- Hadoop高階資料分析 使用Hadoop生態系統設計和構建大資料系統Hadoop大資料
- NoSQL生態系統SQL
- 產品的生態系統
- Hadoop演進與Hadoop生態Hadoop
- Hadoop生態圖譜Hadoop
- 【轉】Nosql生態系統SQL
- 作業系統:計算機的生態系統作業系統計算機
- Hadoop生態圈一覽Hadoop
- Rust 生態系統的非官方指南Rust
- 豐富的包(packages)生態系統Package
- 國產作業系統的生態之路作業系統
- LinkedIn的生態系統–資訊圖
- 雲端計算生態系統
- 集團資訊生態系統
- Laravel生態系統中EcosystemLaravel
- Hadoop生態系統應用狀況大調查:網際網路篇!Hadoop
- Hadoop 基礎之生態圈Hadoop
- 第一章:Hadoop生態系統及執行MapReduce任務介紹!Hadoop
- IT十年-大資料系列講解之hadoop生態系統及版本演化大資料Hadoop
- 圖解:RTB廣告生態系統圖解
- 基於Hadoop生態系統的一種高效能資料儲存格式CarbonData(效能篇)Hadoop
- 一步一步學習大資料:Hadoop 生態系統與場景大資料Hadoop
- .NET 生態系統的蛻變之 .NET 6
- 一、hadoop生態圈搭建(資源)Hadoop
- 商業生態系統角色定位與系統整合
- 聊聊鴻蒙系統與開發者生態前景鴻蒙
- Laravel 生態系統Forge Vapor Nova EnvoyerLaravelVapor
- 全方位解析俄語系勒索軟體的生態系統
- [譯] 構建未來的設計生態系統
- SuperAwesome:兒童遊戲生態系統是如何改變的遊戲