什麼是 Apache Hadoop?
什麼是 Apache Hadoop ?
Apache? Hadoop?專案是為可靠、可擴充套件及分散式計算而開發的開源軟體。
Apache Hadoop軟體庫是一個允許使用簡單程式設計模型對叢集計算機內的大資料集進行分散式處理的框架,她被設計成可以從單一伺服器到成千上萬的伺服器的縱向擴充套件,這些伺服器提供本地計算及儲存。
不是依靠硬體上提供高可用性,程式碼庫用來檢測和處理應用層的失敗,因此將在計算機叢集的頂層提供高可用的服務,其中的每個節點都允許失效。
該專案包含以下模組:
- Hadoop Common: 支援其他Hadoop模組的通用工具。
- Hadoop Distributed File System (HDFS?):訪問應用時提供高吞吐量的分散式檔案系統。
- Hadoop YARN: 負責作業排程和叢集管理的框架。
- Hadoop MapReduce:基於YARN的對大資料集進行並行處理的系統。
- Ambari?: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
- Avro ?: 一個資料序列化系統。
- Cassandra?: 一個支援可擴充套件的多主節點,不存在單點故障的資料庫。
- Chukwa?: 管理分大型分散式系統的資料集合系統。
- HBase?: 一個可擴充套件的分散式資料庫,該資料庫針對大表支援結構資料儲存。
- Hive?: 一個提供資料彙總和熱點查詢的資料倉儲架構。
- Mahout?:一個可擴充套件的機器學習和資料探勘庫。
- Pig?: 為平行計算提供高層次資料流語言和執行框架。
- Spark?:為Hadoop資料提供的快速、通用的計算引擎。她提供簡單的程式設計模型,該模型支援各種應用,包括ETL,機器學習,流處理,圖形計算。
- Tez?:一個廣義的資料流程式設計框架,建立在Hadoop YARN,它提供了一個強大的和靈活的引擎來執行任意任務的任意(批處理和互動式的用例)。
- ZooKeeper?:為分散式應用提供高效能的協調服務。
原文參考如下:
What Is Apache Hadoop?
The Apache? Hadoop? project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
The project includes these modules:
- Hadoop Common: The common utilities that support the other Hadoop modules.
- Hadoop Distributed File System (HDFS?): A distributed file system that provides high-throughput access to application data.
- Hadoop YARN: A framework for job scheduling and cluster resource management.
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Other Hadoop-related projects at Apache include:
- Ambari?: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
- Avro?: A data serialization system.
- Cassandra?: A scalable multi-master database with no single points of failure.
- Chukwa?: A data collection system for managing large distributed systems.
- HBase?: A scalable, distributed database that supports structured data storage for large tables.
- Hive?: A data warehouse infrastructure that provides data summarization and ad hoc querying.
- Mahout?: A Scalable machine learning and data mining library.
- Pig?: A high-level data-flow language and execution framework for parallel computation.
- Spark?: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
- Tez?: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive?, Pig? and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop? MapReduce as the underlying execution engine.
- ZooKeeper?: A high-performance coordination service for distributed applications.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23628945/viewspace-1081101/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 什麼是Hadoop?Hadoop
- 什麼是 Apache Kafka?ApacheKafka
- 《openstack 和hadoop的區別是什麼?》Hadoop
- apache伺服器的特點是什麼Apache伺服器
- What Is Apache HadoopApacheHadoop
- 邊緣計算|Hadoop——邊緣計算和Hadoop是什麼關係?Hadoop
- 大資料與Hadoop之間是什麼關係?大資料Hadoop
- Hadoop 擅長什麼?Hadoop
- 什麼是cookie,什麼是sessionCookieSession
- 什麼是DNS,什麼是HostsDNS
- 什麼是模式? 什麼是框架?模式框架
- 這是什麼這是什麼
- 大資料之hadoop / hive / hbase 的區別是什麼?有什麼應用場景?大資料HadoopHive
- 什麼是WebAuthn、FIDO 是什麼?Web
- ITIL是什麼意思?ITIL是什麼?
- 什麼是跨域,什麼是同源跨域
- # Apache SeaTunnel 究竟是什麼?Apache
- 什麼是.NET平臺、什麼是c#、什麼是ASP.NET。C#ASP.NET
- ftp是什麼,ftp是什麼東西?FTP
- Java是什麼_Java是做什麼的?Java
- 什麼是正向代理?什麼是反向代理?
- NLA是什麼?NLA的原理是什麼?
- ###什麼是Linux核心###什麼是MMULinux
- Apache Hadoop Day5ApacheHadoop
- Apache Hadoop 入門教程ApacheHadoop
- 為什麼要有 Servlet ,什麼是 Servlet 容器,什麼是 Web 容器?ServletWeb
- 替代品不少,大家堅持用Hadoop的原因是什麼?Hadoop
- LAMP環境中Apache,MySQL,PHP的配置檔案路徑是什麼LAMPApacheMySqlPHP
- Nginx和Apache有什麼區別?NginxApache
- 什麼是塊元素?什麼是行內元素?
- 什麼是API介面,具體是什麼意思?API
- 什麼是框架?為什麼說 Angular 是框架?框架Angular
- 什麼是Unicode,什麼是UTF-8Unicode
- 大資料和Hadoop什麼關係?為什麼大資料要學習Hadoop?大資料Hadoop
- 什麼是藍海?什麼是紅海?什麼是網際網路思維?
- 從Nginx、Apache工作原理看為什麼Nginx比Apache高效NginxApache
- nginx 是什麼,能幹什麼?Nginx
- 什麼是Django?有什麼用途?Django