NoSQL(MongoDB,Riak,CouchDB,Redis)

pxbibm發表於2014-04-24

本文來自主要介紹目前最為流行NOSQL 資料庫,介紹了每個NOSQL資料庫的優點,缺點,和適用的場景。
本文是來自德國的一位技術架構師寫的,,從 Kristof Kovacs技術文章上分析
Kristof Kovacs應該是位做機械相關的軟體架構師。
最近一直在讀英文資料,順便翻譯了下,有可能有些地方翻譯的不準確,還請多多指教。
NoSQL(MongoDB,Riak,CouchDB,Redis)
While SQL databases are insanely useful tools, their monopoly in the last decades is coming to an end. And it's just time: I can't even count the things that were forced into relational databases, but never really fitted them. (That being said, relational databases will always be the best for the stuff that has relations.)
關係型資料庫(sql database)是非常有用的工具,sql 資料庫壟斷了10多年了,但這局面即將被打破。這只是時間問題:關聯式資料庫不能適應需求的所有情況。
話雖這麼說關聯式資料庫永遠是最好的關係型資料庫

But, the differences between NoSQL databases are much bigger than ever was between one SQL database and another. This means that it is a bigger responsibility on to choose the appropriate one for a project right at the beginning.

但是,NoSQL資料庫的不同遠超過了關聯式資料庫(sql database)和其他資料庫。這意味著軟體架構師在專案開始時有更大的需求空間選擇好一個適合的 NoSQL資料庫。
In this light, here is a comparison of , , , , , , , , Accumulo, , , , and :
針對這種情況,這裡對 , , , , , , , , Accumulo, , , ,  和 進行了比較:

The most popular ones

MongoDB (2.2)

  • Written in: C++
  • Main point: Retains some friendly properties of SQL. (Query, index)
  • License: AGPL (Drivers: Apache)
  • Protocol: Custom, binary (BSON)
  • Master/slave replication (auto failover with replica sets)
  • Sharding built-in
  • Queries are javascript expressions
  • Run arbitrary javascript functions server-side
  • Better update-in-place than CouchDB
  • Uses memory mapped files for data storage
  • Performance over features
  • Journaling (with --journal) is best turned on
  • On 32bit systems, limited to ~2.5Gb
  • An empty database takes up 192Mb
  • GridFS to store big data + metadata (not actually an FS)
  • Has geospatial indexing
  • Data center aware

Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.

For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.


?使用語言:C++
?主要特點:保留了SQL一些友好的特性(查詢,索引)。
?許可: AGPL(發起者: Apache)
?通訊協議: Custom, binary( BSON)(譯註:沒使用過該協議)
?Master/slave主從複製(支援自動故障轉移與恢復)
?分片機制
?支援 javascript表示式查詢
?可在伺服器端執行任意javascript 函式
?update-in-place比CouchDB更好
?使用記憶體對映檔案的資料儲存
?效能性比功能性強
?最好開啟日誌功能(可修改引數journal)
?在32位作業系統上,資料庫大小限制在約2.5Gb
?一個空資料庫大約佔192MB
?採用 GridFS儲存大資料和後設資料(不是真正的NF檔案系統)
?有索引(譯註:翻譯不準)
?有資料中心意思(譯註:翻譯不準)
最佳的應用場景:適用於需要動態查詢支援.如果你需要使用索引而不是 map/reduce功能;如果您需要對大資料庫有良好的效能要求,
如果您需要使用CouchDB但資料改變太頻繁而快速佔滿磁碟空間。

例如: Riak (V1.2)

  • Written in: Erlang & C, some JavaScript
  • Main point: Fault tolerance
  • License: Apache
  • Protocol: HTTP/REST or custom binary
  • Stores blobs
  • Tunable trade-offs for distribution and replication
  • Pre- and post-commit hooks in JavaScript or Erlang, for validation and security.
  • Map/reduce in JavaScript or Erlang
  • Links & link walking: use it as a graph database
  • Secondary indices: but only one at once
  • Large object support (Luwak)
  • Comes in "open source" and "enterprise" editions
  • Full-text search, indexing, querying with Riak Search
  • In the process of migrating the storing backend from "Bitcask" to Google's "LevelDB"
  • Masterless multi-site replication replication and SNMP monitoring are commercially licensed

Best used: If you want something Dynamo-like data storage, but no way you're gonna deal with the bloat and complexity. If you need very good single-site scalability, availability and fault-tolerance, but you're ready to pay for multi-site replication.

For example: Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server.

?使用語言:Erlang&C,以及一些Javascript
?主要特點:具備容錯能力
?許可: Apache
?通訊協議: HTTP/REST或者 custom binary
?儲存集中

?可調諧的權衡分配和複製
?JavaScript or Erlang在操作前或操作後進行驗證和安全支援。
?在JavaScript或Erlang中進行 Map/reduce管理
?連線及連線遍歷:可作為圖形資料庫使用
?Secondary indices: but only one at once
?支援大資料物件
?提供開源版和企業版
?支援全文字搜尋,索引,環型查詢
?在遷移的過程中,儲存後端可從“bitcask“到google的“LevelDB”
?支援Masterless多站點複製的複製和SNMP監控商業許可
最佳的應用場景:如果你想使用動態資料儲存,但沒有方式處理膨脹及複雜性的情況。如果你需要很好的單站點的可擴充套件性,可用性和容錯性但是你已經準備支付多站點複製
例如:銷售站點的資料蒐集,工廠的控制系統;對當機有嚴格要求的,適用於易於更新的 web伺服器。

 

 

CouchDB (V1.2)

  • Written in: Erlang
  • Main point: DB consistency, ease of use
  • License: Apache
  • Protocol: HTTP/REST
  • Bi-directional (!) replication,
  • continuous or ad-hoc,
  • with conflict detection,
  • thus, master-master replication. (!)
  • MVCC - write operations do not block reads
  • Previous versions of documents are available
  • Crash-only (reliable) design
  • Needs compacting from time to time
  • Views: embedded map/reduce
  • Formatting views: lists & shows
  • Server-side document validation possible
  • Authentication possible
  • Real-time updates via '_changes' (!)
  • Attachment handling
  • thus, (standalone js apps)

Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.

For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.

?使用語言: Erlang
?主要特點:DB一致性、易於使用
?許可: Apache
?通訊協議: HTTP/REST
?雙向資料複製
?持續進行或臨時處理
?衝突檢查
?master-master複製
?MVCC – 寫操作不阻塞讀
?檔案之前的版本可用
?Crash-only(可靠的)設計
?實時的進行資料壓縮
?檢視:嵌入式map/reduce
?格式化檢視:列表顯示
?支援伺服器端驗證
?支援認證
?支援實時更新
?支援附件處理
?thus, (standalone js apps)

最佳的應用場景:適用於資料變化較少,執行預定義查詢的應用程式。適用於需要資料版本支援的應用程式。

例如: CRM、CMS系統。 master-master複製對於多站點部署是非常簡單。

Redis (V2.8)

  • Written in: C
  • Main point: Blazing fast
  • License: BSD
  • Protocol: Telnet-like, binary safe
  • Disk-backed in-memory database,
  • Dataset size limited to computer RAM (but can span multiple machines' RAM with clustering)
  • Master-slave replication, automatic failover
  • Simple values or data structures by keys
  • but like ZREVRANGEBYSCORE.
  • INCR & co (good for rate limiting or statistics)
  • Bit operations (for example to implement bloom filters)
  • Has sets (also union/diff/inter)
  • Has lists (also a queue; blocking pop)
  • Has hashes (objects of multiple fields)
  • Sorted sets (high score table, good for range queries)
  • Lua scripting capabilities (!)
  • Has transactions (!)
  • Values can be set to expire (as in a cache)
  • Pub/Sub lets one implement messaging

Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory).

For example: Stock prices. Analytics. Real-time data collection. Real-time communication. And wherever you used memcached before.

?使用語言:C
?主要特點:執行非常快
?許可: BSD
?通訊協議: Telnet-like, binary safe
?有硬碟儲存支援的記憶體資料庫
?資料集的大小限制為計算機RAM(但可以跨多個機器的記憶體和聚類)
?主從複製,自動故障轉移
?簡單的值、鍵資料結構
?但也支援複雜操作,例如 ZREVRANGEBYSCORE
?INCR & co (適合計算極限值或統計資料)
?支援位操作
?支援 sets(同時也支援 union/diff/inter)
?支援列表(同時也支援佇列、阻塞式pop操作)
?支援雜湊表(帶有多個屬性的物件)
?支援排序
?支援事務
?可將資料設定成過期資料
?Pub/Sub允許使用者實現訊息機制
 

最佳應用場景:適用於資料變化快且資料庫較小的應用程式(資料常在記憶體處理的)。

例如:股票價格、資料分析、實時資料蒐集、實時通訊。


 

Clones of Google's Bigtable

HBase (V0.92.0)

  • Written in: Java
  • Main point: Billions of rows X millions of columns
  • License: Apache
  • Protocol: HTTP/REST (also Thrift)
  • Modeled after Google's BigTable
  • Uses Hadoop's HDFS as storage
  • Map/reduce with Hadoop
  • Query predicate push down via server side scan and get filters
  • Optimizations for real time queries
  • A high performance Thrift gateway
  • HTTP supports XML, Protobuf, and binary
  • Jruby-based (JIRB) shell
  • Rolling restart for configuration changes and minor upgrades
  • Random access performance is like MySQL
  • A cluster consists of several different types of nodes

Best used: Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already.

For example: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.


?使用語言: Java
?主要特點:支援數十億、數百萬以上的列
?許可: Apache
?通訊協議:HTTP/REST
?Modeled after Google's BigTable

?使用類似 Hadoop's HDFS 進行儲存
?Map/reduce with Hadoop
?實現謂詞在server端掃描及過濾
?對實時查詢進行最佳化
?支援 HTTP、XML、Protobuf、binary
?基於 Jruby( JIRB)的shell
?實現滾動式配置和升級
?隨機訪問效能類似MySQL
?一個叢集包含幾種不同型別的節點
 

最佳的應用場景:適用於非常大的表,並且需要實時訪問的場合。

例如: 搜尋引擎。分析日誌資料。任何需要巨大的二維表的要求

Cassandra (1.2)

  • Written in: Java
  • Main point: Best of BigTable and Dynamo
  • License: Apache
  • Protocol: Thrift & custom binary CQL3
  • Tunable trade-offs for distribution and replication (N, R, W)
  • Querying by column, range of keys (Requires indices on anything that you want to search on)
  • BigTable-like features: columns, column families
  • Can be used as a distributed hash-table, with an "SQL-like" language, CQL (but no JOIN!)
  • Data can have expiration (set on INSERT)
  • Writes can be much faster than reads (when reads are disk-bound)
  • Map/reduce possible with Apache Hadoop
  • All nodes are similar, as opposed to Hadoop/HBase
  • Very good and reliable cross-datacenter replication

Best used: When you write more than you read (logging). If every component of the system must be in Java. ("No one gets fired for choosing Apache's stuff.")

For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is data analysis.

?使用語言: Java
?主要特點:對大表格支援得最好
?許可: Apache
?通訊協議: Thrift & custom binary CQL3
?可調節的分發及複製(N, R, W)
?查詢列範圍內的鍵值
?類似大表格的特點:列,某個列集合
?Can be used as a distributed hash-table, with an "SQL-like" language, CQL (but no JOIN!)

?資料可以設定有效期
?寫操作比讀操作更快
?所有的節點都是相似的,而不像Hadoop/HBase
?很好的和可靠的跨資料中心的複製 

最佳的應用場景:寫操作多過讀操作,如果每個系統組建都必須用 Java編寫。
例如:銀行業,金融業(雖然對於金融交易不是必須的,但這些產業對資料庫的要求會比它們更大)寫比讀更快。

Neo4j (V1.5M02)

  • Written in: Java
  • Main point: Graph database - connected data
  • License: GPL, some features AGPL/commercial
  • Protocol: HTTP/REST (or embedding in Java)
  • Standalone, or embeddable into Java applications
  • Full ACID conformity (including durable data)
  • Both nodes and relationships can have metadata
  • Integrated pattern-matching-based query language ("Cypher")
  • Also the "Gremlin" graph traversal language can be used
  • Indexing of nodes and relationships
  • Nice self-contained web admin
  • Advanced path-finding with multiple algorithms
  • Indexing of keys and relationships
  • Optimized for reads
  • Has transactions (in the Java API)
  • Scriptable in Groovy
  • Online backup, advanced monitoring and High Availability is AGPL/commercial licensed

Best used: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense.

For example: For searching routes in social relations, public transport links, road maps, or network topologies.

  • 所用語言: Java
  • 特點:基於關係的圖形資料庫
  • 使用許可: GPL,其中一些特性使用 AGPL/商業許可
  • 協議: HTTP/REST(或嵌入在 Java中)
  • 可獨立使用或嵌入到 Java應用程式
  • 圖形的節點和邊都可以帶有後設資料
  • 很好的自帶web管理功能
  • 使用多種演算法支援路徑搜尋
  • 使用鍵值和關係進行索引
  • 為讀操作進行最佳化
  • 支援事務(用 Java api)
  • 使用 Gremlin圖形遍歷語言
  • 支援 Groovy指令碼
  • 支援線上備份,高階監控及高可靠性支援使用 AGPL/商業許可

 

最佳應用場景:適用於圖形一類資料。這是 Neo4j與其他nosql資料庫的最顯著區別

例如:社會關係,公共交通網路,地圖及網路拓譜


 

Hypertable (0.9.6.5)

  • Written in: C++
  • Main point: A faster, smaller HBase
  • License: GPL 2.0
  • Protocol: Thrift, C++ library, or HQL shell
  • Implements Google's BigTable design
  • Run on Hadoop's HDFS
  • Uses its own, "SQL-like" language, HQL
  • Can search by key, by cell, or for values in column families.
  • Search can be limited to key/column ranges.
  • Sponsored by Baidu
  • Retains the last N historical values
  • Tables are in namespaces
  • Map/reduce with Hadoop

Best used: If you need a better HBase.

For example: Same as HBase, since it's basically a replacement: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

?使用語言: C++
?主要特點:小的,非常快
?許可: GPL 2.0
?通訊協議: Thrift, C++ library, or HQL shell
?實現了谷歌的Bigtable的設計
?執行在Run on Hadoop's HDFS

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/12798004/viewspace-1148914/,如需轉載,請註明出處,否則將追究法律責任。

相關文章