daily job tips(cassandra)

oxoxooxx發表於2011-08-29

1.多資料中心部署問題思考?
Token選擇問題
分割槽策略選擇
單資料中心內部資料平衡
多資料中心擴容
節點故障帶來的問題
2.叢集維護?
系統日誌聚集(todo)
定時備份任務(todo)
叢集系統配置檔案維護方案選擇

參考文件(cassandra wiki)

相關詞彙:
consult 考慮,顧及
vital 致命的;生死攸關的
corollary 推論,必然的結果
disproportionately 不成比例 不對稱的
consequence 結果,後果
wipe 揩,擦;揩乾;擦淨

hypothetical 假設的,假定的;有待證實的
depition 描寫;敘述
rigid 剛硬的; 僵硬的; 不彎曲的
symposium 討論會, 專題報告
Once data is placed on the cluster, the partitioner may not be changed without wiping and starting over.

1.雲端計算之cassandra硬體選擇tip
Cloud
Several heavy users of Cassandra deploy in the cloud, e.g. CloudKick on Rackspace Cloud Servers and SimpleGeo on Amazon EC2.
On EC2, the best practice is to use L or XL instances with local storage. I/o performance is proportionately much worse on S and M sizes, and EBS is a bad fit for several reasons (see Erik Onnen's excellent explanation). Put the Cassandra commitlog on the root volume, and the data directory on the raid0'd ephemeral disks.

2.cassandra叢集跨資料中心倍增擴容的說明
The corollary to this is, if you want to start with a single DC and add another later, when you add the second DC you should add as many nodes as you have in the first rather than adding a node or two at a time gradually.

3.關於repair頻率問題
Frequency of nodetool repair
Unless your application performs no deletes, it is vital that production clusters run nodetool repair periodically on all nodes in the cluster. The hard requirement for repair frequency is the value used for GCGraceSeconds (see DistributedDeletes). Running nodetool repair often enough to guarantee that all nodes have performed a repair in a given period GCGraceSeconds long, ensures that deletes are not "forgotten" in the cluster.

4.//應該明確endpoint_snitch存在是為了讓cassandra充分的瞭解網路拓撲以便有效的路由資料讀請求。
//區分此選項partition(ByteOrderedPartitioner,OrderPreservingPartitioner,CollatingOrderPreservingPartitioner)策略和replication strategy(SimpleStrategy,OldNetworkTopologyStrategy,NetworkTopologyStrategy)
# endpoint_snitch -- Set this to a class that implements
# IEndpointSnitch, which will let Cassandra know enough
# about your network topology to route requests efficiently.

RackUnawareStrategy(OldNetworkTopologyStrategy): replicas are always placed on the next (in increasing Token order) N-1 nodes along the ring
RackAwareStrategy(NetworkTopologyStrategy): replica 2 is placed in the first node along the ring the belongs in another data center than the first; the remaining N-2 replicas, if any, are placed on the first nodes along the ring in the same rack as the first

5.關於一致性讀取R的選擇和讀修復處理的時點說明
So what we'd like to do two changes:
only send read requests to the closest R live nodes
if read repair is enabled, also compare results from the other nodes in the background

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23937368/viewspace-1054533/,如需轉載,請註明出處,否則將追究法律責任。