What’s new in Cassandra 1.0: Compression
What’s new in Cassandra 1.0: Compression
Cassandra 1.0 introduces support for data compression on a per-ColumnFamily basis, one of the most-requested features since the project started. Compression maximizes the storage capacity of your Cassandra nodes by reducing the volume of data on disk. In addition to the space-saving benefits, compression also reduces disk I/O, particularly for read-dominated workloads. Compression benefits
Besides data size, compression typically improves both read and write performance. Cassandra is able to quickly find the location of rows in the SSTable index, and only decompresses the relevant row chunks. This means compression improves read performance not just by allowing a larger data set to fit in memory, but it also benefits workloads where the hot data set does not fit into memory.
Unlike in traditional databases, write performance is not negatively impacted by compression in Cassandra. Writes on compressed tables can in fact show up to a 10 percent performance improvement. In traditional relational databases, writes require overwrites to existing data files on disk. This means that the database has to locate the relevant pages on disk, decompress them, overwrite the relevant data, and then compress them again – an expensive operation in both CPU cycles and disk I/O.
Because Cassandra SSTable data files are immutable (they are not written to again after they have been flushed to disk), there is no recompression cycle necessary in order to process writes. SSTables are only compressed once, when they are written to disk.
Overall, we are seeing the following results from enabling compression, depending on the data characteristics:
2x-4x reduction in data size
25-35% performance improvement on reads
5-10% performance improvement on writes
When to use compression
Compression is best suited for ColumnFamilies where there are many rows, with each row having the same columns, or at least many columns in common. For example, a ColumnFamily containing user data such as username, email, etc., would be a good candidate for compression. The more similar the data across rows, the greater the compression ratio will be, and the larger the gain in read performance.
Compression is not as good a fit for ColumnFamilies where each row has a different set of columns, or where there are just a few very wide rows. Dynamic column families such as this will not yield good compression ratios. Configuring compression on a ColumnFamily
When you create or update a column family, you can choose to make it a compressed column family by specifying the following storage properties:
compression_options: this is a container property for setting compression options on a column family. The compression_options property contains the following options:
sstable_compression: specifies the compression algorithm to use when compressing SSTable files. Cassandra supports two built-in compression classes: SnappyCompressor (Snappy compression library) and DeflateCompressor (Java zip implementation).Snappy compression offers faster compression/decompression while the Java zip compression offers better compression ratios. Choosing the right one depends on your requirements for space savings over read performance. For read-heavy workloads, Snappy compression is recommended.Developers can also implement custom compression classes using the org.apache.cassandra.io.compress.ICompressor interface.
chunk_length_kb: sets the compression chunk size in kilobytes. The default value (64) is a good middle-ground for compressing column families with either wide rows or with skinny rows. With wide rows, it allows reading a 64kb slice of column data without decompressing the entire row. For skinny rows, although you may still end up decompressing more data than requested, it is a good trade-off between maximizing the compression ratio and minimizing the overhead of decompressing more data than is needed to access a requested row.The compression chunk size can be adjusted to account for read/write access patterns (how much data is typically requested at once) and the average size of rows in the column family.
You can enable compression when you create a new column family, or update an existing column family to add compression later on. When you add compression to an existing column family, existing SSTables on disk are not compressed immediately. Any new SSTables that are created will be compressed, and any existing SSTables will be compressed during the normal Cassandra compaction process. (If necessary, you can force existing sstables to be rewritten and compressed by using the nodetool scrub tool.)
For example, to create a new column family with compression enabled using the Cassandra CLI, you would do the following:
[default@demo] CREATE COLUMN FAMILY users
WITH key_validation_class=UTF8Type
AND column_metadata = [
{column_name: name, validation_class: UTF8Type}
{column_name: email, validation_class: UTF8Type}
{column_name: state, validation_class: UTF8Type}
{column_name: gender, validation_class: UTF8Type}
{column_name: birth_year, validation_class: LongType}
]
AND compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64};
Conclusion Compression in Cassandra 1.0 is an easy way to reduce storage volume requirements while increasing performance. Compression can be easily added to existing ColumnFamilies after an upgrade, and the implementation allows power users to tweak chunk sizes for maximum benefit.
相關文章
- What’s new in RxSwift 5Swift
- What's new of dubbogo v1.4Go
- What's New in MariaDB 10.4
- What’s New in Swift 3? 筆記Swift筆記
- What's new in C# from 2.0 to 5.0C#
- What's New in the JMF 2.0 Reference Implementations (轉)
- What's new in dubbo-go-pixiu 0.4.0Go
- What’s New in TiDB 3.0.0-rc.1TiDB
- What's new in dubbo-go v1.5.6Go
- What's new in Dubbo-go v1.5.1Go
- What's new in Dubbo 3.1.4 and 3.2.0-beta.3
- What's new in Dubbo 3.1.5 and 3.2.0-beta.4
- iOS10新特性(What's New in iOS)iOS
- WWDC 201 What's new in cocoa touch
- VMware vSphere:What's New V5.1培訓
- WWDC18 What’s New in LLVM 個人筆記LVM筆記
- Chrome 63 - What's New in DevTools(中文字幕)Chromedev
- What's New in J2SE 1.5 ? -- Interview on Joshua Bloch (轉)ViewBloC
- [MetalKit]21-What's-new-in-graphics-and-games-at-WWDC-2016GAM
- What's the webmethodWeb
- what is the new features of Flash CS5?
- What’s Brewing for .NET DevelopersDeveloper
- 收藏What’s the Point of Oracle Checkpoints?Oracle
- SAP S4 HANA - What is it all about?
- Win10 v2004的最新功能曝光:What’s New類似高階玩家官方指南Win10
- GNU grep's new features
- 2021 New Year‘s Resolution
- 「萌新指南」SOA vs. 微服務:What’s the Difference?微服務
- React Native填坑之旅 -- What's nextReact Native
- Oracle Database Compression 1 - Basic CompressionOracleDatabase
- Unused Block Compression和Null Block CompressionBloCNull
- What's the maximum typical speed possible with a USB2.0 drive?
- 真正“搞”懂HTTP協議05之What's HTTP?HTTP協議
- What's coming In Java Enterprise Edition 6Java
- Oracle Database Compression 3 - Hybrid Columnar CompressionOracleDatabase
- Oracle Database Compression 2 - Advanced/OLTP CompressionOracleDatabase
- 2008 3 21:What's going to be covered on the test?Go
- GoldenGate - What is supported and what is not ....Go