netty 使用字典提升短文字的壓縮效果

何德海發表於2020-07-24

原文網址 : https://www.cnblogs.com/dehai/p/13261205.html

Netty

1 問題

　　術語：壓縮率，compression ratio，壓縮後的大小／壓縮前的大小，越小說明壓縮效果越好。

　　在使用netty的JdkZlibEncoder進行壓縮時，發現了一個問題：它對於短文字（小於2K）的壓縮效果很差，壓縮率在80%-120%，文字越短，壓縮效果越差，甚至可能比沒壓縮前更大。

　　通過研究發現，使用字典可以改進壓縮效果。以下詳細介紹如何做。

2 提取字典

　　我們要傳輸的文字類似於：

  1 <?xml version="1.0" encoding="utf-8" ?>
  2 <Event attribute="TRANSIENT">
  3   <outer id="11" from="1005" to="915880056212" trunk="83057387" callid="24587"/>
  4   <ext id="1005"/>
  5 </Event>

　　提取字典的原則：將重複出現的字串加入到字典。

　　可以提取以下字典：

  1 String[] dictionary = {
  2         "<?xml version=\"1.0\" encoding=\"utf-8\" ?>",
  3         "Event", "TRANSIENT", "attribute", "outer", "from", "trunk",
  4         "callid", "id", "to", "ext"
  5 };
  6

3 測試用例

　　使用EmbeddedChannel API來構建測試用例。EmbeddedChannel能夠模擬入站和出站的資料流，對於測試ChannelHandler非常有用。

　　JdkZlibEncoder的建構函式可以接受一個字典引數：

　　下面是測試程式碼：

  1 public class GzipTest {
  2 
  3 
  4     private String xml = "<?xml version=\"1.0\" encoding=\"utf-8\" ?>" +
  5             "<Event attribute=\"TRANSIENT\">" +
  6             "<outer id=\"11\" from=\"1005\" to=\"915880056212\" trunk=\"83057387\" callid=\"24587\"  />" +
  7             "<ext id=\"1005\" />" +
  8             "</Event>";
  9 
 10     private String[] dictionary = {
 11             "<?xml version=\"1.0\" encoding=\"utf-8\" ?>",
 12             "Event", "TRANSIENT", "attribute", "outer", "from", "trunk",
 13             "callid", "id", "to", "ext"
 14     };
 15 
 16 
 17     /**
 18      * 不使用字典壓縮
 19      */
 20     @Test
 21     public void test1() {
 22         EmbeddedChannel embeddedChannel = new EmbeddedChannel();
 23         ChannelPipeline pipeline = embeddedChannel.pipeline();
 24         //
 25         pipeline.addLast("gzipDecoder", new JdkZlibDecoder());
 26         pipeline.addLast("gzipEncoder", new JdkZlibEncoder(9));
 27         pipeline.addLast("decoder", new StringDecoder());
 28         pipeline.addLast("encoder", new StringEncoder());
 29         //
 30         System.out.println("*******不使用字典壓縮*******");
 31         int compressBefore = xml.getBytes(StandardCharsets.UTF_8).length;
 32         System.out.printf("壓縮前大小：%d \n", compressBefore);
 33         // 模擬輸出
 34         embeddedChannel.writeOutbound(xml);
 35         ByteBuf outboundBuf = embeddedChannel.readOutbound();
 36         int compressAfter = outboundBuf.readableBytes();
 37         System.out.printf("壓縮後大小：%d, 壓縮率：%d%% \n", compressAfter,
 38                 compressAfter * 100 / compressBefore);
 39 
 40     }
 41 
 42     /**
 43      * 使用字典壓縮
 44      */
 45     @Test
 46     public void test2() {
 47         EmbeddedChannel embeddedChannel = new EmbeddedChannel();
 48         ChannelPipeline pipeline = embeddedChannel.pipeline();
 49         // 字典
 50         byte[] dictionaryBytes = String.join("", dictionary)
 51                 .getBytes(StandardCharsets.UTF_8);
 52         //
 53         pipeline.addLast("gzipDecoder", new JdkZlibDecoder(dictionaryBytes));
 54         pipeline.addLast("gzipEncoder", new JdkZlibEncoder(9, dictionaryBytes));
 55         pipeline.addLast("decoder", new StringDecoder());
 56         pipeline.addLast("encoder", new StringEncoder());
 57         //
 58         System.out.println("*******使用字典壓縮*******");
 59         int compressBefore = xml.getBytes(StandardCharsets.UTF_8).length;
 60         System.out.printf("壓縮前大小：%d \n", compressBefore);
 61         // 模擬輸出
 62         embeddedChannel.writeOutbound(xml);
 63         ByteBuf outboundBuf = embeddedChannel.readOutbound();
 64         int compressAfter = outboundBuf.readableBytes();
 65         System.out.printf("壓縮後大小：%d, 壓縮率：%d%% \n", compressAfter,
 66                 compressAfter * 100 / compressBefore);
 67     }
 68 
 69 
 70 }

輸出：

*******不使用字典壓縮*******

壓縮前大小：173

壓縮後大小：150, 壓縮率：86%

*******使用字典壓縮*******

壓縮前大小：173

壓縮後大小：95, 壓縮率：54%

　　從輸出可以看到，壓縮率由86%提升至了54%。

4 進一步

　　如果覺得手工提取字典效率太低，還可以試一下zstd。zstd是由facebook提供的一個壓縮庫，它提供了自動提取字典的工具。命令如下：

　zstd --train ./dictionary/* -o ./dict.bin

5 參考資料

zstd github

文字壓縮演算法的對比和選擇

linux 高效壓縮工具之xz的壓縮解壓使用
2023-01-29
Linux
Blazor 釋出WebAssembly使用Brotli 壓縮提升初次載入速度
2022-04-09
BlazorWeb
input文字框獲取焦點伸縮效果
2019-07-26
Nginx網路壓縮 CSS壓縮圖片壓縮 JSON壓縮
2022-02-08
NginxCSSJSON
NET中SharpZipLib 的使用(一)【壓縮與解壓】
2018-07-22
前端效能最佳化——啟用文字壓縮
2023-03-13
前端
強大且易於使用的壓縮和解壓縮軟體：Keka for Mac
2023-10-10
Mac
第二章製作短影片文字效果
2022-11-04
Linux中檔案的壓縮和解壓縮
2018-06-29
Linux
檔案壓縮和解壓縮
2024-06-30
Linux下各壓縮方式測試（壓縮率和使用時間）
2018-11-16
Linux
ppt怎麼壓縮，ppt壓縮的技巧分享
2019-07-19
NET中SharpZipLib 的使用(二)【Web中壓縮與解壓】
2018-07-23
Web
儲存空間緊張？來看 TDengine TSZ 壓縮演算法如何顯著提升壓縮率
2023-11-28
演算法
Python實現壓縮和解壓縮
2024-04-09
Python
linux下壓縮解壓縮命令
2020-07-17
Linux
linux壓縮和解壓縮命令整理
2020-10-10
Linux
JS壓縮方法及批量壓縮
2022-07-15
JS
InnoDB 層壓縮相關字典表 | 全方位認識 information_schema
2019-04-04
ORM
如何利用新浪官方的短網址API介面實現T.cn短連結的壓縮生成
2019-12-04
API
app直播原始碼，為文字/圖片新增按壓效果
2023-02-17
APP原始碼
Linux下的tar壓縮解壓縮命令詳解
2018-12-25
Linux
Linux 常用的壓縮與解壓縮命令詳解
2021-06-02
Linux
Linux tar分卷壓縮與解壓縮
2020-05-06
Linux
pigz更快的壓縮和解壓工具
2020-12-01
Linux壓縮解壓
2018-05-18
Linux
CentOS 壓縮解壓
2021-10-23
CentOS
pdf怎麼壓縮，好用的pdf壓縮工具介紹
2019-07-30
加密的壓縮包
2020-03-05
加密
SQLServer的頁壓縮
2023-01-10
SQLServer
oracle 的表壓縮
2021-05-12
Oracle
淺談在c#中使用Zlib壓縮與解壓的方法
2021-04-27
C#
網址縮短短網址連結縮短生成器的試用推薦
2020-04-29
影像體積壓縮工具JPEG Jackal更好的壓縮圖片
2020-12-01
打包/壓縮
2019-01-28
Gzipped 壓縮
2024-11-25
AU音訊剪輯設定廣播多頻段壓縮效果的方法
2020-10-28
音訊
linuxtar解壓和壓縮
2018-10-30
Linux