網站首頁 編程語言 正文
前言
最近項目上有大量的字符串數據需要存儲到內存,并且需要儲存至一定時間,于是自然而然的想到了使用字符串壓縮算法對“源串”進行壓縮存儲。由此觸發了對一些優秀壓縮算法的調研。
字符串壓縮,我們通常的需求有幾個,一是高壓縮率,二是壓縮速率高,三是解壓速率高。不過高壓縮率與高壓縮速率是魚和熊掌的關系,不可皆得,優秀的算法一般也是采用壓縮率與性能折中的方案。從壓縮率、壓縮速率、解壓速率考慮,zstd與lz4有較好的壓縮與解壓性能,最終選取zstd與lz4進行調研。
zstd是facebook開源的提供高壓縮比的快速壓縮算法(參考https://github.com/facebook/zstd),很想了解一下它在壓縮與解壓方面的實際表現。
一、zstd壓縮與解壓
ZSTD_compress屬于ZSTD的Simple API范疇,只有壓縮級別可以設置。
ZSTD_compress函數原型如下:
size_t?ZSTD_compress(void* dst, size_t dstCapacity, const void* src, size_t srcSize, int compressionLevel)
ZSTD_decompress函數原型如下:
size_t ZSTD_decompress( void* dst, size_t dstCapacity, const void* src, size_t compressedSize); 我們先來看看zstd的壓縮與解壓縮示例。
#include <stdio.h>
#include <string.h>
#include <sys/time.h>
#include <malloc.h>
#include <zstd.h>
#include <iostream>
using namespace std;
int main()
{
// compress
size_t com_space_size;
size_t peppa_pig_text_size;
char *com_ptr = NULL;
char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: It's only mud.";
peppa_pig_text_size = strlen(peppa_pig_buf);
com_space_size= ZSTD_compressBound(peppa_pig_text_size);
com_ptr = (char *)malloc(com_space_size);
if(NULL == com_ptr) {
cout << "compress malloc failed" << endl;
return -1;
}
size_t com_size;
com_size = ZSTD_compress(com_ptr, com_space_size, peppa_pig_buf, peppa_pig_text_size, ZSTD_fast);
cout << "peppa pig text size:" << peppa_pig_text_size << endl;
cout << "compress text size:" << com_size << endl;
cout << "compress ratio:" << (float)peppa_pig_text_size / (float)com_size << endl << endl;
// decompress
char* decom_ptr = NULL;
unsigned long long decom_buf_size;
decom_buf_size = ZSTD_getFrameContentSize(com_ptr, com_size);
decom_ptr = (char *)malloc((size_t)decom_buf_size);
if(NULL == decom_ptr) {
cout << "decompress malloc failed" << endl;
return -1;
}
size_t decom_size;
decom_size = ZSTD_decompress(decom_ptr, decom_buf_size, com_ptr, com_size);
cout << "decompress text size:" << decom_size << endl;
if(strncmp(peppa_pig_buf, decom_ptr, peppa_pig_text_size)) {
cout << "decompress text is not equal peppa pig text" << endl;
}
free(com_ptr);
free(decom_ptr);
return 0;
}
執行結果:
從結果可以發現,壓縮之前的peppa pig文本長度為1827,壓縮后的文本長度為759,壓縮率為2.4,解壓后的長度與壓縮前相等。
另外,上文提到可以調整ZSTD_compress函數的壓縮級別,zstd的默認級別為ZSTD_CLEVEL_DEFAULT = 3,最小值為0,最大值為ZSTD_MAX_CLEVEL = 22。另外也提供一些策略設置,例如?ZSTD_fast, ZSTD_greedy, ZSTD_lazy, ZSTD_lazy2, ZSTD_btlazy2。壓縮級別越高,壓縮率越高,但是壓縮速率越低。
二、ZSTD壓縮與解壓性能探索
上面探索了zstd的基礎壓縮與解壓方法,接下來再摸索一下zstd的壓縮與解壓縮性能。
測試方法是,使用ZSTD_compress連續壓縮同一段文本并持續10秒,最后得到每一秒的平均壓縮速率。測試壓縮性能的代碼示例如下:
#include <stdio.h>
#include <string.h>
#include <sys/time.h>
#include <malloc.h>
#include <zstd.h>
#include <iostream>
using namespace std;
int main()
{
int cnt = 0;
size_t com_size;
size_t com_space_size;
size_t peppa_pig_text_size;
char *com_ptr = NULL;
char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: It's only mud.";
timeval st, et;
peppa_pig_text_size = strlen(peppa_pig_buf);
com_space_size= ZSTD_compressBound(peppa_pig_text_size);
gettimeofday(&st, NULL);
while(1) {
com_ptr = (char *)malloc(com_space_size);
com_size = ZSTD_compress(com_ptr, com_space_size, peppa_pig_buf, peppa_pig_text_size, ZSTD_fast);
free(com_ptr);
cnt++;
gettimeofday(&et, NULL);
if(et.tv_sec - st.tv_sec >= 10) {
break;
}
}
cout << "compress per second:" << cnt/10 << " times" << endl;
return 0;
}
執行結果:
結果顯示ZSTD的壓縮性能大概在每秒6-7萬次左右,這個結果其實并不是太理想。需要說明的是壓縮性能與待壓縮文本的長度、字符內容也是有關系的。
我們再來探索一下ZSTD的解壓縮性能。與上面的測試方法類似,先對本文進行壓縮,然后連續解壓同一段被壓縮過的數據并持續10秒,最后得到每一秒的平均解壓速率。測試解壓性能的代碼示例如下:
#include <stdio.h>
#include <string.h>
#include <sys/time.h>
#include <malloc.h>
#include <zstd.h>
#include <iostream>
using namespace std;
int main()
{
int cnt = 0;
size_t com_size;
size_t com_space_size;
size_t peppa_pig_text_size;
timeval st, et;
char *com_ptr = NULL;
char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: It's only mud.";
size_t decom_size;
char* decom_ptr = NULL;
unsigned long long decom_buf_size;
peppa_pig_text_size = strlen(peppa_pig_buf);
com_space_size= ZSTD_compressBound(peppa_pig_text_size);
com_ptr = (char *)malloc(com_space_size);
com_size = ZSTD_compress(com_ptr, com_space_size, peppa_pig_buf, peppa_pig_text_size, 1);
gettimeofday(&st, NULL);
decom_buf_size = ZSTD_getFrameContentSize(com_ptr, com_size);
while(1) {
decom_ptr = (char *)malloc((size_t)decom_buf_size);
decom_size = ZSTD_decompress(decom_ptr, decom_buf_size, com_ptr, com_size);
if(decom_size != peppa_pig_text_size) {
cout << "decompress error" << endl;
break;
}
free(decom_ptr);
cnt++;
gettimeofday(&et, NULL);
if(et.tv_sec - st.tv_sec >= 10) {
break;
}
}
cout << "decompress per second:" << cnt/10 << " times" << endl;
free(com_ptr);
return 0;
}
執行結果:
結果顯示ZSTD的解壓縮性能大概在每秒12萬次左右,解壓性能比壓縮性能高。
三、zstd的高級用法
zstd提供了一個名為PZSTD的壓縮和解壓工具。PZSTD(parallel zstd),并行壓縮的zstd,是一個使用多線程對待壓縮文本進行切片分段,且進行并行壓縮的命令行工具。
其實高版本(v1.4.0及以上)的zstd也提供了指定多線程對文本進行并行壓縮的相關API接口,也就是本小節要介紹的zstd高級API用法。下面我們再來探索一下zstd的多線程壓縮使用方法。
多線程并行壓縮的兩個關鍵API,一個是參數設置API,另一個是壓縮API。
參數設置API的原型是:
size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, int value)
壓縮API的原型是:
size_t ZSTD_compress2(ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize)
下面給出zstd并行壓縮的示例demo,通過ZSTD_CCtx_setParameter設置線程數為3,即指定宏ZSTD_c_nbWorkers為3,通過ZSTD_compress2壓縮相關文本。另外,為了展示zstd確實使用了多線程,需要先讀取一個非常大的文件,作為zstd的壓縮文本源,盡量使zstd運行較長時間。
#include <stdio.h>
#include <string.h>
#include <sys/time.h>
#include <malloc.h>
#include <zstd.h>
#include <iostream>
using namespace std;
int main()
{
size_t com_size;
size_t com_space_size;
FILE *fp = NULL;
unsigned int file_len;
char *com_ptr = NULL;
char *file_text_ptr = NULL;
fp = fopen("xxxxxx", "r");
if(NULL == fp){
cout << "file open failed" << endl;
return -1;
}
fseek(fp, 0, SEEK_END);
file_len = ftell(fp);
fseek(fp, 0, SEEK_SET);
cout << "file length:" << file_len << endl;
// malloc space for file content
file_text_ptr = (char *)malloc(file_len);
if(NULL == file_text_ptr) {
cout << "malloc failed" << endl;
return -1;
}
// malloc space for compress space
com_space_size = ZSTD_compressBound(file_len);
com_ptr = (char *)malloc(com_space_size);
if(NULL == com_ptr) {
cout << "malloc failed" << endl;
return -1;
}
// read text from source file
fread(file_text_ptr, 1, file_len, fp);
fclose(fp);
ZSTD_CCtx* cctx;
cctx = ZSTD_createCCtx();
// set multi-thread parameter
ZSTD_CCtx_setParameter(cctx, ZSTD_c_nbWorkers, 3);
ZSTD_CCtx_setParameter(cctx, ZSTD_c_compressionLevel, ZSTD_btlazy2);
com_size = ZSTD_compress2(cctx, com_ptr, com_space_size, file_text_ptr, file_len);
free(com_ptr);
free(file_text_ptr);
return 0;
}
運行上述demo,可見zstd確實啟動了3個線程對文本進行了并行壓縮。且設置的線程數越多,壓縮時間越短,這里就不詳細展示了,讀者可以自行實驗。
需要說明的是,zstd當前默認編譯單線程的庫文件,要實現多線程的API調用,需要在make的時候指定編譯參數ZSTD_MULTITHREAD。
另外,zstd還支持線程池的方式,線程池的函數原型:
POOL_ctx* ZSTD_createThreadPool(size_t numThreads)
線程池可以避免在多次、連續壓縮場景時頻繁的去創建線程、撤銷線程產生的非必要開銷,使得算力主要開銷在文本壓縮方面。
四、總結
本篇分享了zstd壓縮與解壓縮使用的基本方法,對壓縮與解壓的性能進行了摸底,最后探索了zstd多線程壓縮的使用方法。
從壓縮測試來看,zstd的壓縮比其實已經比較好了,比原文所占用空間縮小了一半以上,當然壓縮比也跟待壓縮文本的內容有關。
從性能執行結果來看,zstd的壓縮與解壓性能表現比較勉強,我認為zstd在魚(性能)和熊掌(壓縮比)之間更偏向熊掌一些,不過對一些性能要求不太高的,但是要高壓縮比的場景是比較符合的。
多線程并行壓縮,在有大文本需要連續多次壓縮的場景下,結合線程池可以很好的提升壓縮速率。
原文鏈接:https://www.cnblogs.com/t-bar/p/15956868.html
相關推薦
- 2023-07-02 cv2.imread?和?cv2.imdecode?用法及區別_python
- 2023-01-30 Android自定義View模仿即刻點贊數字切換效果實例_Android
- 2022-04-01 報錯處理-bash: fork: Cannot allocate memory
- 2022-10-22 如何在Go中使用Casbin進行訪問控制_Golang
- 2022-01-21 Flink中window 窗口和時間以及watermark水印
- 2022-08-10 Pandas?sample隨機抽樣的實現_python
- 2022-07-13 JMeter主要元件_線程組的使用方法
- 2022-04-18 Python的類成員變量默認初始值的坑及解決_python
- 最近更新
-
- window11 系統安裝 yarn
- 超詳細win安裝深度學習環境2025年最新版(
- Linux 中運行的top命令 怎么退出?
- MySQL 中decimal 的用法? 存儲小
- get 、set 、toString 方法的使
- @Resource和 @Autowired注解
- Java基礎操作-- 運算符,流程控制 Flo
- 1. Int 和Integer 的區別,Jav
- spring @retryable不生效的一種
- Spring Security之認證信息的處理
- Spring Security之認證過濾器
- Spring Security概述快速入門
- Spring Security之配置體系
- 【SpringBoot】SpringCache
- Spring Security之基于方法配置權
- redisson分布式鎖中waittime的設
- maven:解決release錯誤:Artif
- restTemplate使用總結
- Spring Security之安全異常處理
- MybatisPlus優雅實現加密?
- Spring ioc容器與Bean的生命周期。
- 【探索SpringCloud】服務發現-Nac
- Spring Security之基于HttpR
- Redis 底層數據結構-簡單動態字符串(SD
- arthas操作spring被代理目標對象命令
- Spring中的單例模式應用詳解
- 聊聊消息隊列,發送消息的4種方式
- bootspring第三方資源配置管理
- GIT同步修改后的遠程分支