資料備份 reed-solomn 庫 的使用
1. reed-solomn 是什麼?
reed-solomn 假如磁碟損壞了一部分,或者光碟一部分出現了汙漬,那是不是我們的資訊就丟了呢?
當然不是,有一種糾錯演算法,可以在資料明顯缺失的前提下,依然可以恢復資料,它就是 Reed–Solomon 演算法。當然,它事先需要冗餘備份,當丟失資料在一定範圍內,就可以恢復原始檔。
2. Reed Solomon 編碼原理
把輸入資料視為向量 D=(D1,D2,..., Dn), 編碼後資料視為向量(D1, D2,..., Dn, C1, C2,.., Cm),RS 編碼可視為如下圖所示矩陣運算。
需要注意:編碼矩陣 B 必須具有任意子矩陣可逆的特性。
3. Reed Solomon 解碼原理
RS 最多能容忍 m 個資料塊被刪除,m 包括實際資料和冗餘資料。 資料恢復的過程如下:
(1)假設 D1、D4、C2 丟失,從編碼矩陣中刪掉丟失的資料塊/編碼塊對應的行。
(2)由於 B' 是可逆的,記 B'的逆矩陣為 (B'^-1),則 B' * (B'^-1) = I 單位矩陣。兩邊左乘 B' 逆矩陣。
(3)得到如下原始資料 D 的計算公式
4. 程式碼實現及測試
github 地址: "github.com/klauspost/reedsolomon"
1.reed-solomon 編碼測試 此時,輸入 in.txt,go run main.go
輸出 out 目錄下,30 個 shards 檔案。
package main
import (
func main() {
var DataShards = 10
var ParShards = 20
var OutDir = "./out"
var dataShards = &DataShards
var parShards = &ParShards
var outDir = &OutDir
fname := "in.txt"
// 1.Create encoding matrix.
enc, err := reedsolomon.NewStream(*dataShards, *parShards)
fmt.Println("Opening", fname)
f, err := os.Open(fname)
instat, err := f.Stat()
shards := *dataShards + *parShards
out := make([]*os.File, shards)
// 2.建立輸入檔案 30個shards
dir, file := filepath.Split(fname)
if *outDir != "" {
dir = *outDir
for i := range out {
outfn := fmt.Sprintf("%s.%d", file, i)
fmt.Println("Creating", outfn)
out[i], err = os.Create(filepath.Join(dir, outfn))
// Split into files.
data := make([]io.Writer, *dataShards)
for i := range data {
data[i] = out[i]
// 3.原始檔案拆分
err = enc.Split(f, data, instat.Size())
// Close and re-open the files.
input := make([]io.Reader, *dataShards)
for i := range data {
f, err := os.Open(out[i].Name())
fmt.Println("Error ", err)
input[i] = f
defer f.Close()
// 4.封裝 parity
parity := make([]io.Writer, *parShards)
for i := range parity {
parity[i] = out[*dataShards+i]
defer out[*dataShards+i].Close()
// 5.Encode 編碼rs格式
err = enc.Encode(input, parity)
fmt.Printf("File split into %d data + %d parity shards.\n", *dataShards, *parShards)
func checkErr2(err error) {
if err != nil {
$ go run main.go
Opening in.txt
Creating in.txt.0
Creating in.txt.1
Creating in.txt.2
Creating in.txt.3
Creating in.txt.4
Creating in.txt.5
Creating in.txt.6
Creating in.txt.7
Creating in.txt.8
Creating in.txt.9
Creating in.txt.10
Creating in.txt.11
Creating in.txt.12
Creating in.txt.13
Creating in.txt.14
Creating in.txt.15
Creating in.txt.16
Creating in.txt.17
Creating in.txt.18
Creating in.txt.19
Creating in.txt.20
Creating in.txt.21
Creating in.txt.22
Creating in.txt.23
Creating in.txt.24
Creating in.txt.25
Creating in.txt.26
Creating in.txt.27
Creating in.txt.28
Creating in.txt.29
File split into 10 data + 20 parity shards.
2.reed-solomon 恢復測試
在上面基礎上,刪掉out目錄下面20個檔案(編號6-24),剩下10個,執行 go run recover_main.go
package main
import (
var OutFile = "out2.txt"
var outFile = &OutFile
var DataShards = 10
var ParShards = 20
var OutDir = "./out"
var dataShards = &DataShards
var parShards = &ParShards
var outDir = &OutDir
func main() {
fname := "out/in.txt"
// 1.Create matrix
enc, err := reedsolomon.NewStream(*dataShards, *parShards)
// 2.Open the inputs
shards, size, err := openInput(*dataShards, *parShards, fname)
// 3.Verify the shards
ok, err := enc.Verify(shards)
if ok {
fmt.Println("No reconstruction needed")
} else {
fmt.Println("Verification failed. Reconstructing data")
shards, size, err = openInput(*dataShards, *parShards, fname)
// 3.1 重新建立刪除的檔案
out := make([]io.Writer, len(shards))
for i := range out {
if shards[i] == nil {
//dir, _ := filepath.Split(fname)
outfn := fmt.Sprintf("%s.%d", fname, i)
fmt.Println("Creating", outfn)
out[i], err = os.Create(outfn)
// 3.2 重建30個shards
err = enc.Reconstruct(shards, out)
if err != nil {
fmt.Println("Reconstruct failed -", err)
// Close output.
for i := range out {
if out[i] != nil {
err := out[i].(*os.File).Close()
shards, size, err = openInput(*dataShards, *parShards, fname)
ok, err = enc.Verify(shards)
if !ok {
fmt.Println("Verification failed after reconstruction, data likely corrupted:", err)
// 4.Join the shards and write them
outfn := *outFile
if outfn == "" {
outfn = fname
fmt.Println("Writing data to", outfn)
f, err := os.Create(outfn)
shards, size, err = openInput(*dataShards, *parShards, fname)
// join恢復原檔案 but We don't know the exact filesize.
err = enc.Join(f, shards, int64(*dataShards)*size)
func openInput(dataShards, parShards int, fname string) (r []io.Reader, size int64, err error) {
// Create shards and load the data.
shards := make([]io.Reader, dataShards+parShards)
for i := range shards {
infn := fmt.Sprintf("%s.%d", fname, i)
fmt.Println("Opening", infn)
f, err := os.Open(infn)
if err != nil {
fmt.Println("Error reading file", err)
shards[i] = nil
} else {
shards[i] = f
stat, err := f.Stat()
if stat.Size() > 0 {
size = stat.Size()
} else {
shards[i] = nil
return shards, size, nil
func checkErr(err error) {
if err != nil {
先刪除0-5,25-29 這些檔案,
$ go run recover_main.go
Verification failed. Reconstructing data
Opening out/in.txt.0
Opening out/in.txt.1
Opening out/in.txt.2
Opening out/in.txt.3
Opening out/in.txt.4
Opening out/in.txt.5
Opening out/in.txt.25
Opening out/in.txt.26
Opening out/in.txt.27
Opening out/in.txt.28
Opening out/in.txt.29
Creating out/in.txt.5
Creating out/in.txt.6
Creating out/in.txt.7
Creating out/in.txt.8
Creating out/in.txt.9
Creating out/in.txt.10
Creating out/in.txt.11
Creating out/in.txt.12
Creating out/in.txt.13
Creating out/in.txt.14
Creating out/in.txt.15
Creating out/in.txt.16
Creating out/in.txt.17
Creating out/in.txt.18
Creating out/in.txt.19
Creating out/in.txt.20
Creating out/in.txt.21
Creating out/in.txt.22
Creating out/in.txt.23
Creating out/in.txt.24
reconstruct ...
Writing data to out2.txt
最後可以看到0-5,25-29 這些檔案已經恢復出來,並且原始檔也恢復出來了 out2.txt.
reed solomon 糾錯碼是一種特殊型別的糾錯碼。在事先冗餘備份下,當丟失資料在一定範圍內,呼叫恢復過程,就可以恢復原始檔。 一些大型的分散式檔案儲存,都用它來保證檔案的高可用性。當然,使用起來非常方便,大家可以多動手試試,希望你能喜歡哦!
