btrfs使用指南-1概念,建立,塊裝置管理,效能優化

德哥發表於2015-12-08
一、btrfs概念
在btrfs中存在三種型別的資料,data, metadata和system。它們表示:
       DATA
           store data blocks and nothing else。資料塊。

       METADATA
           store internal metadata in b-trees, can store file data if they fit into the inline limit。
       b-trees格式儲存的btrfs內部源資料,例如檔案inode資訊,檔案大小,修改時間等等。

       SYSTEM
           store structures that describe the mapping between the physical devices and the linear logical space representing the filesystem。
       塊裝置和檔案系統線性邏輯空間之間的對映資訊,類似定址對映關係,還包括RAID的關係(profile)。

block group或chunk的概念,這兩個術語可以通用。它們表示:
           a logical range of space of a given profile, stores data, metadata or both; sometimes the terms are used interchangably。
       block group或chunk術語用來表示以上幾種資料型別data,metadata,system的一個空間邏輯範圍,一次性分配的最小空間。(為了保持好的資料連續性?)

           A typical size of metadata block group is 256MiB (filesystem smaller than 50GiB) and 1GiB (larger than 50GiB), for data it’s 1GiB. The system block group size is a few megabytes.
       例如metadata資料型別一次分配的空間為256MB(當檔案系統小於50GB時)或1GB(當檔案系統大於50GB時)。
       data資料型別一次分配的空間是1GB。
       system資料塊則一次分配很少的MB。
       你可以用btrfs filesystem show觀察到這些資訊。

       RAID
           a block group profile type that utilizes RAID-like features on multiple devices: striping, mirroring, parity
       RAID是profile的一種描述,包括條帶(raid0, raid10),mirror(raid1),奇偶校驗(raid 5,6)。

       profile
           when used in connection with block groups refers to the allocation strategy and constraints, see the section PROFILES for more details
       profile和block group結合起來,用來描述資料的分配策略或約束。例如:
       single表示只存一份資料,即每個block group都是獨一無二的。
           DUP表示在一個塊裝置中存雙份資料,即每個block group在 同一個塊裝置 中有一個一樣的block group副本。
       RAID0表示條帶,單個block group可能跨塊裝置儲存。
       RAID10表示映象加條帶,單個block group可能跨塊裝置儲存,其中每個部分都會在兩個塊裝置中存成映象。

PROFILES
       There are the following block group types available:

       ┌────────┬─────────────────────┬────────────┬─────────────────┐
       │Profile │ Redundancy          │ Striping   │ Min/max devices │
       ├────────┼─────────────────────┼────────────┼─────────────────┤
       │        │                     │            │                 │
       │single  │ 1 copy              │ n/a        │ 1/any           │
       ├────────┼─────────────────────┼────────────┼─────────────────┤
       │        │                     │            │                 │
       │DUP     │ 2 copies / 1 device │ n/a        │ 1/1             │
       ├────────┼─────────────────────┼────────────┼─────────────────┤
       │        │                     │            │                 │
       │RAID0   │ n/a                 │ 1 to N     │ 2/any           │
       ├────────┼─────────────────────┼────────────┼─────────────────┤
       │        │                     │            │                 │
       │RAID10  │ 2 copies            │ 1 to N     │ 4/any           │
       ├────────┼─────────────────────┼────────────┼─────────────────┤
       │        │                     │            │                 │
       │RAID5   │ 2 copies            │ 3 to N - 1 │ 2/any           │
       ├────────┼─────────────────────┼────────────┼─────────────────┤
       │        │                     │            │                 │
       │RAID6   │ 3 copies            │ 3 to N - 2 │ 3/any           │
       └────────┴─────────────────────┴────────────┴─────────────────┘
二、建立一個btrfs檔案系統
man mkfs.btrfs
       -d|--data <profile>
           Specify the profile for the data block groups. Valid values are raid0, raid1, raid5, raid6, raid10 or single, (case does not matter).
       指定data資料型別的profile,需要結合塊裝置,如果底層塊裝置沒有冗餘措施,建議這裡使用冗餘儲存。否則存單份即可,single。
       如果有多個塊裝置,可以選擇是否需要條帶,條帶話可以帶來好的負載均衡效能。
       -m|--metadata <profile>
           Specify the profile for the metadata block groups. Valid values are raid0, raid1, raid5, raid6, raid10, single or dup, (case does not matter).

           A single device filesystem will default to DUP, unless a SSD is detected. Then it will default to single. The detection is based on the value of /sys/block/DEV/queue/rotational, where DEV is the short name of the device.
           This is because SSDs can remap the blocks internally to a single copy thus deduplicating them which negates the purpose of increased metadata redunancy and just wastes space.

           Note that the rotational status can be arbitrarily set by the underlying block device driver and may not reflect the true status (network block device, memory-backed SCSI devices etc). Use the options --data/--metadata
           to avoid confusion.
       指定metadata資料型別的profile,需要結合塊裝置,如果底層塊裝置沒有冗餘措施,建議這裡使用冗餘儲存。否則存單份即可,single。
       如果有多個塊裝置,可以選擇是否需要條帶,條帶話可以帶來好的負載均衡效能。
       -n|--nodesize <size>
           Specify the nodesize, the tree block size in which btrfs stores metadata. The default value is 16KiB (16384) or the page size, whichever is bigger. Must be a multiple of the sectorsize, but not larger than 64KiB (65536).
           Leafsize always equals nodesize and the options are aliases.

           Smaller node size increases fragmentation but lead to higher b-trees which in turn leads to lower locking contention. Higher node sizes give better packing and less fragmentation at the cost of more expensive memory
           operations while updating the metadata blocks.

               Note
               versions up to 3.11 set the nodesize to 4k.
       對於資料庫應用,建議使用4K,減少衝突。
       -f|--force
           Forcibly overwrite the block devices when an existing filesystem is detected. By default, mkfs.btrfs will utilize libblkid to check for any known filesystem on the devices. Alternatively you can use the wipefs utility to
           clear the devices.

有多個塊裝置時,可以直接指定多個塊裝置進行格式化。
並且可以為metadata和data指定不同的profile級別。
例如:
[root@digoal ~]# mkfs.btrfs -m raid10 -d raid10 -n 4096 -f /dev/sdb /dev/sdc /dev/sdd /dev/sde
btrfs-progs v4.3.1
See http://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               00036b8e-7914-41a9-831a-d35c97202eeb
Node size:          4096
Sector size:        4096
Filesystem size:    80.00GiB
Block group profiles:  可以看到已分配的block group,三種資料型別,分別分配了多少容量。
  Data:             RAID10            2.01GiB
  Metadata:         RAID10            2.01GiB
  System:           RAID10           20.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  4
Devices:
   ID        SIZE  PATH
    1    20.00GiB  /dev/sdb
    2    20.00GiB  /dev/sdc
    3    20.00GiB  /dev/sdd
    4    20.00GiB  /dev/sde
下面這個,metadata使用raid1,不使用條帶。而data使用raid10,使用條帶。可以看到system和metadata一樣,使用了raid1。
不過建議將metadata和data設定為一致的風格。
[root@digoal ~]# mkfs.btrfs -m raid1 -d raid10 -n 4096 -f /dev/sdb /dev/sdc /dev/sdd /dev/sde
btrfs-progs v4.3.1
See http://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               4eef7b0c-73a3-430c-bb61-028b37d1872b
Node size:          4096
Sector size:        4096
Filesystem size:    80.00GiB
Block group profiles:
  Data:             RAID10            2.01GiB
  Metadata:         RAID1             1.01GiB
  System:           RAID1            12.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  4
Devices:
   ID        SIZE  PATH
    1    20.00GiB  /dev/sdb
    2    20.00GiB  /dev/sdc
    3    20.00GiB  /dev/sdd
    4    20.00GiB  /dev/sde

[root@digoal ~]# btrfs filesystem show /dev/sdb
Label: none  uuid: 4eef7b0c-73a3-430c-bb61-028b37d1872b
        Total devices 4 FS bytes used 28.00KiB
        devid    1 size 20.00GiB used 2.00GiB path /dev/sdb
        devid    2 size 20.00GiB used 2.00GiB path /dev/sdc
        devid    3 size 20.00GiB used 1.01GiB path /dev/sdd
        devid    4 size 20.00GiB used 1.01GiB path /dev/sde

三、mount btrfs檔案系統
如果你的btrfs管理了多個塊裝置,那麼你有兩種選擇來mount它,第一種是直接指定多個塊裝置,第二種是先scan,再mount,因為某些系統重新啟動或者btrfs模組重新載入後,需要重新scan來識別。
例如:
[root@digoal ~]# btrfs device scan
Scanning for Btrfs filesystems
[root@digoal ~]# mount /dev/sdb /data01
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 00036b8e-7914-41a9-831a-d35c97202eeb
        Total devices 4 FS bytes used 1.03MiB
        devid    1 size 20.00GiB used 2.01GiB path /dev/sdb
        devid    2 size 20.00GiB used 2.01GiB path /dev/sdc
        devid    3 size 20.00GiB used 2.01GiB path /dev/sdd
        devid    4 size 20.00GiB used 2.01GiB path /dev/sde
或者
[root@digoal ~]# mount -o device=/dev/sdb,device=/dev/sdc,device=/dev/sdd,device=/dev/sde /dev/sdb /data01
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 00036b8e-7914-41a9-831a-d35c97202eeb
        Total devices 4 FS bytes used 1.03MiB
        devid    1 size 20.00GiB used 2.01GiB path /dev/sdb
        devid    2 size 20.00GiB used 2.01GiB path /dev/sdc
        devid    3 size 20.00GiB used 2.01GiB path /dev/sdd
        devid    4 size 20.00GiB used 2.01GiB path /dev/sde
或者
# vi /etc/fstab

UUID=00036b8e-7914-41a9-831a-d35c97202eeb /data01 btrfs ssd,ssd_spread,discard,noatime,nodiratime,compress=no,space_cache,recovery,defaults 0 0
或者
UUID=00036b8e-7914-41a9-831a-d35c97202eeb /data01 btrfs device=/dev/sdb,device=/dev/sdc,device=/dev/sdd,device=/dev/sde,ssd,ssd_spread,discard,noatime,nodiratime,compress=no,space_cache,recovery,defaults 0 0
四、mount引數建議
https://btrfs.wiki.kernel.org/index.php/Mount_options
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/btrfs-mount.html
4.1 ssd相關引數建議
discard,ssd,ssd_spread

discard
    Use this option to enable discard/TRIM on freed blocks.
ssd
    Turn on some of the SSD optimized behaviour within btrfs. This is enabled automatically by checking /sys/block/sdX/queue/rotational to be zero. This does not enable discard/TRIM!

ssd_spread
    Mount -o ssd_spread is more strict about finding a large unused region of the disk for new allocations, which tends to fragment the free space more over time. It is often faster on the less expensive SSD devices. 廉價ssd硬碟建議開啟ssd_spread

nossd
The ssd mount option only enables the ssd option. Use the nossd option to disable it.

4.2 效能相關引數建議
noatime,nodiratime,space_cache

noatime,nodiratime
    as discussed in the mailing list noatime mount option might speed up your file system, especially in case you have lots of snapshots. Each read access to a file is supposed to update its unix access time. COW will happen and will make even more writes. Default is now relatime which updates access times less often.
space_cache
    Btrfs stores the free space data on-disk to make the caching of a block group much quicker. It`s a persistent change and is safe to boot into old kernels.

4.3 其他建議引數建議
defaults,compress=no,recovery

compress=no
recovery
    Enable autorecovery upon mount; currently it scans list of several previous tree roots and tries to use the first readable. The information about the tree root backups is stored by kernels starting with 3.2, older kernels do not and thus no recovery can be done.
thread_pool=number 
    The number of worker threads to allocate.

4.4 Linux塊裝置IO排程策略建議
    deadline

五、resize btrfs檔案系統
btrfs檔案系統整合了塊裝置的管理,正如前面所述,btrfs儲存了data, metadata, system三種資料型別。當任何一種資料型別需要空間時,btrfs會為對應的資料型別分配空間(block group),這些分配的空間就來自btrfs管理的塊裝置。
所以,resize btrfs,實際上就是resize 塊裝置的使用空間。對於單個塊裝置的btrfs,resize btrfs root掛載點和resize block dev的效果是一樣的。

5.1 擴大
單位支援k,m,g。
# btrfs filesystem resize amount /mount-point
# btrfs filesystem show /mount-point
# btrfs filesystem resize devid:amount /mount-point
# btrfs filesystem resize devid:max /mount-point

對於單個塊裝置的btrfs,不需要指定塊裝置ID
# btrfs filesystem resize +200M /btrfssingle
Resize `/btrfssingle` of `+200M`

對於多個塊裝置的btrfs,需要指定塊裝置ID
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 00036b8e-7914-41a9-831a-d35c97202eeb
        Total devices 4 FS bytes used 2.12GiB
        devid    1 size 19.00GiB used 4.01GiB path /dev/sdb
        devid    2 size 20.00GiB used 4.01GiB path /dev/sdc
        devid    3 size 20.00GiB used 4.01GiB path /dev/sdd
        devid    4 size 20.00GiB used 4.01GiB path /dev/sde
[root@digoal ~]# btrfs filesystem resize `1:+1G` /data01
Resize `/data01` of `1:+1G`
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 00036b8e-7914-41a9-831a-d35c97202eeb
        Total devices 4 FS bytes used 2.12GiB
        devid    1 size 20.00GiB used 4.01GiB path /dev/sdb
        devid    2 size 20.00GiB used 4.01GiB path /dev/sdc
        devid    3 size 20.00GiB used 4.01GiB path /dev/sdd
        devid    4 size 20.00GiB used 4.01GiB path /dev/sde
可以指定max,表示使用塊裝置的所有容量。
[root@digoal ~]# btrfs filesystem resize `1:max` /data01
Resize `/data01` of `1:max`

5.2 縮小
# btrfs filesystem resize amount /mount-point
# btrfs filesystem show /mount-point
# btrfs filesystem resize devid:amount /mount-point
類似:
# btrfs filesystem resize -200M /btrfssingle
Resize `/btrfssingle` of `-200M`

5.3 設定固定大小
# btrfs filesystem resize amount /mount-point
# btrfs filesystem resize 700M /btrfssingle
Resize `/btrfssingle` of `700M`

# btrfs filesystem show /mount-point
# btrfs filesystem resize devid:amount /mount-point

同樣支援max:
[root@digoal ~]# btrfs filesystem resize `max` /data01
Resize `/data01` of `max`
[root@digoal ~]# btrfs filesystem resize `2:max` /data01
Resize `/data01` of `2:max`
[root@digoal ~]# btrfs filesystem resize `3:max` /data01
Resize `/data01` of `3:max`
[root@digoal ~]# btrfs filesystem resize `4:max` /data01
Resize `/data01` of `4:max`

六、btrfs檔案系統卷管理
btrfs檔案系統多個塊裝置如何管理
MULTIPLE DEVICES
       Before mounting a multiple device filesystem, the kernel module must know the association of the block devices that are attached to the filesystem UUID.

       There is typically no action needed from the user. On a system that utilizes a udev-like daemon(自動識別, 不需要scan, centos 7是這樣的), any new block device is automatically registered. The rules call btrfs device scan.

       The same command can be used to trigger the device scanning if the btrfs kernel module is reloaded (naturally all previous information about the device registration is lost).

       Another possibility is to use the mount options device to specify the list of devices to scan at the time of mount.

           # mount -o device=/dev/sdb,device=/dev/sdc /dev/sda /mnt

           Note
           that this means only scanning, if the devices do not exist in the system, mount will fail anyway. This can happen on systems without initramfs/initrd and root partition created with RAID1/10/5/6 profiles. The mount
           action can happen before all block devices are discovered. The waiting is usually done on the initramfs/initrd systems.
否則,在作業系統重啟或者btrfs模組過載後,需要先scan 一下,才能mount使用了多個塊裝置的btrfs。

七、負載均衡
使用raid0, raid10, raid5, raid6時,支援條帶,一個block group將橫跨多個塊裝置,所以有負載均衡的作用。

八、單到多轉換
如果一開始btrfs只用了一個塊裝置,要轉換成raid1,如何轉換?
[root@digoal ~]# mkfs.btrfs -m single -d single -n 4096 -f /dev/sdb
btrfs-progs v4.3.1
See http://btrfs.wiki.kernel.org for more information.
Label:              (null)
UUID:               165f59f6-77b5-4421-b3d8-90884d3c0b40
Node size:          4096
Sector size:        4096
Filesystem size:    20.00GiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         single            8.00MiB
  System:           single            4.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1    20.00GiB  /dev/sdb

[root@digoal ~]# mount -o ssd,ssd_spread,discard,noatime,nodiratime,compress=no,space_cache,recovery,defaults /dev/sdb /data01
新增塊裝置
[root@digoal ~]# btrfs device add /dev/sdc /data01 -f
使用balance線上轉換,其中-m指metadata, -d指data
[root@digoal ~]# btrfs balance start -dconvert=raid1 -mconvert=raid1 /data01
Done, had to relocate 3 out of 3 chunks
這裡的chunks指的就是block group.

[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 2 FS bytes used 360.00KiB
        devid    1 size 20.00GiB used 1.28GiB path /dev/sdb
        devid    2 size 20.00GiB used 1.28GiB path /dev/sdc

檢視balance任務是否完成
[root@digoal ~]# btrfs balance status -v /data01
No balance found on `/data01`

還可以繼續轉換,例如data我想用raid0,可以這樣。
[root@digoal ~]# btrfs balance start -dconvert=raid0 /data01
Done, had to relocate 1 out of 3 chunks
這裡的chunks指的就是block group.

九、新增塊裝置,資料重分佈。
和前面的轉換差不多,只是不改-d -m的profile。
[root@digoal ~]# btrfs device add /dev/sdd/data01 -f
[root@digoal ~]# btrfs device add /dev/sde/data01 -f

[root@digoal ~]# btrfs filesystem show /dev/sdb
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 4 FS bytes used 616.00KiB
        devid    1 size 20.00GiB used 1.28GiB path /dev/sdb
        devid    2 size 20.00GiB used 1.28GiB path /dev/sdc
        devid    3 size 20.00GiB used 0.00B path /dev/sdd
        devid    4 size 20.00GiB used 0.00B path /dev/sde
資料重分佈
[root@digoal ~]# btrfs balance start /data01
Done, had to relocate 3 out of 3 chunks
[root@digoal ~]# btrfs filesystem show /dev/sdb
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 4 FS bytes used 1.29MiB
        devid    1 size 20.00GiB used 1.03GiB path /dev/sdb
        devid    2 size 20.00GiB used 1.03GiB path /dev/sdc
        devid    3 size 20.00GiB used 2.00GiB path /dev/sdd
        devid    4 size 20.00GiB used 2.00GiB path /dev/sde
將metadata轉換為raid10儲存,重分佈。
[root@digoal ~]# btrfs balance start -mconvert=raid10 /data01
Done, had to relocate 2 out of 3 chunks
[root@digoal ~]# btrfs filesystem show /dev/sdb
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 4 FS bytes used 1.54MiB
        devid    1 size 20.00GiB used 1.53GiB path /dev/sdb
        devid    2 size 20.00GiB used 1.53GiB path /dev/sdc
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    4 size 20.00GiB used 1.53GiB path /dev/sde
檢視重分佈後的三種型別的使用量。
[root@digoal ~]# btrfs filesystem df /data01
Data, RAID0: total=4.00GiB, used=1.25MiB
System, RAID10: total=64.00MiB, used=4.00KiB
Metadata, RAID10: total=1.00GiB, used=36.00KiB
GlobalReserve, single: total=4.00MiB, used=0.00B

十、刪除塊裝置(必須確保達到該profile級別最小個數的塊裝置)
[root@digoal ~]# btrfs filesystem df /data01
Data, RAID10: total=2.00GiB, used=1.00GiB
System, RAID10: total=64.00MiB, used=4.00KiB
Metadata, RAID10: total=1.00GiB, used=1.18MiB
GlobalReserve, single: total=4.00MiB, used=0.00B
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 4 FS bytes used 1.00GiB
        devid    1 size 20.00GiB used 1.53GiB path /dev/sdb
        devid    2 size 20.00GiB used 1.53GiB path /dev/sdc
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    4 size 20.00GiB used 1.53GiB path /dev/sde
因為raid10至少需要4個塊裝置,所以刪除失敗
[root@digoal ~]# btrfs device delete /dev/sdb /data01
ERROR: error removing device `/dev/sdb`: unable to go below four devices on raid10

先轉換為raid1,再演示
[root@digoal ~]# btrfs balance start -mconvert=raid1 -dconvert=raid1 /data01
Done, had to relocate 3 out of 3 chunks
[root@digoal ~]# btrfs filesystem df /data01
Data, RAID1: total=2.00GiB, used=1.00GiB
System, RAID1: total=32.00MiB, used=4.00KiB
Metadata, RAID1: total=1.00GiB, used=1.11MiB
GlobalReserve, single: total=4.00MiB, used=0.00B
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 4 FS bytes used 1.00GiB
        devid    1 size 20.00GiB used 1.03GiB path /dev/sdb
        devid    2 size 20.00GiB used 2.00GiB path /dev/sdc
        devid    3 size 20.00GiB used 2.00GiB path /dev/sdd
        devid    4 size 20.00GiB used 1.03GiB path /dev/sde
raid1最少只需要2個塊裝置,所以可以刪除兩個。
[root@digoal ~]# btrfs device delete /dev/sdb /data01
[root@digoal ~]# btrfs device delete /dev/sdc /data01
[root@digoal ~]# btrfs filesystem df /data01
Data, RAID1: total=2.00GiB, used=1.00GiB
System, RAID1: total=32.00MiB, used=4.00KiB
Metadata, RAID1: total=256.00MiB, used=1.12MiB
GlobalReserve, single: total=4.00MiB, used=0.00B
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 2 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 2.28GiB path /dev/sdd
        devid    4 size 20.00GiB used 2.28GiB path /dev/sde
繼續刪除則失敗
[root@digoal ~]# btrfs device delete /dev/sdd /data01
ERROR: error removing device `/dev/sdd`: unable to go below two devices on raid1
再加回去
[root@digoal ~]# btrfs device add /dev/sdb /data01
[root@digoal ~]# btrfs device add /dev/sdc /data01
[root@digoal ~]# btrfs balance start /data01
Done, had to relocate 4 out of 4 chunks
轉換為raid5
[root@digoal ~]# btrfs balance start -mconvert=raid5 -dconvert=raid5 /data01
Done, had to relocate 4 out of 4 chunks
可以刪除1個,因為raid5最少需要3個塊裝置
[root@digoal ~]# btrfs device delete /dev/sde /data01

[root@digoal ~]# btrfs filesystem df /data01
Data, RAID5: total=2.00GiB, used=1.00GiB
System, RAID5: total=64.00MiB, used=4.00KiB
Metadata, RAID5: total=1.00GiB, used=1.12MiB
GlobalReserve, single: total=4.00MiB, used=0.00B
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 3 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    5 size 20.00GiB used 1.53GiB path /dev/sdb
        devid    6 size 20.00GiB used 1.53GiB path /dev/sdc

十、處理壞塊裝置。
假設當前btrfs管理了3個塊裝置,其中data profile=raid5, metadata profile=raid5, system profile=raid1
設定好這樣的狀態:
[root@digoal ~]# btrfs balance start -sconvert=raid1 -f /data01
Done, had to relocate 1 out of 3 chunks

[root@digoal ~]# btrfs fi show
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 3 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    5 size 20.00GiB used 1.50GiB path /dev/sdb
        devid    6 size 20.00GiB used 1.53GiB path /dev/sdc

[root@digoal ~]# btrfs fi df /data01
Data, RAID5: total=2.00GiB, used=1.00GiB
System, RAID1: total=32.00MiB, used=4.00KiB
Metadata, RAID5: total=1.00GiB, used=1.12MiB
GlobalReserve, single: total=4.00MiB, used=0.00B

刪除一個塊裝置檔案,模擬壞裝置
[root@digoal ~]# rm -f /dev/sdb

[root@digoal ~]# btrfs fi df /data01
Data, RAID5: total=2.00GiB, used=1.00GiB
System, RAID1: total=32.00MiB, used=4.00KiB
Metadata, RAID5: total=1.00GiB, used=1.12MiB
GlobalReserve, single: total=4.00MiB, used=0.00B

現在btrfs顯示有一些裝置處於missing狀態。
[root@digoal ~]# btrfs fi show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 3 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    6 size 20.00GiB used 1.53GiB path /dev/sdc
        *** Some devices missing

umount掉之後,就不能掛載上來了。必須使用degraded模式掛載。
[root@digoal ~]# umount /data01

[root@digoal ~]# mount /dev/sdc /data01
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

dmesg|tail -n 5
[ 1311.617838] BTRFS: open /dev/sdb failed
[ 1311.618763] BTRFS info (device sdc): disk space caching is enabled
[ 1311.618767] BTRFS: has skinny extents
[ 1311.623540] BTRFS: failed to read chunk tree on sdc
[ 1311.648198] BTRFS: open_ctree failed

你可以看到超級塊在sdc sdd是好的。
[root@digoal ~]# btrfs rescue super-recover -v /dev/sdc
All Devices:
        Device: id = 3, name = /dev/sdd
        Device: id = 6, name = /dev/sdc

Before Recovering:
        [All good supers]:
                device name = /dev/sdd
                superblock bytenr = 65536

                device name = /dev/sdd
                superblock bytenr = 67108864

                device name = /dev/sdc
                superblock bytenr = 65536

                device name = /dev/sdc
                superblock bytenr = 67108864

        [All bad supers]:

All supers are valid, no need to recover
所以可以使用degraded掛載。
[root@digoal ~]# mount -t btrfs -o degraded /dev/sdc /data01

[root@digoal ~]# btrfs fi show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 3 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    6 size 20.00GiB used 1.53GiB path /dev/sdc
        *** Some devices missing

[root@digoal ~]# btrfs fi df /data01
Data, RAID5: total=2.00GiB, used=1.00GiB
System, RAID1: total=32.00MiB, used=4.00KiB
Metadata, RAID5: total=1.00GiB, used=1.12MiB
GlobalReserve, single: total=4.00MiB, used=0.00B

刪除missing的塊裝置,同樣需要保證profile對應的級別,至少要滿足最少的資料塊格式,因為用了raid5,所以至少要3個塊裝置。刪除失敗。
[root@digoal ~]# btrfs device delete missing /data01
ERROR: error removing device `missing`: unable to go below two devices on raid5

你可以先新增塊裝置進來,然後再刪除missing的裝置。
[root@digoal ~]# btrfs device add /dev/sde /data01

[root@digoal ~]# btrfs fi show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 4 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    6 size 20.00GiB used 1.53GiB path /dev/sdc
        devid    7 size 20.00GiB used 0.00B path /dev/sde
        *** Some devices missing

[root@digoal ~]# btrfs device delete missing /data01

[root@digoal ~]# btrfs fi show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 3 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    6 size 20.00GiB used 1.53GiB path /dev/sdc
        devid    7 size 20.00GiB used 1.50GiB path /dev/sde

重新平衡。
[root@digoal ~]# btrfs balance start /data01
Done, had to relocate 3 out of 3 chunks

[root@digoal ~]# btrfs fi show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 3 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    6 size 20.00GiB used 1.50GiB path /dev/sdc
        devid    7 size 20.00GiB used 1.53GiB path /dev/sde

[小結]
1. 建議的mkfs引數
多個塊裝置時,建議
-n 4096 -m raid10 -d raid10
或
-n 4096 -m raid10 -d raid5
...
單個塊裝置建議(非SSD)
-n 4096 -m DUP -d single
單個塊裝置建議(SSD)
-n 4096 -m single -d single

2. 建議的mount引數
discard,ssd,ssd_spread,noatime,nodiratime,space_cache,defaults,compress=no,recovery

3. 建議的IO排程策略
deadline

4. btrfs 架構
搞清幾個概念:
1. block group, chunk
2. profile
3. 三種資料型別
4. block dev

5. 新增塊裝置後,記得執行重分佈。

6. 搞清楚man btrfs以及所有子命令所有的內容.

[參考]
1. man mkfs.btrfs
2. man btrfs
3. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/ch-btrfs.html
4. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/index.html
5. https://www.suse.com/events/susecon/sessions/presentations/SUSECon-2012-TT1301.pdf
6. https://www.suse.com/documentation/
7. https://wiki.gentoo.org/wiki/Btrfs
8. https://wiki.gentoo.org/wiki/ZFS


相關文章