【I/O scheduler】Linux的磁碟排程策略

ballontt發表於2013-08-10

Linux

磁碟的排程演算法有多種，先來先服務（First Come,First Server，FCFS），最短尋道優先（Shortest Seek Time First,SSTF）,掃描演算法SCAN等等。

這裡介紹Linux支援的4種磁碟排程演算法：

The Schedulers

There are currently 4 available:

Noop Scheduler
Anticipatory IO Scheduler ("as scheduler")
Deadline Scheduler
Complete Fair Queueing Scheduler ("cfq scheduler")

Noop Scheduler

This scheduler only implements request merging.

在Linux2.4或更早的版本的排程程式,那時只有這一種I/O排程演算法.
NOOP演算法的全寫為No Operation。該演算法實現了最最簡單的FIFO佇列，所有IO請求大致按照先來後到的順序進行操作。之所以說“大致”，原因是NOOP在FIFO的基礎上還做了相鄰IO請求的合併，並不是完完全全按照先進先出的規則滿足IO請求。NOOP假定I/O請求由驅動程式或者裝置做了優化或者重排了順序(就像一個智慧控制器完成的工作那樣)。在有些SAN環境下，這個選擇可能是最好選擇。Noop 對於 IO 不那麼操心，對所有的 IO請求都用 FIFO 佇列形式處理，預設認為 IO 不會存在效能問題。這也使得 CPU 也不用那麼操心。當然，對於複雜一點的應用型別，使用這個排程器，使用者自己就會非常操心。
NOOP對於快閃記憶體裝置,RAM,嵌入式系統是最好的選擇.

Anticipatory IO Scheduler ("as scheduler")

The anticipatory scheduler is the default scheduler in older 2.6 kernels – if you’ve not specified one, this is the one that will be loaded. It implements request merging, a one-way elevator, read and write request batching, and attempts some anticapatory reads by holding off a bit after a read batch if it thinks a user is going to ask for more data. It tries to optimise for physical disks by avoiding head movements if possible – one downside to this is that it probably give highly erratic performance on database or storage systems.

CFQ和DEADLINE考慮的焦點在於滿足零散IO請求上。對於連續的IO請求，比如順序讀，並沒有做優化。為了滿足隨機IO和順序IO混合的場景，Linux還支援ANTICIPATORY排程演算法。ANTICIPATORY的在DEADLINE的基礎上，為每個讀IO都設定了6ms 的等待時間視窗。如果在這6ms內OS收到了相鄰位置的讀IO請求，就可以立即滿足 Anticipatory scheduler（as) 曾經一度是 Linux 2.6 Kernel 的 IO scheduler 。Anticipatory 的中文含義是”預料的, 預想的”, 這個詞的確揭示了這個演算法的特點，簡單的說，有個 IO 發生的時候，如果又有程式請求 IO 操作，則將產生一個預設的 6 毫秒猜測時間，猜測下一個程式請求 IO 是要幹什麼的。這對於隨即讀取會造成比較大的延時，對資料庫應用很糟糕，而對於 Web Server 等則會表現的不錯。這個演算法也可以簡單理解為面向低速磁碟的，因為那個”猜測”實際上的目的是為了減少磁頭移動時間。

Deadline Scheduler

The deadline scheduler implements request merging, a one-way elevator, and imposes a deadline on all operations to prevent resource starvation. Because writes return instantly within linux, with the actual data being held in cache, the deadline scheduler will also prefer readers – as long as the deadline for a write request hasn’t passed. The kernel docs suggest this is the preferred scheduler for database systems, especially if you have TCQ aware disks, or any system with high disk performance.

DEADLINE在CFQ的基礎上，解決了IO請求餓死的極端情況。除了CFQ本身具有的IO排序佇列之外，DEADLINE額外分別為讀IO和寫IO提供了FIFO佇列。讀FIFO佇列的最大等待時間為500ms，寫FIFO佇列的最大等待時間為5s。FIFO佇列內的IO請求優先順序要比CFQ佇列中的高，，而讀FIFO佇列的優先順序又比寫FIFO佇列的優先順序高。

優先順序可以表示如下：
FIFO(Read) > FIFO(Write) > CFQ

deadline 演算法保證對於既定的 IO 請求以最小的延遲時間，從這一點理解，對於 DSS 應用應該會是很適合的。

Complete Fair Queueing Scheduler ("cfq scheduler")

The complete fair queueing scheduler implements both request merging and the elevator, and attempts to give all users of a particular device the same number of IO requests over a particular time interval. This should make it more efficient for multiuser systems. It seems that Novel SLES sets cfq as the scheduler by default, as does the latest Ubuntu release. As of the 2.6.18 kernel, this is the default schedular in kernel.org releases.

CFQ演算法的全寫為Completely Fair Queuing。該演算法的特點是按照IO請求的地址進行排序，而不是按照先來後到的順序來進行響應。
在傳統的SAS盤上，磁碟尋道花去了絕大多數的IO響應時間。CFQ的出發點是對IO地址進行排序，以儘量少的磁碟旋轉次數來滿足儘可能多的IO請求。在CFQ演算法下，SAS盤的吞吐量大大提高了。但是相比於NOOP的缺點是：先來的IO請求並不一定能被滿足，可能會出現餓死的情況。
Completely Fair Queuing （cfq, 完全公平佇列) 在 2.6.18 取代了 Anticipatory scheduler 成為 Linux Kernel 預設的 IO scheduler 。cfq 對每個程式維護一個 IO 佇列，各個程式發來的 IO 請求會被 cfq 以輪循方式處理。也就是對每一個 IO 請求都是公平的。這使得 cfq 很適合離散讀的應用(eg: OLTP DB)。我所知道的企業級 Linux 發行版中，SuSE Linux 好像是最先預設用 cfq 的.

Changing Schedulers

The most reliable way to change schedulers is to set the kernel option ‘elevator’ at boot time. You can set it to one of "as", "cfq", "deadline" or "noop", to set the appropriate scheduler.

It seems under more recent 2.6 kernels (2.6.11, possibly earlier), you can change the scheduler at runtime by echoing the name of the scheduler into /sys/block//queue/scheduler, where devicename is the base name of the block device, eg sda for /dev/sda

Which one should I use?

I’ve not personally done any testing on this, so I can’t speak from experience yet. The anticipatory scheduler will be the default one for a reason however – it is optimised for the common case. If you’ve only got single disk systems (ie, no RAID – hardware or software) then this scheduler is probably the right one for you. If it’s a multiuser system, you will probably find cfq or deadline providing better performance, and the numbers seem to back deadline giving the best performance for database systems.

Tuning the IO schedulers

The schedulers may have parameters that can be tuned at runtime. Read the linux documentation on the schedulers listed in theReferences section below

More information

Read the documents mentioned in the References section below, especially the linux kernel documentation on the anticipatory and deadline schedulers.

ballontt
2013/8/10
---The End---

如需轉載，請標明出處和連結，謝謝！

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/27425054/viewspace-768224/，如需轉載，請註明出處，否則將追究法律責任。

Linux I/O排程器
2019-03-04
Linux
如何更改Linux的I/O排程器
2018-03-13
Linux
Linux下磁碟I/O測試
2021-08-31
Linux
如何監測 Linux 的磁碟 I/O 效能
2022-09-19
Linux
在Linux下測試磁碟的I/O
2006-04-20
Linux
排程器簡介，以及Linux的排程策略
2020-03-26
Linux
goroutine 排程器（scheduler）
2018-01-14
Go
Linux程式排程策略
2016-11-22
Linux
Pod的排程是由排程器（kube-scheduler）
2024-10-11
優化磁碟I/O
2015-07-22
優化
Linux系統監控之磁碟I/O篇
2012-12-17
Linux
Linux下如何測試磁碟I/O: ( hdparm -t)
2006-08-17
Linux
【MySQL】事件排程器 (Event Scheduler)
2013-04-04
MySql事件
Flink排程之排程器、排程策略、排程模式
2023-03-08
模式
減少ORACLE中的磁碟I/O(轉)
2007-08-14
Oracle
linux之修改磁碟排程演算法
2018-09-04
Linux演算法
AIX 下磁碟 I/O 效能分析[轉]
2014-01-14
AI
使用iostat監控磁碟I/O
2014-07-19
iOS
【AIX】AIX 下磁碟 I/O 效能分析
2012-06-14
AI
MySQL之磁碟I/O過高排查
2024-04-16
MySql
oracle排程程式作業dbms_scheduler
2018-08-20
Oracle
oracle使用DBMS_SCHEDULER排程作業
2018-08-20
Oracle
golang 原始碼分析之scheduler排程器
2020-11-17
Golang原始碼
配置hadoop 使用fair scheduler排程器
2014-08-08
HadoopAI
Oracle 排程程式作業( dbms_scheduler )
2012-03-24
Oracle
AIX系統磁碟I/O效能評估
2009-07-10
AI
linux檢視 CPU，記憶體，網路流量和磁碟 I/O
2009-12-30
Linux記憶體
容器化RDS｜排程策略
2017-11-21
Go runtime 排程器精講（五）：排程策略
2024-09-14
Go
Linux之CPU排程策略和CPU親和性
2024-11-25
Linux
磁碟效能測試工具 flexible I/O tester
2011-11-09
Flex
Linux下的5種I/O模型與3組I/O複用
2022-02-03
Linux模型
在 Linux 中如何使用 iotop 和 iostat 監控磁碟 I/O 活動？
2019-05-05
LinuxiOS
Linux核心排程分析（程式排程）
2018-03-01
Linux
RxJava原始碼解析(二)—執行緒排程器Scheduler
2019-03-04
RxJava原始碼執行緒
kube-scheduler原始碼分析（3）-搶佔排程分析
2022-03-13
原始碼
Linux 下的I/O效能分析 iotop
2011-03-23
Linux
深入 Linux I/O 重定向
2017-01-19
Linux