系統診斷小技巧(12):如何確定執行緒是否因CPU資源波動

寧希波若發表於2018-09-01

引子

執行緒可能因為CPU資源不足或者因為–比如等待網路資料–而波動。這在監控上來看,就是業務波動了。但是確定這一點並不容易。

第一個難點是現場難抓。如果是CPU打滿或者負載很高,現場復現了,但是可能捕捉資料的執行緒沒有機會執行。如何解決這個問題我們在另一個小技巧中討論了,這裡略過。

第二個難點是使用什麼資料來確定執行緒因為CPU資源波動了。下面我們展開討論下。

vruntime

Linux 2.6.33引入了CFS排程器task_strcut也因之加了sched_entity結構。sched_entity結構有一個欄位是我們感興趣的:vruntime

struct sched_entity {
    /* For load-balancing: */
    struct load_weight        load;
    unsigned long            runnable_weight;
    struct rb_node            run_node;
    struct list_head        group_node;
    unsigned int            on_rq;

    u64                exec_start;
    u64                sum_exec_runtime;
    u64                vruntime; // 我們要使用的欄位
    u64                prev_sum_exec_runtime;

    u64                nr_migrations;

    struct sched_statistics        statistics;

#ifdef CONFIG_FAIR_GROUP_SCHED
    int                depth;
    struct sched_entity        *parent;
    /* rq on which this entity is (to be) queued: */
    struct cfs_rq            *cfs_rq;
    /* rq "owned" by this entity/group: */
    struct cfs_rq            *my_q;
#endif

#ifdef CONFIG_SMP
    /*
     * Per entity load average tracking.
     *
     * Put into separate cache line so it does not
     * collide with read-mostly values above.
     */
    struct sched_avg        avg;
#endif
};

vruntime代表的是什麼呢?核心文件是這麼說的

In CFS the virtual runtime is expressed and tracked via the per-task
p->se.vruntime (nanosec-unit) value. This way, it`s possible to accurately

timestamp and measure the “expected CPU time” a task should have gotten.

[ small detail: on “ideal” hardware, at any time all tasks would have the same

p->se.vruntime value — i.e., tasks would execute simultaneously and no task
would ever get “out of balance” from the “ideal” share of CPU time. ]

CFS`s task picking logic is based on this p->se.vruntime value and it is thus

very simple: it always tries to run the task with the smallest p->se.vruntime
value (i.e., the task which executed least so far). CFS always tries to split
up CPU time between runnable tasks as close to “ideal multitasking hardware” as
possible.

Most of the rest of CFS`s design just falls out of this really simple concept,

with a few add-on embellishments like nice levels, multiprocessing and various
algorithm variants to recognize sleepers.

簡單說,vruntime代表了執行緒已經消耗的處理器時間。在理想的硬體上,執行緒應該有相同的vruntime

這就是我們的依據。

簡單實驗

測試指令碼和壓力工具

我們直接讓測試指令碼列印 vruntime資訊。壓力工具則是使用perf工具。

測試指令碼如下

#!/bin/bash

export LANG=C

for ((i=0;i<10;i++));do
    cat /proc/$$/sched
    sleep 1
done

壓力工具用法如下

root@pusf:~ perf bench sched messaging -l 10000

綜合起來,我們的測試方法如下

./demo > log/1.log; perf bench sched messaging -l 10000 & sleep 1;./demo > log/2.log

結果分析

我們看下得到的結果

nerd@pusf:/tmp$ egrep vruntime log/{1.log,2.log}
log/1.log:se.vruntime                                  :         22075.635863
log/1.log:se.vruntime                                  :         22076.476482
log/1.log:se.vruntime                                  :         22077.746821
log/1.log:se.vruntime                                  :         22080.537902
log/1.log:se.vruntime                                  :         22084.183713
log/1.log:se.vruntime                                  :         22087.243075
log/1.log:se.vruntime                                  :         22098.180655
log/1.log:se.vruntime                                  :         22099.594014
log/1.log:se.vruntime                                  :         22104.294012
log/1.log:se.vruntime                                  :         22108.701587
log/2.log:se.vruntime                                  :         82731.373434
log/2.log:se.vruntime                                  :         83382.975477
log/2.log:se.vruntime                                  :         78933.644191
log/2.log:se.vruntime                                  :         88235.425663
log/2.log:se.vruntime                                  :         93117.891657
log/2.log:se.vruntime                                  :        101234.834622
log/2.log:se.vruntime                                  :         95899.749367
log/2.log:se.vruntime                                  :        115403.719751
log/2.log:se.vruntime                                  :        124388.997744
log/2.log:se.vruntime                                  :        126752.972070
nerd@pusf:/tmp$

可見,vruntime的區別是顯著的。


相關文章