硬體平臺:某ARM SoC
軟體平臺:Linux
1 Runtime PM 簡介
在介紹 Runtime PM 之前,不妨先看看傳統的電源管理。傳統的電源管理機制,稱之為 System PM(System Suspend & Resume),當整個系統要進入睡眠時,依次呼叫各驅動模組的 suspend 函式,是一種粗粒度的電源管理,執行路徑相對也比較單一。
Runtime PM,直譯過來就是執行時電源管理。每個裝置(包括晶片內部件)各自處理好自身的電源管理工作,在不需要工作的時候儘量進入低功耗狀態,在需要工作時又重新起來。這樣即使整個系統沒有進入睡眠的情況下,裝置自身也可以根據實際工作情況決定是否要進入低功耗狀態,達到儘量省電的目的。
落實到程式碼上,當需要裝置工作時,通過呼叫 pm_runtime_get_sync 讓裝置 runtime resume;當工作完成後,通過呼叫 pm_runtime_put 讓裝置 runtime suspend,虛擬碼如下:
senddata() { pm_runtime_get_sync() do something ... pm_runtime_put() }
recvdata() { pm_runtime_get_sync() do something ... pm_runtime_put() }
pm_runtime_get_sync 和 pm_runtime_put 會維護一個引用計數,pm_runtime_get_sync 會增加引用計數,pm_runtime_put 會減少引用計數,當引用計數為0時,才會真正讓裝置進入低功耗。
Runtime PM 的概念是比較直觀的,對於某個裝置來說,就是誰需要我工作,就 get 我,否則就 put 我。但是提供的函式介面有點多,本文的重點不在這裡,就不一一介紹了,常用的如下:
- pm_runtime_get_sync //請求
- pm_runtime_put //釋放
- pm_runtime_use_autosuspend //啟用auto-suspend
- pm_runtime_set_autosuspend_delay //設定多久之後auto-suspend
- pm_runtime_put_autosuspend //帶auto-suspend的釋放
- pm_runtime_mark_last_busy //重置auto-suspend時間計數
Runtime PM 呼叫的時機,需要裝置驅動仔細地處理,不然可能引發功耗問題或者系統異常。
- 如果控制的粒度太細,比如封裝一個暫存器讀寫介面,每次去讀寫這個裝置的暫存器時,都先 get 再 put,那未免代價太高了;
- 如果控制的粒度太粗,比如裝置驅動起來後就一直 get,直到系統 suspend 才 put,那就和傳統的電源管理差不多了,失去了 Runtime PM 的意義。
- 如果 get / put 介面沒有成對呼叫,比如 get 的次數大於 put 的次數,那裝置就進不了低功耗。
- 如果 put 的時機不太合適,導致裝置下電後仍然有程式碼訪問裝置,那麼就可能出現異常。
2 問題案例 Kernel Panic:external abort on non-linefetch
external abort on non-linefetch,常見的原因是:讀寫晶片內某個部件的暫存器時,該部件的 power 和 clock 還沒有開啟。
案例一,通過使用者空間 spidev_test 程式測試 SPI 時報錯。
[ 86.901554] c2 Unhandled fault: external abort on non-linefetch (0x1008) at 0xe999a008 [ 86.909373] c2 pgd = 6fa82014 [ 86.912315] c2 [e999a008] *pgd=a80e1811, *pte=70a00653, *ppte=70a00453 [ 86.918813] c2 Internal error: : 1008 [#1] PREEMPT SMP ARM [ 86.942798] c2 CPU: 2 PID: 2653 Comm: spidev_test Tainted: G O 4.14.133+ #10 [ 86.950923] c2 Hardware name: Generic DT based system [ 86.955945] c2 task: 434d36b9 task.stack: a6b02495 [ 86.960713] c2 PC is at foo_spi_chipselect+0x8c/0xdc [ 86.965721] c2 LR is at 0xe999a000 [ 86.969095] c2 pc : [<c0671184>] lr : [<e999a000>] psr: a00f0013 [ 86.975590] c2 sp : c3a07db8 ip : 00000000 fp : c3a07dd4 [ 86.981039] c2 r10: 00000036 r9 : 00000003 r8 : 00000196 [ 86.986489] c2 r7 : 00000196 r6 : e999a008 r5 : c3a07db8 r4 : c067113c [ 86.993241] c2 r3 : c11adb28 r2 : e9898030 r1 : 00000030 r0 : e9898000 [ 86.999993] c2 Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 87.007349] c2 Control: 10c5383d Table: 8531806a DAC: 00000051 [ 87.013319] c2 Process spidev_test (pid: 2653, stack limit = 0xd6c1587c) ... [ 87.184082] c2 [<c0671184>] (foo_spi_chipselect) from [<c066b4c0>] (spi_set_cs+0x8c/0x90) [ 87.192294] c2 [<c066b4c0>] (spi_set_cs) from [<c066df88>] (spi_setup+0x128/0x1dc) [ 87.199814] c2 [<c066df88>] (spi_setup) from [<c066fa84>] (spidev_ioctl+0x26c/0x84c) [ 87.207521] c2 [<c066fa84>] (spidev_ioctl) from [<c02cda60>] (vfs_ioctl+0x28/0x44) [ 87.215048] c2 [<c02cda60>] (vfs_ioctl) from [<c02ce370>] (do_vfs_ioctl+0x7a8/0x900) [ 87.222748] c2 [<c02ce370>] (do_vfs_ioctl) from [<c02ce524>] (SyS_ioctl+0x5c/0x84) [ 87.230276] c2 [<c02ce524>] (SyS_ioctl) from [<c0108760>] (ret_fast_syscall+0x0/0x28)
從函式呼叫棧可以看出,spidev_test 程式通過 IOCTL 與 SPI driver 互動時,在 SPI driver 的 foo_spi_chipselect 函式中發生了錯誤。foo_spi_chipselect 函式的內容如下,可以看到它讀寫了 SPI Controller 的暫存器,但是操作之前並沒有呼叫 pm_runtime_get 讓 controller resume。
static void foo_spi_chipselect(struct spi_device *sdev, bool cs) { struct spi_controller *sctlr = sdev->controller; struct foo_spi *ss = spi_controller_get_devdata(sctlr); u32 val; val = readl_relaxed(ss->base + FOO_SPI_CTL0); /* The SPI controller will pull down CS pin if cs is 0 */ if (!cs) { val &= ~FOO_SPI_CS0_VALID; writel_relaxed(val, ss->base + FOO_SPI_CTL0); } else { val |= FOO_SPI_CSN_MASK; writel_relaxed(val, ss->base + FOO_SPI_CTL0); } }
我們可以改成如下程式碼解決問題。
static void foo_spi_chipselect(struct spi_device *sdev, bool cs) { struct spi_controller *sctlr = sdev->controller; struct foo_spi *ss = spi_controller_get_devdata(sctlr); u32 val; + pm_runtime_get_sync(ss->dev); val = readl_relaxed(ss->base + FOO_SPI_CTL0); /* The SPI controller will pull down CS pin if cs is 0 */ if (!cs) { val &= ~FOO_SPI_CS0_VALID; writel_relaxed(val, ss->base + FOO_SPI_CTL0); } else { val |= FOO_SPI_CSN_MASK; writel_relaxed(val, ss->base + FOO_SPI_CTL0); } + pm_runtime_mark_last_busy(ss->dev); + pm_runtime_put_autosuspend(ss->dev); }
但是較新的(2019年10月份以後) kernel spi 程式碼,已經在 spi core 程式碼中修復了此問題,無需改動晶片廠商的 controller 驅動。
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index f9502db..19007e0 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -3091,7 +3091,20 @@ int spi_setup(struct spi_device *spi) if (spi->controller->setup) status = spi->controller->setup(spi); - spi_set_cs(spi, false); + if (spi->controller->auto_runtime_pm && spi->controller->set_cs) { + status = pm_runtime_get_sync(spi->controller->dev.parent); + if (status < 0) { + pm_runtime_put_noidle(spi->controller->dev.parent); + dev_err(&spi->controller->dev, "Failed to power device: %d\n", + status); + return status; + } + spi_set_cs(spi, false); + pm_runtime_mark_last_busy(spi->controller->dev.parent); + pm_runtime_put_autosuspend(spi->controller->dev.parent); + } else { + spi_set_cs(spi, false); + } if (spi->rt && !spi->controller->rt) { spi->controller->rt = true;
詳情可參考 https://lore.kernel.org/linux-arm-kernel/1572426234-30019-1-git-send-email-luhua.xu@mediatek.com/
案例二 USB做Host時反覆開關機測試出現異常
[ 11.616956] c0 Unhandled fault: external abort on non-linefetch (0x1008) at 0xd02d4001 [ 11.624774] c0 pgd = c0004000 [ 11.627708] [d02d4001] *pgd=8f69a811, *pte=20200653, *ppte=20200453 [ 11.633944] c0 Internal error: : 1008 [#1] PREEMPT SMP ARM [ 11.639390] Modules linked in: [ 11.642424] c0 CPU: 0 PID: 161 Comm: kworker/0:3 Not tainted 4.4.83 #1 [ 11.648909] c0 Hardware name: Generic DT based system [ 11.653940] Workqueue: events musb_deassert_reset [ 11.658601] c0 task: ce854780 task.stack: ce8c6000 [ 11.663364] c0 PC is at musb_default_readb+0x54/0x9c
原因和案例一類似,musb_deassert_reset 在 USB controller shutdown 的狀態下訪問了 USB controller 的暫存器。
static void musb_deassert_reset(struct work_struct *work) { struct musb *musb; unsigned long flags; musb = container_of(work, struct musb, deassert_reset_work.work); + pm_runtime_get_sync(musb->controller); spin_lock_irqsave(&musb->lock, flags); if (musb->port1_status & USB_PORT_STAT_RESET) musb_port_reset(musb, false); spin_unlock_irqrestore(&musb->lock, flags); + pm_runtime_put(musb->controller); }
按以上修改後,上述錯誤路徑不復現,但是出現了新的錯誤路徑,說明修改得並不徹底。
[ 13.364606] c0 Unhandled fault: external abort on non-linefetch (0x1008) at 0xd02d4001 [ 13.372418] c0 pgd = c0004000 [ 13.375359] [d02d4001] *pgd=8f69a811, *pte=20200653, *ppte=20200453 [ 13.381595] c0 Internal error: : 1008 [#1] PREEMPT SMP ARM [ 13.387042] Modules linked in: [ 13.390075] c0 CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.4.83 #1 [ 13.396388] c0 Hardware name: Generic DT based system [ 13.401417] Workqueue: usb_hub_wq hub_event [ 13.405562] c0 task: cf471380 task.stack: cf490000 [ 13.410326] c0 PC is at musb_default_readb+0x54/0x9c
梳理 musb 程式碼(drivers/usb/musb/*),發現 musb_gadget 程式碼有針對 runtime PM 的處理,musb_host 程式碼則沒有針對 runtime PM 的處理。最後的處理方案是在 USB controller 驅動中完善 runtime PM 處理,沒有修改 musb 公共程式碼。具體修改細節與該廠商 USB controller 的驅動實現邏輯有關,沒有普遍的借鑑意義,就沒有必要貼出了。
-------------------------------------------------------
作者:bigfish99
部落格:https://www.cnblogs.com/bigfish0506/
公眾號:大魚嵌入式