背景
在開發過程中,偶然發現了spinand驅動的一個bug,滿懷欣喜地往社群提補丁。這是怎麼樣的一個bug呢?
static int spinand_mtd_read(struct mtd_info *mtd, loff_t from,
struct mtd_oob_ops *ops)
{
......
nanddev_io_for_each_page(nand, from, ops, &iter) {
......
ret = spinand_read_page(spinand, &iter.req, enable_ecc);
if (ret < 0 && ret != -EBADMSG) /* 讀取資料出錯 */
break;
if (ret == -EBADMSG) {
/* -EBADMSG 返回表示壞塊 */
ecc_failed = true;
mtd->ecc_stats.failed++;
ret = 0;
} else {
/* 出現位翻轉或者讀取正常,則記錄歷史位翻轉最大值 */
mtd->ecc_stats.corrected += ret;
max_bitflips = max_t(unsigned int, max_bitflips, ret);
}
ops->retlen += iter.req.datalen;
ops->oobretlen += iter.req.ooblen;
}
if (ecc_failed && !ret)
ret = -EBADMSG;
return ret ? ret : max_bitflips;
}
程式碼邏輯如下:
- 遍歷讀取每一個page
- 如果讀出錯則直接返回
- 如果出現壞塊,則置位ecc_failed,在函式最後會檢查此標誌
- 如果出現位翻轉,則暫存最大位翻轉的bit位數量
- 全部讀取完後,如果有置位ecc_failed,則返回壞塊錯誤碼;如果出現位翻轉,則返回最大位翻轉;否則返回0,表示正常
問題出在於,如果剛好最後一次讀取出現位翻轉,此時ret != 0
就直接退出迴圈,此時會導致壞塊標識無效,且返回最後的位翻轉量而非歷史位翻轉最大值。這是程式碼不嚴謹的地方。
第一次提交
修改補丁如下,補丁邏輯不再解釋。
In function spinand_mtd_read, if the last page to read occurs bitflip,
this function will return error value because veriable ret not equal to 0.
Signed-off-by: liaoweixiong <liaoweixiong@allwinnertech.com>
---
drivers/mtd/nand/spi/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/spi/core.c b/drivers/mtd/nand/spi/core.c
index 556bfdb..6b9388d 100644
--- a/drivers/mtd/nand/spi/core.c
+++ b/drivers/mtd/nand/spi/core.c
@@ -511,12 +511,12 @@ static int spinand_mtd_read(struct mtd_info *mtd, loff_t from,
if (ret == -EBADMSG) {
ecc_failed = true;
mtd->ecc_stats.failed++;
- ret = 0;
} else {
mtd->ecc_stats.corrected += ret;
max_bitflips = max_t(unsigned int, max_bitflips, ret);
}
+ ret = 0;
ops->retlen += iter.req.datalen;
ops->oobretlen += iter.req.ooblen;
}
21:13分發出的郵件,21:45分陸續收到兩個回覆:
<maintainer A>:
Actually, that's exactly what the MTD core expects (see [1]), so you're
the one introducing a regression here.
<maintainer B>:
To me it looks like the patch description is somewhat incorrect, but the
fix itself looks okay, unless I'm getting it wrong.
In case of the last page containing bitflips (ret > 0),
spinand_mtd_read() will return that number of bitflips for the last
page. But to me it looks like it should instead return max_bitflips like
it does when the last page read returns with 0.
以及隔天回覆
<maintainer A>:
Oh, you're right. liaoweixiong, can you adjust the commit message
accordingly?
好吧,問題出在與我沒把問題描述清楚,改改再提交
第二次提交
只改了comment和補丁標題:
Subject: [PATCH v2] mtd: spinand: read return badly if the last page has bitflips
In case of the last page containing bitflips (ret > 0),
spinand_mtd_read() will return that number of bitflips for the last
page. But to me it looks like it should instead return max_bitflips like
it does when the last page read returns with 0.
然後嘩啦啦收到兩個Reviewed-by
,附帶一個建議:
Reviewed-by: <maintainer B>
This should probably be resent with the following tags:
Cc: stable@vger.kernel.org
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI
NANDs")
得,再提交一次吧
第三次提交
此時的我提交補丁到社群經驗並不多,Maintainer讓我resend,我就忐忑開始胡思亂想了:
版本號需要累加麼?該怎麼標記是重新傳送?有兩個maintainer已經"認可"了我的補丁(reviewed-by),我改怎麼體現到新的郵件中?
仔細想想內容並沒改,因此不需要累加版本號;查詢前人提交,在郵件標題可以加上RESEND字樣;搜尋含RESEND字樣的前人郵件,剛好找到一個在maintainer reviewed後resend為acked,寫在signed-off-by區。
OK,確定下來就重新發吧
Subject: [RESEND PATCH v2] mtd: spinand: read return badly if the last page has bitflips
......
Signed-off-by: liaoweixiong <liaoweixiong@allwinnertech.com>
Acked-by: <maintainer A>
Acked-by: <maintainer B>
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs")
很快,就挨批了...
第四次提交
晚上10點多,收到回覆:
<maintainer B>
Why did you change our Reviewed-by tags to Acked-by tags?
額...我也是看別人這麼做我才這麼做的,大佬生氣了!趕緊補救
......
Reviewed-by: <maintainer A>
Reviewed-by: <maintainer B>
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs")
第五次提交
埋下的坑終究是要踩的,很快,再次挨批了
<maintainer C>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
<maintainer A>
FYI, you should not send the patch to stable@vger.kernel.org, but
instead, as I said in my other reply, add the tag "Cc:
stable@vger.kernel.org". See "Option 1" in the document Greg referred to.
小白趕緊狠補基礎操作規範...
第六次提交
......
Reviewed-by: <maintainer A>
Reviewed-by: <maintainer B>
Cc: stable@vger.kernel.org
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs")
總結
哎,我只是挪了一行程式碼的位置而已啊,Maintainer嚴審下,我竟然提交了6次!6次!突然感覺心好累。
累歸累,問題總結還是需要的
- 新手不具備提交程式碼的一些常識,包括 a) 提交中各個tag的含義,在什麼時候加這些tag,例如Reviewed-by和Acked-by的差別 b) 提交補丁到stable的注意事項
- 對補丁的問題描述不夠仔細清楚,導致
無法理解,幸好 幫我澄清了
解決方法:
- linux提交有規範文件的,抽時間擼一遍,並翻譯發部落格
- 在發補丁之前,讓身邊的人幫忙看一下補丁說明是否足夠清晰
希望我的經歷能幫助到正在或者準備向Linux核心開源社群的小夥伴