一次向linux開源社群提交補丁的經歷

廣漠飄羽發表於2019-06-25

背景

在開發過程中,偶然發現了spinand驅動的一個bug,滿懷欣喜地往社群提補丁。這是怎麼樣的一個bug呢?

static int spinand_mtd_read(struct mtd_info *mtd, loff_t from,
                            struct mtd_oob_ops *ops)
{
        ......
        nanddev_io_for_each_page(nand, from, ops, &iter) {
                ......
                ret = spinand_read_page(spinand, &iter.req, enable_ecc);
                if (ret < 0 && ret != -EBADMSG)     /* 讀取資料出錯 */
                        break;

                if (ret == -EBADMSG) {
                        /* -EBADMSG 返回表示壞塊 */
                        ecc_failed = true;
                        mtd->ecc_stats.failed++;
                        ret = 0;
                } else {
                        /* 出現位翻轉或者讀取正常,則記錄歷史位翻轉最大值 */
                        mtd->ecc_stats.corrected += ret;
                        max_bitflips = max_t(unsigned int, max_bitflips, ret);
                }

                ops->retlen += iter.req.datalen; 
                ops->oobretlen += iter.req.ooblen;
        }

        if (ecc_failed && !ret)
                ret = -EBADMSG;

        return ret ? ret : max_bitflips;
}

程式碼邏輯如下:

  1. 遍歷讀取每一個page
  2. 如果讀出錯則直接返回
  3. 如果出現壞塊,則置位ecc_failed,在函式最後會檢查此標誌
  4. 如果出現位翻轉,則暫存最大位翻轉的bit位數量
  5. 全部讀取完後,如果有置位ecc_failed,則返回壞塊錯誤碼;如果出現位翻轉,則返回最大位翻轉;否則返回0,表示正常

問題出在於,如果剛好最後一次讀取出現位翻轉,此時ret != 0就直接退出迴圈,此時會導致壞塊標識無效,且返回最後的位翻轉量而非歷史位翻轉最大值。這是程式碼不嚴謹的地方。

第一次提交

修改補丁如下,補丁邏輯不再解釋。

In function spinand_mtd_read, if the last page to read occurs bitflip,
this function will return error value because veriable ret not equal to 0.

Signed-off-by: liaoweixiong <liaoweixiong@allwinnertech.com>
---
 drivers/mtd/nand/spi/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mtd/nand/spi/core.c b/drivers/mtd/nand/spi/core.c
index 556bfdb..6b9388d 100644
--- a/drivers/mtd/nand/spi/core.c
+++ b/drivers/mtd/nand/spi/core.c
@@ -511,12 +511,12 @@ static int spinand_mtd_read(struct mtd_info *mtd, loff_t from,
        if (ret == -EBADMSG) {
            ecc_failed = true;
            mtd->ecc_stats.failed++;
-           ret = 0;
        } else {
            mtd->ecc_stats.corrected += ret;
            max_bitflips = max_t(unsigned int, max_bitflips, ret);
        }
 
+       ret = 0;
        ops->retlen += iter.req.datalen;
        ops->oobretlen += iter.req.ooblen;
    }

21:13分發出的郵件,21:45分陸續收到兩個回覆:

<maintainer A>:

Actually, that's exactly what the MTD core expects (see [1]), so you're
the one introducing a regression here.
<maintainer B>:

To me it looks like the patch description is somewhat incorrect, but the 
fix itself looks okay, unless I'm getting it wrong.

In case of the last page containing bitflips (ret > 0), 
spinand_mtd_read() will return that number of bitflips for the last 
page. But to me it looks like it should instead return max_bitflips like 
it does when the last page read returns with 0.

以及隔天回覆

<maintainer A>:

Oh, you're right. liaoweixiong, can you adjust the commit message
accordingly?

好吧,問題出在與我沒把問題描述清楚,改改再提交

第二次提交

只改了comment和補丁標題:

Subject: [PATCH v2] mtd: spinand: read return badly if the last page has bitflips

In case of the last page containing bitflips (ret > 0), 
spinand_mtd_read() will return that number of bitflips for the last 
page. But to me it looks like it should instead return max_bitflips like 
it does when the last page read returns with 0.

然後嘩啦啦收到兩個Reviewed-by,附帶一個建議:

Reviewed-by: <maintainer B>

This should probably be resent with the following tags:

Cc: stable@vger.kernel.org
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI 
NANDs")

得,再提交一次吧

第三次提交

此時的我提交補丁到社群經驗並不多,Maintainer讓我resend,我就忐忑開始胡思亂想了:

版本號需要累加麼?該怎麼標記是重新傳送?有兩個maintainer已經"認可"了我的補丁(reviewed-by),我改怎麼體現到新的郵件中?

仔細想想內容並沒改,因此不需要累加版本號;查詢前人提交,在郵件標題可以加上RESEND字樣;搜尋含RESEND字樣的前人郵件,剛好找到一個在maintainer reviewed後resend為acked,寫在signed-off-by區。

OK,確定下來就重新發吧

Subject: [RESEND PATCH v2] mtd: spinand: read return badly if the last page has bitflips

......
Signed-off-by: liaoweixiong <liaoweixiong@allwinnertech.com>
Acked-by: <maintainer A>
Acked-by: <maintainer B>
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs")

很快,就挨批了...

第四次提交

晚上10點多,收到回覆:

<maintainer B>

Why did you change our Reviewed-by tags to Acked-by tags?

額...我也是看別人這麼做我才這麼做的,大佬生氣了!趕緊補救

......
Reviewed-by: <maintainer A>
Reviewed-by: <maintainer B>
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs")

第五次提交

埋下的坑終究是要踩的,很快,再次挨批了

<maintainer C>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
<maintainer A>

FYI, you should not send the patch to stable@vger.kernel.org, but 
instead, as I said in my other reply, add the tag "Cc: 
stable@vger.kernel.org". See "Option 1" in the document Greg referred to.

小白趕緊狠補基礎操作規範...

第六次提交

......
Reviewed-by: <maintainer A>
Reviewed-by: <maintainer B>
Cc: stable@vger.kernel.org
Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs")

總結

哎,我只是挪了一行程式碼的位置而已啊,Maintainer嚴審下,我竟然提交了6次!6次!突然感覺心好累。

累歸累,問題總結還是需要的

  1. 新手不具備提交程式碼的一些常識,包括 a) 提交中各個tag的含義,在什麼時候加這些tag,例如Reviewed-by和Acked-by的差別 b) 提交補丁到stable的注意事項
  2. 對補丁的問題描述不夠仔細清楚,導致無法理解,幸好幫我澄清了

解決方法:

  1. linux提交有規範文件的,抽時間擼一遍,並翻譯發部落格
  2. 在發補丁之前,讓身邊的人幫忙看一下補丁說明是否足夠清晰

希望我的經歷能幫助到正在或者準備向Linux核心開源社群的小夥伴

相關文章