帶著問題讀 TiDB 原始碼：Power BI Desktop 以 MySQL 驅動連線 TiDB 報錯

常有人說，閱讀原始碼是每個優秀開發工程師的必經之路，但是在面對像類似 TiDB 這樣複雜的系統時，原始碼閱讀是一個非常龐大的工程。而對一些 TiDB User 來說，從自己日常遇到的問題出發，反過來閱讀原始碼就是一個不錯的切入點，因此我們策劃了《帶著問題讀原始碼》系列文章。

本文為該系列的第二篇，從一個 Power BI Desktop 在 TiDB 上表現異常的問題為例，介紹從問題的發現、定位，到通過開源社群提 issue、寫 PR 解決問題的流程，從程式碼實現的角度來做 trouble shooting，希望能夠幫助大家更好地瞭解 TiDB 原始碼。

首先我們重現一下失敗的場景（TiDB 5.1.1 on MacOS），建一個簡單的只有一個欄位的表：

CREATE TABLE test(name VARCHAR(1) PRIMARY KEY);

MySQL 上可以 TiDB 上就不可以，報錯

DataSource.Error: An error happened while reading data from the provider: 'Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints.'
Details:

DataSourceKind=MySql
DataSourcePath=localhost:4000;test

看 general log TiDB 上最後一條跑的 SQL 是：

select COLUMN_NAME, ORDINAL_POSITION, IS_NULLABLE, DATA_TYPE, case when NUMERIC_PRECISION is null then null when DATA_TYPE in ('FLOAT', 'DOUBLE') then 2 else 10 end AS NUMERIC_PRECISION_RADIX, NUMERIC_PRECISION, NUMERIC_SCALE,            CHARACTER_MAXIMUM_LENGTH, COLUMN_DEFAULT, COLUMN_COMMENT AS DESCRIPTION, COLUMN_TYPE  from INFORMATION_SCHEMA.COLUMNS  where table_schema = 'test' and table_name = 'test';

我們用 tiup 啟動一個 TiDB 叢集，使用 tiup client 執行該命令，tiup client 也會報錯：

error: mysql: sql: Scan error on column index 4, name "NUMERIC_PRECISION_RADIX": converting NULL to int64 is unsupported

那我們的注意力就集中在解決這條語句的問題，我們先看 tiup client 上報的這個錯意味著什麼。tiup client 使用的是 golang xo/usql 庫，但是在 xo/usql 庫中，我們並不能找到對應的報錯資訊，grep converting 關鍵字返回極有限且無關的內容。我們再看 xo/usql 的 mysql driver，其中又引用到了 go-sql-driver/mysql，下載它的程式碼並 grep converting，只返回了 changelog 中的一條資訊，大概率報錯的地方也不在這個庫中。瀏覽一下 go-sql-driver/mysql 中的程式碼，發現它依賴於 database/sql，那我們看看 database/sql 的內容。database/sql 是 golang 的標準庫，所以我們需要下載 golang 的原始碼。在 golang 的 database 目錄中 grep converting，很快就找到了與報錯資訊相符的內容：

go/src/database/sql/convert.go

case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
        if src == nil {
                return fmt.Errorf("converting NULL to %s is unsupported", dv.Kind())
        }
        s := asString(src)
        i64, err := strconv.ParseInt(s, 10, dv.Type().Bits())
        if err != nil {
                err = strconvErr(err)
                return fmt.Errorf("converting driver.Value type %T (%q) to a %s: %v", src, s, dv.Kind(), err)
        }
        dv.SetInt(i64)
        return nil

我們再追蹤這個片段，看這裡的型別是如何來的，最終我們會回到 go-sql-driver/mysql 中：

mysql/fields.go

        case fieldTypeLongLong:
                if mf.flags&flagNotNULL != 0 {
                        if mf.flags&flagUnsigned != 0 {
                                return scanTypeUint64
                        }
                        return scanTypeInt64
                }
                return scanTypeNullInt

這部分的程式碼是在解析語句返回體中的 column definition，轉換成 golang 中的型別。我們可以使用 mysql --host 127.0.0.1 --port 4000 -u root --column-type-info 連上後檢視有問題的 SQL 返回的 column metadata：

MySQL

Field 5: `NUMERIC_PRECISION_RADIX`
Catalog: `def`
Database: `` 
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 3
Max_length: 0
Decimals: 0
Flags: BINARY NUM

TiDB

Field 5: `NUMERIC_PRECISION_RADIX`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 2
Max_length: 0
Decimals: 0
Flags: NOT_NULL BINARY NUM

可以很明顯的看到，tiup client 報錯資訊中的 NUMERIC_PRECISION_RADIX 欄位的 column definition 在 TiDB 上有明顯的問題，該欄位在 TiDB 的返回體中被標記為了 NOT_NULL，很明顯這是不合理的，因為該欄位顯然可以是 NULL，MySQL 的返回值也體現了這一點。所以 xo/usql 在處理返回體的時候報錯了。到了這裡，我們已經發現了 client 端為什麼會報錯，下面我們就需要去尋找 TiDB 為什麼會返回一個錯誤的 column definition。

通過 TiDB Dev Guide 我們可以知道 TiDB 中一條 DQL 語句的大體執行過程，我們從入口的 server/conn.go#clientConn.Run 往下看去，一路經過 server/conn.go#clientConn.dispatch、server/conn.go#clientConn.handleQuery、server/conn.go#clientConn.handleStmt、server/driver_tidb.go#TiDBContext.ExecuteStmt、session/session.go#session.ExecuteStmt、executor/compiler.go#Compiler.Compile、planner/optimize.go#Optimize、planner/optimize.go#optimize、planner/core/planbuilder.go#PlanBuilder.Build、planner/core/logical_plan_builder.go#PlanBuilder.buildSelect，在 buildSelect 中，我們可以看到 TiDB planner 對查詢語句進行的一系列處理，然後我們就可以走到 planner/core/expression_rewriter.go#PlanBuilder.rewriteWithPreprocess 和 planner/core/expression_rewriter.go#PlanBuilder.rewriteExprNode，在 rewriteExprNode 中，會把有問題的欄位 NUMERIC_PRECISION_RADIX 進行解析，最終這條 CASE 表示式的解析會在 expression/builtin_control.go#caseWhenFunctionClass.getFunction 中，我們終於走到了計算 CASE 表示式返回的 column definition 的地方（這依賴於遍歷 compiler 解析出的 AST）：

    for i := 1; i < l; i += 2 {       
        fieldTps = append(fieldTps, args[i].GetType())
        decimal = mathutil.Max(decimal, args[i].GetType().Decimal)
        if args[i].GetType().Flen == -1 {
            flen = -1
        } else if flen != -1 {
            flen = mathutil.Max(flen, args[i].GetType().Flen)
        }
        isBinaryStr = isBinaryStr || types.IsBinaryStr(args[i].GetType())
        isBinaryFlag = isBinaryFlag || !types.IsNonBinaryStr(args[i].GetType())
    }
    if l%2 == 1 {
        fieldTps = append(fieldTps, args[l-1].GetType())
        decimal = mathutil.Max(decimal, args[l-1].GetType().Decimal)
        if args[l-1].GetType().Flen == -1 {
            flen = -1
        } else if flen != -1 {
            flen = mathutil.Max(flen, args[l-1].GetType().Flen)
        }
        isBinaryStr = isBinaryStr || types.IsBinaryStr(args[l-1].GetType())
        isBinaryFlag = isBinaryFlag || !types.IsNonBinaryStr(args[l-1].GetType())
    }


    fieldTp := types.AggFieldType(fieldTps)
    // Here we turn off NotNullFlag. Because if all when-clauses are false,
    // the result of case-when expr is NULL.
    types.SetTypeFlag(&fieldTp.Flag, mysql.NotNullFlag, false)
    tp := fieldTp.EvalType()


    if tp == types.ETInt {
        decimal = 0
    }
    fieldTp.Decimal, fieldTp.Flen = decimal, flen
    if fieldTp.EvalType().IsStringKind() && !isBinaryStr {
        fieldTp.Charset, fieldTp.Collate = DeriveCollationFromExprs(ctx, args...)
        if fieldTp.Charset == charset.CharsetBin && fieldTp.Collate == charset.CollationBin {
            // When args are Json and Numerical type(eg. Int), the fieldTp is String.
            // Both their charset/collation is binary, but the String need a default charset/collation.
            fieldTp.Charset, fieldTp.Collate = charset.GetDefaultCharsetAndCollate()
        }
    } else {
        fieldTp.Charset, fieldTp.Collate = charset.CharsetBin, charset.CollationBin
    }
    if isBinaryFlag {
        fieldTp.Flag |= mysql.BinaryFlag
    }
    // Set retType to BINARY(0) if all arguments are of type NULL.
    if fieldTp.Tp == mysql.TypeNull {
        fieldTp.Flen, fieldTp.Decimal = 0, types.UnspecifiedLength
        types.SetBinChsClnFlag(fieldTp)
    }

檢視如上計算 column definition flag 的程式碼我們可以發現，無論 CASE 表示式的情況是怎麼樣的，NOT_NULL 標記位都一定會被設定成 false，所以問題不出現在這裡！這個時候我們只能沿著上面的程式碼路徑往回看，看看上面生成的 column definition 在後續有沒有被修改。終於在 server/conn.go#clientConn.handleStmt 中，發現它呼叫了 server/conn.go#clientConn.writeResultSet，然後又陸續呼叫了server/conn.go#clientConn.writeChunks、server/conn.go#clientConn.writeColumnInfo、server/column.go#ColumnInfo.Dump 和 server/column.go#dumpFlag，在 dumpFlag 中，之前生成的 column definition flag 被修改了：

func dumpFlag(tp byte, flag uint16) uint16 {
    switch tp {
    case mysql.TypeSet:
        return flag | uint16(mysql.SetFlag)
    case mysql.TypeEnum:
        return flag | uint16(mysql.EnumFlag)
    default:
        if mysql.HasBinaryFlag(uint(flag)) {
            return flag | uint16(mysql.NotNullFlag)
        }
        return flag
    }
}

終於，我們找到了 TiDB 返回錯誤的 column definition 的原因！其實這個 bug 在 TiDB 最新版5.2.0中已經被修復了：*: fix some problems related to notNullFlag by wjhuang2016 · Pull Request #27697 · pingcap/tidb。

最後，在上述閱讀程式碼的過程中，我們其實最好能夠看到被 TiDB 解析後的 AST 是什麼樣子的，這樣在最後遍歷 AST 的過程中，才不至於摸瞎。TiDB dev guide 中有 parser 章節講解如何除錯 parser，parser/quickstart.md at master · pingcap/parser 中也有樣例輸出生成的 AST，但是簡單地輸出基本沒有任何作用，我們可以使用 davecgh/go-spew 直接輸出 parser 生成的 node，這樣就能獲得一個可被人理解的 tree：

package main

import (
        "fmt"
        "github.com/pingcap/parser"
        "github.com/pingcap/parser/ast"
        _ "github.com/pingcap/parser/test_driver"
        "github.com/davecgh/go-spew/spew"
)

func parse(sql string) (*ast.StmtNode, error) {
        p := parser.New()
        stmtNodes, _, err := p.Parse(sql, "", "")
        if err != nil {
                return nil, err
        }
        return &stmtNodes[0], nil
}

func main() {
        spew.Config.Indent = "    "
        astNode, err := parse("SELECT a, b FROM t")
        if err != nil {
                fmt.Printf("parse error: %v\n", err.Error())
                return
        }
        fmt.Printf("%s\n", spew.Sdump(*astNode))
}

(*ast.SelectStmt)(0x140001dac30)({
    dmlNode: (ast.dmlNode) {
        stmtNode: (ast.stmtNode) {
            node: (ast.node) {
                text: (string) (len=18) "SELECT a, b FROM t"
            }
        }
    },
    resultSetNode: (ast.resultSetNode) {
        resultFields: ([]*ast.ResultField) <nil>
    },
    SelectStmtOpts: (*ast.SelectStmtOpts)(0x14000115bc0)({
        Distinct: (bool) false,
        SQLBigResult: (bool) false,
        SQLBufferResult: (bool) false,
        SQLCache: (bool) true,
        SQLSmallResult: (bool) false,
        CalcFoundRows: (bool) false,
        StraightJoin: (bool) false,
        Priority: (mysql.PriorityEnum) 0,
        TableHints: ([]*ast.TableOptimizerHint) <nil>
    }),
    Distinct: (bool) false,
    From: (*ast.TableRefsClause)(0x140001223c0)({
        node: (ast.node) {
            text: (string) ""
        },
        TableRefs: (*ast.Join)(0x14000254100)({
            node: (ast.node) {
                text: (string) ""
            },
            resultSetNode: (ast.resultSetNode) {
                resultFields: ([]*ast.ResultField) <nil>
            },
            Left: (*ast.TableSource)(0x14000156480)({
                node: (ast.node) {
                    text: (string) ""
                },
                Source: (*ast.TableName)(0x1400013a370)({
                    node: (ast.node) {
                        text: (string) ""
                    },
                    resultSetNode: (ast.resultSetNode) {
                        resultFields: ([]*ast.ResultField) <nil>
                    },
                    Schema: (model.CIStr) ,
                    Name: (model.CIStr) t,
                    DBInfo: (*model.DBInfo)(<nil>),
                    TableInfo: (*model.TableInfo)(<nil>),
                    IndexHints: ([]*ast.IndexHint) <nil>,
                    PartitionNames: ([]model.CIStr) {
                    }
                }),
                AsName: (model.CIStr)
            }),
            Right: (ast.ResultSetNode) <nil>,
            Tp: (ast.JoinType) 0,
            On: (*ast.OnCondition)(<nil>),
            Using: ([]*ast.ColumnName) <nil>,
            NaturalJoin: (bool) false,
            StraightJoin: (bool) false
        })
    }),
    Where: (ast.ExprNode) <nil>,
    Fields: (*ast.FieldList)(0x14000115bf0)({
        node: (ast.node) {
            text: (string) ""
        },
        Fields: ([]*ast.SelectField) (len=2 cap=2) {
            (*ast.SelectField)(0x140001367e0)({
                node: (ast.node) {
                    text: (string) (len=1) "a"
                },
                Offset: (int) 7,
                WildCard: (*ast.WildCardField)(<nil>),
                Expr: (*ast.ColumnNameExpr)(0x14000254000)({
                    exprNode: (ast.exprNode) {
                        node: (ast.node) {
                            text: (string) ""
                        },
                        Type: (types.FieldType) unspecified,
                        flag: (uint64) 8
                    },
                    Name: (*ast.ColumnName)(0x1400017dc70)(a),
                    Refer: (*ast.ResultField)(<nil>)
                }),
                AsName: (model.CIStr) ,
                Auxiliary: (bool) false
            }),
            (*ast.SelectField)(0x14000136840)({
                node: (ast.node) {
                    text: (string) (len=1) "b"
                },
                Offset: (int) 10,
                WildCard: (*ast.WildCardField)(<nil>),
                Expr: (*ast.ColumnNameExpr)(0x14000254080)({
                    exprNode: (ast.exprNode) {
                        node: (ast.node) {
                            text: (string) ""
                        },
                        Type: (types.FieldType) unspecified,
                        flag: (uint64) 8
                    },
                    Name: (*ast.ColumnName)(0x1400017dce0)(b),
                    Refer: (*ast.ResultField)(<nil>)
                }),
                AsName: (model.CIStr) ,
                Auxiliary: (bool) false
            })
        }
    }),
    GroupBy: (*ast.GroupByClause)(<nil>),
    Having: (*ast.HavingClause)(<nil>),
    WindowSpecs: ([]ast.WindowSpec) <nil>,
    OrderBy: (*ast.OrderByClause)(<nil>),
    Limit: (*ast.Limit)(<nil>),
    LockTp: (ast.SelectLockType) none,
    TableHints: ([]*ast.TableOptimizerHint) <nil>,
    IsAfterUnionDistinct: (bool) false,
    IsInBraces: (bool) false,
    QueryBlockOffset: (int) 0,
    SelectIntoOpt: (*ast.SelectIntoOption)(<nil>)
})

帶著問題讀 TiDB 原始碼：Power BI Desktop 以 MySQL 驅動連線 TiDB 報錯

相關文章