[Apache Doris] Apache Doris 後設資料設計及DDL操作原始碼閱讀

chaplinthink發表於2021-11-18

後設資料設計

image

如上圖,Doris 的後設資料主要儲存4類資料:

  • 使用者資料資訊。包括資料庫、表的 Schema、分片資訊等。
  • 各類作業資訊。如匯入作業,Clone 作業、SchemaChange 作業等。
  • 使用者及許可權資訊
  • 叢集及節點資訊

image

後設資料目錄

後設資料目錄通過 FE 的配置項 meta_dir 指定。
bdb/ 目錄下為 bdbje 的資料存放目錄。
image/ 目錄下為 image 檔案的存放目錄。
image.[logid] 是最新的 image 檔案。字尾 logid 表明 image 所包含的最後一條日誌的 id。

image.ckpt 是正在寫入的 image 檔案,如果寫入成功,會重新命名為 image.[logid],並替換掉舊的 image 檔案。

VERSION 檔案中記錄著 cluster_id。cluster_id 唯一標識一個 Doris 叢集。是在 leader 第一次啟動時隨機生成的一個 32 位整型。也可以通過 fe 配置項 cluster_id 來指定一個 cluster id。

ROLE 檔案中記錄的 FE 自身的角色。只有 FOLLOWER 和 OBSERVER 兩種。其中 FOLLOWER 表示 FE 為一個可選舉的節點。(注意:即使是 leader 節點,其角色也為 FOLLOWER)

DDL相關原始碼閱讀

啟動MySQL服務

org.apache.doris.qe.QeService

if (nioEnabled) {
    mysqlServer = new NMysqlServer(port, scheduler);
} else {
    mysqlServer = new MysqlServer(port, scheduler);
}

DDL程式碼呼叫過程

image

org.apache.doris.qe.ConnectProcessor#dispatch 命令識別

switch (command) {
    case COM_INIT_DB:
        handleInitDb();
        break;
    case COM_QUIT:
        handleQuit();
        break;
    case COM_QUERY:
        handleQuery();
        break;
    case COM_FIELD_LIST:
        handleFieldList();
        break;
    case COM_PING:
        handlePing();
        break;
    default:
        ctx.getState().setError("Unsupported command(" + command + ")");
        LOG.warn("Unsupported command(" + command + ")");
        break;

org.apache.doris.qe.ConnectProcessor#analyze 詞法語法解析

// Parse statement with parser generated by CUP&FLEX
SqlScanner input = new SqlScanner(new StringReader(originStmt), ctx.getSessionVariable().getSqlMode());
SqlParser parser = new SqlParser(input);

從連線中讀取原始語句字串
詞法解析檔案
• fe/fe-core/src/main/jflex/sql_scanner.flex
• 語法解析檔案
• fe/fe-core/src/main/cup/sql_parser.cup

所有語法實現類:

StatementBase     [vim org/apache/doris/analysis/StatementBase.java +33]
      ├── ExportStmt    [vim org/apache/doris/analysis/ExportStmt.java +48]
      ├── ImportColumnsStmt     [vim org/apache/doris/analysis/ImportColumnsStmt.java +21]
      ├── ImportDeleteOnStmt    [vim org/apache/doris/analysis/ImportDeleteOnStmt.java +19]
      ├── ImportSequenceStmt    [vim org/apache/doris/analysis/ImportSequenceStmt.java +19]
      ├── ImportWhereStmt       [vim org/apache/doris/analysis/ImportWhereStmt.java +19]
      ├── KillStmt      [vim org/apache/doris/analysis/KillStmt.java +19]
      ├── SetStmt       [vim org/apache/doris/analysis/SetStmt.java +24]
      ├── UseStmt       [vim org/apache/doris/analysis/UseStmt.java +33]
      ├── QueryStmt     [vim org/apache/doris/analysis/QueryStmt.java +38]
      │   ├── SelectStmt        [vim org/apache/doris/analysis/SelectStmt.java +65]
      │   └── SetOperationStmt  [vim org/apache/doris/analysis/SetOperationStmt.java +36]
      ├── ShowStmt      [vim org/apache/doris/analysis/ShowStmt.java +22]
      │   ├── AdminShowConfigStmt       [vim org/apache/doris/analysis/AdminShowConfigStmt.java +33]
      │   ├── AdminShowDataSkewStmt     [vim org/apache/doris/analysis/AdminShowDataSkewStmt.java +32]
      │   ├── AdminShowReplicaDistributionStmt  [vim org/apache/doris/analysis/AdminShowReplicaDistributionStmt.java +34]
      │   ├── AdminShowReplicaStatusStmt        [vim org/apache/doris/analysis/AdminShowReplicaStatusStmt.java +39]
      │   ├── DescribeStmt      [vim org/apache/doris/analysis/DescribeStmt.java +54]
      │   ├── HelpStmt  [vim org/apache/doris/analysis/HelpStmt.java +26]
      │   ├── ShowAlterStmt     [vim org/apache/doris/analysis/ShowAlterStmt.java +46]
      │   ├── ShowAuthorStmt    [vim org/apache/doris/analysis/ShowAuthorStmt.java +23]
      │   ├── ShowBackendsStmt  [vim org/apache/doris/analysis/ShowBackendsStmt.java +30]
      │   ├── ShowBackupStmt    [vim org/apache/doris/analysis/ShowBackupStmt.java +38]
      │   ├── ShowBrokerStmt    [vim org/apache/doris/analysis/ShowBrokerStmt.java +30]
      │   ├── ShowCharsetStmt   [vim org/apache/doris/analysis/ShowCharsetStmt.java +23]
      │   ├── ShowClusterStmt   [vim org/apache/doris/analysis/ShowClusterStmt.java +34]
      │   ├── ShowCollationStmt [vim org/apache/doris/analysis/ShowCollationStmt.java +24]
      │   ├── ShowColumnStatsStmt       [vim org/apache/doris/analysis/ShowColumnStatsStmt.java +28]
      │   ├── ShowColumnStmt    [vim org/apache/doris/analysis/ShowColumnStmt.java +28]
      │   ├── ShowCreateDbStmt  [vim org/apache/doris/analysis/ShowCreateDbStmt.java +36]
      │   ├── ShowCreateFunctionStmt    [vim org/apache/doris/analysis/ShowCreateFunctionStmt.java +32]
      │   ├── ShowCreateRoutineLoadStmt [vim org/apache/doris/analysis/ShowCreateRoutineLoadStmt.java +24]
      │   ├── ShowCreateTableStmt       [vim org/apache/doris/analysis/ShowCreateTableStmt.java +29]
      │   ├── ShowDataStmt      [vim org/apache/doris/analysis/ShowDataStmt.java +56]
      │   ├── ShowDbIdStmt      [vim org/apache/doris/analysis/ShowDbIdStmt.java +29]
      │   ├── ShowDbStmt        [vim org/apache/doris/analysis/ShowDbStmt.java +27]
      │   ├── ShowDeleteStmt    [vim org/apache/doris/analysis/ShowDeleteStmt.java +31]
      │   ├── ShowDynamicPartitionStmt  [vim org/apache/doris/analysis/ShowDynamicPartitionStmt.java +29]
      │   ├── ShowEncryptKeysStmt       [vim org/apache/doris/analysis/ShowEncryptKeysStmt.java +32]
      │   ├── ShowEnginesStmt   [vim org/apache/doris/analysis/ShowEnginesStmt.java +23]
      │   ├── ShowEventsStmt    [vim org/apache/doris/analysis/ShowEventsStmt.java +23]
      │   ├── ShowExportStmt    [vim org/apache/doris/analysis/ShowExportStmt.java +40]
      │   ├── ShowFrontendsStmt [vim org/apache/doris/analysis/ShowFrontendsStmt.java +30]
      │   ├── ShowFunctionsStmt [vim org/apache/doris/analysis/ShowFunctionsStmt.java +32]
      │   ├── ShowGrantsStmt    [vim org/apache/doris/analysis/ShowGrantsStmt.java +32]
      │   ├── ShowIndexStmt     [vim org/apache/doris/analysis/ShowIndexStmt.java +32]
      │   ├── ShowLoadProfileStmt       [vim org/apache/doris/analysis/ShowLoadProfileStmt.java +27]
      │   ├── ShowLoadStmt      [vim org/apache/doris/analysis/ShowLoadStmt.java +42]
      │   ├── ShowLoadWarningsStmt      [vim org/apache/doris/analysis/ShowLoadWarningsStmt.java +36]
      │   ├── ShowMigrationsStmt        [vim org/apache/doris/analysis/ShowMigrationsStmt.java +31]
      │   ├── ShowOpenTableStmt [vim org/apache/doris/analysis/ShowOpenTableStmt.java +23]
      │   ├── ShowPartitionIdStmt       [vim org/apache/doris/analysis/ShowPartitionIdStmt.java +29]
      │   ├── ShowPartitionsStmt        [vim org/apache/doris/analysis/ShowPartitionsStmt.java +49]
      │   ├── ShowPluginsStmt   [vim org/apache/doris/analysis/ShowPluginsStmt.java +23]
      │   ├── ShowProcStmt      [vim org/apache/doris/analysis/ShowProcStmt.java +32]
      │   ├── ShowProcedureStmt [vim org/apache/doris/analysis/ShowProcedureStmt.java +23]
      │   ├── ShowProcesslistStmt       [vim org/apache/doris/analysis/ShowProcesslistStmt.java +24]
      │   ├── ShowQueryProfileStmt      [vim org/apache/doris/analysis/ShowQueryProfileStmt.java +27]
      │   ├── ShowRepositoriesStmt      [vim org/apache/doris/analysis/ShowRepositoriesStmt.java +25]
      │   ├── ShowResourcesStmt [vim org/apache/doris/analysis/ShowResourcesStmt.java +37]
      │   ├── ShowRestoreStmt   [vim org/apache/doris/analysis/ShowRestoreStmt.java +38]
      │   ├── ShowRolesStmt     [vim org/apache/doris/analysis/ShowRolesStmt.java +29]
      │   ├── ShowRollupStmt    [vim org/apache/doris/analysis/ShowRollupStmt.java +28]
      │   ├── ShowRoutineLoadStmt       [vim org/apache/doris/analysis/ShowRoutineLoadStmt.java +34]
      │   ├── ShowRoutineLoadTaskStmt   [vim org/apache/doris/analysis/ShowRoutineLoadTaskStmt.java +32]
      │   ├── ShowSmallFilesStmt        [vim org/apache/doris/analysis/ShowSmallFilesStmt.java +32]
      │   ├── ShowSnapshotStmt  [vim org/apache/doris/analysis/ShowSnapshotStmt.java +29]
      │   ├── ShowSqlBlockRuleStmt      [vim org/apache/doris/analysis/ShowSqlBlockRuleStmt.java +31]
      │   ├── ShowStatusStmt    [vim org/apache/doris/analysis/ShowStatusStmt.java +23]
      │   ├── ShowStreamLoadStmt        [vim org/apache/doris/analysis/ShowStreamLoadStmt.java +39]
      │   ├── ShowSyncJobStmt   [vim org/apache/doris/analysis/ShowSyncJobStmt.java +33]
      │   ├── ShowTableIdStmt   [vim org/apache/doris/analysis/ShowTableIdStmt.java +30]
      │   ├── ShowTableStatsStmt        [vim org/apache/doris/analysis/ShowTableStatsStmt.java +32]
      │   ├── ShowTableStatusStmt       [vim org/apache/doris/analysis/ShowTableStatusStmt.java +35]
      │   ├── ShowTableStmt     [vim org/apache/doris/analysis/ShowTableStmt.java +34]
      │   ├── ShowTabletStmt    [vim org/apache/doris/analysis/ShowTabletStmt.java +39]
      │   ├── ShowTransactionStmt       [vim org/apache/doris/analysis/ShowTransactionStmt.java +35]
      │   ├── ShowTrashDiskStmt [vim org/apache/doris/analysis/ShowTrashDiskStmt.java +33]
      │   ├── ShowTrashStmt     [vim org/apache/doris/analysis/ShowTrashStmt.java +36]
      │   ├── ShowTriggersStmt  [vim org/apache/doris/analysis/ShowTriggersStmt.java +23]
      │   ├── ShowUserPropertyStmt      [vim org/apache/doris/analysis/ShowUserPropertyStmt.java +42]
      │   ├── ShowUserStmt      [vim org/apache/doris/analysis/ShowUserStmt.java +25]
      │   ├── ShowVariablesStmt [vim org/apache/doris/analysis/ShowVariablesStmt.java +29]
      │   ├── ShowViewStmt      [vim org/apache/doris/analysis/ShowViewStmt.java +39]
      │   ├── ShowWarningStmt   [vim org/apache/doris/analysis/ShowWarningStmt.java +23]
      │   └── ShowWhiteListStmt [vim org/apache/doris/analysis/ShowWhiteListStmt.java +23]
      ├── TransactionStmt       [vim org/apache/doris/analysis/TransactionStmt.java +22]
      │   ├── TransactionBeginStmt      [vim org/apache/doris/analysis/TransactionBeginStmt.java +24]
      │   ├── TransactionCommitStmt     [vim org/apache/doris/analysis/TransactionCommitStmt.java +19]
      │   └── TransactionRollbackStmt   [vim org/apache/doris/analysis/TransactionRollbackStmt.java +19]
      ├── UnsupportedStmt       [vim org/apache/doris/analysis/UnsupportedStmt.java +22]
      │   └── EmptyStmt [vim org/apache/doris/analysis/EmptyStmt.java +19]
      └── DdlStmt       [vim org/apache/doris/analysis/DdlStmt.java +19]
          ├── AdminCancelRepairTableStmt        [vim org/apache/doris/analysis/AdminCancelRepairTableStmt.java +33]
          ├── AdminCheckTabletsStmt     [vim org/apache/doris/analysis/AdminCheckTabletsStmt.java +33]
          ├── AdminCleanTrashStmt       [vim org/apache/doris/analysis/AdminCleanTrashStmt.java +34]
          ├── AdminRepairTableStmt      [vim org/apache/doris/analysis/AdminRepairTableStmt.java +33]
          ├── AdminSetConfigStmt        [vim org/apache/doris/analysis/AdminSetConfigStmt.java +32]
          ├── AdminSetReplicaStatusStmt [vim org/apache/doris/analysis/AdminSetReplicaStatusStmt.java +30]
          ├── AlterClusterStmt  [vim org/apache/doris/analysis/AlterClusterStmt.java +29]
          ├── AlterColumnStatsStmt      [vim org/apache/doris/analysis/AlterColumnStatsStmt.java +33]
          ├── AlterDatabasePropertyStmt [vim org/apache/doris/analysis/AlterDatabasePropertyStmt.java +24]
          ├── AlterDatabaseQuotaStmt    [vim org/apache/doris/analysis/AlterDatabaseQuotaStmt.java +30]
          ├── AlterDatabaseRename       [vim org/apache/doris/analysis/AlterDatabaseRename.java +34]
          ├── AlterRoutineLoadStmt      [vim org/apache/doris/analysis/AlterRoutineLoadStmt.java +34]
          ├── AlterSqlBlockRuleStmt     [vim org/apache/doris/analysis/AlterSqlBlockRuleStmt.java +31]
          ├── AlterSystemStmt   [vim org/apache/doris/analysis/AlterSystemStmt.java +28]
          ├── AlterTableStatsStmt       [vim org/apache/doris/analysis/AlterTableStatsStmt.java +33]
          ├── AlterTableStmt    [vim org/apache/doris/analysis/AlterTableStmt.java +37]
          ├── CancelLoadStmt    [vim org/apache/doris/analysis/CancelLoadStmt.java +26]
          ├── CreateClusterStmt [vim org/apache/doris/analysis/CreateClusterStmt.java +33]
          ├── CreateDataSyncJobStmt     [vim org/apache/doris/analysis/CreateDataSyncJobStmt.java +36]
          ├── CreateDbStmt      [vim org/apache/doris/analysis/CreateDbStmt.java +31]
          ├── CreateEncryptKeyStmt      [vim org/apache/doris/analysis/CreateEncryptKeyStmt.java +30]
          ├── CreateFileStmt    [vim org/apache/doris/analysis/CreateFileStmt.java +35]
          ├── CreateFunctionStmt        [vim org/apache/doris/analysis/CreateFunctionStmt.java +47]
          ├── CreateMaterializedViewStmt        [vim org/apache/doris/analysis/CreateMaterializedViewStmt.java +43]
          ├── CreateRepositoryStmt      [vim org/apache/doris/analysis/CreateRepositoryStmt.java +28]
          ├── CreateResourceStmt        [vim org/apache/doris/analysis/CreateResourceStmt.java +32]
          ├── CreateRoleStmt    [vim org/apache/doris/analysis/CreateRoleStmt.java +28]
          ├── CreateRoutineLoadStmt     [vim org/apache/doris/analysis/CreateRoutineLoadStmt.java +48]
          ├── CreateSqlBlockRuleStmt    [vim org/apache/doris/analysis/CreateSqlBlockRuleStmt.java +37]
          ├── CreateTableAsSelectStmt   [vim org/apache/doris/analysis/CreateTableAsSelectStmt.java +26]
          ├── CreateTableLikeStmt       [vim org/apache/doris/analysis/CreateTableLikeStmt.java +31]
          ├── CreateTableStmt   [vim org/apache/doris/analysis/CreateTableStmt.java +56]
          ├── CreateUserStmt    [vim org/apache/doris/analysis/CreateUserStmt.java +36]
          ├── DeleteStmt        [vim org/apache/doris/analysis/DeleteStmt.java +35]
          ├── DropClusterStmt   [vim org/apache/doris/analysis/DropClusterStmt.java +31]
          ├── DropDbStmt        [vim org/apache/doris/analysis/DropDbStmt.java +30]
          ├── DropEncryptKeyStmt        [vim org/apache/doris/analysis/DropEncryptKeyStmt.java +28]
          ├── DropFileStmt      [vim org/apache/doris/analysis/DropFileStmt.java +34]
          ├── DropFunctionStmt  [vim org/apache/doris/analysis/DropFunctionStmt.java +27]
          ├── DropMaterializedViewStmt  [vim org/apache/doris/analysis/DropMaterializedViewStmt.java +29]
          ├── DropRepositoryStmt        [vim org/apache/doris/analysis/DropRepositoryStmt.java +27]
          ├── DropResourceStmt  [vim org/apache/doris/analysis/DropResourceStmt.java +27]
          ├── DropRoleStmt      [vim org/apache/doris/analysis/DropRoleStmt.java +28]
          ├── DropSqlBlockRuleStmt      [vim org/apache/doris/analysis/DropSqlBlockRuleStmt.java +30]
          ├── DropTableStmt     [vim org/apache/doris/analysis/DropTableStmt.java +28]
          ├── DropUserStmt      [vim org/apache/doris/analysis/DropUserStmt.java +27]
          ├── EnterStmt [vim org/apache/doris/analysis/EnterStmt.java +25]
          ├── GrantStmt [vim org/apache/doris/analysis/GrantStmt.java +39]
          ├── InsertStmt        [vim org/apache/doris/analysis/InsertStmt.java +66]
          ├── InstallPluginStmt [vim org/apache/doris/analysis/InstallPluginStmt.java +31]
          ├── LinkDbStmt        [vim org/apache/doris/analysis/LinkDbStmt.java +31]
          ├── LoadStmt  [vim org/apache/doris/analysis/LoadStmt.java +45]
          ├── MigrateDbStmt     [vim org/apache/doris/analysis/MigrateDbStmt.java +29]
          ├── PauseRoutineLoadStmt      [vim org/apache/doris/analysis/PauseRoutineLoadStmt.java +26]
          ├── PauseSyncJobStmt  [vim org/apache/doris/analysis/PauseSyncJobStmt.java +22]
          ├── RecoverDbStmt     [vim org/apache/doris/analysis/RecoverDbStmt.java +33]
          ├── RecoverPartitionStmt      [vim org/apache/doris/analysis/RecoverPartitionStmt.java +32]
          ├── RecoverTableStmt  [vim org/apache/doris/analysis/RecoverTableStmt.java +32]
          ├── ResumeRoutineLoadStmt     [vim org/apache/doris/analysis/ResumeRoutineLoadStmt.java +26]
          ├── ResumeSyncJobStmt [vim org/apache/doris/analysis/ResumeSyncJobStmt.java +22]
          ├── RevokeStmt        [vim org/apache/doris/analysis/RevokeStmt.java +32]
          ├── SetUserPropertyStmt       [vim org/apache/doris/analysis/SetUserPropertyStmt.java +31]
          ├── StopRoutineLoadStmt       [vim org/apache/doris/analysis/StopRoutineLoadStmt.java +23]
          ├── StopSyncJobStmt   [vim org/apache/doris/analysis/StopSyncJobStmt.java +22]
          ├── SyncStmt  [vim org/apache/doris/analysis/SyncStmt.java +22]
          ├── TruncateTableStmt [vim org/apache/doris/analysis/TruncateTableStmt.java +27]
          ├── UninstallPluginStmt       [vim org/apache/doris/analysis/UninstallPluginStmt.java +28]
          ├── UpdateStmt        [vim org/apache/doris/analysis/UpdateStmt.java +35]
          ├── AbstractBackupStmt        [vim org/apache/doris/analysis/AbstractBackupStmt.java +36]
          │   ├── BackupStmt    [vim org/apache/doris/analysis/BackupStmt.java +29]
          │   └── RestoreStmt   [vim org/apache/doris/analysis/RestoreStmt.java +33]
          ├── BaseViewStmt      [vim org/apache/doris/analysis/BaseViewStmt.java +39]
          │   ├── AlterViewStmt [vim org/apache/doris/analysis/AlterViewStmt.java +31]
          │   └── CreateViewStmt        [vim org/apache/doris/analysis/CreateViewStmt.java +33]
          └── CancelStmt        [vim org/apache/doris/analysis/CancelStmt.java +19]
              ├── CancelAlterSystemStmt [vim org/apache/doris/analysis/CancelAlterSystemStmt.java +28]
              ├── CancelAlterTableStmt  [vim org/apache/doris/analysis/CancelAlterTableStmt.java +31]
              └── CancelBackupStmt      [vim org/apache/doris/analysis/CancelBackupStmt.java +30]

org.apache.doris.qe.StmtExecutor#execute(TUniqueId)

analyze(context.getSessionVariable().toThrift()); //語義解析
if (isForwardToMaster()) {
    forwardToMaster();  //轉發處理
    if (masterOpExecutor != null && masterOpExecutor.getQueryId() != null) {
        context.setQueryId(masterOpExecutor.getQueryId());
    }
    return;
} else {
    LOG.debug("no need to transfer to Master. stmt: {}", context.getStmtId());
}


//命令執行
else if (parsedStmt instanceof DdlStmt) {
    handleDdlStmt();
} else if (parsedStmt instanceof ShowStmt) {
    handleShow();
} else if (parsedStmt instanceof KillStmt) {
    handleKill();
}

語義解析

判斷含義的正確性

@Override
public void analyze(Analyzer analyzer) throws UserException {

    super.analyze(analyzer);
    tableName.analyze(analyzer);   
    
    checkTblPriv(ConnectContext.get(), tableName.getDb(),
        tableName.getTbl(), PrivPredicate.CREATE)
        
    analyzeEngineName();  
      
    keysDesc.analyze(columnDefs);
    
    for (ColumnDef columnDef : columnDefs) {
        columnDef.analyze(engineName.equals("olap"));
    }   
    
    partitionDesc.analyze(columnDefs, properties);
    
    distributionDesc.analyze(columnSet);
}

驗證名稱是否合法
許可權是否正確
分割槽是否合法
列型別是否合法

轉發處理

Master、Follower、Observer
只有Master有後設資料的修改能力
所有需要修改後設資料的操作,需要轉發到Master去執行
轉發型別:
FORWARD_NO_SYNC
FORWARD_WITH_SYNC
NO_FORWARD
DDL 採用 FORWARD_WITH_SYNC

命令執行

org.apache.doris.qe.DdlExecutor#execute()

//根據語句型別執行相應的函式

if (ddlStmt instanceof CreateTableStmt) {
    catalog.createTable((CreateTableStmt) ddlStmt);
} 

支援多種表型別, 除了olap 表, 其餘都為對映表

if (engineName.equals("olap")) {
    createOlapTable(db, stmt);
    return;
} else if (engineName.equals("odbc")) {
    createOdbcTable(db, stmt);
    return;
} else if (engineName.equals("mysql")) {
    createMysqlTable(db, stmt);
    return;
} else if (engineName.equals("broker")) {
    createBrokerTable(db, stmt);
    return;
} else if (engineName.equalsIgnoreCase("elasticsearch") || engineName.equalsIgnoreCase("es")) {
    createEsTable(db, stmt);
    return;
} else if (engineName.equalsIgnoreCase("hive")) {
    createHiveTable(db, stmt);
    return;
}

org.apache.doris.catalog.Catalog#createOlapTable

//將語法物件轉為後設資料物件
String tableName = stmt.getTableName();
LOG.debug("begin create olap table: {}", tableName);

// create columns
List<Column> baseSchema = stmt.getColumns();
validateColumns(baseSchema);

// create partition info
PartitionDesc partitionDesc = stmt.getPartitionDesc();
PartitionInfo partitionInfo = null;

//建立table物件
long tableId = Catalog.getCurrentCatalog().getNextId();
OlapTable olapTable = new OlapTable(tableId, tableName, baseSchema, keysType, partitionInfo,
        distributionInfo, indexes);
        
  // 建立Partition 物件
  if (partitionInfo.getType() == PartitionType.UNPARTITIONED) {
        // this is a 1-level partitioned table
        // use table name as partition name
        String partitionName = tableName;
        long partitionId = partitionNameToId.get(partitionName);
        // create partition
        Partition partition = createPartitionWithIndices()
        olapTable.addPartition(partition);
    }
    
    
//新增後設資料並進行持久化
Pair<Boolean, Boolean> result = db.createTableWithLock(olapTable, false, stmt.isSetIfNotExists());

org.apache.doris.catalog.Catalog#createPartitionWithIndices

分割槽是表的實體

image

// create base index first.
Preconditions.checkArgument(baseIndexId != -1);
MaterializedIndex baseIndex = new MaterializedIndex(baseIndexId, IndexState.NORMAL);

// create partition with base index
Partition partition = new Partition(partitionId, partitionName, baseIndex, distributionInfo);

// add to index map
Map<Long, MaterializedIndex> indexMap = new HashMap<>();
indexMap.put(baseIndexId, baseIndex);

// create rollup index if has
for (long indexId : indexIdToMeta.keySet()) {
    if (indexId == baseIndexId) {
        continue;
    }

    MaterializedIndex rollup = new MaterializedIndex(indexId, IndexState.NORMAL);
    indexMap.put(indexId, rollup);
}

for (Map.Entry<Long, MaterializedIndex> entry : indexMap.entrySet()) {
// create tablets
int schemaHash = indexMeta.getSchemaHash();
TabletMeta tabletMeta = new TabletMeta(dbId, tableId, partitionId, indexId, schemaHash, storageMedium);
createTablets(clusterName, index, ReplicaState.NORMAL, distributionInfo, version, versionHash,
        replicaAlloc, tabletMeta, tabletIdSet);
        
    
// add create replica task for olap
short shortKeyColumnCount = indexMeta.getShortKeyColumnCount();
TStorageType storageType = indexMeta.getStorageType();
List<Column> schema = indexMeta.getSchema();
KeysType keysType = indexMeta.getKeysType();
int totalTaskNum = index.getTablets().size() * totalReplicaNum;
MarkedCountDownLatch<Long, Long> countDownLatch = new MarkedCountDownLatch<Long, Long>(totalTaskNum);
AgentBatchTask batchTask = new AgentBatchTask();
for (Tablet tablet : index.getTablets()) {
    long tabletId = tablet.getId();
    for (Replica replica : tablet.getReplicas()) {
        long backendId = replica.getBackendId();
        countDownLatch.addMark(backendId, tabletId);
        CreateReplicaTask task = new CreateReplicaTask(backendId, dbId, tableId,
                partitionId, indexId, tabletId,
                shortKeyColumnCount, schemaHash,
                version, versionHash,
                keysType,
                storageType, storageMedium,
                schema, bfColumns, bfFpp,
                countDownLatch,
                indexes,
                isInMemory,
                tabletType);
        task.setStorageFormat(storageFormat);
        batchTask.addTask(task);
        // add to AgentTaskQueue for handling finish report.
        // not for resending task
        AgentTaskQueue.addTask(task);
    }
}
AgentTaskExecutor.submit(batchTask);    
       
}

整體流程:

  • 建立Partition 物件
  • 建立MaterializedIndex物件
  • 對於每個MaterializedIndex物件 建立建立Tablet
  • 建立replica並下發任務到BE

// estimate timeout
long timeout = Config.tablet_create_timeout_second * 1000L * totalTaskNum;
timeout = Math.min(timeout, Config.max_create_table_timeout_second * 1000);
try {
    ok = countDownLatch.await(timeout, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
    LOG.warn("InterruptedException: ", e);
    ok = false;
}

等待BE執行任務完成

org.apache.doris.catalog.Database#createTableWithLock

idToTable.put(table.getId(), table);
nameToTable.put(table.getName(), table);
lowerCaseToTableName.put(tableName.toLowerCase(), tableName);

if (!isReplay) {
    // Write edit log
    CreateTableInfo info = new CreateTableInfo(fullQualifiedName, table);
    Catalog.getCurrentCatalog().getEditLog().logCreateTable(info);
}
if (table.getType() == TableType.ELASTICSEARCH) {
    Catalog.getCurrentCatalog().getEsRepository().registerTable((EsTable) table);
}

將table新增到DataBase物件裡
判斷是否replay
寫入後設資料日誌

流程總結:

image

FE與BE互動

image

FE與BE互動流程圖

FE 傳送任務
BE執行
BE彙報執行結果
FE彙總結果

AgentBatchTask batchTask = new AgentBatchTask();
for (Tablet tablet : index.getTablets()) {
    long tabletId = tablet.getId();
    for (Replica replica : tablet.getReplicas()) {
        long backendId = replica.getBackendId();
        countDownLatch.addMark(backendId, tabletId);
        CreateReplicaTask task = new CreateReplicaTask(backendId, dbId, tableId,
                partitionId, indexId, tabletId,
                shortKeyColumnCount, schemaHash,
                version, versionHash,
                keysType,
                storageType, storageMedium,
                schema, bfColumns, bfFpp,
                countDownLatch,
                indexes,
                isInMemory,
                tabletType);
        task.setStorageFormat(storageFormat);
        batchTask.addTask(task);
        // add to AgentTaskQueue for handling finish report.
        // not for resending task
        AgentTaskQueue.addTask(task);
    }
}
AgentTaskExecutor.submit(batchTask);

AgentBatchTask:
收集Task並按照Be分組

AgentTaskExecutor:
傳送AgentBatchTask

AgentTaskQueue:
處理任務完成的上報

BE任務接收
be/src/agent/agent_server.cpp
接收Task

// resend request when something is wrong(BE may need some logic to guarantee idempotence.
void AgentServer::submit_tasks(TAgentResult& agent_result,
                               const std::vector<TAgentTaskRequest>& tasks) {
    Status ret_st;
    // TODO check master_info here if it is the same with that of heartbeat rpc
    if (_master_info.network_address.hostname == "" || _master_info.network_address.port == 0) {
        Status ret_st = Status::Cancelled("Have not get FE Master heartbeat yet");
        ret_st.to_thrift(&agent_result.status);
        return;
    }

    for (auto task : tasks) {
        VLOG_RPC << "submit one task: " << apache::thrift::ThriftDebugString(task).c_str();
        TTaskType::type task_type = task.task_type;
        int64_t signature = task.signature;

#define HANDLE_TYPE(t_task_type, work_pool, req_member)                         \
    case t_task_type:                                                           \
        if (task.__isset.req_member) {                                          \
            work_pool->submit_task(task);                                       \
        } else {                                                                \
            ret_st = Status::InvalidArgument(strings::Substitute(               \
                    "task(signature=$0) has wrong request member", signature)); \
        }                                                                       \
        break;

       ...

    ret_st.to_thrift(&agent_result.status);
}

工作執行緒

while (_is_work) {
    TAgentTaskRequest agent_task_req;
    TCreateTabletReq create_tablet_req;
    {
        lock_guard<Mutex> worker_thread_lock(_worker_thread_lock);
        while (_is_work && _tasks.empty()) {
            _worker_thread_condition_variable.wait();
        }
        if (!_is_work) {
            return;
        }
        //從佇列中取出任務
        agent_task_req = _tasks.front();
        create_tablet_req = agent_task_req.create_tablet_req;
        _tasks.pop_front();
      
        //執行
        OLAPStatus create_status = _env->storage_engine()->create_tablet(create_tablet_req);
      
      
        TFinishTaskRequest finish_task_request;
        finish_task_request.__set_finish_tablet_infos(finish_tablet_infos);
        finish_task_request.__set_backend(_backend);
        finish_task_request.__set_report_version(_s_report_version);
        finish_task_request.__set_task_type(agent_task_req.task_type);
        finish_task_request.__set_signature(agent_task_req.signature);
        finish_task_request.__set_task_status(task_status);
        //彙報結果
        _finish_task(finish_task_request);
   }   

處理任務彙報

org.apache.doris.service.FrontendServiceImpl#finishTask
org.apache.doris.master.MasterImpl#finishTask
FE、BE通過Thrift協議通訊

錯誤處理

org.apache.doris.task.AgentTaskQueue 儲存正在執行的Task

image

org.apache.doris.master.ReportHandler#handleReport
org.apache.doris.master.ReportHandler#taskReport
BE: Report tasks/olap tablet/disk state to the master server
FE master 處理任務,超時會進行重試

private static void taskReport(long backendId, Map<TTaskType, Set<Long>> runningTasks) {
    ...
    
    // to escape sending duplicate agent task to be
    if (task.shouldResend(taskReportTime)) {
        batchTask.addTask(task);
    }
    ...
}

後設資料持久化

image

Edit類似WAL

BDBJE 分散式KV儲存

後設資料持久化:org.apache.doris.catalog.Database#createTableWithLock

public Pair<Boolean, Boolean> createTableWithLock(Table table, boolean isReplay, boolean setIfNotExist) {
   ...
   //更新記憶體
   nameToTable.put(table.getName(), table);
    // Write edit log
    //構建後設資料日誌
    CreateTableInfo info = new CreateTableInfo(fullQualifiedName, table);
    //寫入後設資料日誌
    Catalog.getCurrentCatalog().getEditLog().logCreateTable(info);  
   ...
    
}

後設資料回放

後設資料回放發生在FE leader 給 其他FE節點同步的時候

image

逐一回放後設資料
在記憶體中復原後設資料
org.apache.doris.catalog.Catalog#replayCreateTable

public void replayCreateTable(String dbName, Table table) {
    Database db = this.fullNameToDb.get(dbName);
    db.createTableWithLock(table, true, false);
    ...
}

如何實現一個新的語句

fe/fe-core/src/main/cup/sql_parser.cup 語法檔案

KW_CREATE opt_external:isExternal KW_TABLE opt_if_not_exists:ifNotExists table_name:name
        LPAREN column_definition_list:columns COMMA index_definition_list:indexes RPAREN opt_engine:engineName
        opt_keys:keys
        opt_comment:tableComment
        opt_partition:partition
        opt_distribution:distribution
        opt_rollup:index
        opt_properties:tblProperties
        opt_ext_properties:extProperties
{:
    RESULT = new CreateTableStmt(ifNotExists, isExternal, name, columns, indexes, engineName, keys, partition,
    distribution, tblProperties, extProperties, tableComment, index);
:}

fe/fe-core/src/main/jflex/sql_scanner.flex 詞法檔案

keywordMap.put("create", new Integer(SqlParserSymbols.KW_CREATE));
keywordMap.put("cross", new Integer(SqlParserSymbols.KW_CROSS));
keywordMap.put("cube", new Integer(SqlParserSymbols.KW_CUBE));
keywordMap.put("current", new Integer(SqlParserSymbols.KW_CURRENT));
keywordMap.put("current_user", new Integer(SqlParserSymbols.KW_CURRENT_USER));
keywordMap.put("data", new Integer(SqlParserSymbols.KW_DATA));
keywordMap.put("database", new Integer(SqlParserSymbols.KW_DATABASE));

詞法語法的程式碼生成:

cd fe/ && mvn clean install –DskipTests
• SqlScanner.java
• SqlParser.java
• SqlParserSymbols.java

實現新語句步驟總結:

  1. 定義詞法語法檔案
  2. 實現對應的語句類,比如CreateTableStmt
  3. 實現後設資料修改的方法,如Catalog.createTable()
  4. 定義對應操作的後設資料日誌類,如CreateTableInfo
  5. 實現後設資料日誌的寫入
  6. 實現對應的replay方法,如Catalog.replayCreateTable()

相關文章