後設資料設計
如上圖,Doris 的後設資料主要儲存4類資料:
- 使用者資料資訊。包括資料庫、表的 Schema、分片資訊等。
- 各類作業資訊。如匯入作業,Clone 作業、SchemaChange 作業等。
- 使用者及許可權資訊
- 叢集及節點資訊
後設資料目錄
後設資料目錄通過 FE 的配置項 meta_dir 指定。
bdb/ 目錄下為 bdbje 的資料存放目錄。
image/ 目錄下為 image 檔案的存放目錄。
image.[logid] 是最新的 image 檔案。字尾 logid 表明 image 所包含的最後一條日誌的 id。
image.ckpt 是正在寫入的 image 檔案,如果寫入成功,會重新命名為 image.[logid],並替換掉舊的 image 檔案。
VERSION 檔案中記錄著 cluster_id。cluster_id 唯一標識一個 Doris 叢集。是在 leader 第一次啟動時隨機生成的一個 32 位整型。也可以通過 fe 配置項 cluster_id 來指定一個 cluster id。
ROLE 檔案中記錄的 FE 自身的角色。只有 FOLLOWER 和 OBSERVER 兩種。其中 FOLLOWER 表示 FE 為一個可選舉的節點。(注意:即使是 leader 節點,其角色也為 FOLLOWER)
DDL相關原始碼閱讀
啟動MySQL服務
org.apache.doris.qe.QeService
if (nioEnabled) {
mysqlServer = new NMysqlServer(port, scheduler);
} else {
mysqlServer = new MysqlServer(port, scheduler);
}
DDL程式碼呼叫過程
org.apache.doris.qe.ConnectProcessor#dispatch 命令識別
switch (command) {
case COM_INIT_DB:
handleInitDb();
break;
case COM_QUIT:
handleQuit();
break;
case COM_QUERY:
handleQuery();
break;
case COM_FIELD_LIST:
handleFieldList();
break;
case COM_PING:
handlePing();
break;
default:
ctx.getState().setError("Unsupported command(" + command + ")");
LOG.warn("Unsupported command(" + command + ")");
break;
org.apache.doris.qe.ConnectProcessor#analyze 詞法語法解析
// Parse statement with parser generated by CUP&FLEX
SqlScanner input = new SqlScanner(new StringReader(originStmt), ctx.getSessionVariable().getSqlMode());
SqlParser parser = new SqlParser(input);
從連線中讀取原始語句字串
詞法解析檔案
• fe/fe-core/src/main/jflex/sql_scanner.flex
• 語法解析檔案
• fe/fe-core/src/main/cup/sql_parser.cup
所有語法實現類:
StatementBase [vim org/apache/doris/analysis/StatementBase.java +33]
├── ExportStmt [vim org/apache/doris/analysis/ExportStmt.java +48]
├── ImportColumnsStmt [vim org/apache/doris/analysis/ImportColumnsStmt.java +21]
├── ImportDeleteOnStmt [vim org/apache/doris/analysis/ImportDeleteOnStmt.java +19]
├── ImportSequenceStmt [vim org/apache/doris/analysis/ImportSequenceStmt.java +19]
├── ImportWhereStmt [vim org/apache/doris/analysis/ImportWhereStmt.java +19]
├── KillStmt [vim org/apache/doris/analysis/KillStmt.java +19]
├── SetStmt [vim org/apache/doris/analysis/SetStmt.java +24]
├── UseStmt [vim org/apache/doris/analysis/UseStmt.java +33]
├── QueryStmt [vim org/apache/doris/analysis/QueryStmt.java +38]
│ ├── SelectStmt [vim org/apache/doris/analysis/SelectStmt.java +65]
│ └── SetOperationStmt [vim org/apache/doris/analysis/SetOperationStmt.java +36]
├── ShowStmt [vim org/apache/doris/analysis/ShowStmt.java +22]
│ ├── AdminShowConfigStmt [vim org/apache/doris/analysis/AdminShowConfigStmt.java +33]
│ ├── AdminShowDataSkewStmt [vim org/apache/doris/analysis/AdminShowDataSkewStmt.java +32]
│ ├── AdminShowReplicaDistributionStmt [vim org/apache/doris/analysis/AdminShowReplicaDistributionStmt.java +34]
│ ├── AdminShowReplicaStatusStmt [vim org/apache/doris/analysis/AdminShowReplicaStatusStmt.java +39]
│ ├── DescribeStmt [vim org/apache/doris/analysis/DescribeStmt.java +54]
│ ├── HelpStmt [vim org/apache/doris/analysis/HelpStmt.java +26]
│ ├── ShowAlterStmt [vim org/apache/doris/analysis/ShowAlterStmt.java +46]
│ ├── ShowAuthorStmt [vim org/apache/doris/analysis/ShowAuthorStmt.java +23]
│ ├── ShowBackendsStmt [vim org/apache/doris/analysis/ShowBackendsStmt.java +30]
│ ├── ShowBackupStmt [vim org/apache/doris/analysis/ShowBackupStmt.java +38]
│ ├── ShowBrokerStmt [vim org/apache/doris/analysis/ShowBrokerStmt.java +30]
│ ├── ShowCharsetStmt [vim org/apache/doris/analysis/ShowCharsetStmt.java +23]
│ ├── ShowClusterStmt [vim org/apache/doris/analysis/ShowClusterStmt.java +34]
│ ├── ShowCollationStmt [vim org/apache/doris/analysis/ShowCollationStmt.java +24]
│ ├── ShowColumnStatsStmt [vim org/apache/doris/analysis/ShowColumnStatsStmt.java +28]
│ ├── ShowColumnStmt [vim org/apache/doris/analysis/ShowColumnStmt.java +28]
│ ├── ShowCreateDbStmt [vim org/apache/doris/analysis/ShowCreateDbStmt.java +36]
│ ├── ShowCreateFunctionStmt [vim org/apache/doris/analysis/ShowCreateFunctionStmt.java +32]
│ ├── ShowCreateRoutineLoadStmt [vim org/apache/doris/analysis/ShowCreateRoutineLoadStmt.java +24]
│ ├── ShowCreateTableStmt [vim org/apache/doris/analysis/ShowCreateTableStmt.java +29]
│ ├── ShowDataStmt [vim org/apache/doris/analysis/ShowDataStmt.java +56]
│ ├── ShowDbIdStmt [vim org/apache/doris/analysis/ShowDbIdStmt.java +29]
│ ├── ShowDbStmt [vim org/apache/doris/analysis/ShowDbStmt.java +27]
│ ├── ShowDeleteStmt [vim org/apache/doris/analysis/ShowDeleteStmt.java +31]
│ ├── ShowDynamicPartitionStmt [vim org/apache/doris/analysis/ShowDynamicPartitionStmt.java +29]
│ ├── ShowEncryptKeysStmt [vim org/apache/doris/analysis/ShowEncryptKeysStmt.java +32]
│ ├── ShowEnginesStmt [vim org/apache/doris/analysis/ShowEnginesStmt.java +23]
│ ├── ShowEventsStmt [vim org/apache/doris/analysis/ShowEventsStmt.java +23]
│ ├── ShowExportStmt [vim org/apache/doris/analysis/ShowExportStmt.java +40]
│ ├── ShowFrontendsStmt [vim org/apache/doris/analysis/ShowFrontendsStmt.java +30]
│ ├── ShowFunctionsStmt [vim org/apache/doris/analysis/ShowFunctionsStmt.java +32]
│ ├── ShowGrantsStmt [vim org/apache/doris/analysis/ShowGrantsStmt.java +32]
│ ├── ShowIndexStmt [vim org/apache/doris/analysis/ShowIndexStmt.java +32]
│ ├── ShowLoadProfileStmt [vim org/apache/doris/analysis/ShowLoadProfileStmt.java +27]
│ ├── ShowLoadStmt [vim org/apache/doris/analysis/ShowLoadStmt.java +42]
│ ├── ShowLoadWarningsStmt [vim org/apache/doris/analysis/ShowLoadWarningsStmt.java +36]
│ ├── ShowMigrationsStmt [vim org/apache/doris/analysis/ShowMigrationsStmt.java +31]
│ ├── ShowOpenTableStmt [vim org/apache/doris/analysis/ShowOpenTableStmt.java +23]
│ ├── ShowPartitionIdStmt [vim org/apache/doris/analysis/ShowPartitionIdStmt.java +29]
│ ├── ShowPartitionsStmt [vim org/apache/doris/analysis/ShowPartitionsStmt.java +49]
│ ├── ShowPluginsStmt [vim org/apache/doris/analysis/ShowPluginsStmt.java +23]
│ ├── ShowProcStmt [vim org/apache/doris/analysis/ShowProcStmt.java +32]
│ ├── ShowProcedureStmt [vim org/apache/doris/analysis/ShowProcedureStmt.java +23]
│ ├── ShowProcesslistStmt [vim org/apache/doris/analysis/ShowProcesslistStmt.java +24]
│ ├── ShowQueryProfileStmt [vim org/apache/doris/analysis/ShowQueryProfileStmt.java +27]
│ ├── ShowRepositoriesStmt [vim org/apache/doris/analysis/ShowRepositoriesStmt.java +25]
│ ├── ShowResourcesStmt [vim org/apache/doris/analysis/ShowResourcesStmt.java +37]
│ ├── ShowRestoreStmt [vim org/apache/doris/analysis/ShowRestoreStmt.java +38]
│ ├── ShowRolesStmt [vim org/apache/doris/analysis/ShowRolesStmt.java +29]
│ ├── ShowRollupStmt [vim org/apache/doris/analysis/ShowRollupStmt.java +28]
│ ├── ShowRoutineLoadStmt [vim org/apache/doris/analysis/ShowRoutineLoadStmt.java +34]
│ ├── ShowRoutineLoadTaskStmt [vim org/apache/doris/analysis/ShowRoutineLoadTaskStmt.java +32]
│ ├── ShowSmallFilesStmt [vim org/apache/doris/analysis/ShowSmallFilesStmt.java +32]
│ ├── ShowSnapshotStmt [vim org/apache/doris/analysis/ShowSnapshotStmt.java +29]
│ ├── ShowSqlBlockRuleStmt [vim org/apache/doris/analysis/ShowSqlBlockRuleStmt.java +31]
│ ├── ShowStatusStmt [vim org/apache/doris/analysis/ShowStatusStmt.java +23]
│ ├── ShowStreamLoadStmt [vim org/apache/doris/analysis/ShowStreamLoadStmt.java +39]
│ ├── ShowSyncJobStmt [vim org/apache/doris/analysis/ShowSyncJobStmt.java +33]
│ ├── ShowTableIdStmt [vim org/apache/doris/analysis/ShowTableIdStmt.java +30]
│ ├── ShowTableStatsStmt [vim org/apache/doris/analysis/ShowTableStatsStmt.java +32]
│ ├── ShowTableStatusStmt [vim org/apache/doris/analysis/ShowTableStatusStmt.java +35]
│ ├── ShowTableStmt [vim org/apache/doris/analysis/ShowTableStmt.java +34]
│ ├── ShowTabletStmt [vim org/apache/doris/analysis/ShowTabletStmt.java +39]
│ ├── ShowTransactionStmt [vim org/apache/doris/analysis/ShowTransactionStmt.java +35]
│ ├── ShowTrashDiskStmt [vim org/apache/doris/analysis/ShowTrashDiskStmt.java +33]
│ ├── ShowTrashStmt [vim org/apache/doris/analysis/ShowTrashStmt.java +36]
│ ├── ShowTriggersStmt [vim org/apache/doris/analysis/ShowTriggersStmt.java +23]
│ ├── ShowUserPropertyStmt [vim org/apache/doris/analysis/ShowUserPropertyStmt.java +42]
│ ├── ShowUserStmt [vim org/apache/doris/analysis/ShowUserStmt.java +25]
│ ├── ShowVariablesStmt [vim org/apache/doris/analysis/ShowVariablesStmt.java +29]
│ ├── ShowViewStmt [vim org/apache/doris/analysis/ShowViewStmt.java +39]
│ ├── ShowWarningStmt [vim org/apache/doris/analysis/ShowWarningStmt.java +23]
│ └── ShowWhiteListStmt [vim org/apache/doris/analysis/ShowWhiteListStmt.java +23]
├── TransactionStmt [vim org/apache/doris/analysis/TransactionStmt.java +22]
│ ├── TransactionBeginStmt [vim org/apache/doris/analysis/TransactionBeginStmt.java +24]
│ ├── TransactionCommitStmt [vim org/apache/doris/analysis/TransactionCommitStmt.java +19]
│ └── TransactionRollbackStmt [vim org/apache/doris/analysis/TransactionRollbackStmt.java +19]
├── UnsupportedStmt [vim org/apache/doris/analysis/UnsupportedStmt.java +22]
│ └── EmptyStmt [vim org/apache/doris/analysis/EmptyStmt.java +19]
└── DdlStmt [vim org/apache/doris/analysis/DdlStmt.java +19]
├── AdminCancelRepairTableStmt [vim org/apache/doris/analysis/AdminCancelRepairTableStmt.java +33]
├── AdminCheckTabletsStmt [vim org/apache/doris/analysis/AdminCheckTabletsStmt.java +33]
├── AdminCleanTrashStmt [vim org/apache/doris/analysis/AdminCleanTrashStmt.java +34]
├── AdminRepairTableStmt [vim org/apache/doris/analysis/AdminRepairTableStmt.java +33]
├── AdminSetConfigStmt [vim org/apache/doris/analysis/AdminSetConfigStmt.java +32]
├── AdminSetReplicaStatusStmt [vim org/apache/doris/analysis/AdminSetReplicaStatusStmt.java +30]
├── AlterClusterStmt [vim org/apache/doris/analysis/AlterClusterStmt.java +29]
├── AlterColumnStatsStmt [vim org/apache/doris/analysis/AlterColumnStatsStmt.java +33]
├── AlterDatabasePropertyStmt [vim org/apache/doris/analysis/AlterDatabasePropertyStmt.java +24]
├── AlterDatabaseQuotaStmt [vim org/apache/doris/analysis/AlterDatabaseQuotaStmt.java +30]
├── AlterDatabaseRename [vim org/apache/doris/analysis/AlterDatabaseRename.java +34]
├── AlterRoutineLoadStmt [vim org/apache/doris/analysis/AlterRoutineLoadStmt.java +34]
├── AlterSqlBlockRuleStmt [vim org/apache/doris/analysis/AlterSqlBlockRuleStmt.java +31]
├── AlterSystemStmt [vim org/apache/doris/analysis/AlterSystemStmt.java +28]
├── AlterTableStatsStmt [vim org/apache/doris/analysis/AlterTableStatsStmt.java +33]
├── AlterTableStmt [vim org/apache/doris/analysis/AlterTableStmt.java +37]
├── CancelLoadStmt [vim org/apache/doris/analysis/CancelLoadStmt.java +26]
├── CreateClusterStmt [vim org/apache/doris/analysis/CreateClusterStmt.java +33]
├── CreateDataSyncJobStmt [vim org/apache/doris/analysis/CreateDataSyncJobStmt.java +36]
├── CreateDbStmt [vim org/apache/doris/analysis/CreateDbStmt.java +31]
├── CreateEncryptKeyStmt [vim org/apache/doris/analysis/CreateEncryptKeyStmt.java +30]
├── CreateFileStmt [vim org/apache/doris/analysis/CreateFileStmt.java +35]
├── CreateFunctionStmt [vim org/apache/doris/analysis/CreateFunctionStmt.java +47]
├── CreateMaterializedViewStmt [vim org/apache/doris/analysis/CreateMaterializedViewStmt.java +43]
├── CreateRepositoryStmt [vim org/apache/doris/analysis/CreateRepositoryStmt.java +28]
├── CreateResourceStmt [vim org/apache/doris/analysis/CreateResourceStmt.java +32]
├── CreateRoleStmt [vim org/apache/doris/analysis/CreateRoleStmt.java +28]
├── CreateRoutineLoadStmt [vim org/apache/doris/analysis/CreateRoutineLoadStmt.java +48]
├── CreateSqlBlockRuleStmt [vim org/apache/doris/analysis/CreateSqlBlockRuleStmt.java +37]
├── CreateTableAsSelectStmt [vim org/apache/doris/analysis/CreateTableAsSelectStmt.java +26]
├── CreateTableLikeStmt [vim org/apache/doris/analysis/CreateTableLikeStmt.java +31]
├── CreateTableStmt [vim org/apache/doris/analysis/CreateTableStmt.java +56]
├── CreateUserStmt [vim org/apache/doris/analysis/CreateUserStmt.java +36]
├── DeleteStmt [vim org/apache/doris/analysis/DeleteStmt.java +35]
├── DropClusterStmt [vim org/apache/doris/analysis/DropClusterStmt.java +31]
├── DropDbStmt [vim org/apache/doris/analysis/DropDbStmt.java +30]
├── DropEncryptKeyStmt [vim org/apache/doris/analysis/DropEncryptKeyStmt.java +28]
├── DropFileStmt [vim org/apache/doris/analysis/DropFileStmt.java +34]
├── DropFunctionStmt [vim org/apache/doris/analysis/DropFunctionStmt.java +27]
├── DropMaterializedViewStmt [vim org/apache/doris/analysis/DropMaterializedViewStmt.java +29]
├── DropRepositoryStmt [vim org/apache/doris/analysis/DropRepositoryStmt.java +27]
├── DropResourceStmt [vim org/apache/doris/analysis/DropResourceStmt.java +27]
├── DropRoleStmt [vim org/apache/doris/analysis/DropRoleStmt.java +28]
├── DropSqlBlockRuleStmt [vim org/apache/doris/analysis/DropSqlBlockRuleStmt.java +30]
├── DropTableStmt [vim org/apache/doris/analysis/DropTableStmt.java +28]
├── DropUserStmt [vim org/apache/doris/analysis/DropUserStmt.java +27]
├── EnterStmt [vim org/apache/doris/analysis/EnterStmt.java +25]
├── GrantStmt [vim org/apache/doris/analysis/GrantStmt.java +39]
├── InsertStmt [vim org/apache/doris/analysis/InsertStmt.java +66]
├── InstallPluginStmt [vim org/apache/doris/analysis/InstallPluginStmt.java +31]
├── LinkDbStmt [vim org/apache/doris/analysis/LinkDbStmt.java +31]
├── LoadStmt [vim org/apache/doris/analysis/LoadStmt.java +45]
├── MigrateDbStmt [vim org/apache/doris/analysis/MigrateDbStmt.java +29]
├── PauseRoutineLoadStmt [vim org/apache/doris/analysis/PauseRoutineLoadStmt.java +26]
├── PauseSyncJobStmt [vim org/apache/doris/analysis/PauseSyncJobStmt.java +22]
├── RecoverDbStmt [vim org/apache/doris/analysis/RecoverDbStmt.java +33]
├── RecoverPartitionStmt [vim org/apache/doris/analysis/RecoverPartitionStmt.java +32]
├── RecoverTableStmt [vim org/apache/doris/analysis/RecoverTableStmt.java +32]
├── ResumeRoutineLoadStmt [vim org/apache/doris/analysis/ResumeRoutineLoadStmt.java +26]
├── ResumeSyncJobStmt [vim org/apache/doris/analysis/ResumeSyncJobStmt.java +22]
├── RevokeStmt [vim org/apache/doris/analysis/RevokeStmt.java +32]
├── SetUserPropertyStmt [vim org/apache/doris/analysis/SetUserPropertyStmt.java +31]
├── StopRoutineLoadStmt [vim org/apache/doris/analysis/StopRoutineLoadStmt.java +23]
├── StopSyncJobStmt [vim org/apache/doris/analysis/StopSyncJobStmt.java +22]
├── SyncStmt [vim org/apache/doris/analysis/SyncStmt.java +22]
├── TruncateTableStmt [vim org/apache/doris/analysis/TruncateTableStmt.java +27]
├── UninstallPluginStmt [vim org/apache/doris/analysis/UninstallPluginStmt.java +28]
├── UpdateStmt [vim org/apache/doris/analysis/UpdateStmt.java +35]
├── AbstractBackupStmt [vim org/apache/doris/analysis/AbstractBackupStmt.java +36]
│ ├── BackupStmt [vim org/apache/doris/analysis/BackupStmt.java +29]
│ └── RestoreStmt [vim org/apache/doris/analysis/RestoreStmt.java +33]
├── BaseViewStmt [vim org/apache/doris/analysis/BaseViewStmt.java +39]
│ ├── AlterViewStmt [vim org/apache/doris/analysis/AlterViewStmt.java +31]
│ └── CreateViewStmt [vim org/apache/doris/analysis/CreateViewStmt.java +33]
└── CancelStmt [vim org/apache/doris/analysis/CancelStmt.java +19]
├── CancelAlterSystemStmt [vim org/apache/doris/analysis/CancelAlterSystemStmt.java +28]
├── CancelAlterTableStmt [vim org/apache/doris/analysis/CancelAlterTableStmt.java +31]
└── CancelBackupStmt [vim org/apache/doris/analysis/CancelBackupStmt.java +30]
org.apache.doris.qe.StmtExecutor#execute(TUniqueId)
analyze(context.getSessionVariable().toThrift()); //語義解析
if (isForwardToMaster()) {
forwardToMaster(); //轉發處理
if (masterOpExecutor != null && masterOpExecutor.getQueryId() != null) {
context.setQueryId(masterOpExecutor.getQueryId());
}
return;
} else {
LOG.debug("no need to transfer to Master. stmt: {}", context.getStmtId());
}
//命令執行
else if (parsedStmt instanceof DdlStmt) {
handleDdlStmt();
} else if (parsedStmt instanceof ShowStmt) {
handleShow();
} else if (parsedStmt instanceof KillStmt) {
handleKill();
}
語義解析
判斷含義的正確性
@Override
public void analyze(Analyzer analyzer) throws UserException {
super.analyze(analyzer);
tableName.analyze(analyzer);
checkTblPriv(ConnectContext.get(), tableName.getDb(),
tableName.getTbl(), PrivPredicate.CREATE)
analyzeEngineName();
keysDesc.analyze(columnDefs);
for (ColumnDef columnDef : columnDefs) {
columnDef.analyze(engineName.equals("olap"));
}
partitionDesc.analyze(columnDefs, properties);
distributionDesc.analyze(columnSet);
}
驗證名稱是否合法
許可權是否正確
分割槽是否合法
列型別是否合法
轉發處理
Master、Follower、Observer
只有Master有後設資料的修改能力
所有需要修改後設資料的操作,需要轉發到Master去執行
轉發型別:
FORWARD_NO_SYNC
FORWARD_WITH_SYNC
NO_FORWARD
DDL 採用 FORWARD_WITH_SYNC
命令執行
org.apache.doris.qe.DdlExecutor#execute()
//根據語句型別執行相應的函式
if (ddlStmt instanceof CreateTableStmt) {
catalog.createTable((CreateTableStmt) ddlStmt);
}
支援多種表型別, 除了olap 表, 其餘都為對映表
if (engineName.equals("olap")) {
createOlapTable(db, stmt);
return;
} else if (engineName.equals("odbc")) {
createOdbcTable(db, stmt);
return;
} else if (engineName.equals("mysql")) {
createMysqlTable(db, stmt);
return;
} else if (engineName.equals("broker")) {
createBrokerTable(db, stmt);
return;
} else if (engineName.equalsIgnoreCase("elasticsearch") || engineName.equalsIgnoreCase("es")) {
createEsTable(db, stmt);
return;
} else if (engineName.equalsIgnoreCase("hive")) {
createHiveTable(db, stmt);
return;
}
org.apache.doris.catalog.Catalog#createOlapTable
//將語法物件轉為後設資料物件
String tableName = stmt.getTableName();
LOG.debug("begin create olap table: {}", tableName);
// create columns
List<Column> baseSchema = stmt.getColumns();
validateColumns(baseSchema);
// create partition info
PartitionDesc partitionDesc = stmt.getPartitionDesc();
PartitionInfo partitionInfo = null;
//建立table物件
long tableId = Catalog.getCurrentCatalog().getNextId();
OlapTable olapTable = new OlapTable(tableId, tableName, baseSchema, keysType, partitionInfo,
distributionInfo, indexes);
// 建立Partition 物件
if (partitionInfo.getType() == PartitionType.UNPARTITIONED) {
// this is a 1-level partitioned table
// use table name as partition name
String partitionName = tableName;
long partitionId = partitionNameToId.get(partitionName);
// create partition
Partition partition = createPartitionWithIndices()
olapTable.addPartition(partition);
}
//新增後設資料並進行持久化
Pair<Boolean, Boolean> result = db.createTableWithLock(olapTable, false, stmt.isSetIfNotExists());
org.apache.doris.catalog.Catalog#createPartitionWithIndices
分割槽是表的實體
// create base index first.
Preconditions.checkArgument(baseIndexId != -1);
MaterializedIndex baseIndex = new MaterializedIndex(baseIndexId, IndexState.NORMAL);
// create partition with base index
Partition partition = new Partition(partitionId, partitionName, baseIndex, distributionInfo);
// add to index map
Map<Long, MaterializedIndex> indexMap = new HashMap<>();
indexMap.put(baseIndexId, baseIndex);
// create rollup index if has
for (long indexId : indexIdToMeta.keySet()) {
if (indexId == baseIndexId) {
continue;
}
MaterializedIndex rollup = new MaterializedIndex(indexId, IndexState.NORMAL);
indexMap.put(indexId, rollup);
}
for (Map.Entry<Long, MaterializedIndex> entry : indexMap.entrySet()) {
// create tablets
int schemaHash = indexMeta.getSchemaHash();
TabletMeta tabletMeta = new TabletMeta(dbId, tableId, partitionId, indexId, schemaHash, storageMedium);
createTablets(clusterName, index, ReplicaState.NORMAL, distributionInfo, version, versionHash,
replicaAlloc, tabletMeta, tabletIdSet);
// add create replica task for olap
short shortKeyColumnCount = indexMeta.getShortKeyColumnCount();
TStorageType storageType = indexMeta.getStorageType();
List<Column> schema = indexMeta.getSchema();
KeysType keysType = indexMeta.getKeysType();
int totalTaskNum = index.getTablets().size() * totalReplicaNum;
MarkedCountDownLatch<Long, Long> countDownLatch = new MarkedCountDownLatch<Long, Long>(totalTaskNum);
AgentBatchTask batchTask = new AgentBatchTask();
for (Tablet tablet : index.getTablets()) {
long tabletId = tablet.getId();
for (Replica replica : tablet.getReplicas()) {
long backendId = replica.getBackendId();
countDownLatch.addMark(backendId, tabletId);
CreateReplicaTask task = new CreateReplicaTask(backendId, dbId, tableId,
partitionId, indexId, tabletId,
shortKeyColumnCount, schemaHash,
version, versionHash,
keysType,
storageType, storageMedium,
schema, bfColumns, bfFpp,
countDownLatch,
indexes,
isInMemory,
tabletType);
task.setStorageFormat(storageFormat);
batchTask.addTask(task);
// add to AgentTaskQueue for handling finish report.
// not for resending task
AgentTaskQueue.addTask(task);
}
}
AgentTaskExecutor.submit(batchTask);
}
整體流程:
- 建立Partition 物件
- 建立MaterializedIndex物件
- 對於每個MaterializedIndex物件 建立建立Tablet
- 建立replica並下發任務到BE
// estimate timeout
long timeout = Config.tablet_create_timeout_second * 1000L * totalTaskNum;
timeout = Math.min(timeout, Config.max_create_table_timeout_second * 1000);
try {
ok = countDownLatch.await(timeout, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
LOG.warn("InterruptedException: ", e);
ok = false;
}
等待BE執行任務完成
org.apache.doris.catalog.Database#createTableWithLock
idToTable.put(table.getId(), table);
nameToTable.put(table.getName(), table);
lowerCaseToTableName.put(tableName.toLowerCase(), tableName);
if (!isReplay) {
// Write edit log
CreateTableInfo info = new CreateTableInfo(fullQualifiedName, table);
Catalog.getCurrentCatalog().getEditLog().logCreateTable(info);
}
if (table.getType() == TableType.ELASTICSEARCH) {
Catalog.getCurrentCatalog().getEsRepository().registerTable((EsTable) table);
}
將table新增到DataBase物件裡
判斷是否replay
寫入後設資料日誌
流程總結:
FE與BE互動
FE 傳送任務
BE執行
BE彙報執行結果
FE彙總結果
AgentBatchTask batchTask = new AgentBatchTask();
for (Tablet tablet : index.getTablets()) {
long tabletId = tablet.getId();
for (Replica replica : tablet.getReplicas()) {
long backendId = replica.getBackendId();
countDownLatch.addMark(backendId, tabletId);
CreateReplicaTask task = new CreateReplicaTask(backendId, dbId, tableId,
partitionId, indexId, tabletId,
shortKeyColumnCount, schemaHash,
version, versionHash,
keysType,
storageType, storageMedium,
schema, bfColumns, bfFpp,
countDownLatch,
indexes,
isInMemory,
tabletType);
task.setStorageFormat(storageFormat);
batchTask.addTask(task);
// add to AgentTaskQueue for handling finish report.
// not for resending task
AgentTaskQueue.addTask(task);
}
}
AgentTaskExecutor.submit(batchTask);
AgentBatchTask:
收集Task並按照Be分組
AgentTaskExecutor:
傳送AgentBatchTask
AgentTaskQueue:
處理任務完成的上報
BE任務接收
be/src/agent/agent_server.cpp
接收Task
// resend request when something is wrong(BE may need some logic to guarantee idempotence.
void AgentServer::submit_tasks(TAgentResult& agent_result,
const std::vector<TAgentTaskRequest>& tasks) {
Status ret_st;
// TODO check master_info here if it is the same with that of heartbeat rpc
if (_master_info.network_address.hostname == "" || _master_info.network_address.port == 0) {
Status ret_st = Status::Cancelled("Have not get FE Master heartbeat yet");
ret_st.to_thrift(&agent_result.status);
return;
}
for (auto task : tasks) {
VLOG_RPC << "submit one task: " << apache::thrift::ThriftDebugString(task).c_str();
TTaskType::type task_type = task.task_type;
int64_t signature = task.signature;
#define HANDLE_TYPE(t_task_type, work_pool, req_member) \
case t_task_type: \
if (task.__isset.req_member) { \
work_pool->submit_task(task); \
} else { \
ret_st = Status::InvalidArgument(strings::Substitute( \
"task(signature=$0) has wrong request member", signature)); \
} \
break;
...
ret_st.to_thrift(&agent_result.status);
}
工作執行緒
while (_is_work) {
TAgentTaskRequest agent_task_req;
TCreateTabletReq create_tablet_req;
{
lock_guard<Mutex> worker_thread_lock(_worker_thread_lock);
while (_is_work && _tasks.empty()) {
_worker_thread_condition_variable.wait();
}
if (!_is_work) {
return;
}
//從佇列中取出任務
agent_task_req = _tasks.front();
create_tablet_req = agent_task_req.create_tablet_req;
_tasks.pop_front();
//執行
OLAPStatus create_status = _env->storage_engine()->create_tablet(create_tablet_req);
TFinishTaskRequest finish_task_request;
finish_task_request.__set_finish_tablet_infos(finish_tablet_infos);
finish_task_request.__set_backend(_backend);
finish_task_request.__set_report_version(_s_report_version);
finish_task_request.__set_task_type(agent_task_req.task_type);
finish_task_request.__set_signature(agent_task_req.signature);
finish_task_request.__set_task_status(task_status);
//彙報結果
_finish_task(finish_task_request);
}
處理任務彙報
org.apache.doris.service.FrontendServiceImpl#finishTask
org.apache.doris.master.MasterImpl#finishTask
FE、BE通過Thrift協議通訊
錯誤處理
org.apache.doris.task.AgentTaskQueue 儲存正在執行的Task
org.apache.doris.master.ReportHandler#handleReport
org.apache.doris.master.ReportHandler#taskReport
BE: Report tasks/olap tablet/disk state to the master server
FE master 處理任務,超時會進行重試
private static void taskReport(long backendId, Map<TTaskType, Set<Long>> runningTasks) {
...
// to escape sending duplicate agent task to be
if (task.shouldResend(taskReportTime)) {
batchTask.addTask(task);
}
...
}
後設資料持久化
Edit類似WAL
BDBJE 分散式KV儲存
後設資料持久化:org.apache.doris.catalog.Database#createTableWithLock
public Pair<Boolean, Boolean> createTableWithLock(Table table, boolean isReplay, boolean setIfNotExist) {
...
//更新記憶體
nameToTable.put(table.getName(), table);
// Write edit log
//構建後設資料日誌
CreateTableInfo info = new CreateTableInfo(fullQualifiedName, table);
//寫入後設資料日誌
Catalog.getCurrentCatalog().getEditLog().logCreateTable(info);
...
}
後設資料回放
後設資料回放發生在FE leader 給 其他FE節點同步的時候
逐一回放後設資料
在記憶體中復原後設資料
org.apache.doris.catalog.Catalog#replayCreateTable
public void replayCreateTable(String dbName, Table table) {
Database db = this.fullNameToDb.get(dbName);
db.createTableWithLock(table, true, false);
...
}
如何實現一個新的語句
fe/fe-core/src/main/cup/sql_parser.cup 語法檔案
KW_CREATE opt_external:isExternal KW_TABLE opt_if_not_exists:ifNotExists table_name:name
LPAREN column_definition_list:columns COMMA index_definition_list:indexes RPAREN opt_engine:engineName
opt_keys:keys
opt_comment:tableComment
opt_partition:partition
opt_distribution:distribution
opt_rollup:index
opt_properties:tblProperties
opt_ext_properties:extProperties
{:
RESULT = new CreateTableStmt(ifNotExists, isExternal, name, columns, indexes, engineName, keys, partition,
distribution, tblProperties, extProperties, tableComment, index);
:}
fe/fe-core/src/main/jflex/sql_scanner.flex 詞法檔案
keywordMap.put("create", new Integer(SqlParserSymbols.KW_CREATE));
keywordMap.put("cross", new Integer(SqlParserSymbols.KW_CROSS));
keywordMap.put("cube", new Integer(SqlParserSymbols.KW_CUBE));
keywordMap.put("current", new Integer(SqlParserSymbols.KW_CURRENT));
keywordMap.put("current_user", new Integer(SqlParserSymbols.KW_CURRENT_USER));
keywordMap.put("data", new Integer(SqlParserSymbols.KW_DATA));
keywordMap.put("database", new Integer(SqlParserSymbols.KW_DATABASE));
詞法語法的程式碼生成:
cd fe/ && mvn clean install –DskipTests
• SqlScanner.java
• SqlParser.java
• SqlParserSymbols.java
實現新語句步驟總結:
- 定義詞法語法檔案
- 實現對應的語句類,比如CreateTableStmt
- 實現後設資料修改的方法,如Catalog.createTable()
- 定義對應操作的後設資料日誌類,如CreateTableInfo
- 實現後設資料日誌的寫入
- 實現對應的replay方法,如Catalog.replayCreateTable()