Paimon Deletion Vector

Aitozi發表於2024-11-18

原文網址 : https://www.cnblogs.com/Aitozi/p/18551580

deletion vector 是透過一組向量, 維護一個檔案中被刪除的行, 可以理解為一種索引. 這種方式可以以 Merge On Write 的方式, 來避免 Merge On Read 的過程, 從而以寫入效能換取讀取效能. 對於寫少讀多, 或者對讀取效能有更高要求的場景會比較適合.

避免 Merge On Read 會帶來以下幾個好處:

讀取時僅需要直接讀取檔案, 並根據 bitmap 跳過對應的刪除行即可, 跳過 Merge 過程, 直觀的提升讀取效率
和 Native 引擎結合更好, 可以透過 C++ reader, 直接讀取檔案並進行向量過濾, 避免了透過 jni 的 merge 過程
由於不再需要 merge, 因此 split 切分可以更細, 從而增多 split, 提升讀取的併發度
可以進行 value 欄位的 filter push down, 從而實現更好的過濾效果.
- 因為在 MOR 的情況下, 由於存在多個 Key value 需要合併的情況, 因此在合併之前不能進行 value filter 下推.

寫入/compaction 過程

LookupChangelogMergeFunctionWrapper

deletion vector 的建立依賴於 compaction. 在 compaction 過程中透過 lookup 查詢高層檔案, 並進行檔案刪除行的標記.

開啟 delete vector 後, 會強制進行 Lookup, 從而使用 ForceUpLevel0Compaction 的 Compaction 策略.

org.apache.paimon.operation.KeyValueFileStoreWrite#createRewriter

// 對於needLookup的場景, 採用ForceUpLevel0 compaction的機制
CompactStrategy compactStrategy =  
        options.needLookup()  
                ? new ForceUpLevel0Compaction(universalCompaction)  
                : universalCompaction;

processor =  
        lookupStrategy.deletionVector  
                ? new PositionedKeyValueProcessor(  
                        valueType,  
                        lookupStrategy.produceChangelog  
                                || mergeEngine != DEDUPLICATE
                                || !options.sequenceField().isEmpty())  
                : new KeyValueProcessor(valueType);

這裡的 processor 對應於 lookup 過程中如何處理 value 欄位, 首先對於 deletion vector 場景, 我們查詢到一個 key 時, 需要知道對應的"行號" position, 因此需要 PositionedKeyValueProcessor 即記錄對應 KV pair 的行號.

其次對於以下三種場景, 還要求 lookup 的過程中, 讀取完整的 value

lookupStrategy.produceChangelog 由於要產生 Changelog, 所以需要知道前值, 因此需要完整的 value 讀取
mergeEngine != DEDUPLICATE
!options.sequenceField().isEmpty() 和上面的一樣, 這幾類場景都是基於 L0 key 查詢到高層值的時候, 不能簡單的將高層標記為 delete, 而是需要執行一次 Merge 過程, 例如 Partial-Update, 或者根據 sequence field 比較後才 deduplicate, 所以這幾類也需要讀取完整value.

如果不是這幾類, 比如不帶排序欄位的 deduplicate, 那麼在 lookup 的過程中, 只需要讀取對應的 key 即可, 那麼就可以大大降低 lookup 的 IO 開銷.

DeletionVectorsMaintainer

Lookup 過程中, 對於查詢到高層的 key, 可以對高層資料標記刪除. DeletionVectorsMaintainer 中維護了檔案到 DeletionVector 的對映, DeletionVector 的實現通常是一個 RoaringBitmap.

if (lookupResult != null) {  
    if (lookupStrategy.deletionVector) {  
        PositionedKeyValue positionedKeyValue = (PositionedKeyValue) lookupResult;  
        highLevel = positionedKeyValue.keyValue();  
        deletionVectorsMaintainer.notifyNewDeletion(  
                positionedKeyValue.fileName(), positionedKeyValue.rowPosition());  
    } else {  
        highLevel = (KeyValue) lookupResult;  
    }  
}

按照 pip-16 中的描述, 每個 bucket 會維護一個 delete vector 檔案 , 這個檔案中維護了所有有刪除 key 的檔案和對應的 bitmap.
Pasted image 20241103220849

同步生成: 在 Compaction 完成後, 將記憶體中維護的 Map<String, DeletionVector> 資料結構寫入對應的 index 檔案. 這塊其實會在記憶體裡面維護每個有刪除行為的檔案的 deletion vector. 啟動階段也會從後設資料中讀取恢復. 當檔案比較多的時候, 這塊的記憶體開銷可能也不容忽視. 而且由於 Map 維護, 只要有一個檔案更新, 整個 index 檔案也是要被重寫的.
非同步生成: delete vector 生成也支援非同步化, 這樣就可以不阻塞主鏈路的寫入流程.

查詢/讀取

RawSplitRead

KeyValueTableRead

this.readProviders =
		Arrays.asList(
				new RawFileSplitReadProvider(batchRawReadSupplier, this::assignValues),
				new MergeFileSplitReadProvider(mergeReadSupplier, this::assignValues),
				new IncrementalChangelogReadProvider(mergeReadSupplier, this::assignValues),
				new IncrementalDiffReadProvider(mergeReadSupplier, this::assignValues));

對於 KeyValueTableRead, 會建立一堆的 SplitReadProvider, 哪個 match 就走哪個讀取.

public boolean match(DataSplit split, boolean forceKeepDelete) {
	boolean matched = !forceKeepDelete && !split.isStreaming() && split.rawConvertible();
	if (matched) {
		// for legacy version, we are not sure if there are delete rows, but in order to be
		// compatible with the query acceleration of the OLAP engine, we have generated raw
		// files.
		// Here, for the sake of correctness, we still need to perform drop delete filtering.
		for (DataFileMeta file : split.dataFiles()) {
			if (!file.deleteRowCount().isPresent()) {
				return false;
			}
		}
	}
	return matched;
}

對於 dv 表, 他的 split 是 rawConvertible 的, 即表示對應的 reader 可以轉化為 raw reader.

ApplyDeletionFileRecordIterator

public InternalRow next() throws IOException {
	while (true) {
		InternalRow next = iterator.next();
		if (next == null) {
			return null;
		}
		if (!deletionVector.isDeleted(returnedPosition())) {
			return next;
		}
	}
}

真正的讀取過程, 就是根據提前載入的 delete vector 根據行號進行過濾.

還有一些其他關於讀取的改動, 主要是 filter 下推相關的. 因為當檔案可以 raw read, 不需要合併後, 非主鍵欄位也就可以安全下推了.

例如: 開啟 dv 的表, 可以應用其他的 value filter, 因此也就可以使用索引機制了.

Append Table DV support

除此之外, Paimon 還利用 deletion vector 實現了對 Append 表的刪除

append 表的刪除可以類比 iceberg 的實現, 根據輸入資料, 構建刪除的 deletion vector, 從而實現 append 表的刪除邏輯.

  if (deletionVectorsEnabled) {
	// Step2: collect all the deletion vectors that marks the deleted rows.
	val deletionVectors = collectDeletionVectors(
	  candidateDataSplits,
	  dataFilePathToMeta,
	  condition,
	  relation,
	  sparkSession)

	deletionVectors.cache()
	try {
	  // Step3: write these updated data
	  val touchedDataSplits = deletionVectors.collect().map {
		SparkDeletionVectors.toDataSplit(_, root, pathFactory, dataFilePathToMeta)
	  }
	  val addCommitMessage = writeOnlyUpdatedData(sparkSession, touchedDataSplits)

	  // Step4: write these deletion vectors.
	  val indexCommitMsg = writer.persistDeletionVectors(deletionVectors)

	  addCommitMessage ++ indexCommitMsg
	} finally {
	  deletionVectors.unpersist()
	}
  } else {

透過 filter 過濾, 先拿到 update 或 delete 語句可能影響的 split
構建 Reader 讀取, 讀取的 plan 額外新增 Metadata column, __paimon_file_path 和 __paimon_row_index , 這兩個是上面 deletion vector 構建的依賴元資訊
根據 update 輸入構建 deletion vector (indexCommitMsg), 根據 update 輸出構建addCommitMsg

Paimon筆記
2024-06-16
AI筆記
理解 Paimon changelog producer
2023-12-17
AI
Paimon lookup store 實現
2024-10-29
AI
CodeForces - 1430D String Deletion （思維）
2020-10-13
apache flink + Paimon 快速搭建指南
2024-08-31
ApacheAI
vector
2024-08-28
讀Paimon原始碼聊設計：引子
2024-02-26
AI原始碼
Paimon 跟 Spark 是否也能玩得來
2023-12-29
AISpark
Apache Paimon流式湖倉學習交流群成立
2023-12-01
ApacheAI
Apache Paimon 在同程旅行的探索實踐
2023-03-31
ApacheAI
Vector和Stack
2018-09-05
vector——C++
2020-10-22
C++
c++ vector
2024-07-06
C++
Vector擴容
2024-05-31
STL容器---Vector
2021-12-16
vector 使用上
2020-12-23
Support Vector Machines
2020-12-11
Mac
row_vector and col_vector的建立 (Leetcode 807, Leetcode 531)
2018-06-13
LeetCode
STL使用篇__vector
2019-03-03
簡化版vector
2018-12-01
Android中的Vector
2019-01-22
Android
Vector 原始碼分析
2018-06-18
原始碼
C++ STL -- vector
2024-04-19
C++
C++ Vector fundamental
2024-04-23
C++
vector::shrink_to_fit()
2019-09-01
C++：vector assign
2020-09-29
C++
STL---vector（向量）
2020-04-04
python-Vector向量
2024-09-16
Python
C++（std::vector）
2024-09-07
C++
vector的基本用法
2020-12-28
幸福裡基於 Flink & Paimon 的流式數倉實踐
2023-09-20
AI
STL:vector用法總結
2018-10-26
java arrayList vector 區別
2018-10-25
Java
C++的vector容器
2019-03-06
C++
vector的使用注意點
2024-04-01
Vector + ClickHouse 收集日誌
2024-03-15
vector 二維陣列
2020-11-22
陣列
初探STL容器之Vector
2019-05-10