leveldb插入資料時，必然做的操作是先寫logfile，再將資料放到cache裡
不過在此之前，會先進行一下預處理
1 將要寫的資料封裝到到writer裡，將write加入寫佇列，等待輪到它寫。
2 檢查cache是否已滿，是否需要“做檢查點”
3 leveldb的cache有兩個狀態，當前狀態和只讀狀態。
當cache寫滿，需要寫檔案時，會將cache轉成只讀狀態，進行寫檔案和檔案壓縮操作。
所以每次寫檔案前都要先等待之前的只讀cache完成自己的使命。
4 由於新資料需要些到level 0檔案，而level 0檔案的個數是有限制的
當達到soft limit時，需要sleep1毫秒，將cpu資源讓給正在進行中的壓縮操作。
當達到hard limits時，直接進入等待。
5 維護cache狀態，維護file number，建立新檔案。。。
6 嘗試進行一次檔案壓縮。

對外的put函式

點選(此處)摺疊或開啟

Status DBImpl::Put(const WriteOptions& o, const Slice& key, const Slice& val) {
return DB::Put(o, key, val);
}
Status DB::Put(const WriteOptions& opt, const Slice& key, const Slice& value) {
WriteBatch batch;
batch.Put(key, value);
return Write(opt, &batch);
}

真正的功能入口函式

點選(此處)摺疊或開啟

Status DBImpl::Write(const WriteOptions& options, WriteBatch* my_batch) {
// 將WriteBatch封裝到一個Writer裡，設定一些選項。
Writer w(&mutex_);
w.batch = my_batch;
w.sync = options.sync;
w.done = false;
MutexLock l(&mutex_);
/*
把Writer放到到寫佇列裡，等待writer升到 writers_.front 就能開始了。
*/
writers_.push_back(&w);
while (!w.done && &w != writers_.front()) {
w.cv.Wait();
}
if (w.done) {
return w.status;
}
// May temporarily unlock and wait.
/*
MakeRoomForWrite會對寫入操作所需的條件進行一系列判斷，
如：
level 0 檔案數是否超過限制，
cache是否還有空間，是否真的需要寫檔案，
imm cache是否能夠釋放
是否需要做壓縮
條件都滿足，確定需要寫檔案時，進行:
生成新的檔案號，新的logfile，將當前cache轉成imm cache等操作
引數為my_batch == NULL的意思是：如果my_batch為空，則視為想讓MakeRoomForWrite嘗試做一次壓縮。
*/
Status status = MakeRoomForWrite(my_batch == NULL);
// sequence是指寫batch的次數
uint64_t last_sequence = versions_->LastSequence();
Writer* last_writer = &w;
/*
my_batch是有可能為空的，可以利用空batch手動讓MakeRoomForWrite進行壓縮操作。
*/
if (status.ok() && my_batch != NULL) { // NULL batch is for compactions
/*
WriteBatch裡有一個字串rep_，存放轉碼成儲存格式後的資料。
BuildBatchGroup的工作是從writers_裡找其它的WriteBatch，他們的rep_拼到一個WriteBatch裡
但是最終的rep_長度不能超過 1 << 20
*/
WriteBatch* updates = BuildBatchGroup(&last_writer);
// WriteBatchInternal是一個由靜態函式組成的工具類
WriteBatchInternal::SetSequence(updates, last_sequence + 1);
last_sequence += WriteBatchInternal::Count(updates);
// Add to log and apply to memtable. We can release the lock
// during this phase since &w is currently responsible for logging
// and protects against concurrent loggers and concurrent writes
// into mem_.
{
mutex_.Unlock();
// 寫logfile
status = log_->AddRecord(WriteBatchInternal::Contents(updates));
bool sync_error = false;
if (status.ok() && options.sync) {
status = logfile_->Sync();
if (!status.ok()) {
sync_error = true;
}
}
if (status.ok()) {
// 將資料放入cache
status = WriteBatchInternal::InsertInto(updates, mem_);
}
mutex_.Lock();
if (sync_error) {
// The state of the log file is indeterminate: the log record we
// just added may or may not show up when the DB is re-opened.
// So we force the DB into a mode where all future writes fail.
RecordBackgroundError(status);
}
}
if (updates == tmp_batch_) tmp_batch_->Clear();
// 更新sequence
versions_->SetLastSequence(last_sequence);
}
while (true) {
Writer* ready = writers_.front();
writers_.pop_front();
if (ready != &w) {
ready->status = status;
ready->done = true;
ready->cv.Signal();
}
if (ready == last_writer) break;
}
// Notify new head of write queue
if (!writers_.empty()) {
writers_.front()->cv.Signal();
}
return status;
}

之所以需要MakeRoom是因為新資料需要寫入level 0 資料檔案，但是level 0檔案數量有限制。
可能需要做壓縮來減少level 0 檔案的數量。
同時當前cache也需要轉成imm cache，需要判斷之前的imm cache是否還佔著位置。

點選(此處)摺疊或開啟

Status DBImpl::MakeRoomForWrite(bool force) {
mutex_.AssertHeld();
assert(!writers_.empty());
// 決定是否允許透過sleep來給壓縮操作讓出系統資源。
bool allow_delay = !force;
Status s;
while (true) {
if (!bg_error_.ok()) {
// Yield previous error
s = bg_error_;
break;
} else if (
/*
當允許delay，並且level 0的檔案數已經超過了8個，就要sleep 1毫秒，給複製壓縮的執行緒工作讓出CPU資源。
sleep一次後就將allow_delay設成false，這次寫入操作就不需要再sleep了。
*/
allow_delay &&
versions_->NumLevelFiles(0) >= config::kL0_SlowdownWritesTrigger) {
// We are getting close to hitting a hard limit on the number of
// L0 files. Rather than delaying a single write by several
// seconds when we hit the hard limit, start delaying each
// individual write by 1ms to reduce latency variance. Also,
// this delay hands over some CPU to the compaction thread in
// case it is sharing the same core as the writer.
mutex_.Unlock();
env_->SleepForMicroseconds(1000);
allow_delay = false; // Do not delay a single write more than once
mutex_.Lock();
} else if (!force &&
/*
當cache不滿時，先不寫檔案。
*/
(mem_->ApproximateMemoryUsage() <= options_.write_buffer_size)) {
// There is room in current memtable
break;
} else if (imm_ != NULL) {
/*
leveldb有兩種cache，一個是當前cache，就是目前正在寫新資料的cache。
當cache滿了，需要寫檔案時，就將當前cache轉成immunity cache，是一個只讀cache，由指標imm_管理。
imm cache 使用者查詢操作和壓縮操作。
如果imm cache存在，就要等它的對應的檔案壓縮完成才能將當前cache轉成imm cache。
*/
// We have filled up the current memtable, but the previous
// one is still being compacted, so we wait.
Log(options_.info_log, "Current memtable full; waiting...\n");
bg_cv_.Wait();
} else if (versions_->NumLevelFiles(0) >= config::kL0_StopWritesTrigger) {
// 達到了level 0 檔案數的硬指標限制，不能再寫新的了。
// There are too many level-0 files.
Log(options_.info_log, "Too many L0 files; waiting...\n");
bg_cv_.Wait();
} else {
// 檢查條件結束，開始正式工作
// Attempt to switch to a new memtable and trigger compaction of old
assert(versions_->PrevLogNumber() == 0);
// 生成新的logfile number
uint64_t new_log_number = versions_->NewFileNumber();
WritableFile* lfile = NULL;
// 建立新檔案
s = env_->NewWritableFile(LogFileName(dbname_, new_log_number), &lfile);
if (!s.ok()) {
// Avoid chewing through file number space in a tight loop.
versions_->ReuseFileNumber(new_log_number);
break;
}
delete log_;
delete logfile_;
// 將Logfile指向新檔案，設定新log number
logfile_ = lfile;
logfile_number_ = new_log_number;
log_ = new log::Writer(lfile);
// 將當前cache切換成imm cache，建立新的當前cache
imm_ = mem_;
has_imm_.Release_Store(imm_);
mem_ = new MemTable(internal_comparator_);
mem_->Ref();
force = false; // Do not force another compaction if have room
/*
如果需要，進行一次壓縮
這裡面進行了一下判斷，調了回撥函式
最終真正的功能入口是DBImpl::BackgroundCompaction()
*/
MaybeScheduleCompaction();
}
}
return s;
}

leveldb程式碼精讀插入操作

相關文章

leveldb程式碼精讀 插入操作

相關文章

leveldb程式碼精讀插入操作