Folly解讀（零） Fbstring—— 一個完美替代std::string的庫

張雅宸發表於2021-02-01

原文網址 : https://www.cnblogs.com/zhangyachen/p/14356815.html

在引入fbstring之前，我們首先再回顧一下 string 常見的三種實現方式。

string 常見的三種實現方式

string 中比較重要的 3 個欄位：

char *data. 指向存放字串的首地址（在 SSO 的某些實現方案中可能沒有此欄位）。
size_t size. 字串長度。
size_t capacity. 字串容量。capacity >= size. 在字串相加、reserve 等場景下會用到此欄位。

eager copy

這個是最簡單、最好理解的一種，在每次拷貝時將原 string 對應的記憶體以及所持有的動態資源完整地複製一份，即沒有任何特殊處理。

優點：

實現簡單。
每個物件互相獨立，不用考慮那麼多亂七八糟的場景。

缺點：

字串較大時，拷貝浪費空間。

COW

這個也算是計算機裡的基本思想了。不同於 eager copy 的每次拷貝都會複製，此種實現方式為寫時複製，即 copy-on-write。只有在某個 string 要對共享物件進行修改時，才會真正執行拷貝。

由於存在共享機制，所以需要一個std::atomic<size_t>，代表被多少物件共享。

寫時複製：

優點：

字串空間較大時，減少了分配、複製字串的時間。

缺點：

refcount 需要原子操作，效能有損耗。
某些情況下會帶來意外的開銷。比如非 const 成員使用[]，這會觸發 COW，因為無法知曉應用程式是否會對返回的字元做修改。典型的如Legality of COW std::string implementation in C++11中舉的例子：

std::string s("str");
const char* p = s.data();
{
    std::string s2(s);
    (void) s[0];         // 觸發COW
}
std::cout << *p << '\n';      // p指向的原有空間已經無效

其他更細節的缺點可以參考：std::string 的 Copy-on-Write：不如想象中美好

SSO

Small String Optimization. 基於字串大多數比較短的特點，利用 string 物件本身的棧空間來儲存短字串。而當字串長度大於某個臨界值時，則使用 eager copy 的方式。

SSO 下，string 的資料結構會稍微複雜點，使用 union 來區分短字串和長字串的場景：

class string {
  char *start;
  size_t size;
  static const int kLocalSize = 15;
  union{
    char buffer[kLocalSize+1];      // 滿足條件時，用來存放短字串
    size_t capacity;
  }data;
};

短字串，SSO：

長字串，eager copy：

這種資料結構的實現下，SSO 的閾值一般是 15 位元組。folly 的 fbstring 在 SSO 場景下，資料結構做了一些優化，可以儲存 23 個位元組，後面會提到。

優點：

短字串時，無動態記憶體分配。

缺點：

string 物件佔用空間比 eager copy 和 cow 要大。

Fbstring 介紹

fbstring 可以 100%相容 std::string。配合三種儲存策略和jemalloc，可以顯著的提高 string 的效能。

fbstring 支援 32-bit、64-bit、little-endian、big-endian.

Storage strategies

Small Strings (<= 23 chars) ，使用 SSO.
Medium strings (24 - 255 chars)，使用 eager copy.
Large strings (> 255 chars)，使用 COW.

Implementation highlights

與 std::string 100%相容。
COW 儲存時對於引用計數執行緒安全。
對 Jemalloc 友好。如果檢測到使用 jemalloc，那麼將使用 jemalloc 的一些非標準擴充套件介面來提高效能。
find()使用簡化版的Boyer-Moore algorithm。在查詢成功的情況下，相對於string::find()有 30 倍的效能提升。在查詢失敗的情況下也有 1.5 倍的效能提升。
可以與 std::string 互相轉換。

Benchmark

在FBStringBenchmark.cpp中。

主要類

::folly::fbstring str("abc")中的 fbstring 為 basic_fbstring的別名：typedef basic_fbstring<char> fbstring;
basicfbstring 在 fbstring_core 提供的介面之上，實現了 std::string 定義的所有介面。裡面有一個私有變數 store，預設值即為 fbstring_core。basic_fbstring 的定義如下，比 std::basic_string 只多了一個預設的模板引數 Storage：

template <
    typename E,
    class T = std::char_traits<E>,
    class A = std::allocator<E>,
    class Storage = fbstring_core<E>>
class basic_fbstring;

fbstring_core 負責字串的儲存及字串相關的操作，例如 init、copy、reserve、shrink 等等。

字串儲存資料結構

最重要的 3 個資料結構 union{Char small, MediumLarge ml}、MediumLarge、RefCounted，定義在 fbstring_core 中，基本上所有的字串操作都離不開這三個資料結構。


struct RefCounted {
    std::atomic<size_t> refCount_;
    Char data_[1];

    static RefCounted * create(size_t * size);       // 建立一個RefCounted
    static RefCounted * create(const Char * data, size_t * size);     // ditto
    static void incrementRefs(Char * p);     // 增加一個引用
    static void decrementRefs(Char * p);    // 減少一個引用

   // 其他函式定義
};

struct MediumLarge {
  Char* data_;
  size_t size_;
  size_t capacity_;

  size_t capacity() const {
    return kIsLittleEndian ? capacity_ & capacityExtractMask : capacity_ >> 2;
  }

  void setCapacity(size_t cap, Category cat) {
    capacity_ = kIsLittleEndian
        ? cap | (static_cast<size_t>(cat) << kCategoryShift)
        : (cap << 2) | static_cast<size_t>(cat);
  }
};

union {
    uint8_t bytes_[sizeof(MediumLarge)];          // For accessing the last byte.
    Char small_[sizeof(MediumLarge) / sizeof(Char)];
    MediumLarge ml_;
};

small strings（SSO）時，使用 union 中的 Char small_儲存字串，即物件本身的棧空間。
medium strings（eager copy）時，使用 union 中的MediumLarge ml_
- Char* data_ ：指向分配在堆上的字串。
- sizet size：字串長度。
- sizet capacity ：字串容量。
large strings（cow）時，使用MediumLarge ml_和 RefCounted：
- RefCounted.refCount_ ：共享字串的引用計數。
- RefCounted.data_[1] : flexible array. 存放字串。
- ml.data指向 RefCounted.data，ml.size與 ml.capacity_的含義不變。

但是這裡有一個問題是：SSO 情況下的 size 和 capacity 存在哪裡了？

capacity : 首先 SSO 的場景並不需要 capacity，因為此時利用的是棧空間，或者理解此種情況下的 capacity=maxSmallSize.
size : 利用 small_的一個位元組來儲存 size，而且具體儲存的不是 size，而是maxSmallSize - s（maxSmallSize=23，再轉成 char 型別），因為這樣可以 SSO 多儲存一個位元組，具體原因後面詳細講。

small strings :

medium strings :

large strings :

如何區分字串型別 category

字串的 small/medium/large 型別對外部透明，而且針對字串的各種操作例如 copy、shrink、reserve、賦值等等，三種型別的處理方式都不一樣，所以，我們需要在上面的資料結構中做些“手腳”，來區分不同的字串型別。

因為只有三種型別，所以只需要 2 個 bit 就能夠區分。相關的資料結構為：

typedef uint8_t category_type;

enum class Category : category_type {
    isSmall = 0,
    isMedium = kIsLittleEndian ? 0x80 : 0x2,       //  10000000 , 00000010
    isLarge = kIsLittleEndian ? 0x40 : 0x1,        //  01000000 , 00000001
};

kIsLittleEndian 為判斷當前平臺的大小端，大端和小端的儲存方式不同。

small strings

category 與 size 共同存放在 small_的最後一個位元組中（size 最大為 23，所以可以存下），考慮到大小端，所以有移位操作，這主要是為了讓 category()的判斷更簡單，後面再詳細分析。具體程式碼在 setSmallSize 中：

void setSmallSize(size_t s) {
  ......
  constexpr auto shift = kIsLittleEndian ? 0 : 2;
  small_[maxSmallSize] = char((maxSmallSize - s) << shift);
  ......
}

medium strings

可能有人注意到了，在 MediumLarge 結構體中定義了兩個方法，capacity()和setCapacity(size_t cap, Category cat)，其中 setCapacity 即同時設定 capacity 和 category :

constexpr static size_t kCategoryShift = (sizeof(size_t) - 1) * 8;

void setCapacity(size_t cap, Category cat) {
    capacity_ = kIsLittleEndian
        ? cap | (static_cast<size_t>(cat) << kCategoryShift)
        : (cap << 2) | static_cast<size_t>(cat);
}

小端時，將 category = isMedium = 0x80 向左移動(sizeof(size_t) - 1) * 8位，即移到最高位的位元組中，再與 capacity 做或運算。
大端時，將 category = isMedium = 0x2 與 cap << 2 做或運算即可，左移 2 位的目的是給 category 留空間。

舉個例子，假設 64 位機器，capacity = 100 :

large strings

同樣使用 MediumLarge 的 setCapacity，演算法相同，只是 category 的值不同。

假設 64 位機器，capacity = 1000 ：

category()

category()為最重要的函式之一，作用是獲取字串的型別：

constexpr static uint8_t categoryExtractMask = kIsLittleEndian ? 0xC0 : 0x3;    // 11000000 , 00000011
constexpr static size_t lastChar = sizeof(MediumLarge) - 1;

union {
    uint8_t bytes_[sizeof(MediumLarge)];          // For accessing the last byte.
    Char small_[sizeof(MediumLarge) / sizeof(Char)];
    MediumLarge ml_;
};

Category category() const {
  // works for both big-endian and little-endian
  return static_cast<Category>(bytes_[lastChar] & categoryExtractMask);
}

bytes_定義在 union 中，從註釋可以看出來，是為了配合 lastChar 更加方便的取該結構最後一個位元組。

配合上面三種型別字串的儲存，可以很容易理解這一行程式碼。

小端

大端

capacity()

獲取字串的 capaticy，因為 capacity 與 category 儲存都在一起，所以一起看比較好。

同樣分三種情況。

size_t capacity() const {
  switch (category()) {
    case Category::isSmall:
      return maxSmallSize;
    case Category::isLarge:
      // For large-sized strings, a multi-referenced chunk has no
      // available capacity. This is because any attempt to append
      // data would trigger a new allocation.
      if (RefCounted::refs(ml_.data_) > 1) {
        return ml_.size_;
      }
      break;
    case Category::isMedium:
    default:
      break;
  }
  return ml_.capacity();
}

small strings : 直接返回 maxSmallSize，前面有分析過。
medium strings : 返回 ml_.capacity()。
large strings :
- 當字串引用大於 1 時，直接返回 size。因為此時的 capacity 是沒有意義的，任何 append data 操作都會觸發一次 cow
- 否則，返回 ml_.capacity()。

看下 ml.capacity() :

constexpr static uint8_t categoryExtractMask = kIsLittleEndian ? 0xC0 : 0x3;
constexpr static size_t kCategoryShift = (sizeof(size_t) - 1) * 8;
constexpr static size_t capacityExtractMask = kIsLittleEndian
      ? ~(size_t(categoryExtractMask) << kCategoryShift)
      : 0x0 /* unused */;

size_t capacity() const {
      return kIsLittleEndian ? capacity_ & capacityExtractMask : capacity_ >> 2;
}

categoryExtractMask 和 kCategoryShift 之前遇到過，分別用來計算 category 和小端情況下將 category 左移 kCategoryShift 位。capacityExtractMask 的目的就是消掉 category，讓 capacity_中只有 capacity。

對著上面的每種情況下字串的儲存的圖，應該很好理解，這裡不細說了。

size()

size_t size() const {
  size_t ret = ml_.size_;
  if /* constexpr */ (kIsLittleEndian) {
    // We can save a couple instructions, because the category is
    // small iff the last char, as unsigned, is <= maxSmallSize.
    typedef typename std::make_unsigned<Char>::type UChar;
    auto maybeSmallSize = size_t(maxSmallSize) -
        size_t(static_cast<UChar>(small_[maxSmallSize]));
    // With this syntax, GCC and Clang generate a CMOV instead of a branch.
    ret = (static_cast<ssize_t>(maybeSmallSize) >= 0) ? maybeSmallSize : ret;
  } else {
    ret = (category() == Category::isSmall) ? smallSize() : ret;
  }
  return ret;
}

小端的情況下，medium strings 和 large strings 對應的 ml_的高位元組儲存的是 category（0x80、0x40），而 small strings 儲存的是 size，所以正如註釋說的，可以先判斷 kIsLittleEndian && maybeSmall，會快一些，不需要呼叫 smallSize()。而且現在絕大多數平臺都是小端。

如果是大端，那麼如果是 small，呼叫 smallSize()，否則返回 ml.size_;

size_t smallSize() const {
  assert(category() == Category::isSmall);
  constexpr auto shift = kIsLittleEndian ? 0 : 2;
  auto smallShifted = static_cast<size_t>(small_[maxSmallSize]) >> shift;
  assert(static_cast<size_t>(maxSmallSize) >= smallShifted);
  return static_cast<size_t>(maxSmallSize) - smallShifted;
}

比較簡單，不說了。

字串初始化

首先 fbstring_core 的建構函式中，根據字串的長度，呼叫 3 種不同型別的初始化函式：

fbstring_core(
    const Char* const data,
    const size_t size,
    bool disableSSO = FBSTRING_DISABLE_SSO) {
  if (!disableSSO && size <= maxSmallSize) {
    initSmall(data, size);
  } else if (size <= maxMediumSize) {
    initMedium(data, size);
  } else {
    initLarge(data, size);
  }
}

initSmall

template <class Char>
inline void fbstring_core<Char>::initSmall(
    const Char* const data,
    const size_t size) {

// If data is aligned, use fast word-wise copying. Otherwise,
// use conservative memcpy.
// The word-wise path reads bytes which are outside the range of
// the string, and makes ASan unhappy, so we disable it when
// compiling with ASan.
#ifndef FOLLY_SANITIZE_ADDRESS
  if ((reinterpret_cast<size_t>(data) & (sizeof(size_t) - 1)) == 0) {
    const size_t byteSize = size * sizeof(Char);
    constexpr size_t wordWidth = sizeof(size_t);
    switch ((byteSize + wordWidth - 1) / wordWidth) { // Number of words.
      case 3:
        ml_.capacity_ = reinterpret_cast<const size_t*>(data)[2];
        FOLLY_FALLTHROUGH;
      case 2:
        ml_.size_ = reinterpret_cast<const size_t*>(data)[1];
        FOLLY_FALLTHROUGH;
      case 1:
        ml_.data_ = *reinterpret_cast<Char**>(const_cast<Char*>(data));
        FOLLY_FALLTHROUGH;
      case 0:
        break;
    }
  } else
#endif
  {
    if (size != 0) {
      fbstring_detail::podCopy(data, data + size, small_);
    }
  }
  setSmallSize(size);
}

首先，如果傳入的字串地址是記憶體對齊的，則配合 reinterpret_cast 進行 word-wise copy，提高效率。
否則，呼叫 podCopy 進行 memcpy。
最後，通過 setSmallSize 設定 small string 的 size。

setSmallSize :

void setSmallSize(size_t s) {
  // Warning: this should work with uninitialized strings too,
  // so don't assume anything about the previous value of
  // small_[maxSmallSize].
  assert(s <= maxSmallSize);
  constexpr auto shift = kIsLittleEndian ? 0 : 2;
  small_[maxSmallSize] = char((maxSmallSize - s) << shift);
  small_[s] = '\0';
  assert(category() == Category::isSmall && size() == s);
}

之前提到過，small strings 存放的 size 不是真正的 size，是maxSmallSize - size，這樣做的原因是可以 small strings 可以多儲存一個位元組。因為假如儲存 size 的話，small中最後兩個位元組就得是\0 和 size，但是儲存maxSmallSize - size，當 size == maxSmallSize 時，small的最後一個位元組恰好也是\0。

initMedium

template <class Char>
FOLLY_NOINLINE inline void fbstring_core<Char>::initMedium(
    const Char* const data,
    const size_t size) {
  // Medium strings are allocated normally. Don't forget to
  // allocate one extra Char for the terminating null.
  auto const allocSize = goodMallocSize((1 + size) * sizeof(Char));
  ml_.data_ = static_cast<Char*>(checkedMalloc(allocSize));
  if (FOLLY_LIKELY(size > 0)) {
    fbstring_detail::podCopy(data, data + size, ml_.data_);
  }
  ml_.size_ = size;
  ml_.setCapacity(allocSize / sizeof(Char) - 1, Category::isMedium);
  ml_.data_[size] = '\0';
}

folly 會通過 canNallocx 函式檢測是否使用 jemalloc，如果是，會使用 jemalloc 來提高記憶體分配的效能。關於 jemalloc 我也不是很熟悉，感興趣的可以查查，有很多資料。

所有的動態記憶體分配都會呼叫 goodMallocSize，獲取一個對 jemalloc 友好的值。
再通過 checkedMalloc 真正申請記憶體，存放字串。
呼叫 podCopy 進行 memcpy，與 initSmall 的 podCopy 一樣。
最後再設定 size、capacity、category 和\0。

initLarge

template <class Char>
FOLLY_NOINLINE inline void fbstring_core<Char>::initLarge(
    const Char* const data,
    const size_t size) {
  // Large strings are allocated differently
  size_t effectiveCapacity = size;
  auto const newRC = RefCounted::create(data, &effectiveCapacity);
  ml_.data_ = newRC->data_;
  ml_.size_ = size;
  ml_.setCapacity(effectiveCapacity, Category::isLarge);
  ml_.data_[size] = '\0';
}

與 medium strings 最大的不同是會通過 RefCounted::create 建立 RefCounted 用於共享字串：

struct RefCounted {
    std::atomic<size_t> refCount_;
    Char data_[1];

    constexpr static size_t getDataOffset() {
      return offsetof(RefCounted, data_);
    }

    static RefCounted* create(size_t* size) {
      const size_t allocSize =
          goodMallocSize(getDataOffset() + (*size + 1) * sizeof(Char));
      auto result = static_cast<RefCounted*>(checkedMalloc(allocSize));
      result->refCount_.store(1, std::memory_order_release);
      *size = (allocSize - getDataOffset()) / sizeof(Char) - 1;
      return result;
    }

    static RefCounted* create(const Char* data, size_t* size) {
      const size_t effectiveSize = *size;
      auto result = create(size);
      if (FOLLY_LIKELY(effectiveSize > 0)) {
        fbstring_detail::podCopy(data, data + effectiveSize, result->data_);
      }
      return result;
    }
  };

需要注意的是：

ml.data指向的是 RefCounted.data_.
getDataOffset()用 offsetof 函式獲取 data*在 RefCounted 結構體內的偏移，Char data*[1]為 flexible array，存放字串。
注意對std::atomic<size_t> refCount_進行原子操作的 c++ memory model :
- store，設定引用數為 1 : std::memory_order_release
- load，獲取當前共享字串的引用數: std::memory_order_acquire
- add/sub。增加/減少一個引用 : std::memory_order_acq_rel

c++ memory model 是另外一個比較大的話題，可以參考：

特殊的建構函式 —— 不拷貝使用者傳入的字串

上面的三種構造，都是將應用程式傳入的字串，不管使用 word-wise copy 還是 memcpy，拷貝到 fbstring_core 中，且在 medium 和 large 的情況下，需要動態分配記憶體。

fbstring 提供了一個特殊的建構函式，讓 fbstring_core 接管應用程式自己分配的記憶體。

basic_fbstring 的建構函式，並呼叫 fbstring_core 相應的建構函式。注意這裡 AcquireMallocatedString 為 enum class，比使用 int 和 bool 更可讀。

/**
 * Defines a special acquisition method for constructing fbstring
 * objects. AcquireMallocatedString means that the user passes a
 * pointer to a malloc-allocated string that the fbstring object will
 * take into custody.
 */
enum class AcquireMallocatedString {};

// Nonstandard constructor
basic_fbstring(value_type *s, size_type n, size_type c,
                 AcquireMallocatedString a)
      : store_(s, n, c, a) {
}

basic_fbstring 呼叫相應的 fbstring_core 建構函式：

// Snatches a previously mallocated string. The parameter "size"
// is the size of the string, and the parameter "allocatedSize"
// is the size of the mallocated block.  The string must be
// \0-terminated, so allocatedSize >= size + 1 and data[size] == '\0'.
//
// So if you want a 2-character string, pass malloc(3) as "data",
// pass 2 as "size", and pass 3 as "allocatedSize".
fbstring_core(Char * const data,
              const size_t size,
              const size_t allocatedSize,
              AcquireMallocatedString) {
  if (size > 0) {
    FBSTRING_ASSERT(allocatedSize >= size + 1);
    FBSTRING_ASSERT(data[size] == '\0');
    // Use the medium string storage
    ml_.data_ = data;
    ml_.size_ = size;
    // Don't forget about null terminator
    ml_.setCapacity(allocatedSize - 1, Category::isMedium);
  } else {
    // No need for the memory
    free(data);
    reset();
  }
}

可以看出這裡沒有拷貝字串的過程，而是直接接管了上游傳遞過來的指標指向的記憶體。但是，正如註釋說的，這裡直接使用了 medium strings 的儲存方式。

比如 folly/io/IOBuf.cpp 中的呼叫：

// Ensure NUL terminated
*writableTail() = 0;
fbstring str(
      reinterpret_cast<char*>(writableData()),
      length(),
      capacity(),
      AcquireMallocatedString());

字串拷貝

同初始化，也是根據不同的字串型別，呼叫不同的函式：

  fbstring_core(const fbstring_core& rhs) {
    assert(&rhs != this);
    switch (rhs.category()) {
      case Category::isSmall:
        copySmall(rhs);
        break;
      case Category::isMedium:
        copyMedium(rhs);
        break;
      case Category::isLarge:
        copyLarge(rhs);
        break;
      default:
        folly::assume_unreachable();
    }
  }

copySmall

template <class Char>
inline void fbstring_core<Char>::copySmall(const fbstring_core& rhs) {
  // Just write the whole thing, don't look at details. In
  // particular we need to copy capacity anyway because we want
  // to set the size (don't forget that the last character,
  // which stores a short string's length, is shared with the
  // ml_.capacity field).

  ml_ = rhs.ml_;
}

正如註釋中所說，雖然 small strings 的情況下，字串儲存在 small中，但是我們只需要把 ml直接賦值即可，因為在一個 union 中。

copyMedium

template <class Char>
FOLLY_NOINLINE inline void fbstring_core<Char>::copyMedium(
    const fbstring_core& rhs) {
  // Medium strings are copied eagerly. Don't forget to allocate
  // one extra Char for the null terminator.
  auto const allocSize = goodMallocSize((1 + rhs.ml_.size_) * sizeof(Char));
  ml_.data_ = static_cast<Char*>(checkedMalloc(allocSize));
  // Also copies terminator.
  fbstring_detail::podCopy(
      rhs.ml_.data_, rhs.ml_.data_ + rhs.ml_.size_ + 1, ml_.data_);
  ml_.size_ = rhs.ml_.size_;
  ml_.setCapacity(allocSize / sizeof(Char) - 1, Category::isMedium);
}

medium strings 是 eager copy，所以就是"深拷貝"：

為字串分配空間、拷貝
賦值 size、capacity、category.

copyLarge

template <class Char>
FOLLY_NOINLINE inline void fbstring_core<Char>::copyLarge(
    const fbstring_core& rhs) {
  // Large strings are just refcounted
  ml_ = rhs.ml_;
  RefCounted::incrementRefs(ml_.data_);
}

large strings 的 copy 過程很直觀，因為是 COW 方式：

直接賦值 ml，內含指向共享字串的指標。
共享字串的引用計數加 1。

incrementRefs 和內部呼叫 fromData 這兩個個函式值得看一下：

static RefCounted* fromData(Char* p) {
      return static_cast<RefCounted*>(static_cast<void*>(
          static_cast<unsigned char*>(static_cast<void*>(p)) -
          getDataOffset()));
}

static void incrementRefs(Char* p) {
  fromData(p)->refCount_.fetch_add(1, std::memory_order_acq_rel);
}

因為 ml中指向的是 RefCounted 的 data[1]，所以我們需要通過 fromData 來找到 data_所屬的 RefCounted 的地址。我把 fromData 函式內的運算拆開：

static RefCounted * fromData(Char * p) {
      // 轉換data_[1]的地址
      void* voidDataAddr = static_cast<void*>(p);
      unsigned char* unsignedDataAddr = static_cast<unsigned char*>(voidDataAddr);

     // 獲取data_[1]在結構體的偏移量再相減，得到的就是所屬RefCounted的地址
      unsigned char* unsignedRefAddr = unsignedDataAddr - getDataOffset();

      void* voidRefAddr = static_cast<void*>(unsignedRefAddr);
      RefCounted* refCountAddr = static_cast<RefCounted*>(voidRefAddr);

      return refCountAddr;
}

值得關注的是如何轉換不同型別結構體的指標並做運算，這裡的做法是 ： Char* -> void* -> unsigned char* -> 與size_t做減法 -> void * -> RefCounted*

析構

~fbstring_core() noexcept {
    if (category() == Category::isSmall) {
      return;
    }
    destroyMediumLarge();
}

如果是 small 型別，直接返回，因為利用的是棧空間。
否則，針對 medium 和 large，呼叫 destroyMediumLarge。

FOLLY_MALLOC_NOINLINE void destroyMediumLarge() noexcept {
  auto const c = category();
  FBSTRING_ASSERT(c != Category::isSmall);
  if (c == Category::isMedium) {
    free(ml_.data_);
  } else {
    RefCounted::decrementRefs(ml_.data_);
  }
}

medium ： free 動態分配的字串記憶體即可。
large : 呼叫 decrementRefs，針對共享字串進行操作。

static void decrementRefs(Char * p) {
  auto const dis = fromData(p);
  size_t oldcnt = dis->refCount_.fetch_sub(1, std::memory_order_acq_rel);
  FBSTRING_ASSERT(oldcnt > 0);
  if (oldcnt == 1) {
    free(dis);
  }
}

邏輯也很清晰：先對引用計數減 1，如果本身就只有 1 個引用，那麼直接 free 掉整個 RefCounted。

COW

最重要的一點，也是 large strings 獨有的，就是 COW. 任何針對字串寫的操作，都會觸發 COW，包括前面舉過的[]操作，例如：

non-const at(size_n)
non-const operator[](size_type pos)
operator+
append
......

我們舉個例子，比如non-const operator[](size_type pos)

non-const operator[](size_type pos)

reference operator[](size_type pos) {
    return *(begin() + pos);
}

iterator begin() {
    return store_.mutableData();
}

來重點看下 mutableData() :

Char* mutableData() {
  switch (category()) {
  case Category::isSmall:
    return small_;
  case Category::isMedium:
    return ml_.data_;
  case Category::isLarge:
    return mutableDataLarge();
  }
  fbstring_detail::assume_unreachable();
}

template <class Char>
inline Char* fbstring_core<Char>::mutableDataLarge() {
  FBSTRING_ASSERT(category() == Category::isLarge);
  if (RefCounted::refs(ml_.data_) > 1) { // Ensure unique.
    unshare();
  }
  return ml_.data_;
}

同樣是分三種情況。small 和 medium 直接返回字串的地址，large 會呼叫 mutableDataLarge()，可以看出，如果引用數大於 1，會進行 unshare 操作：

void unshare(size_t minCapacity = 0);

template <class Char>
FOLLY_MALLOC_NOINLINE inline void fbstring_core<Char>::unshare(
    size_t minCapacity) {
  FBSTRING_ASSERT(category() == Category::isLarge);
  size_t effectiveCapacity = std::max(minCapacity, ml_.capacity());
  auto const newRC = RefCounted::create(&effectiveCapacity);
  // If this fails, someone placed the wrong capacity in an
  // fbstring.
  FBSTRING_ASSERT(effectiveCapacity >= ml_.capacity());
  // Also copies terminator.
  fbstring_detail::podCopy(ml_.data_, ml_.data_ + ml_.size_ + 1, newRC->data_);
  RefCounted::decrementRefs(ml_.data_);
  ml_.data_ = newRC->data_;
  ml_.setCapacity(effectiveCapacity, Category::isLarge);
  // size_ remains unchanged.
}

基本思路：

建立新的 RefCounted。
拷貝字串。
對原有的共享字串減少一個引用 decrementRefs，這個函式在上面的析構小節裡分析過。
設定 ml_的 data、capacity、category.
注意此時還不會設定 size，因為還不知道應用程式對字串進行什麼修改。

non-const 與 const

大家可能注意到了，上面的 at 和[]強調了 non-const，這是因為 const-qualifer 針對這兩個呼叫不會觸發 COW ，還以[]為例：

// C++11 21.4.5 element access:
const_reference operator[](size_type pos) const {
    return *(begin() + pos);
}

const_iterator begin() const {
    return store_.data();
}

// In C++11 data() and c_str() are 100% equivalent.
const Char* data() const { return c_str(); }

const Char* c_str() const {
    const Char* ptr = ml_.data_;
    // With this syntax, GCC and Clang generate a CMOV instead of a branch.
    ptr = (category() == Category::isSmall) ? small_ : ptr;
    return ptr;
}

可以看出區別，non-const 版本的 begin()中呼叫的是 mutableData()，而 const-qualifer 版本呼叫的是 data() -> c_str()，而 c_str()直接返回的字串地址。

所以，當字串用到[]、at 且不需要寫操作時，最好用 const-qualifer.

我們拿 folly 自帶的benchmark 工具測試一下：

#include "folly/Benchmark.h"
#include "folly/FBString.h"
#include "folly/container/Foreach.h"

using namespace std;
using namespace folly;

BENCHMARK(nonConstFbstringAt, n) {
  ::folly::fbstring str(
      "fbstring is a drop-in replacement for std::string. The main benefit of fbstring is significantly increased "
      "performance on virtually all important primitives. This is achieved by using a three-tiered storage strategy "
      "and by cooperating with the memory allocator. In particular, fbstring is designed to detect use of jemalloc and "
      "cooperate with it to achieve significant improvements in speed and memory usage.");
  FOR_EACH_RANGE(i, 0, n) {
    char &s = str[2];
    doNotOptimizeAway(s);
  }
}

BENCHMARK_DRAW_LINE();

BENCHMARK_RELATIVE(constFbstringAt, n) {
  const ::folly::fbstring str(
      "fbstring is a drop-in replacement for std::string. The main benefit of fbstring is significantly increased "
      "performance on virtually all important primitives. This is achieved by using a three-tiered storage strategy "
      "and by cooperating with the memory allocator. In particular, fbstring is designed to detect use of jemalloc and "
      "cooperate with it to achieve significant improvements in speed and memory usage.");
  FOR_EACH_RANGE(i, 0, n) {
    const char &s = str[2];
    doNotOptimizeAway(s);
  }
}
int main() { runBenchmarks(); }

結果是 constFbstringAt 比 nonConstFbstringAt 快了 175%

============================================================================
delve_folly/main.cc                             relative  time/iter  iters/s
============================================================================
nonConstFbstringAt                                          39.85ns   25.10M
----------------------------------------------------------------------------
constFbstringAt                                  175.57%    22.70ns   44.06M
============================================================================

Realloc

reserve、operator+等操作，可能會涉及到記憶體重新分配，最終呼叫的都是 memory/Malloc.h 中的 smartRealloc：

inline void* checkedRealloc(void* ptr, size_t size) {
  void* p = realloc(ptr, size);
  if (!p) {
    throw_exception<std::bad_alloc>();
  }
  return p;
}

/**
 * This function tries to reallocate a buffer of which only the first
 * currentSize bytes are used. The problem with using realloc is that
 * if currentSize is relatively small _and_ if realloc decides it
 * needs to move the memory chunk to a new buffer, then realloc ends
 * up copying data that is not used. It's generally not a win to try
 * to hook in to realloc() behavior to avoid copies - at least in
 * jemalloc, realloc() almost always ends up doing a copy, because
 * there is little fragmentation / slack space to take advantage of.
 */
FOLLY_MALLOC_CHECKED_MALLOC FOLLY_NOINLINE inline void* smartRealloc(
    void* p,
    const size_t currentSize,
    const size_t currentCapacity,
    const size_t newCapacity) {
  assert(p);
  assert(currentSize <= currentCapacity &&
         currentCapacity < newCapacity);

  auto const slack = currentCapacity - currentSize;
  if (slack * 2 > currentSize) {
    // Too much slack, malloc-copy-free cycle:
    auto const result = checkedMalloc(newCapacity);
    std::memcpy(result, p, currentSize);
    free(p);
    return result;
  }
  // If there's not too much slack, we realloc in hope of coalescing
  return checkedRealloc(p, newCapacity);
}

從註釋和程式碼看為什麼函式起名叫smartRealloc :

如果(the currentCapacity - currentSize) _ 2 > currentSize，即 currentSize < 2/3 _ capacity，說明當前分配的記憶體利用率較低，此時認為如果使用 realloc 並且 realloc 決定拷貝當前記憶體到新記憶體，成本會高於直接 malloc(newCapacity) + memcpy + free(old_memory)。
否則直接 realloc.

其他

__builtin_expect

給編譯器提供分支預測資訊。原型為：

long __builtin_expect (long exp, long c)

表示式的返回值為 exp 的值，跟 c 無關。 我們預期 exp 的值是 c。例如下面的例子，我們預期 x 的值是 0，所以這裡提示編譯器，只有很小的機率會呼叫到 foo()

if (__builtin_expect (x, 0))
  foo ();

再比如判斷指標是否為空：

if (__builtin_expect (ptr != NULL, 1))
  foo (*ptr);

在 fbstring 中也用到了builtin_expect，例如建立 RefCounted 的函式 (FOLLY_LIKELY 包裝了一下builtin_expect)：

#if __GNUC__
#define FOLLY_DETAIL_BUILTIN_EXPECT(b, t) (__builtin_expect(b, t))
#else
#define FOLLY_DETAIL_BUILTIN_EXPECT(b, t) b
#endif

//  Likeliness annotations
//
//  Useful when the author has better knowledge than the compiler of whether
//  the branch condition is overwhelmingly likely to take a specific value.
//
//  Useful when the author has better knowledge than the compiler of which code
//  paths are designed as the fast path and which are designed as the slow path,
//  and to force the compiler to optimize for the fast path, even when it is not
//  overwhelmingly likely.

#define FOLLY_LIKELY(x) FOLLY_DETAIL_BUILTIN_EXPECT((x), 1)
#define FOLLY_UNLIKELY(x) FOLLY_DETAIL_BUILTIN_EXPECT((x), 0)

static RefCounted* create(const Char* data, size_t* size) {
  const size_t effectiveSize = *size;
  auto result = create(size);
  if (FOLLY_LIKELY(effectiveSize > 0)) {       // __builtin_expect
    fbstring_detail::podCopy(data, data + effectiveSize, result->data_);
  }
  return result;
}

從彙編角度來說，會將可能性更大的彙編緊跟著前面的彙編，防止無效指令的載入。可以參考：

CMOV 指令

conditional move，條件傳送。類似於 MOV 指令，但是依賴於 RFLAGS 暫存器內的狀態。如果條件沒有滿足，該指令不會有任何效果。

CMOV 的優點是可以避免分支預測，避免分支預測錯誤對 CPU 流水線的影響。詳細可以看這篇文件：amd-cmovcc.pdf

fbstring 在一些場景會提示編譯器生成 CMOV 指令，例如：

const Char* c_str() const {
  const Char* ptr = ml_.data_;
  // With this syntax, GCC and Clang generate a CMOV instead of a branch.
  ptr = (category() == Category::isSmall) ? small_ : ptr;
  return ptr;
}

builtin_unreachable && assume(0)

如果程式執行到了builtin_unreachable 和assume(0) ，那麼會出現未定義的行為。例如**builtin_unreachable 出現在一個不會返回的函式後面，而且這個函式沒有宣告為**attribute\_\_((noreturn))。例如6.59 Other Built-in Functions Provided by GCC
給出的例子 :

void function_that_never_returns (void);

int g (int c)
{
  if (c)
    {
      return 1;
    }
  else
    {
      function_that_never_returns ();
      __builtin_unreachable ();
    }
}

如果不加__builtin_unreachable ();，會報error: control reaches end of non-void function [-Werror=return-type]

folly 將 builtin_unreachable 和assume(0) 封裝成了assume_unreachable：

[[noreturn]] FOLLY_ALWAYS_INLINE void assume_unreachable() {
  assume(false);
  // Do a bit more to get the compiler to understand
  // that this function really will never return.
#if defined(__GNUC__)
  __builtin_unreachable();
#elif defined(_MSC_VER)
  __assume(0);
#else
  // Well, it's better than nothing.
  std::abort();
#endif
}

在 fbstring 的一些特性場景，比如 switch 判斷 category 中用到。這是上面提到過的 mutableData() :

Char* mutableData() {
  switch (category()) {
    case Category::isSmall:
      return small_;
    case Category::isMedium:
      return ml_.data_;
    case Category::isLarge:
      return mutableDataLarge();
  }
  folly::assume_unreachable();
}

jemalloc

官網
API 文件
大致的演算法和原理可以參考 facebook 的這篇部落格： Scalable memory allocation using jemalloc

find 演算法

使用的簡化的 Boyer-Moore 演算法，文件說明是在查詢成功的情況下比 std::string 的 find 快 30 倍。benchmark 程式碼在FBStringBenchmark.cpp

我自己測試的情況貌似是搜尋長字串的情況會更好些。

判斷大小端

// It's MSVC, so we just have to guess ... and allow an override
#ifdef _MSC_VER
# ifdef FOLLY_ENDIAN_BE
  static constexpr auto kIsLittleEndian = false;
# else
  static constexpr auto kIsLittleEndian = true;
# endif
#else
  static constexpr auto kIsLittleEndian =
  __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__;
#endif

BYTE_ORDER為預定義巨集：，值是ORDER_LITTLE_ENDIAN、ORDER_BIG_ENDIAN、ORDER_PDP_ENDIAN中的一個。

一般會這麼使用：

/* Test for a little-endian machine */
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__

c++20 引入了std::endian，判斷會更加方便。

（完）

朋友們可以關注下我的公眾號，獲得最及時的更新：

C++ folly庫解讀（二） small_vector —— 小資料集下的std::vector替代方案
2021-02-28
C++
C++ folly庫解讀（三）Synchronized —— 比std::lock_guard/std::unique_lock更易用、功能更強大的同步機制
2021-06-18
C++synchronized
std::string的find問題研究
2018-12-19
【C++】【原始碼解讀】std::is_same函式原始碼解讀
2022-02-09
C++原始碼函式
從 1 到完美，寫一個 js 庫、node 庫、前端元件庫
2019-02-16
JS前端元件
C++ 標準庫 std::set std::multiset swap()的使用
2020-12-17
C++
介紹一個MongoDB的替代方案
2023-09-18
MongoDB
實現一個簡單的 std::unique_ptr
2024-03-10
雖然包含string標頭檔案但未用std::
2024-04-04
補充一個替代 for 迴圈的新姿勢
2019-01-12
使用阿里雲PolarDB替代Oracle資料庫，申通完美扛過618業務高峰
2020-06-29
阿里Oracle資料庫
零程式碼的多方面解讀
2021-10-21
從零寫一個時間序列資料庫
2019-06-11
資料庫
解讀第一個C++程式
2021-02-12
C++
盲目跟風尋求完美零庫存的精益出產辦理方法
2022-05-11
杉巖海量物件儲存系統完美替代Documentum
2019-12-19
物件
八大遠端控制軟體，完美替代Teamviewer
2022-10-18
View
一個完美支援多程式的元件化方案
2019-03-03
元件化
零程式碼開發的價值解讀
2020-09-26
std::reserve和std::resize的區別
2020-10-01
從零到一教你基於vue開發一個元件庫
2020-03-09
Vue元件
Rust 標準庫中的 async/await (async-std)
2019-12-06
RustAI
iOS從0到1| OC 中如何檢查一個字串 string 是否包含另一個 string
2018-12-21
iOS字串
記一個關於std::unordered_map併發訪問的BUG
2021-02-22
一個替代 Supervisor 且更加友好的程式監控器
2019-09-03
解讀庫洛遊戲進階之道：“酷”就一個字，我不止說一次
2021-09-23
遊戲
每週一個 Python 模組 | string
2019-03-03
Python
去掉String屬性的最後一個逗號
2021-08-17
基於Web的6個完美3D圖形WebGL庫
2023-02-14
Web3D
一張圖完美解釋CAP定理
2021-01-07
`std::packaged_task`、`std::thread` 和 `std::async` 的區別與聯絡
2024-10-06
Packagethread
從零到一，元件庫的進化
2019-12-23
元件
零工經濟，一種完美的經濟形態?
2019-09-03
可以替代basemap的幾個工具
2018-10-25
優雅地封裝 Activity Result API，完美地替代 startActivityForResult()
2021-07-22
封裝API
【詳解】換一個角度看Socket的資料讀寫
2019-02-03
解讀並實現一個簡單的koa-router
2018-04-11
nicetool--替代hutool和fastjson的工具庫
2024-03-15
ASTJSON