前言

C++物件模型是個常見、且複雜的話題,本文基於Itanium C++ ABI透過程式實踐介紹了幾種 簡單C++繼承 場景下物件模型,尤其是存在虛擬函式的場景,並透過圖的方式直觀表達記憶體佈局。
本文展示的程式構建環境為Ubuntu,glibc 2.24,gcc 6.3.0。由於clang和gcc編譯器都是基於Itanium C++ ABI(詳細資訊參考gcc ABI policy),因此本文介紹的物件模型對clang編譯的程式也基本適用。

虛擬函式表簡介

虛擬函式表佈局

含有虛擬函式的類,編譯器會為其新增一個虛擬函式表(vptr)。
用如下程式驗證含有虛擬函式的類的記憶體佈局,該程式很簡單,只定義了建構函式,虛解構函式,和一個int成員變數。

// Derive.h
class Base_C
{
public:
    Base_C();
    virtual ~Base_C();

private:
    int baseC;
};

// Derive.cc
Base_C::Base_C()
{
}

Base_C::~Base_C()
{
}

gcc編譯器可透過-fdump-class-hierarchy引數,檢視類的記憶體佈局。可得到如下資訊:

// g++ -O0 -std=c++11 -fdump-class-hierarchy Derive.h
Vtable for Base_C
Base_C::_ZTV6Base_C: 4u entries
0     (int (*)(...))0
8     (int (*)(...))(& _ZTI6Base_C)
16    (int (*)(...))Base_C::~Base_C
24    (int (*)(...))Base_C::~Base_C

Class Base_C
   size=16 align=8
   base size=12 base align=8
Base_C (0x0x7fb8e9185660) 0
    vptr=((& Base_C::_ZTV6Base_C) + 16u)

從類Base_C的定義來看,類佔用的空間包括一個虛擬函式表指標vptr和一個整型變數。由於記憶體對齊的原因,類佔用16位元組
接下來看虛擬函式表,表中一共有4個entry,每個entry都是函式指標,指向具體的虛擬函式,因此每個entry在測試的機器上編譯佔8位元組(指標大小)。

注意看到表中虛解構函式有兩個,這實際上是Itanium C++ ABI規定的:

The entries for virtual destructors are actually pairs of entries. 
The first destructor, called the complete object destructor, performs the destruction without calling delete() on the object. 
The second destructor, called the deleting destructor, calls delete() after destroying the object. 
Both destroy any virtual bases; a separate, non-virtual function, called the base object destructor, 
performs destruction of the object but not its virtual base subobjects, and does not call delete().

虛解構函式在虛擬函式表中佔用兩條目,分別是complete object destructordeleting destructor


除了解構函式,虛擬函式表還有兩個條目,緊靠解構函式的是typeinfo指標,指向型別資訊物件(typeinfo object),用於執行時型別識別(RTTI)。

第一個條目看起來可能比較陌生,是offset,該偏移儲存了從當前虛表指標(vtable pointer)位置到物件頂部的位移。在ABI文件中這兩個條目均有詳細的介紹:

// typeinfo指標
The typeinfo pointer points to the typeinfo object used for RTTI. It is always present. 
All entries in each of the virtual tables for a given class must point to the same typeinfo object.
A correct implementation of typeinfo equality is to check pointer equality, except for pointers (directly or indirectly) to incomplete types. 
The typeinfo pointer is a valid pointer for polymorphic classes, i.e. those with virtual functions, and is zero for non-polymorphic classes.
// offset偏移
The offset to top holds the displacement to the top of the object from the location within the object of the virtual table pointer that addresses this virtual table, as a ptrdiff_t. It is always present. 
The offset provides a way to find the top of the object from any base subobject with a virtual table pointer. This is necessary for dynamic_cast<void*> in particular. 
In a complete object virtual table, and therefore in all of its primary base virtual tables, the value of this offset will be zero. 
For the secondary virtual tables of other non-virtual bases, and of many virtual bases, it will be negative. Only in some construction virtual tables will some virtual base virtual tables have positive offsets, 
due to a different ordering of the virtual bases in the full object than in the subobject's standalone layout.

另外需要注意的是:vptr=((& Base_C::_ZTV6Base_C) + 16u),雖然虛擬函式表中有四個條目,但是vptr的指標實際上並不是指向表的起始位置,而是指向第一個虛擬函式的位置

Base_C的記憶體佈局如下圖所示:
Base_C Object Layout

繼承下的C++物件模型

單繼承下C++物件模型

首先,看一個單繼承場景的例子:

// 此處省略類的實現部分
class Base_C
{
public:
    Base_C();
    virtual ~Base_C();

private:
    int baseC;
};

class Base_D : public Base_C
{
public:
    Base_D(int i);
    virtual ~Base_D();
    virtual void add(void) { cout << "Base_D::add()..." << endl; }
    virtual void print(void);

private:
    int baseD;
};

class Derive_single : public Base_D
{
public:
    Derive_single(int d);
    void print(void) override;
    virtual void Derive_single_print();

private:
    int Derive_singleValue;
};

單繼承場景下,派生類有且只有一個虛表(將基類的虛表複製),同時派生類中override的虛擬函式,會在虛擬函式表中對原函式進行覆蓋派生類新增的虛擬函式也將追加到虛擬函式表的尾部
從整體記憶體佈局上來看,派生類中新增的非靜態成員變數,也會追加到基類的成員變數之後
列印類記憶體佈局如下:

Vtable for Derive_single
Derive_single::_ZTV13Derive_single: 7u entries
0     (int (*)(...))0
8     (int (*)(...))(& _ZTI13Derive_single)
16    (int (*)(...))Derive_single::~Derive_single
24    (int (*)(...))Derive_single::~Derive_single
32    (int (*)(...))Base_D::add
40    (int (*)(...))Derive_single::print
48    (int (*)(...))Derive_single::Derive_single_print

Class Derive_single
   size=24 align=8
   base size=20 base align=8
Derive_single (0x0x7fb8e93fe8f0) 0
    vptr=((& Derive_single::_ZTV13Derive_single) + 16u)
  Base_D (0x0x7fb8e93fe958) 0
      primary-for Derive_single (0x0x7fb8e93fe8f0)
    Base_C (0x0x7fb8e91857e0) 0
        primary-for Base_D (0x0x7fb8e93fe958)

記憶體佈局如下圖所示,記憶體佈局和上述描述一致:
Single Derive Object Layout

多繼承下C++物件模型(非菱形)

接下來考慮非菱形多繼承場景,此時對於派生類,會將其每個基類的虛擬函式表“複製”一份,最終組成虛擬函式表組,虛擬函式表排列順序,由基類在類定義中的宣告順序決定。
派生類的虛擬函式被放在宣告的第一個基類的虛擬函式表中,派生類對基類函式override時,會覆蓋所有基類中對應的函式。

// 此處省略類的實現部分
class Base_A
{
public:
    Base_A(int i);
    virtual ~Base_A();
    int getValue();
    static void countA();
    virtual void print(void);

private:
    int baseA;
    static int baseAS;
};

class Base_B
{
public:
    Base_B(int i);
    virtual ~Base_B();
    int getValue();
    virtual void add(void);
    static void countB();
    virtual void print(void);

private:
    int baseB;
    static int baseBS;
};

class Derive_multiBase : public Base_A, public Base_B
{
public:
    Derive_multiBase(int d);
    void add(void) override;
    void print(void) override;
    virtual void Derive_multiBase_print();

private:
    int Derive_multiBaseValue;
};

列印類記憶體佈局如下:

Vtable for Derive_multiBase
Derive_multiBase::_ZTV16Derive_multiBase: 13u entries
0     (int (*)(...))0
8     (int (*)(...))(& _ZTI16Derive_multiBase)
16    (int (*)(...))Derive_multiBase::~Derive_multiBase
24    (int (*)(...))Derive_multiBase::~Derive_multiBase
32    (int (*)(...))Derive_multiBase::print
40    (int (*)(...))Derive_multiBase::add
48    (int (*)(...))Derive_multiBase::Derive_multiBase_print
56    (int (*)(...))-16
64    (int (*)(...))(& _ZTI16Derive_multiBase)
72    (int (*)(...))Derive_multiBase::_ZThn16_N16Derive_multiBaseD1Ev
80    (int (*)(...))Derive_multiBase::_ZThn16_N16Derive_multiBaseD0Ev
88    (int (*)(...))Derive_multiBase::_ZThn16_N16Derive_multiBase3addEv
96    (int (*)(...))Derive_multiBase::_ZThn16_N16Derive_multiBase5printEv

Class Derive_multiBase
   size=32 align=8
   base size=32 base align=8
Derive_multiBase (0x0x7fb8e910cd20) 0
    vptr=((& Derive_multiBase::_ZTV16Derive_multiBase) + 16u)
  Base_A (0x0x7fb8e91855a0) 0
      primary-for Derive_multiBase (0x0x7fb8e910cd20)
  Base_B (0x0x7fb8e9185600) 16
      vptr=((& Derive_multiBase::_ZTV16Derive_multiBase) + 72u)

從記憶體佈局中可看到存在兩個vptr(分別指向兩個虛擬函式表),對應Derive_multiBase從兩個基類Base_ABase_B複製得到的虛擬函式表。
派生類Derive_multiBase中所有虛擬函式都擴充在主虛擬函式表(primary virtual table),也即從Base_A複製得到的虛擬函式表。
Base_B複製得到的虛擬函式表也稱為輔助虛擬函式表(secondary virtual tables),從記憶體佈局中看到其offset-16,因為此虛擬函式表指標距物件記憶體的初始位置16個位元組。

同時注意到此虛擬函式表中虛擬函式符號為non-virtual thunk to...,這個和函式跳轉的機制有關,透過thunk對呼叫不同父類的函式的地址進行修正,可以參考深入探索 C++多型②-繼承關係C++物件模型中的介紹。

// thunk
A segment of code associated (in this ABI) with a target function, which is called instead of the target function for the purpose of modifying parameters (e.g. this) or 
other parts of the environment before transferring control to the target function, 
and possibly making further modifications after its return. 
A thunk may contain as little as an instruction to be executed prior to falling through to an immediately following target function, 
or it may be a full function with its own stack frame that does a full call to the target function.

記憶體佈局如下圖所示:
Multi Derive Object Layout

討論:enable_shared_from_this特性如何影響記憶體佈局

enable_shared_from_this文件中有如下描述:

A common implementation for enable_shared_from_this is to hold a weak reference (such as std::weak_ptr) to *this. 
For the purpose of exposition, the weak reference is called weak-this and considered as a mutable std::weak_ptr member.

enable_shared_from_this的通常實現是讓例項擁有一個“弱引用”,可表現為例項有個std::weak_ptr的成員變數
可在單繼承場景的測試程式碼上進行驗證,對Derive_single類增加繼承自std::enable_shared_from_this<Derive_single>,其他不變:

class Derive_single : public Base_D, public std::enable_shared_from_this<Derive_single>
{
public:
    Derive_single(int d);
    void print(void) override;
    virtual void Derive_single_print();

private:
    int Derive_singleValue;
};
首先列印類記憶體佈局如下:
Vtable for Derive_single
Derive_single::_ZTV13Derive_single: 7u entries
0     (int (*)(...))0
8     (int (*)(...))(& _ZTI13Derive_single)
16    (int (*)(...))Derive_single::~Derive_single
24    (int (*)(...))Derive_single::~Derive_single
32    (int (*)(...))Base_D::add
40    (int (*)(...))Derive_single::print
48    (int (*)(...))Derive_single::Derive_single_print

Class Derive_single
   size=40 align=8
   base size=36 base align=8
Derive_single (0x0x7fd5c76431c0) 0
    vptr=((& Derive_single::_ZTV13Derive_single) + 16u)
  Base_D (0x0x7fd5c7639750) 0
      primary-for Derive_single (0x0x7fd5c76431c0)
    Base_C (0x0x7fd5c7632780) 0
        primary-for Base_D (0x0x7fd5c7639750)
  std::enable_shared_from_this<Derive_single> (0x0x7fd5c76327e0) 16

對比前文,可發現Derive_single記憶體佔用由24位元組增大到40位元組,原因是std::enable_shared_from_this<Derive_single>的繼承多佔用了16位元組。從std::weak_ptr的文件中可知std::weak_ptr的典型實現實際上是儲存了兩個指標,和這裡的16位元組記憶體增長一致。

// std::weak_ptr
Like std::shared_ptr, a typical implementation of weak_ptr stores two pointers:
-- a pointer to the control block; 
-- the stored pointer of the shared_ptr it was constructed from.

另外,特別注意此時Derive_single類虛擬函式表和前文沒有差異,因此enable_shared_from_this特性不影響虛擬函式表的內容

參考資料

Itanium C++ ABI
圖說C++物件模型:物件記憶體佈局詳解
C++物件模型
C++深入探索C++多型②-繼承關係
虛擬函式繼承-thunk技術初探
C++:虛擬函式記憶體佈局解析(以clang編譯器為例)