Netty原始碼解析 -- 記憶體池與PoolArena

binecy發表於2020-11-29

原文網址 : https://www.cnblogs.com/binecy/p/14057712.html

我們知道，Netty使用直接記憶體實現Netty零拷貝以提升效能，
但直接記憶體的建立和釋放可能需要涉及系統呼叫，是比較昂貴的操作，如果每個請求都建立和釋放一個直接記憶體，那效能肯定是不能滿足要求的。
這時就需要使用記憶體池。
即從系統中申請一大塊記憶體，再在上面分配每個請求所需的記憶體。

Netty中的記憶體池主要涉及PoolArena，PoolChunk與PoolSubpage。
本文主要分析PoolArena的作用與實現。
原始碼分析基於Netty 4.1.52

介面關係
ByteBufAllocator，記憶體分配器，負責為ByteBuf分配記憶體，執行緒安全。
PooledByteBufAllocator，池化記憶體分配器，預設的ByteBufAllocator，預先從作業系統中申請一大塊記憶體，在該記憶體上分配記憶體給ByteBuf，可以提高效能和減小記憶體碎片。
UnPooledByteBufAllocator，非池化記憶體分配器，每次都從作業系統中申請記憶體。

RecvByteBufAllocator，接收記憶體分配器，為Channel讀入的IO資料分配一塊大小合理的buffer空間。具體功能交由內部介面Handle定義。
它主要是針對Channel讀入場景新增一些操作，如guess，incMessagesRead，lastBytesRead等等。
ByteBuf，分配好的記憶體塊，可以直接使用。

下面只關注PooledByteBufAllocator，它是Netty中預設的記憶體分配器，也是理解Netty記憶體機制的難點。

記憶體分配

前面文章《ChannelPipeline機制與讀寫過程》中分析了資料讀取過程，
NioByteUnsafe#read

public final void read() {
    ...
    final RecvByteBufAllocator.Handle allocHandle = recvBufAllocHandle();
    allocHandle.reset(config);

    ByteBuf byteBuf = null;

    ...
    byteBuf = allocHandle.allocate(allocator);
    allocHandle.lastBytesRead(doReadBytes(byteBuf));
    ...
}

recvBufAllocHandle方法返回AdaptiveRecvByteBufAllocator.HandleImpl。(AdaptiveRecvByteBufAllocator，PooledByteBufAllocator都在DefaultChannelConfig中初始化)

AdaptiveRecvByteBufAllocator.HandleImpl#allocate -> AbstractByteBufAllocator#ioBuffer -> PooledByteBufAllocator#directBuffer -> PooledByteBufAllocator#newDirectBuffer

protected ByteBuf newDirectBuffer(int initialCapacity, int maxCapacity) {
    // #1
    PoolThreadCache cache = threadCache.get();
    PoolArena<ByteBuffer> directArena = cache.directArena;

    final ByteBuf buf;
    if (directArena != null) {
        // #2
        buf = directArena.allocate(cache, initialCapacity, maxCapacity);
    } else {
        // #3
        buf = PlatformDependent.hasUnsafe() ? UnsafeByteBufUtil.newUnsafeDirectByteBuf(this, initialCapacity, maxCapacity) : new UnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
    }
    return toLeakAwareBuffer(buf);
}

AbstractByteBufAllocator#ioBuffer方法會判斷當前系統是否支援unsafe。支援時使用直接記憶體，不支援則使用堆記憶體。這裡只關注直接記憶體的實現。
#1 從當前執行緒快取中獲取對應記憶體池PoolArena
#2 在當前執行緒記憶體池上分配記憶體
#3 記憶體池不存在，只能使用非池化記憶體分配記憶體了

PooledByteBufAllocator#threadCache是一個PoolThreadLocalCache例項，PoolThreadLocalCache繼承於FastThreadLocal，FastThreadLocal這裡簡單理解為對ThreadLocal的優化，它為每個執行緒維護了一個PoolThreadCache，PoolThreadCache上關聯了記憶體池。
當PoolThreadLocalCache上某個執行緒的PoolThreadCache不存在時，通過initialValue方法構造。

PoolThreadLocalCache#initialValue

protected synchronized PoolThreadCache initialValue() {
    // #1
    final PoolArena<byte[]> heapArena = leastUsedArena(heapArenas);
    final PoolArena<ByteBuffer> directArena = leastUsedArena(directArenas);
    // #2
    final Thread current = Thread.currentThread();
    if (useCacheForAllThreads || current instanceof FastThreadLocalThread) {
        final PoolThreadCache cache = new PoolThreadCache(
                heapArena, directArena, smallCacheSize, normalCacheSize,
                DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL);

        ...
    }
    // No caching so just use 0 as sizes.
    return new PoolThreadCache(heapArena, directArena, 0, 0, 0, 0);
}

#1 從PooledByteBufAllocator的heapArenas，directArenas中獲取使用率最小的PoolArena。
PooledByteBufAllocator構造時預設會為PooledByteBufAllocator#directArenas初始化8個PoolArena。
#2 構造PoolThreadCache。

PoolArena，可以理解為一個記憶體池，負責管理從作業系統中申請到的記憶體塊。
PoolThreadCache為每一個執行緒關聯一個PoolArena（PoolThreadCache#directArena），該執行緒的記憶體都在該PoolArena上分配。
Netty支援高併發系統，可能有很多執行緒進行同時記憶體分配。為了緩解執行緒競爭，通過建立多個PoolArena細化鎖的粒度，從而提高併發執行的效率。
注意，一個PoolArena可以會分給多個的執行緒，可以看到PoolArena上會有一些同步操作。

記憶體級別

前面分析SizeClasses的文章說過，Netty將記憶體池中的記憶體塊按大小劃分為3個級別。
不同級別的記憶體塊管理演算法不同。預設劃分規則如下：
small <= 28672(28K)
normal <= 16777216(16M)
huge > 16777216(16M)

smallSubpagePools是一個PoolSubpage陣列，負責維護small級別的記憶體塊資訊。
PoolChunk負責維護normal級別的記憶體，PoolChunkList管理一組PoolChunk。
PoolArena按記憶體使用率將PoolChunk分別維護到6個PoolChunkList中，
PoolArena按記憶體使用率將PoolChunk分別維護到6個PoolChunkList中，
qInit->記憶體使用率為0~25，
q000->記憶體使用率為1~50，
q025->記憶體使用率為25~75，
q050->記憶體使用率為50~75，
q075->記憶體使用率為75~100，
q100->記憶體使用率為100。
注意：PoolChunk是Netty每次向作業系統申請的記憶體塊。
PoolSubpage需要從PoolChunk中分配，而Tiny，Small級別的記憶體則是從PoolSubpage中分配。

下面來看一下分配過程

private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity) {
    // #1
    final int sizeIdx = size2SizeIdx(reqCapacity);
    // #2
    if (sizeIdx <= smallMaxSizeIdx) {
        tcacheAllocateSmall(cache, buf, reqCapacity, sizeIdx);
    } else if (sizeIdx < nSizes) {
        // #3
        tcacheAllocateNormal(cache, buf, reqCapacity, sizeIdx);
    } else {
        // #4
        int normCapacity = directMemoryCacheAlignment > 0
                ? normalizeSize(reqCapacity) : reqCapacity;
        // Huge allocations are never served via the cache so just call allocateHuge
        allocateHuge(buf, normCapacity);
    }
}

#1 size2SizeIdx是父類SizeClasses提供的方法，它使用特定演算法，將申請的記憶體大小調整為規範大小，劃分到對應位置，返回對應索引，可參考《記憶體對齊類SizeClasses》
#2 分配small級別的記憶體塊
#3 分配normal級別的記憶體塊
#4 分配huge級別的記憶體塊

private void tcacheAllocateSmall(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity,
                                 final int sizeIdx) {
    // #1
    if (cache.allocateSmall(this, buf, reqCapacity, sizeIdx)) {
        return;
    }

    // #2
    final PoolSubpage<T> head = smallSubpagePools[sizeIdx];
    final boolean needsNormalAllocation;
    synchronized (head) {
        // #3
        final PoolSubpage<T> s = head.next;
        needsNormalAllocation = s == head;
        if (!needsNormalAllocation) {
            assert s.doNotDestroy && s.elemSize == sizeIdx2size(sizeIdx);
            long handle = s.allocate();
            assert handle >= 0;
            s.chunk.initBufWithSubpage(buf, null, handle, reqCapacity, cache);
        }
    }
    // #4
    if (needsNormalAllocation) {
        synchronized (this) {
            allocateNormal(buf, reqCapacity, sizeIdx, cache);
        }
    }

    incSmallAllocation();
}

#1 首先嚐試線上程快取上分配。
除了PoolArena，PoolThreadCache#smallSubPageHeapCaches還為每個執行緒維護了Small級別的記憶體快取
#2 使用前面SizeClasses#size2SizeIdx方法計算的索引，獲取對應PoolSubpage
#3 注意，head是一個佔位節點，並不儲存資料，s==head表示當前存在可以用的PoolSubpage，因為已經耗盡的PoolSubpage是會從連結串列中移除。
接著從PoolSubpage中分配記憶體，後面有文章解析詳細過程
注意，這裡必要執行在同步機制中。
#4 沒有可用的PoolSubpage，需要申請一個Normal級別的記憶體塊，再在上面分配所需記憶體

normal級別的記憶體也是先嚐試線上程快取中分配，分配失敗後再呼叫allocateNormal方法申請
PoolArena#allocate -> allocateNormal

private void allocateNormal(PooledByteBuf<T> buf, int reqCapacity, int sizeIdx, PoolThreadCache threadCache) {
    if (q050.allocate(buf, reqCapacity, sizeIdx, threadCache) ||
        q025.allocate(buf, reqCapacity, sizeIdx, threadCache) ||
        q000.allocate(buf, reqCapacity, sizeIdx, threadCache) ||
        qInit.allocate(buf, reqCapacity, sizeIdx, threadCache) ||
        q075.allocate(buf, reqCapacity, sizeIdx, threadCache)) {
        return;
    }

    // Add a new chunk.
    PoolChunk<T> c = newChunk(pageSize, nPSizes, pageShifts, chunkSize);
    boolean success = c.allocate(buf, reqCapacity, sizeIdx, threadCache);
    assert success;
    qInit.add(c);
}

#1 依次從q050，q025，q000，qInit，q075上申請記憶體
為什麼要是這個順序呢？

PoolArena中的PoolChunkList之間也組成一個“雙向”連結串列

qInit ---> q000 <---> q025 <---> q050 <---> q075 <---> q100

PoolChunkList中還維護了minUsage，maxUsage，即當一個PoolChunk使用率大於maxUsage，它將被移動到下一個PoolChunkList，使用率小於minUsage，則被移動到前一個PoolChunkList。
注意：q000沒有前置節點，它的minUsage為1，即上面的PoolChunk記憶體完全釋放後，將被銷燬。
qInit的前置節點是它自己，但它的minUsage為Integer.MIN_VALUE，即使上面的PoolChunk記憶體完全釋放後，也不會被銷燬，而是繼續保留在記憶體。

不優先從q000分配，正是因為q000上的PoolChunk記憶體完全釋放後要被銷燬，如果在上面分配，則會延遲記憶體的回收進度。
而q075上由於記憶體利用率太高，導致記憶體分配的成功率大大降低，因此放到最後。
所以從q050是一個不錯的選擇，這樣大部分情況下，Chunk的利用率都會保持在一個較高水平，提高整個應用的記憶體利用率；

在PoolChunkList上申請記憶體，PoolChunkList會遍歷連結串列上PoolChunk節點，直到分配成功或到達連結串列末尾。
PoolChunk分配後，如果記憶體使用率高於maxUsage，它將被移動到下一個PoolChunkList。

newChunk方法負責構造一個PoolChunk，這裡是記憶體池向作業系統申請記憶體。
DirectArena#newChunk

protected PoolChunk<ByteBuffer> newChunk(int pageSize, int maxPageIdx,
    int pageShifts, int chunkSize) {
    if (directMemoryCacheAlignment == 0) {
        return new PoolChunk<ByteBuffer>(this,
                allocateDirect(chunkSize), pageSize, pageShifts,
                chunkSize, maxPageIdx, 0);
    }
    final ByteBuffer memory = allocateDirect(chunkSize
            + directMemoryCacheAlignment);
    return new PoolChunk<ByteBuffer>(this, memory, pageSize,
            pageShifts, chunkSize, maxPageIdx,
            offsetCacheLine(memory));
}

allocateDirect方法向作業系統申請記憶體，獲得一個(jvm)ByteBuffer，
PoolChunk#memory維護了該ByteBuffer，PoolChunk的記憶體實際上都是在該ByteBuffer上分配。

最後是huge級別的記憶體申請

private void allocateHuge(PooledByteBuf<T> buf, int reqCapacity) {
    PoolChunk<T> chunk = newUnpooledChunk(reqCapacity);
    activeBytesHuge.add(chunk.chunkSize());
    buf.initUnpooled(chunk, reqCapacity);
    allocationsHuge.increment();
}

比較簡單，沒有使用記憶體池，直接向作業系統申請記憶體。

記憶體釋放

void free(PoolChunk<T> chunk, ByteBuffer nioBuffer, long handle, int normCapacity, PoolThreadCache cache) {
    if (chunk.unpooled) {
        // #1
        int size = chunk.chunkSize();
        destroyChunk(chunk);
        activeBytesHuge.add(-size);
        deallocationsHuge.increment();
    } else {
        // #2
        SizeClass sizeClass = sizeClass(handle);
        if (cache != null && cache.add(this, chunk, nioBuffer, handle, normCapacity, sizeClass)) {
            // cached so not free it.
            return;
        }

        freeChunk(chunk, handle, normCapacity, sizeClass, nioBuffer, false);
    }
}

#1 非池化記憶體，直接銷燬記憶體
#2 池化記憶體，首先嚐試加到執行緒快取中，成功則不需要其他操作。失敗則呼叫freeChunk

void freeChunk(PoolChunk<T> chunk, long handle, int normCapacity, SizeClass sizeClass, ByteBuffer nioBuffer,
               boolean finalizer) {
    final boolean destroyChunk;
    synchronized (this) {
        ...
        destroyChunk = !chunk.parent.free(chunk, handle, normCapacity, nioBuffer);
    }
    if (destroyChunk) {
        // destroyChunk not need to be called while holding the synchronized lock.
        destroyChunk(chunk);
    }
}

chunk.parent即PoolChunkList，PoolChunkList#free會呼叫PoolChunk釋放記憶體，釋放記憶體後，如果記憶體使用率低於minUsage，則移動前一個PoolChunkList，如果前一個PoolChunkList不存在(q000)，則返回false，由後面的步驟銷燬該PoolChunk。
可回顧前面解析ByteBuf文章中關於記憶體銷燬的內容。

如果您覺得本文不錯，歡迎關注我的微信公眾號，系列文章持續更新中。您的關注是我堅持的動力！

Netty原始碼解析 -- 記憶體對齊類SizeClasses
2020-11-22
Netty原始碼記憶體
Netty原始碼—六、tiny、small記憶體分配
2021-09-09
Netty原始碼記憶體
Swoole 原始碼分析——記憶體模組之記憶體池
2018-08-03
原始碼記憶體
Netty原始碼解析 -- FastThreadLocal與HashedWheelTimer
2021-01-17
Netty原始碼ASTthread
Python記憶體管理機制-《原始碼解析》
2020-06-06
Python記憶體原始碼
Netty原始碼解析一——執行緒池模型之執行緒池NioEventLoopGroup
2022-02-21
Netty原始碼執行緒模型OOP
leveldb原始碼分析(1)－－arena記憶體池的實現
2019-05-13
原始碼記憶體
ThreadLocal原始碼解析，記憶體洩露以及傳遞性
2019-05-13
thread原始碼記憶體洩露
Netty原始碼解析 -- 零拷貝機制與ByteBuf
2020-11-15
Netty原始碼
Netty原始碼解析 -- ChannelOutboundBuffer實現與Flush過程
2020-11-08
Netty原始碼
Netty原始碼深度解析(九)-編碼
2018-12-03
Netty原始碼
認真的 Netty 原始碼解析（二）
2018-11-09
Netty原始碼
認真的 Netty 原始碼解析（一）
2018-11-05
Netty原始碼
Netty原始碼解析3-Pipeline
2019-03-04
Netty原始碼
Netty原始碼解析5-ChannelHandler
2019-02-22
Netty原始碼
Netty原始碼解析2-Reactor
2019-02-22
Netty原始碼React
Netty系列（一）：NioEventLoopGroup原始碼解析
2019-02-28
NettyOOP原始碼
追蹤解析 Netty IntObjectHashMap 原始碼
2022-12-26
NettyObjectHashMap原始碼
從核心原始碼看 slab 記憶體池的建立初始化流程
2023-04-12
原始碼記憶體
Netty原始碼解析 -- ChannelPipeline機制與讀寫過程
2020-11-07
Netty原始碼
C++動態記憶體管理與原始碼剖析
2021-08-06
C++記憶體原始碼
記憶體池設計
2019-05-10
記憶體
Netty原始碼解析 -- PoolChunk實現原理
2020-12-06
Netty原始碼
Netty原始碼解析 -- PoolSubpage實現原理
2020-12-19
Netty原始碼
Memcached記憶體管理原始碼分析
2021-09-09
記憶體原始碼
Netty原始碼學習8——從ThreadLocal到FastThreadLocal(如何讓FastThreadLocal記憶體洩漏doge)
2023-12-10
Netty原始碼threadAST記憶體
MOSN 原始碼解析 - 連線池
2020-04-07
原始碼
以太坊交易池原始碼解析
2020-10-15
原始碼
Netty原始碼解析4-Handler綜述
2019-03-02
Netty原始碼
從一次netty 記憶體洩露問題來看netty對POST請求的解析
2021-08-24
Netty記憶體洩露
深入淺出Netty記憶體管理 PoolChunk
2019-04-08
Netty記憶體
Netty 中的記憶體分配淺析
2020-06-12
Netty記憶體
php原始碼02 -基本變數與記憶體管理機制
2022-03-17
PHP原始碼變數記憶體
Netty基礎系列(4) --堆外記憶體與零拷貝
2019-08-12
Netty記憶體
C++記憶體管理：簡易記憶體池的實現
2021-12-13
C++記憶體
記憶體池原理大揭祕
2019-03-01
記憶體
C++手寫記憶體池
2021-08-07
C++記憶體
iOS探索記憶體對齊&malloc原始碼
2020-01-02
iOS記憶體原始碼

Netty原始碼解析 -- 記憶體池與PoolArena

記憶體分配

記憶體級別

記憶體釋放

相關文章