Linux記憶體管理：Vmalloc

發表於2015-09-23

在前面我們講解了kmalloc申請連續實體記憶體的操作，以及原理和基礎cache . 在核心中還有另外一個介面函式那就是vmalloc，申請一片連續的虛擬地址空間，但不保證物理空間連續，實際上我們會想到使用者空間的malloc，malloc它是標準的glibc封裝的一個函式，最終實現是通過系統呼叫brk和mmap來實現，以後在分析它的實現過程. 它就是申請連續的虛擬空間，但是不保證實體記憶體的連續，當然使用者程式也不怎麼關心這個問題，只所以會關心實體記憶體的連續性一般是由於裝置驅動的使用，或者DMA. 但是vmalloc申請效率比較低，還會造成TLB抖動. 一般核心裡常用kmalloc. 除非特殊需求，比如要獲取大塊記憶體時，例項就是當ko模組載入到核心執行時，即需要vmalloc.
釋放函式：vfree

參考核心 3.8.13

這裡是說32位的處理器，即最大定址4G虛擬空間，（當然現在已經64位比較普及了，後續補上吧）而虛擬地址到實體地址的轉化往往需要硬體的支援才能提高效率，即MMU。

當然前提需要os先建立頁表PT. 在linux核心，這4G空間並不是完全給使用者空間使用在高階0xC0000000 （3G開始）留給核心空間使用（x86預設配置，預設0-16M（DMA），16M-896M（Normal），896M-1G（128M）作為高階記憶體分配區域），當然這個區域也是可是配置的.）.
kmalloc函式返回的是虛擬地址(線性地址). kmalloc特殊之處在於它分配的記憶體是物理上連續的,這對於要進行DMA的裝置十分重要. 而用vmalloc分配的記憶體只是線性地址連續,實體地址不一定連續,不能直接用於DMA。我們可以參考一個圖：（它是arm 32架構的核心虛擬地址分配圖）

下面我們就看看vmalloc函式：(mm/vmalloc.c)

/**
 *    vmalloc - allocate virtually contiguous memory
 *    @size:        allocation size
 *    Allocate enough pages to cover @size from the page level
 *    allocator and map them into contiguous kernel virtual space.
 *
 *    For tight control over page level allocator and protection flags
 *    use __vmalloc() instead.
 */
void *vmalloc(unsigned long size)
{
    return __vmalloc_node_flags(size, -1, GFP_KERNEL | __GFP_HIGHMEM);
}

/**

* vmalloc - allocate virtually contiguous memory

* @size: allocation size

* Allocate enough pages to cover @size from the page level

* allocator and map them into contiguous kernel virtual space.

* For tight control over page level allocator and protection flags

* use __vmalloc() instead.

void *vmalloc(unsigned long size)

{

return __vmalloc_node_flags(size, -1, GFP_KERNEL | __GFP_HIGHMEM);

}

這裡我們只用關注size即可，而vmalloc優先從高階記憶體分配，並且可以睡眠.
繼續：

static inline void *__vmalloc_node_flags(unsigned long size,
                    int node, gfp_t flags)
{
    return __vmalloc_node(size, 1, flags, PAGE_KERNEL,
                    node, __builtin_return_address(0));
}

static inline void *__vmalloc_node_flags(unsigned long size,

int node, gfp_t flags)

{

return __vmalloc_node(size, 1, flags, PAGE_KERNEL,

node, __builtin_return_address(0));

}

重點看一下__vmalloc_node:

/**
 *    __vmalloc_node - allocate virtually contiguous memory
 *    @size:        allocation size
 *    @align:        desired alignment
 *    @gfp_mask:    flags for the page level allocator
 *    @prot:        protection mask for the allocated pages
 *    @node:        node to use for allocation or -1
 *    @caller:    caller's return address
 *
 *    Allocate enough pages to cover @size from the page level
 *    allocator with @gfp_mask flags. Map them into contiguous
 *    kernel virtual space, using a pagetable protection of @prot.
 */
static void *__vmalloc_node(unsigned long size, unsigned long align,
             gfp_t gfp_mask, pgprot_t prot,
             int node, const void *caller)
{
    return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
                gfp_mask, prot, node, caller);
}

/**

* __vmalloc_node - allocate virtually contiguous memory

* @size: allocation size

* @align: desired alignment

* @gfp_mask: flags for the page level allocator

* @prot: protection mask for the allocated pages

* @node: node to use for allocation or -1

* @caller: caller's return address

* Allocate enough pages to cover @size from the page level

* allocator with @gfp_mask flags. Map them into contiguous

* kernel virtual space, using a pagetable protection of @prot.

static void *__vmalloc_node(unsigned long size, unsigned long align,

gfp_t gfp_mask, pgprot_t prot,

int node, const void *caller)

{

return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,

gfp_mask, prot, node, caller);

}

因為這裡提到了VMALLOC_START和ＶＭＡＬＬＯＣ＿ＥＮＤ它們究竟是什麼值呢？
這裡看了arm32和mips32的（根據架構虛擬地址分配不同而不同，比如mips就比較特殊）：
在arch/mips/include/asm/pgtable-32.h中
首先看mips虛擬地址分佈圖：

從這個圖裡我們知道使用者空間為2G（0x0-0x7fff ffff）,dma或者normal記憶體對映在kseg0（512M）/kseg1,而對於vmalloc申請的虛擬地址在kseg2中，當然還有其他一些特殊的對映比如io等.

#define VMALLOC_START MAP_BASE

#define PKMAP_BASE        (0xfe000000UL)

#ifdef CONFIG_HIGHMEM
# define VMALLOC_END    (PKMAP_BASE-2*PAGE_SIZE)
#else
# define VMALLOC_END    (FIXADDR_START-2*PAGE_SIZE)
#endif

#define VMALLOC_START MAP_BASE

#define PKMAP_BASE (0xfe000000UL)

#ifdef CONFIG_HIGHMEM

# define VMALLOC_END (PKMAP_BASE-2*PAGE_SIZE)

#else

# define VMALLOC_END (FIXADDR_START-2*PAGE_SIZE)

#endif

在arch/arm/include/asm/pgtable.h

/*
 * Just any arbitrary offset to the start of the vmalloc VM area: the
 * current 8MB value just means that there will be a 8MB "hole" after the
 * physical memory until the kernel virtual memory starts. That means that
 * any out-of-bounds memory accesses will hopefully be caught.
 * The vmalloc() routines leaves a hole of 4kB between each vmalloced
 * area for the same reason. ;)
 */
#define VMALLOC_OFFSET        (8*1024*1024)
#define VMALLOC_START        (((unsigned long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1))
#define VMALLOC_END        0xff000000UL

* Just any arbitrary offset to the start of the vmalloc VM area: the

* current 8MB value just means that there will be a 8MB "hole" after the

* physical memory until the kernel virtual memory starts. That means that

* any out-of-bounds memory accesses will hopefully be caught.

* The vmalloc() routines leaves a hole of 4kB between each vmalloced

* area for the same reason. ;)

#define VMALLOC_OFFSET (8*1024*1024)

#define VMALLOC_START (((unsigned long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1))

#define VMALLOC_END 0xff000000UL

在看一個圖：

我們知道實體記憶體簡單分為三個區域：ZONE_NORMAL、ZONE_DMA、ZONE_HIGHMEM
vmalloc我們看到它是預設從ZONE_HIGMEM裡申請，但是這兩個函式虛擬地址是保持一致的，即都佔用了4G地址空間的核心虛擬地址.通過上面的圖，我們確定了虛擬地址從哪裡分配，以及對於的物理空間從哪裡分配。

下面看看 vmalloc核心實現：

/**
 *    __vmalloc_node_range - allocate virtually contiguous memory
 *    @size:        allocation size
 *    @align:        desired alignment
 *    @start:        vm area range start
 *    @end:        vm area range end
 *    @gfp_mask:    flags for the page level allocator
 *    @prot:        protection mask for the allocated pages
 *    @node:        node to use for allocation or -1
 *    @caller:    caller's return address
 *
 *    Allocate enough pages to cover @size from the page level
 *    allocator with @gfp_mask flags. Map them into contiguous
 *    kernel virtual space, using a pagetable protection of @prot.
 */
void *__vmalloc_node_range(unsigned long size, unsigned long align,
            unsigned long start, unsigned long end, gfp_t gfp_mask,
            pgprot_t prot, int node, const void *caller)
{
    struct vm_struct *area;
    void *addr;
    unsigned long real_size = size;

    size = PAGE_ALIGN(size);
    if (!size || (size >> PAGE_SHIFT) > totalram_pages)
        goto fail;

    area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNLIST,    // 分配虛擬地址空間 把vm_struct 和vm_area(紅黑樹機制)關聯起來.
                 start, end, node, gfp_mask, caller);
    if (!area)
        goto fail;

    addr = __vmalloc_area_node(area, gfp_mask, prot, node, caller);  //計算需要申請的頁面，申請page，然後修改頁表完成對映.
    if (!addr)
        return NULL;

    /*
     * In this function, newly allocated vm_struct is not added
     * to vmlist at __get_vm_area_node(). so, it is added here.
     */
    insert_vmalloc_vmlist(area);     //把vm_struct插入 全域性vmlist連結串列

    /*
     * A ref_count = 3 is needed because the vm_struct and vmap_area
     * structures allocated in the __get_vm_area_node() function contain
     * references to the virtual address of the vmalloc'ed block.
     */
    kmemleak_alloc(addr, real_size, 3, gfp_mask);    //記憶體洩露追蹤

    return addr;

fail:
    warn_alloc_failed(gfp_mask, 0,
             "vmalloc: allocation failure: %lu bytes\n",
             real_size);
    return NULL;
}

/**

* __vmalloc_node_range - allocate virtually contiguous memory

* @size: allocation size

* @align: desired alignment

* @start: vm area range start

* @end: vm area range end

* @gfp_mask: flags for the page level allocator

* @prot: protection mask for the allocated pages

* @node: node to use for allocation or -1

* @caller: caller's return address

* Allocate enough pages to cover @size from the page level

* allocator with @gfp_mask flags. Map them into contiguous

* kernel virtual space, using a pagetable protection of @prot.

void *__vmalloc_node_range(unsigned long size, unsigned long align,

unsigned long start, unsigned long end, gfp_t gfp_mask,

pgprot_t prot, int node, const void *caller)

{

struct vm_struct *area;

void *addr;

unsigned long real_size = size;

size = PAGE_ALIGN(size);

if (!size || (size >> PAGE_SHIFT) > totalram_pages)

goto fail;

area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNLIST, // 分配虛擬地址空間把vm_struct 和vm_area(紅黑樹機制)關聯起來.

start, end, node, gfp_mask, caller);

if (!area)

goto fail;

addr = __vmalloc_area_node(area, gfp_mask, prot, node, caller); //計算需要申請的頁面，申請page，然後修改頁表完成對映.

if (!addr)

return NULL;

* In this function, newly allocated vm_struct is not added

* to vmlist at __get_vm_area_node(). so, it is added here.

insert_vmalloc_vmlist(area); //把vm_struct插入全域性vmlist連結串列

* A ref_count = 3 is needed because the vm_struct and vmap_area

* structures allocated in the __get_vm_area_node() function contain

* references to the virtual address of the vmalloc'ed block.

kmemleak_alloc(addr, real_size, 3, gfp_mask); //記憶體洩露追蹤

return addr;

fail:

warn_alloc_failed(gfp_mask, 0,

"vmalloc: allocation failure: %lu bytes\n",

real_size);

return NULL;

}

它的基本實現思路很簡單：
1. 分配虛擬地址空間
2.對虛擬地址空間進行頁表對映

需要熟知下面兩個結構體：
struct vmap_area

struct vmap_area {
    unsigned long va_start;
    unsigned long va_end;
    unsigned long flags;
    struct rb_node rb_node;        /* address sorted rbtree */
    struct list_head list;        /* address sorted list */
    struct list_head purge_list;    /* "lazy purge" list */
    struct vm_struct *vm;
    struct rcu_head rcu_head;
};

struct vmap_area {

unsigned long va_start;

unsigned long va_end;

unsigned long flags;

struct rb_node rb_node; /* address sorted rbtree */

struct list_head list; /* address sorted list */

struct list_head purge_list; /* "lazy purge" list */

struct vm_struct *vm;

struct rcu_head rcu_head;

};

vm_struct *area ：

struct vm_struct {
    struct vm_struct    *next;
    void            *addr;
    unsigned long        size;
    unsigned long        flags;
    struct page        **pages;
    unsigned int        nr_pages;
    phys_addr_t        phys_addr;
    const void        *caller;
};

struct vm_struct {

struct vm_struct *next;

void *addr;

unsigned long size;

unsigned long flags;

struct page **pages;

unsigned int nr_pages;

phys_addr_t phys_addr;

const void *caller;

};

這裡在說明一下vmalloc_init的初始化.

/*
 * Set up kernel memory allocators
 */
static void __init mm_init(void)
{
    /*
     * page_cgroup requires contiguous pages,
     * bigger than MAX_ORDER unless SPARSEMEM.
     */
    page_cgroup_init_flatmem();
    mem_init();
    kmem_cache_init();
    percpu_init_late();
    pgtable_cache_init();
    vmalloc_init();
}

* Set up kernel memory allocators

static void __init mm_init(void)

{

* page_cgroup requires contiguous pages,

* bigger than MAX_ORDER unless SPARSEMEM.

page_cgroup_init_flatmem();

mem_init();

kmem_cache_init();

percpu_init_late();

pgtable_cache_init();

vmalloc_init();

}

其實在講slab機制的時候已經說過。

void __init vmalloc_init(void)
{
    struct vmap_area *va;
    struct vm_struct *tmp;
    int i;

    for_each_possible_cpu(i) {
        struct vmap_block_queue *vbq;

        vbq = &per_cpu(vmap_block_queue, i);
        spin_lock_init(&vbq->lock);
        INIT_LIST_HEAD(&vbq->free);
    }

    /* Import existing vmlist entries. */
    for (tmp = vmlist; tmp; tmp = tmp->next) {                     // 在系統啟動或者初始化之初，vmlist為空.
        va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
        va->flags = VM_VM_AREA;
        va->va_start = (unsigned long)tmp->addr;
        va->va_end = va->va_start + tmp->size;
        va->vm = tmp;
        __insert_vmap_area(va);
    }

    vmap_area_pcpu_hole = VMALLOC_END;

    vmap_initialized = true;
}

void __init vmalloc_init(void)

{

struct vmap_area *va;

struct vm_struct *tmp;

int i;

for_each_possible_cpu(i) {

struct vmap_block_queue *vbq;

vbq = &per_cpu(vmap_block_queue, i);

spin_lock_init(&vbq->lock);

INIT_LIST_HEAD(&vbq->free);

}

/* Import existing vmlist entries. */

for (tmp = vmlist; tmp; tmp = tmp->next) { // 在系統啟動或者初始化之初，vmlist為空.

va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);

va->flags = VM_VM_AREA;

va->va_start = (unsigned long)tmp->addr;

va->va_end = va->va_start + tmp->size;

va->vm = tmp;

__insert_vmap_area(va);

}

vmap_area_pcpu_hole = VMALLOC_END;

vmap_initialized = true;

}

下面就說說__get_vm_area_node函式：

static struct vm_struct *__get_vm_area_node(unsigned long size,
        unsigned long align, unsigned long flags, unsigned long start,
        unsigned long end, int node, gfp_t gfp_mask, const void *caller)
{
    struct vmap_area *va;
    struct vm_struct *area;

    BUG_ON(in_interrupt());
    if (flags & VM_IOREMAP) { // ioremap標誌，對映的是裝置記憶體
        int bit = fls(size);

        if (bit > IOREMAP_MAX_ORDER)
            bit = IOREMAP_MAX_ORDER;
        else if (bit < PAGE_SHIFT)
            bit = PAGE_SHIFT;

        align = 1ul << bit;
    }

    size = PAGE_ALIGN(size);
    if (unlikely(!size))
        return NULL;

    area = kzalloc_node(sizeof(*area), gfp_mask & GFP_RECLAIM_MASK, node);
    if (unlikely(!area))
        return NULL;

    /*
     * We always allocate a guard page.
     */
    size += PAGE_SIZE; // 多偏移一頁，為了防止訪問越界，由於多出來的一頁並不對映，所以當訪問的時候，會引發保護異常.

    va = alloc_vmap_area(size, align, start, end, node, gfp_mask);        // 申請vm_area虛擬地址空間
    if (IS_ERR(va)) {
        kfree(area);
        return NULL;
    }

    /*
     * When this function is called from __vmalloc_node_range,
     * we do not add vm_struct to vmlist here to avoid
     * accessing uninitialized members of vm_struct such as
     * pages and nr_pages fields. They will be set later.
     * To distinguish it from others, we use a VM_UNLIST flag.
     */
    if (flags & VM_UNLIST)   // 必然走這裡 
        setup_vmalloc_vm(area, va, flags, caller);  // 關聯vm_struct 和 vm_area
    else
        insert_vmalloc_vm(area, va, flags, caller);

    return area;
}

static struct vm_struct *__get_vm_area_node(unsigned long size,

unsigned long align, unsigned long flags, unsigned long start,

unsigned long end, int node, gfp_t gfp_mask, const void *caller)

{

struct vmap_area *va;

struct vm_struct *area;

BUG_ON(in_interrupt());

if (flags & VM_IOREMAP) { // ioremap標誌，對映的是裝置記憶體

int bit = fls(size);

if (bit > IOREMAP_MAX_ORDER)

bit = IOREMAP_MAX_ORDER;

else if (bit < PAGE_SHIFT)

bit = PAGE_SHIFT;

align = 1ul << bit;

}

size = PAGE_ALIGN(size);

if (unlikely(!size))

return NULL;

area = kzalloc_node(sizeof(*area), gfp_mask & GFP_RECLAIM_MASK, node);

if (unlikely(!area))

return NULL;

* We always allocate a guard page.

size += PAGE_SIZE; // 多偏移一頁，為了防止訪問越界，由於多出來的一頁並不對映，所以當訪問的時候，會引發保護異常.

va = alloc_vmap_area(size, align, start, end, node, gfp_mask); // 申請vm_area虛擬地址空間

if (IS_ERR(va)) {

kfree(area);

return NULL;

}

* When this function is called from __vmalloc_node_range,

* we do not add vm_struct to vmlist here to avoid

* accessing uninitialized members of vm_struct such as

* pages and nr_pages fields. They will be set later.

* To distinguish it from others, we use a VM_UNLIST flag.

if (flags & VM_UNLIST) // 必然走這裡

setup_vmalloc_vm(area, va, flags, caller); // 關聯vm_struct 和 vm_area

else

insert_vmalloc_vm(area, va, flags, caller);

return area;

}

這個函式核心就是alloc_vmap_area，這個很有趣的，之前我們講到了vmalloc申請的虛擬地址範圍，而它只傳遞了size而已，對於mips，x86，arm會有不同的虛擬空間.

/*
 * Allocate a region of KVA of the specified size and alignment, within the
 * vstart and vend.
 */
static struct vmap_area *alloc_vmap_area(unsigned long size,
                unsigned long align,
                unsigned long vstart, unsigned long vend,
                int node, gfp_t gfp_mask)
{
    struct vmap_area *va;
    struct rb_node *n;
    unsigned long addr;
    int purged = 0;
    struct vmap_area *first;

    BUG_ON(!size);
    BUG_ON(size & ~PAGE_MASK);
    BUG_ON(!is_power_of_2(align));

    va = kmalloc_node(sizeof(struct vmap_area),
            gfp_mask & GFP_RECLAIM_MASK, node);
    if (unlikely(!va))
        return ERR_PTR(-ENOMEM);

retry:
    spin_lock(&vmap_area_lock);
    /*
     * Invalidate cache if we have more permissive parameters.
     * cached_hole_size notes the largest hole noticed _below_
     * the vmap_area cached in free_vmap_cache: if size fits
     * into that hole, we want to scan from vstart to reuse
     * the hole instead of allocating above free_vmap_cache.
     * Note that __free_vmap_area may update free_vmap_cache
     * without updating cached_hole_size or cached_align.
     */
    if (!free_vmap_cache ||                              //第一次呼叫的時候 free_vmap_cache為空，後來即後邊的程式碼line 105 ： free_vmap_cache = &va->rb_node; 一般不為空 ；一般會發                                                           // 生align < cached_align的情況,即會清除free_vmap_cache。有時候align比較大的時候，它會跳過一段虛擬地址空間.後面的申請由於沒                                                            //有free_vmap_cache，所以它需要重新查詢
            size < cached_hole_size ||
            vstart < cached_vstart ||
            align < cached_align) {
nocache:
        cached_hole_size = 0;
        free_vmap_cache = NULL;
    }
    /* record if we encounter less permissive parameters */
    cached_vstart = vstart;
    cached_align = align;

    /* find starting point for our search */
    if (free_vmap_cache) {                                                  // 第一次使用的時候為空；當不為空時，它保持上次申請的節點，並初始化addr為va_end.
        first = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
        addr = ALIGN(first->va_end, align);
        if (addr < vstart)
            goto nocache;
        if (addr + size - 1 < addr)
            goto overflow;

    } else {
        addr = ALIGN(vstart, align);
        if (addr + size - 1 < addr)
            goto overflow;

        n = vmap_area_root.rb_node;                               // 同樣vmap_area_root.rb_node; 初始化也為空，第一次使用為空
        first = NULL;

        while (n) {                                               // 當不是第一申請，並且free_cache為空的時候， 需要重新找到根節點即va_start <= addr
            struct vmap_area *tmp;
            tmp = rb_entry(n, struct vmap_area, rb_node);
     
            if (tmp->va_end >= addr) {
                first = tmp;
                if (tmp->va_start <= addr)
                    break;
                n = n->rb_left;
            } else
                n = n->rb_right;
        }

        if (!first)
            goto found;
    }

    /* from the starting point, walk areas until a suitable hole is found */
    while (addr + size > first->va_start && addr + size <= vend) {                // 當不是第一申請，並且free_cache為空的時候,查詢紅黑樹節點，找到合適的空間地址.
        if (addr + cached_hole_size < first->va_start)
            cached_hole_size = first->va_start - addr;
        addr = ALIGN(first->va_end, align);
        if (addr + size - 1 < addr)
            goto overflow;
         
        if (list_is_last(&first->list, &vmap_area_list))     // 預設不會在這裡操作。也就是說它沒有元素.
            goto found;

        first = list_entry(first->list.next,
                struct vmap_area, list);
    }

found:
    if (addr + size > vend)
        goto overflow;

    va->va_start = addr;
    va->va_end = addr + size;
    va->flags = 0;
    __insert_vmap_area(va);                           // 新增到紅黑樹 vmap_area_root
    free_vmap_cache = &va->rb_node;                  // 初始化free_vmap_cache ，它會影響後續虛擬空間的申請.
    spin_unlock(&vmap_area_lock);

    BUG_ON(va->va_start & (align-1));
    BUG_ON(va->va_start < vstart);
    BUG_ON(va->va_end > vend);

    return va;

overflow:
    spin_unlock(&vmap_area_lock);
    if (!purged) {
        purge_vmap_area_lazy();
        purged = 1;
        goto retry;
    }
    if (printk_ratelimit())
        printk(KERN_WARNING
            "vmap allocation for size %lu failed: "
            "use vmalloc= to increase size.\n", size);
    kfree(va);
    return ERR_PTR(-EBUSY);
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

* Allocate a region of KVA of the specified size and alignment, within the

* vstart and vend.

static struct vmap_area *alloc_vmap_area(unsigned long size,

unsigned long align,

unsigned long vstart, unsigned long vend,

int node, gfp_t gfp_mask)

{

struct vmap_area *va;

struct rb_node *n;

unsigned long addr;

int purged = 0;

struct vmap_area *first;

BUG_ON(!size);

BUG_ON(size & ~PAGE_MASK);

BUG_ON(!is_power_of_2(align));

va = kmalloc_node(sizeof(struct vmap_area),

gfp_mask & GFP_RECLAIM_MASK, node);

if (unlikely(!va))

return ERR_PTR(-ENOMEM);

retry:

spin_lock(&vmap_area_lock);

* Invalidate cache if we have more permissive parameters.

* cached_hole_size notes the largest hole noticed _below_

* the vmap_area cached in free_vmap_cache: if size fits

* into that hole, we want to scan from vstart to reuse

* the hole instead of allocating above free_vmap_cache.

* Note that __free_vmap_area may update free_vmap_cache

* without updating cached_hole_size or cached_align.

if (!free_vmap_cache || //第一次呼叫的時候 free_vmap_cache為空，後來即後邊的程式碼line 105 ： free_vmap_cache = &va->rb_node; 一般不為空；一般會發 // 生align < cached_align的情況,即會清除free_vmap_cache。有時候align比較大的時候，它會跳過一段虛擬地址空間.後面的申請由於沒 //有free_vmap_cache，所以它需要重新查詢

size < cached_hole_size ||

vstart < cached_vstart ||

align < cached_align) {

nocache:

cached_hole_size = 0;

free_vmap_cache = NULL;

}

/* record if we encounter less permissive parameters */

cached_vstart = vstart;

cached_align = align;

/* find starting point for our search */

if (free_vmap_cache) { // 第一次使用的時候為空；當不為空時，它保持上次申請的節點，並初始化addr為va_end.

first = rb_entry(free_vmap_cache, struct vmap_area, rb_node);

addr = ALIGN(first->va_end, align);

if (addr < vstart)

goto nocache;

if (addr + size - 1 < addr)

goto overflow;

} else {

addr = ALIGN(vstart, align);

if (addr + size - 1 < addr)

goto overflow;

n = vmap_area_root.rb_node; // 同樣vmap_area_root.rb_node; 初始化也為空，第一次使用為空

first = NULL;

while (n) { // 當不是第一申請，並且free_cache為空的時候，需要重新找到根節點即va_start <= addr

struct vmap_area *tmp;

tmp = rb_entry(n, struct vmap_area, rb_node);

if (tmp->va_end >= addr) {

first = tmp;

if (tmp->va_start <= addr)

break;

n = n->rb_left;

} else

n = n->rb_right;

}

if (!first)

goto found;

}

/* from the starting point, walk areas until a suitable hole is found */

while (addr + size > first->va_start && addr + size <= vend) { // 當不是第一申請，並且free_cache為空的時候,查詢紅黑樹節點，找到合適的空間地址.

if (addr + cached_hole_size < first->va_start)

cached_hole_size = first->va_start - addr;

addr = ALIGN(first->va_end, align);

if (addr + size - 1 < addr)

goto overflow;

if (list_is_last(&first->list, &vmap_area_list)) // 預設不會在這裡操作。也就是說它沒有元素.

goto found;

first = list_entry(first->list.next,

struct vmap_area, list);

}

found:

if (addr + size > vend)

goto overflow;

va->va_start = addr;

va->va_end = addr + size;

va->flags = 0;

__insert_vmap_area(va); // 新增到紅黑樹 vmap_area_root

free_vmap_cache = &va->rb_node; // 初始化free_vmap_cache ，它會影響後續虛擬空間的申請.

spin_unlock(&vmap_area_lock);

BUG_ON(va->va_start & (align-1));

BUG_ON(va->va_start < vstart);

BUG_ON(va->va_end > vend);

return va;

overflow:

spin_unlock(&vmap_area_lock);

if (!purged) {

purge_vmap_area_lazy();

purged = 1;

goto retry;

}

if (printk_ratelimit())

printk(KERN_WARNING

"vmap allocation for size %lu failed: "

"use vmalloc= to increase size.\n", size);

kfree(va);

return ERR_PTR(-EBUSY);

}

既然我們已經開闢了虛擬地址空間，那麼還需要做的當然是和頁面一一對映起來.
看函式__vmalloc_area_node：

static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
                 pgprot_t prot, int node, const void *caller)
{
    const int order = 0;
    struct page **pages;
    unsigned int nr_pages, array_size, i;
    gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;

    nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; //申請多少pages
    array_size = (nr_pages * sizeof(struct page *));   //需要多大的存放page指標的空間 .

    area->nr_pages = nr_pages;
    /* Please note that the recursion is strictly bounded. */
    if (array_size > PAGE_SIZE) {                          // 這裡預設page_size 為4k 即4096 ，地址32位的話，相當於申請1024個pages：4M空間
        pages = __vmalloc_node(array_size, 1, nested_gfp|__GFP_HIGHMEM,
                PAGE_KERNEL, node, caller);
        area->flags |= VM_VPAGES;
    } else {
        pages = kmalloc_node(array_size, nested_gfp, node);    // 小於一頁，則直接利用slab機制申請物理空間地址 給pages.
    }
    area->pages = pages;
    area->caller = caller;
    if (!area->pages) {
        remove_vm_area(area->addr);
        kfree(area);
        return NULL;
    }

    for (i = 0; i < area->nr_pages; i++) {              //  每次申請一個page利用alloc_page直接申請物理頁面
        struct page *page;
        gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;

        if (node < 0)
            page = alloc_page(tmp_mask);
        else
            page = alloc_pages_node(node, tmp_mask, order);

        if (unlikely(!page)) {
            /* Successfully allocated i pages, free them in __vunmap() */
            area->nr_pages = i;
            goto fail;
        }
        area->pages[i] = page;             // 分配的地址存放在指標陣列.
    }

    if (map_vm_area(area, prot, &pages)) // 修改頁表 ,一頁一頁的實現對映，以及flush cache保持資料的一致性；對頁面對映和操作感興趣的可以深入看看這個函式.
        goto fail;
    return area->addr;

fail:
    warn_alloc_failed(gfp_mask, order,
             "vmalloc: allocation failure, allocated %ld of %ld bytes\n",
             (area->nr_pages*PAGE_SIZE), area->size);
    vfree(area->addr);
    return NULL;
}

static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,

pgprot_t prot, int node, const void *caller)

{

const int order = 0;

struct page **pages;

unsigned int nr_pages, array_size, i;

gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;

nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; //申請多少pages

array_size = (nr_pages * sizeof(struct page *)); //需要多大的存放page指標的空間 .

area->nr_pages = nr_pages;

/* Please note that the recursion is strictly bounded. */

if (array_size > PAGE_SIZE) { // 這裡預設page_size 為4k 即4096 ，地址32位的話，相當於申請1024個pages：4M空間

pages = __vmalloc_node(array_size, 1, nested_gfp|__GFP_HIGHMEM,

PAGE_KERNEL, node, caller);

area->flags |= VM_VPAGES;

} else {

pages = kmalloc_node(array_size, nested_gfp, node); // 小於一頁，則直接利用slab機制申請物理空間地址給pages.

}

area->pages = pages;

area->caller = caller;

if (!area->pages) {

remove_vm_area(area->addr);

kfree(area);

return NULL;

}

for (i = 0; i < area->nr_pages; i++) { // 每次申請一個page利用alloc_page直接申請物理頁面

struct page *page;

gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;

if (node < 0)

page = alloc_page(tmp_mask);

else

page = alloc_pages_node(node, tmp_mask, order);

if (unlikely(!page)) {

/* Successfully allocated i pages, free them in __vunmap() */

area->nr_pages = i;

goto fail;

}

area->pages[i] = page; // 分配的地址存放在指標陣列.

}

if (map_vm_area(area, prot, &pages)) // 修改頁表 ,一頁一頁的實現對映，以及flush cache保持資料的一致性；對頁面對映和操作感興趣的可以深入看看這個函式.

goto fail;

return area->addr;

fail:

warn_alloc_failed(gfp_mask, order,

"vmalloc: allocation failure, allocated %ld of %ld bytes\n",

(area->nr_pages*PAGE_SIZE), area->size);

vfree(area->addr);

return NULL;

}

而insert_vmalloc_vmlist很明顯把vm_struct插入到vmlist。
那麼就完成了整個過程，沒有想象的複雜，當然對記憶體有了更多的認識，這裡還需要說一下，一般情況下有高階記憶體會比沒有的好些，防止了vmalloc申請的時候造成的TLB抖動等問題，更少的破壞normal空間。

可以通過proc來檢視vmalloc的一下資訊：

cat /proc/vmallocinfo 
0xc0002000-0xc0045000 274432 jffs2_zlib_init+0x24/0xa4 pages=66 vmalloc
0xc0045000-0xc0051000 49152 jffs2_zlib_init+0x40/0xa4 pages=11 vmalloc
0xc0051000-0xc0053000 8192 brcmnand_create_cet+0x244/0x788 pages=1 vmalloc
0xc0053000-0xc0055000 8192 ebt_register_table+0x98/0x39c pages=1 vmalloc

cat /proc/vmallocinfo

0xc0002000-0xc0045000 274432 jffs2_zlib_init+0x24/0xa4 pages=66 vmalloc

0xc0045000-0xc0051000 49152 jffs2_zlib_init+0x40/0xa4 pages=11 vmalloc

0xc0051000-0xc0053000 8192 brcmnand_create_cet+0x244/0x788 pages=1 vmalloc

0xc0053000-0xc0055000 8192 ebt_register_table+0x98/0x39c pages=1 vmalloc

還有：

# cat /proc/vmstat
#cat /proc/meminfo

1 2	# cat /proc/vmstat #cat /proc/meminfo

linux記憶體管理（二）- vmalloc
2024-06-11
Linux記憶體
linux 非連續記憶體區管理 vmalloc
2024-04-26
Linux記憶體
Linux 記憶體管理：記憶體對映
2015-09-24
Linux記憶體
linux記憶體管理
2014-07-24
Linux記憶體
LINUX 記憶體管理
2010-07-06
Linux記憶體
Linux 記憶體管理: Kmalloc
2015-09-22
Linux記憶體
linux的記憶體管理
2007-03-15
Linux記憶體
Linux記憶體管理：Malloc
2015-09-24
Linux記憶體
Linux記憶體管理：DMA
2015-09-25
Linux記憶體
記憶體管理記憶體管理概述
2020-11-03
記憶體
linux記憶體管理（一）實體記憶體的組織和記憶體分配
2024-06-07
Linux記憶體
Linux共享記憶體的管理
2018-06-07
Linux記憶體
Linux中的記憶體管理
2013-12-19
Linux記憶體
linux記憶體管理機制
2006-11-12
Linux記憶體
Linux 記憶體管理: Kmalloc(2)
2015-09-22
Linux記憶體
自動共享記憶體管理自動記憶體管理手工記憶體管理
2017-11-20
記憶體
Linux記憶體洩露案例分析和記憶體管理分享
2024-10-24
Linux記憶體洩露
記憶體管理篇——實體記憶體的管理
2022-02-23
記憶體
【記憶體管理】記憶體佈局
2024-06-10
記憶體
Linux的記憶體分頁管理
2020-03-26
Linux記憶體
Linux 的記憶體分頁管理
2018-08-08
Linux記憶體
Linux 記憶體管理 pt.2
2023-05-05
Linux記憶體
Linux 記憶體管理 pt.3
2023-05-17
Linux記憶體
Linux-記憶體和磁碟管理
2022-02-14
Linux記憶體
Linux 記憶體管理 pt.1
2023-04-27
Linux記憶體
Linux C面試題（記憶體管理）
2014-07-01
Linux面試題記憶體
Linux 記憶體區管理 slab
2024-04-26
Linux記憶體
記憶體管理
2016-12-19
記憶體
記憶體管理兩部曲之實體記憶體管理
2021-05-22
記憶體
Go：記憶體管理與記憶體清理
2020-08-04
Go記憶體
Java的記憶體 -JVM 記憶體管理
2018-08-20
Java記憶體JVM
Aerospike的bin記憶體管理--即列記憶體管理
2017-12-01
ROS記憶體
Linux 管理員手冊(4)--記憶體管理(轉)
2007-08-10
Linux記憶體
Linux記憶體管理複習總結
2017-02-28
Linux記憶體
Linux堆記憶體管理深入分析
2017-02-02
Linux記憶體
淺談Linux記憶體管理機制
2017-02-09
Linux記憶體
Linux 記憶體管理：Kmem_cache_init
2015-09-23
Linux記憶體
linux記憶體管理學習總結
2024-11-04
Linux記憶體

Linux記憶體管理：Vmalloc

相關文章