ELF 格式簡述 - eBPF 基礎知識

MarkZhu發表於2023-03-05

image.png

本文摘自我的《Mark’s DevOps 雜碎》中的 《ELF 格式簡述》。如圖片不清,可轉回原處。

為何要研究 ELF 格式?因為想深入點學習 eBPF ,充分利用它的特性,而非只利用它的名聲,ELF 格式是跳不過的。

ELF 格式簡述

程式程式碼被編譯和連結成包含二進位制計算機指令的可執行檔案。而可執行檔案是有格式規範的,在 Linux 中,這個規範叫 Executable and linking format (ELF)。ELF 中包含二進位制計算機指令、靜態資料、元資訊。

  • 靜態資料 - 我們在程式中 hard code 的東西資料,如字串常量等
  • 二進位制計算機指令集合,程式程式碼邏輯生成的計算機指令。程式碼中的每個函式都在編譯時生成一塊指令,而連結器負責把一塊塊指令連續排列到輸出的 ELF 檔案的 .text section(區域) 中。
  • 元資訊,可再細分為兩類:

    • 告訴作業系統,如何載入和動態連結可執行檔案,完成程式記憶體的初始化。
    • 一些非執行期必須,但可以幫助定位問題的資訊。如 .symtab section

檢視

ELF 檔案提供 2 個不同的檢視(視角):

  • 連結檢視(Linking view) - 對應 section header table

    告訴作業系統,動態連結可執行檔案,完成程式記憶體的初始化。同時也為偵錯程式提供一些元資訊。

  • 執行檢視(execution view) - 對應 program header table(segment header table)

    告訴作業系統,如何載入可執行檔案,完成程式記憶體的初始化。一個可執行的 ELF,一定有program header table

從二進位制串看,一個典型的 ELF 檔案長這樣:

image.png

<p align = "center">典型的 ELF 檔案格式例子 (圖源自 [Computer Systems - A Programmer’s Perspective])</p>

其中 名稱類似.xyz 的塊為 section。

有一個更細節的圖:

image.png

Typical ELF file. The linker uses the Section Header Table, and the loader uses the Program Header Table. - from https://www.ics.uci.edu/~aburtsev/238P/hw/hw3-elf/hw3-elf.html

注意:如果section header table被剝離(stripped)(從二進位制檔案中丟失),那並不意味著這些section不存在; 這只是意味著它們不能被section header table引用,並且偵錯程式和反彙編程式可用的資訊更少。

從資料結構定義及其關係上看,可以總結為:

image.png
ELF 格式相關資料結構關係圖

用 Draw.io 開啟

下面大概過一下每個資料結構。

典型的 ELF 檔案示例

典型的 relocatable object file

[Computer Systems A Programmer's Perspective]

image.png

典型的 executable object file

[Computer Systems A Programmer's Perspective]

image.png

檔案結構

ELF 檔案頭

如果我們使用命令 readelf -h 檢視 ELF 檔案,我們可以檢視 ELF 檔案頭。 ELF 檔案頭從 ELF 檔案的偏移量 0 開始,並作為其餘部分的索引。

$ readelf -h ./envoy
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:(ELF 檔案型別)                DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x14dcac0
  Start of program headers:          64 (bytes into file)(指向 header table)
  Start of section headers:          99364400 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         12
  Size of section headers:           64 (bytes)
  Number of section headers:         35
  Section header string table index: 33

檢視 Linux 中的 ELF(5) 手冊頁(man elf)向我們展示了 ELF 檔案頭結構:

//ELF檔案頭部結構: ElfN_Ehdr
##define EI_NIDENT 16

typedef struct {
unsigned char    e_ident[EI_NIDENT];
uint16_t    e_type; //ELF 檔案型別
uint16_t    e_machine;
uint32_t    e_version;
ElfN_Addr    e_entry;
ElfN_Off    e_phoff; //program header table offset -> 指向 header table
ElfN_Off    e_shoff;
uint32_t    e_flags;
uint16_t    e_ehsize;
uint16_t    e_phentsize;
uint16_t    e_phnum;
uint16_t    e_shentsize;
uint16_t    e_shnum;
uint16_t    e_shstrndx;

} ElfN_Ehdr;
ELF 檔案型別
Ref. [Learning Linux Binary Analys]
  • ET_NONE : This is an unknown type. It indicates that the file type is unknown,
    or has not yet been defined.
  • ET_REL : This is a relocatable file. ELF type relocatable means that the file
    is marked as a relocatable piece of code or sometimes called an object file.
    Relocatable object files are generally pieces of Position independent code
    (PIC) that have not yet been linked into an executable. You will often see
    .o files in a compiled code base. These are the files that hold code and data
    suitable for creating an executable file.
  • ET_EXEC : This is an executable file. ELF type executable means that the file
    is marked as an executable file. These types of files are also called programs
    and are the entry point of how a process begins running.
  • ET_DYN : This is a shared object. ELF type dynamic means that the file is
    marked as a dynamically linkable object file, also known as shared libraries.
    These shared libraries are loaded and linked into a program's process image
    at runtime
  • ET_CORE : This is an ELF type core that marks a core file. A core file is a dump
    of a full process image during the time of a program crash or when the
    process has delivered an SIGSEGV signal (segmentation violation). GDB can
    read these files and aid in debugging to determine what caused the program
    to crash.

為何有的可執行檔案是 ET_DYN 而不是 ET_EXEC

https://stackoverflow.com/questions/61567439/why-is-my-simple-main-programs-elf-header-say-its-a-dyn-shared-object-file
Executables that are as compiled as "position independent executables" (with -pie/-fPIE) should be relocated to a random address at runtime. To achieve this, they use the DYN type.
Your version of g++ was configured with --enable-default-pie, so it sets -pie and -fPIE by default. You can disable this, and generate a normal executable, by linking with -no-pie.

ELF program header

ELF program headers are what describe segments within a binary and are necessary for program loading. Segments are understood by the kernel during load time and describe the memory layout of an executable on disk and how it should translate to memory. The program header table can be accessed by referencing the offset found in the initial ELF header member called e_phoff (program header table offset), as shown in the ElfN_Ehdr structure above.

//program header 的結構:

typedef struct {
uint32_t p_type;  (segment type) 如:PT_LOAD、PT_DYNAMIC ……

Elf32_Off p_offset; (segment offset)

Elf32_Addr p_vaddr; (segment virtual address)

Elf32_Addr p_paddr; (segment physical address)

uint32_t p_filesz; (size of segment in the file)

uint32_t p_memsz; (size of segment in memory)

uint32_t p_flags; (segment flags, I.E execute|read|read) //segment 記憶體塊的許可權

uint32_t p_align; (segment alignment in memory)

} Elf32_Phdr;

下面按 segment 型別分別細說:

PT_LOAD

An executable will always have at least one PT_LOAD type segment. This type of
program header is describing a loadable segment, which means that the segment
is going to be loaded or mapped into memory.

For instance, an ELF executable with dynamic linking will generally contain the
following two loadable segments (of type PT_LOAD ):

  • The text segment for program code
  • And the data segment for global variables and dynamic linking information
PT_DYNAMIC – Phdr for the dynamic segment

The dynamic segment is specific to executables that are dynamically linked and
contains information necessary for the dynamic linker. This segment contains
tagged values and pointers, including but not limited to the following:

可執行 ELF 檔案指向 SO 檔案的引用資訊,最少包括以下部分:

  • List of shared libraries that are to be linked at runtime
  • The address/location of the Global offset table (GOT) discussed in the ELF
    Dynamic Linking section
  • Information about relocation entries
PT_NOTE

A segment of type PT_NOTE may contain auxiliary information that is pertinent
to a specific vendor or system.

PT_INTERP

指向 program interpreter 的路徑,如指向一個/lib/linux-ld.so.2字串。

PT_PHDR

This segment contains the location and size of the program header table itself. The
Phdr table contains all of the Phdr's describing the segments of the file (and in the
memory image)

列出所有 segment

We can use the readelf -l <filename> command to view a file's Phdr table:

labile@worknode5:~$ readelf -l ./envoy

Elf file type is DYN (Shared object file)
Entry point 0x14dcac0
There are 12 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000002a0 0x00000000000002a0  R      0x8
  INTERP         0x00000000000002e0 0x00000000000002e0 0x00000000000002e0
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000014dba84 0x00000000014dba84  R      0x1000
  LOAD           0x00000000014dbac0 0x00000000014dcac0 0x00000000014dcac0  
                 0x0000000002756c10 0x0000000002756c10  R E <--執行指令 0x1000 
  LOAD           0x0000000003c32700 0x0000000003c34700 0x0000000003c34700
                 0x000000000023c6d0 0x000000000023c6d0  RW     0x1000
  LOAD           0x0000000003e6edd0 0x0000000003e71dd0 0x0000000003e71dd0
                 0x00000000000336e0 0x000000000024ab70  RW     0x1000
  TLS            0x0000000003c32700 0x0000000003c34700 0x0000000003c34700
                 0x0000000000000088 0x0000000000014560  R      0x40
  DYNAMIC        0x0000000003e37c98 0x0000000003e39c98 0x0000000003e39c98
                 0x0000000000000200 0x0000000000000200  RW     0x8
  GNU_RELRO      0x0000000003c32700 0x0000000003c34700 0x0000000003c34700
                 0x000000000023c6d0 0x000000000023c900  R      0x1
  GNU_EH_FRAME   0x0000000000dc98d0 0x0000000000dc98d0 0x0000000000dc98d0
                 0x0000000000124b34 0x0000000000124b34  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x0
  NOTE           0x00000000000002fc 0x00000000000002fc 0x00000000000002fc
                 0x0000000000000020 0x0000000000000020  R      0x4

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .dynsym .gnu.version .gnu.version_r .gnu.hash .dynstr .rela.dyn .rela.plt .rodata .gcc_except_table .eh_frame_hdr .eh_frame 
   03     .text .init .fini malloc_hook google_malloc .plt 
   04     .tdata .fini_array .init_array .data.rel.ro .dynamic .got .got.plt 
   05     .tm_clone_table .data .bss 
   06     .tdata .tbss 
   07     .dynamic 
   08     .tdata .fini_array .init_array .data.rel.ro .dynamic .got .got.plt 
   09     .eh_frame_hdr 
   10     
   11     .note.ABI-tag 

上面有 12 個  segment,每個 segment 引用了一個或幾個 section

The text segment is READ+EXECUTE and the data segment is READ+WRITE , and both
segments have an alignment of 0x1000 or 4,096 which is a page size on a 32-bit
executable, and this is for alignment during program loading.

ELF section header

Now that we've looked at what program headers are, it is time to look at section headers.

section 不同於 segment

  • Segments 是程式執行所必須的元素
  • 每個 segment 中包含多個 code 或 data 的 section
  • section header table 儲存了指向 section 中的指標字典。其主要作用是 linking 和 debug。程式執行不依賴於這個資訊。Section headers are not necessary for program execution,
    and a program will execute just fine without having a section header table. This is
    because the section header table doesn't describe the program memory layout. That
    is the responsibility of the program header table. section header實際上只是對program header的補充。

The SHT gives an overview on the sections contained in the ELF file. Of particular interest are REL sections (relocations), SYMTAB/DYNSYM (symbol tables), VERSYM/VERDEF/VERNEED sections (symbol versioning information).

列出 section
$ readelf -S  ./lib/x86_64-linux-gnu/libc.so.6

There are 73 section headers, starting at offset 0x1eeb10:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .note.gnu.build-i NOTE             0000000000000270  00000270
       0000000000000024  0000000000000000   A       0     0     4
  [ 2] .note.ABI-tag     NOTE             0000000000000294  00000294
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .gnu.hash         GNU_HASH         00000000000002b8  000002b8
       0000000000003c30  0000000000000000   A       4     0     8
  [ 4] .dynsym           DYNSYM           0000000000003ee8  00003ee8
       000000000000dae8  0000000000000018   A       5     1     8
  [ 5] .dynstr           STRTAB           00000000000119d0  000119d0
       0000000000005ede  0000000000000000   A       0     0     1
       
  [19] .eh_frame_hdr     PROGBITS         00000000001bdd0c  001bdd0c
       00000000000059e4  0000000000000000   A       0     0     4
  [20] .eh_frame         PROGBITS         00000000001c36f0  001c36f0
       000000000001fa70  0000000000000000   A       0     0     8
       
...       
  [31] .dynamic          DYNAMIC          00000000003eab80  001eab80
       00000000000001e0  0000000000000010  WA       5     0     8
The .text section

The .text section is a code section that contains program code instructions. In an
executable program where there are also Phdr's, this section would be within the
range of the text segment. Because it contains program code, it is of section type
SHT_PROGBITS .

The .rodata section

The rodata section contains read-only data such as strings from a line of C code,
such as the following command are stored in this section:
printf("Hello World!\n");
This section is read-only and therefore must exist in a read-only segment of an
executable. So you will find .rodata within the range of the text segment (not the
data segment). Because this section is read-only, it is of type SHT_PROGBITS .

The .plt section

The procedure linkage table (PLT) will be discussed in depth later in this chapter,
but it contains code necessary for the dynamic linker to call functions that are
imported from shared libraries. It resides in the text segment and contains code,
so it is marked as type SHT_PROGBITS .

The .data section

The data section, not to be confused with the data segment, will exist within the data
segment and contain data such as initialized global variables. It contains program
variable data, so it is marked SHT_PROGBITS .

The .bss section

The bss section contains uninitialized global data as part of the data segment and
therefore takes up no space on disk other than 4 bytes, which represents the section
itself. The data is initialized to zero at program load time and the data can be
assigned values during program execution. The bss section is marked SHT_NOBITS
since it contains no actual data.

The .got.plt section

The Global offset table (GOT) section contains the global offset table. This works
together with the PLT to provide access to imported shared library functions and is
modified by the dynamic linker at runtime. This section in particular is often abused
by attackers who gain a pointer-sized write primitive in heap or .bss exploits. We
will discuss this in the ELF Dynamic Linking section of this chapter. This section has
to do with program execution and therefore is marked SHT_PROGBITS .

The .dynsym section

The dynsym section contains dynamic symbol information imported from shared
libraries. It is contained within the text segment and is marked as type SHT_DYNSYM .

The .dynstr section

The dynstr section contains the string table for dynamic symbols that has the name
of each symbol in a series of null terminated strings.

The .rel.* section

Relocation sections contain information about how parts of an ELF object or process
image need to be fixed up or modified at linking or runtime. We will discuss more
about relocations in the ELF Relocations section of this chapter. Relocation sections
are marked as type SHT_REL since they contain relocation data.

The .hash section

The hash section, sometimes called .gnu.hash , contains a hash table for symbol
lookup.

The text segments will be as follows:
• [.text] : This is the program code
• [.rodata] : This is read-only data
• [.hash] : This is the symbol hash table
• [.dynsym ] : This is the shared object symbol data
• [.dynstr ] : This is the shared object symbol name
• [.plt] : This is the procedure linkage table
• [.rel.got] : This is the G.O.T relocation data
The data segments will be as follows:
• [.data] : These are the globally initialized variables
• [.dynamic] : These are the dynamic linking structures and objects
• [.got.plt] : This is the global offset table
• [.bss] : These are the globally uninitialized variables

參考