本文摘自我的《Mark’s DevOps 雜碎》中的 《ELF 格式簡述》。如圖片不清,可轉回原處。
引
為何要研究 ELF 格式?因為想深入點學習 eBPF ,充分利用它的特性,而非只利用它的名聲,ELF 格式是跳不過的。
ELF 格式簡述
程式程式碼被編譯和連結成包含二進位制計算機指令的可執行檔案。而可執行檔案是有格式規範的,在 Linux 中,這個規範叫 Executable and linking format (ELF)。ELF 中包含二進位制計算機指令、靜態資料、元資訊。
- 靜態資料 - 我們在程式中 hard code 的東西資料,如字串常量等
- 二進位制計算機指令集合,程式程式碼邏輯生成的計算機指令。程式碼中的每個函式都在編譯時生成一塊指令,而連結器負責把一塊塊指令連續排列到輸出的 ELF 檔案的
.text section(區域)
中。 元資訊,可再細分為兩類:
- 告訴作業系統,如何載入和動態連結可執行檔案,完成程式記憶體的初始化。
- 一些非執行期必須,但可以幫助定位問題的資訊。如
.symtab section
檢視
ELF 檔案提供 2 個不同的檢視(視角):
連結檢視(Linking view)
- 對應section header table
告訴作業系統,動態連結可執行檔案,完成程式記憶體的初始化。同時也為偵錯程式提供一些元資訊。
執行檢視(execution view)
- 對應program header table(segment header table)
告訴作業系統,如何載入可執行檔案,完成程式記憶體的初始化。一個可執行的 ELF,一定有
program header table
從二進位制串看,一個典型的 ELF 檔案長這樣:
<p align = "center">典型的 ELF 檔案格式例子 (圖源自 [Computer Systems - A Programmer’s Perspective])</p>
其中 名稱類似.xyz
的塊為 section。
有一個更細節的圖:
Typical ELF file. The linker uses the Section Header Table, and the loader uses the Program Header Table. - from https://www.ics.uci.edu/~aburtsev/238P/hw/hw3-elf/hw3-elf.html
注意:如果section header table
被剝離(stripped)(從二進位制檔案中丟失),那並不意味著這些section
不存在; 這只是意味著它們不能被section header table
引用,並且偵錯程式和反彙編程式可用的資訊更少。
從資料結構定義及其關係上看,可以總結為:
ELF 格式相關資料結構關係圖
下面大概過一下每個資料結構。
典型的 ELF 檔案示例
典型的 relocatable object file
[Computer Systems A Programmer's Perspective]
典型的 executable object file
[Computer Systems A Programmer's Perspective]
檔案結構
ELF 檔案頭
如果我們使用命令 readelf -h
檢視 ELF 檔案,我們可以檢視 ELF 檔案頭。 ELF 檔案頭從 ELF 檔案的偏移量 0 開始,並作為其餘部分的索引。
$ readelf -h ./envoy
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type:(ELF 檔案型別) DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x14dcac0
Start of program headers: 64 (bytes into file)(指向 header table)
Start of section headers: 99364400 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 12
Size of section headers: 64 (bytes)
Number of section headers: 35
Section header string table index: 33
檢視 Linux 中的 ELF(5) 手冊頁(man elf
)向我們展示了 ELF 檔案頭結構:
//ELF檔案頭部結構: ElfN_Ehdr
##define EI_NIDENT 16
typedef struct {
unsigned char e_ident[EI_NIDENT];
uint16_t e_type; //ELF 檔案型別
uint16_t e_machine;
uint32_t e_version;
ElfN_Addr e_entry;
ElfN_Off e_phoff; //program header table offset -> 指向 header table
ElfN_Off e_shoff;
uint32_t e_flags;
uint16_t e_ehsize;
uint16_t e_phentsize;
uint16_t e_phnum;
uint16_t e_shentsize;
uint16_t e_shnum;
uint16_t e_shstrndx;
} ElfN_Ehdr;
ELF 檔案型別
Ref. [Learning Linux Binary Analys]
- ET_NONE : This is an unknown type. It indicates that the file type is unknown,
or has not yet been defined. - ET_REL : This is a relocatable file. ELF type relocatable means that the file
is marked as a relocatable piece of code or sometimes called an object file.
Relocatable object files are generally pieces of Position independent code
(PIC) that have not yet been linked into an executable. You will often see
.o files in a compiled code base. These are the files that hold code and data
suitable for creating an executable file. - ET_EXEC : This is an executable file. ELF type executable means that the file
is marked as an executable file. These types of files are also called programs
and are the entry point of how a process begins running. - ET_DYN : This is a shared object. ELF type dynamic means that the file is
marked as a dynamically linkable object file, also known as shared libraries.
These shared libraries are loaded and linked into a program's process image
at runtime - ET_CORE : This is an ELF type core that marks a core file. A core file is a dump
of a full process image during the time of a program crash or when the
process has delivered an SIGSEGV signal (segmentation violation). GDB can
read these files and aid in debugging to determine what caused the program
to crash.
為何有的可執行檔案是
ET_DYN
而不是ET_EXEC
:https://stackoverflow.com/questions/61567439/why-is-my-simple-main-programs-elf-header-say-its-a-dyn-shared-object-file
Executables that are as compiled as "position independent executables" (with-pie
/-fPIE
) should be relocated to a random address at runtime. To achieve this, they use the DYN type.
Your version of g++ was configured with--enable-default-pie
, so it sets-pie
and-fPIE
by default. You can disable this, and generate a normal executable, by linking with-no-pie
.
ELF program header
ELF program headers are what describe segments
within a binary and are necessary for program loading. Segments are understood by the kernel during load time and describe the memory layout of an executable on disk and how it should translate to memory. The program header table can be accessed by referencing the offset found in the initial ELF header member called e_phoff
(program header table offset), as shown in the ElfN_Ehdr
structure above.
//program header 的結構:
typedef struct {
uint32_t p_type; (segment type) 如:PT_LOAD、PT_DYNAMIC ……
Elf32_Off p_offset; (segment offset)
Elf32_Addr p_vaddr; (segment virtual address)
Elf32_Addr p_paddr; (segment physical address)
uint32_t p_filesz; (size of segment in the file)
uint32_t p_memsz; (size of segment in memory)
uint32_t p_flags; (segment flags, I.E execute|read|read) //segment 記憶體塊的許可權
uint32_t p_align; (segment alignment in memory)
} Elf32_Phdr;
下面按 segment 型別分別細說:
PT_LOAD
An executable will always have at least one PT_LOAD
type segment. This type of
program header is describing a loadable segment, which means that the segment
is going to be loaded or mapped into memory.
For instance, an ELF executable with dynamic linking will generally contain the
following two loadable segments (of type PT_LOAD
):
- The text segment for program code
- And the data segment for global variables and dynamic linking information
PT_DYNAMIC – Phdr for the dynamic segment
The dynamic segment is specific to executables that are dynamically linked and
contains information necessary for the dynamic linker. This segment contains
tagged values and pointers, including but not limited to the following:
可執行 ELF 檔案指向 SO 檔案的引用資訊,最少包括以下部分:
- List of shared libraries that are to be linked at runtime
- The address/location of the
Global offset table (GOT)
discussed in the ELF
Dynamic Linking section - Information about relocation entries
PT_NOTE
A segment of type PT_NOTE may contain auxiliary information that is pertinent
to a specific vendor or system.
PT_INTERP
指向 program interpreter
的路徑,如指向一個/lib/linux-ld.so.2
字串。
PT_PHDR
This segment contains the location and size of the program header table itself. The
Phdr table contains all of the Phdr's describing the segments of the file (and in the
memory image)
列出所有 segment
We can use the readelf -l <filename>
command to view a file's Phdr table:
labile@worknode5:~$ readelf -l ./envoy
Elf file type is DYN (Shared object file)
Entry point 0x14dcac0
There are 12 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000002a0 0x00000000000002a0 R 0x8
INTERP 0x00000000000002e0 0x00000000000002e0 0x00000000000002e0
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000014dba84 0x00000000014dba84 R 0x1000
LOAD 0x00000000014dbac0 0x00000000014dcac0 0x00000000014dcac0
0x0000000002756c10 0x0000000002756c10 R E <--執行指令 0x1000
LOAD 0x0000000003c32700 0x0000000003c34700 0x0000000003c34700
0x000000000023c6d0 0x000000000023c6d0 RW 0x1000
LOAD 0x0000000003e6edd0 0x0000000003e71dd0 0x0000000003e71dd0
0x00000000000336e0 0x000000000024ab70 RW 0x1000
TLS 0x0000000003c32700 0x0000000003c34700 0x0000000003c34700
0x0000000000000088 0x0000000000014560 R 0x40
DYNAMIC 0x0000000003e37c98 0x0000000003e39c98 0x0000000003e39c98
0x0000000000000200 0x0000000000000200 RW 0x8
GNU_RELRO 0x0000000003c32700 0x0000000003c34700 0x0000000003c34700
0x000000000023c6d0 0x000000000023c900 R 0x1
GNU_EH_FRAME 0x0000000000dc98d0 0x0000000000dc98d0 0x0000000000dc98d0
0x0000000000124b34 0x0000000000124b34 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x0
NOTE 0x00000000000002fc 0x00000000000002fc 0x00000000000002fc
0x0000000000000020 0x0000000000000020 R 0x4
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .dynsym .gnu.version .gnu.version_r .gnu.hash .dynstr .rela.dyn .rela.plt .rodata .gcc_except_table .eh_frame_hdr .eh_frame
03 .text .init .fini malloc_hook google_malloc .plt
04 .tdata .fini_array .init_array .data.rel.ro .dynamic .got .got.plt
05 .tm_clone_table .data .bss
06 .tdata .tbss
07 .dynamic
08 .tdata .fini_array .init_array .data.rel.ro .dynamic .got .got.plt
09 .eh_frame_hdr
10
11 .note.ABI-tag
上面有 12 個 segment,每個 segment 引用了一個或幾個 section
The text segment is READ+EXECUTE and the data segment is READ+WRITE , and both
segments have an alignment of 0x1000 or 4,096 which is a page size on a 32-bit
executable, and this is for alignment during program loading.
ELF section header
Now that we've looked at what program headers
are, it is time to look at section headers
.
section
不同於 segment
:
Segments
是程式執行所必須的元素- 每個
segment
中包含多個 code 或 data 的section
。 section header table
儲存了指向 section 中的指標字典。其主要作用是 linking 和 debug。程式執行不依賴於這個資訊。Section headers are not necessary for program execution,
and a program will execute just fine without having a section header table. This is
because the section header table doesn't describe the program memory layout. That
is the responsibility of the program header table.section header
實際上只是對program header
的補充。
The SHT gives an overview on the sections contained in the ELF file. Of particular interest are REL
sections (relocations), SYMTAB/DYNSYM
(symbol tables), VERSYM
/VERDEF
/VERNEED
sections (symbol versioning information).
列出 section
$ readelf -S ./lib/x86_64-linux-gnu/libc.so.6
There are 73 section headers, starting at offset 0x1eeb10:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .note.gnu.build-i NOTE 0000000000000270 00000270
0000000000000024 0000000000000000 A 0 0 4
[ 2] .note.ABI-tag NOTE 0000000000000294 00000294
0000000000000020 0000000000000000 A 0 0 4
[ 3] .gnu.hash GNU_HASH 00000000000002b8 000002b8
0000000000003c30 0000000000000000 A 4 0 8
[ 4] .dynsym DYNSYM 0000000000003ee8 00003ee8
000000000000dae8 0000000000000018 A 5 1 8
[ 5] .dynstr STRTAB 00000000000119d0 000119d0
0000000000005ede 0000000000000000 A 0 0 1
[19] .eh_frame_hdr PROGBITS 00000000001bdd0c 001bdd0c
00000000000059e4 0000000000000000 A 0 0 4
[20] .eh_frame PROGBITS 00000000001c36f0 001c36f0
000000000001fa70 0000000000000000 A 0 0 8
...
[31] .dynamic DYNAMIC 00000000003eab80 001eab80
00000000000001e0 0000000000000010 WA 5 0 8
The .text section
The .text section is a code section that contains program code instructions. In an
executable program where there are also Phdr's, this section would be within the
range of the text segment. Because it contains program code, it is of section typeSHT_PROGBITS
.
The .rodata section
The rodata section contains read-only data such as strings from a line of C code,
such as the following command are stored in this section:
printf("Hello World!\n");
This section is read-only and therefore must exist in a read-only segment of an
executable. So you will find .rodata within the range of the text segment (not the
data segment). Because this section is read-only, it is of type SHT_PROGBITS
.
The .plt section
The procedure linkage table (PLT) will be discussed in depth later in this chapter,
but it contains code necessary for the dynamic linker to call functions that are
imported from shared libraries. It resides in the text segment and contains code,
so it is marked as type SHT_PROGBITS
.
The .data section
The data section, not to be confused with the data segment
, will exist within the data
segment and contain data such as initialized global variables. It contains program
variable data, so it is marked SHT_PROGBITS
.
The .bss section
The bss section contains uninitialized global data as part of the data segment and
therefore takes up no space on disk other than 4 bytes, which represents the section
itself. The data is initialized to zero at program load time and the data can be
assigned values during program execution. The bss section is marked SHT_NOBITS
since it contains no actual data.
The .got.plt section
The Global offset table (GOT) section contains the global offset table. This works
together with the PLT to provide access to imported shared library functions and is
modified by the dynamic linker at runtime. This section in particular is often abused
by attackers who gain a pointer-sized write primitive in heap or .bss exploits. We
will discuss this in the ELF Dynamic Linking section of this chapter. This section has
to do with program execution and therefore is marked SHT_PROGBITS .
The .dynsym section
The dynsym section contains dynamic symbol information imported from shared
libraries. It is contained within the text segment and is marked as type SHT_DYNSYM .
The .dynstr section
The dynstr section contains the string table for dynamic symbols that has the name
of each symbol in a series of null terminated strings.
The .rel.* section
Relocation sections contain information about how parts of an ELF object or process
image need to be fixed up or modified at linking or runtime. We will discuss more
about relocations in the ELF Relocations section of this chapter. Relocation sections
are marked as type SHT_REL since they contain relocation data.
The .hash section
The hash section, sometimes called .gnu.hash , contains a hash table for symbol
lookup.
The text segments
will be as follows:
• [.text] : This is the program code
• [.rodata] : This is read-only data
• [.hash] : This is the symbol hash table
• [.dynsym ] : This is the shared object symbol data
• [.dynstr ] : This is the shared object symbol name
• [.plt] : This is the procedure linkage table
• [.rel.got] : This is the G.O.T relocation data
The data segments
will be as follows:
• [.data] : These are the globally initialized variables
• [.dynamic] : These are the dynamic linking structures and objects
• [.got.plt] : This is the global offset table
• [.bss] : These are the globally uninitialized variables
參考
- https://greek0.net/elf.html
- book - [Computer Systems - A Programmer’s Perspective]
- book - [Learning Linux Binary Analys]
- https://www.ics.uci.edu/~aburtsev/238P/hw/hw3-elf/hw3-elf.html