用php讀取elf結構

xing393939發表於2020-12-18

前提知識

  1. UNIX系統的可執行檔案都採用ELF格式,型別分為目標檔案、可執行檔案和共享庫
  2. ELF格式探析之三:sections
  3. The ELF file format
  4. 本例基於64位的小端序linux機器
  5. php的unpack函式第一個引數的說明
    • v:格式化為unsigned short(小端序)
    • V:格式化為unsigned long(小端序)
    • P:格式化為unsigned long long(小端序)

生成目標檔案和可執行檔案

#include <stdio.h>
void say_hello(char *who) {
    printf("hello, %s!\n", who);
}
char *my_name = "wb";
int main() {
    say_hello(my_name);
    return 0;
}
// 生成目標檔案hello.o:gcc -c hello.c
// 生成可執行檔案hello:gcc -no-pie -o hello hello.c

目標檔案elf的結構

  1. ELF header,位於檔案的0~64位元組,儲存檔案的描述資訊,Section header table的起始位置
  2. N個Section,Section的種類有:
    1. SHT_STRTAB有3種:.shstrtab表示section name;.strtab表示symbol table條目的符號名稱;.dynstr表示dynamic symbol table條目的符號名稱
    2. SHT_SYMTAB只有.symtab,標識所有在elf檔案內的符號
    3. SHT_DYNSYM只有.dynsym,標識所有imported或者exported的符號
  3. Section header table,目標檔案需要,每個條目64位元組,對應一個Section的資訊。沒有strip過的可執行檔案都含有此資訊
  4. Program header table,可執行檔案需要
  5. 舉例分析目標檔案hello.o:
    1. ELF header佔用64位元組
    2. N個Section佔用6488-64-1472=4952位元組
    3. Section header table佔用23*64=1472位元組

ELF header、Section header、Program header、Symbols

// ELF header
typedef struct {
  unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
  Elf64_Half e_type;      /* Object file type */
  Elf64_Half e_machine;   /* Architecture */
  Elf64_Word e_version;   /* Object file version */
  Elf64_Addr e_entry;     /* Entry point virtual address */
  Elf64_Off e_phoff;      /* Program header table file offset */
  Elf64_Off e_shoff;      /* Section header table file offset */
  Elf64_Word e_flags;     /* Processor-specific flags */
  Elf64_Half e_ehsize;    /* ELF header size in bytes */
  Elf64_Half e_phentsize; /* Program header table entry size */
  Elf64_Half e_phnum;     /* Program header table entry count */
  Elf64_Half e_shentsize; /* Section header table entry size */
  Elf64_Half e_shnum;     /* Section header table entry count */
  Elf64_Half e_shstrndx;  /* Section header string table index */
} Elf64_Ehdr;

// Section header table
typedef struct {
  Elf64_Word sh_name;       /* Section name (string tbl index) */
  Elf64_Word sh_type;       /* Section type */
  Elf64_Xword sh_flags;     /* Section flags */
  Elf64_Addr sh_addr;       /* Section virtual addr at execution */
  Elf64_Off sh_offset;      /* Section file offset */
  Elf64_Xword sh_size;      /* Section size in bytes */
  Elf64_Word sh_link;       /* Link to another section */
  Elf64_Word sh_info;       /* Additional section information */
  Elf64_Xword sh_addralign; /* Section alignment */
  Elf64_Xword sh_entsize;   /* Entry size if section holds table */
} Elf64_Shdr;

// Program header table
typedef struct {
  Elf64_Word p_type;    /* Segment type */
  Elf64_Word p_flags;   /* Segment flags */
  Elf64_Off p_offset;   /* Segment file offset */
  Elf64_Addr p_vaddr;   /* Segment virtual address */
  Elf64_Addr p_paddr;   /* Segment physical address */
  Elf64_Xword p_filesz; /* Segment size in file */
  Elf64_Xword p_memsz;  /* Segment size in memory */
  Elf64_Xword p_align;  /* Segment alignment */
} Elf64_Phdr;

// Symbol table
typedef struct {
  Elf64_Word st_name;     /* Symbol name (string table index) */
  unsigned char st_info;  /* Symbol type and binding */
  unsigned char st_other; /* Symbol visibility */
  Elf64_Section st_shndx; /* Section header table index, except ABS/UNDEF/COMMON */
  Elf64_Addr st_value;    /* Symbol address(目標檔案是elf檔案的偏移位置;可執行檔案是absolute run-time address) */
  Elf64_Xword st_size;    /* Symbol size(單位是位元組) */
} Elf64_Sym;

$ readelf -h hello.o
ELF Header:
  Class:                             ELF64
  Data:                              2's complement, little endian
  OS/ABI:                            UNIX - System V
  Type:                              REL (Relocatable file)
  Machine:                           Advanced Micro Devices X86-64
  Start of program headers:          0 (bytes into file)
  Start of section headers:          5016 (bytes into file) //Section header table的起始位置
  Size of this header:               64 (bytes) //ELF header的佔用大小
  Size of program headers:           0 (bytes)  //hello.o沒有program header table
  Number of program headers:         0          //hello.o沒有program header table
  Size of section headers:           64 (bytes) //Section header table每個條目佔用大小
  Number of section headers:         23         //Section header table條目個數
  Section header string table index: 22         //.shstrtab Section位於Section header table第22個條目

$ readelf -S -W hello.o
Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 000041 00  AX  0   0  1
  [ 2] .rela.text        RELA            0000000000000000 0002c8 000060 18   I 12   1  8
  [ 3] .data             PROGBITS        0000000000000000 000081 000000 00  WA  0   0  1
  [ 4] .bss              NOBITS          0000000000000000 000081 000000 00  WA  0   0  1
  [ 5] .rodata           PROGBITS        0000000000000000 000081 00000f 00   A  0   0  1
  [ 6] .data.rel.local   PROGBITS        0000000000000000 000090 000008 00  WA  0   0  8
  [ 7] .rela.data.rel.local RELA         0000000000000000 000328 000018 18   I 12   6  8
  [ 8] .comment          PROGBITS        0000000000000000 000098 00002a 01  MS  0   0  1
  [ 9] .note.GNU-stack   PROGBITS        0000000000000000 0000c2 000000 00      0   0  1
  [10] .eh_frame         PROGBITS        0000000000000000 0000c8 000058 00   A  0   0  8
  [11] .rela.eh_frame    RELA            0000000000000000 000340 000030 18   I 12  10  8
  [12] .symtab           SYMTAB          0000000000000000 000120 000168 18     13  10  8
  [13] .strtab           STRTAB          0000000000000000 000288 00003d 00      0   0  1
  [14] .shstrtab         STRTAB          0000000000000000 000370 000076 00      0   0  1

$ readelf -l -W hello 或者 objdump -p hello
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000000040 0x0000000000000040 0x0001f8 0x0001f8 R   0x8
  INTERP         0x000238 0x0000000000000238 0x0000000000000238 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x0008a0 0x0008a0 R E 0x200000
  LOAD           0x000db8 0x0000000000200db8 0x0000000000200db8 0x000260 0x000268 RW  0x200000
  DYNAMIC        0x000dc8 0x0000000000200dc8 0x0000000000200dc8 0x0001f0 0x0001f0 RW  0x8
  NOTE           0x000254 0x0000000000000254 0x0000000000000254 0x000044 0x000044 R   0x4
  GNU_EH_FRAME   0x000734 0x0000000000000734 0x0000000000000734 0x000044 0x000044 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x000db8 0x0000000000200db8 0x0000000000200db8 0x000248 0x000248 R   0x1

$ readelf --symbols -W hello.o
Symbol table '.symtab' contains 15 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6 
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    9 
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT   10 
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT    8 
    10: 0000000000000000    39 FUNC    GLOBAL DEFAULT    1 say_hello
    11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
    13: 0000000000000000     8 OBJECT  GLOBAL DEFAULT    6 my_name
    14: 0000000000000027    26 FUNC    GLOBAL DEFAULT    1 main

用php讀取elf結構

<?php
$fp = fopen("hello.o", "rb");

fseek($fp, 32, SEEK_SET);
$ph_off = uint($fp, 8);
print_ln("program header offset in file: " . $ph_off);
$sh_off = uint($fp, 8);
print_ln("section header offset in file: " . $sh_off);

fseek($fp, 6, SEEK_CUR);
$ph_ent_size = uint($fp, 2);
print_ln("program header entry size: " . $ph_ent_size);
$ph_num = uint($fp, 2);
print_ln("program header number: " . $ph_num);
$sh_ent_size = uint($fp, 2);
print_ln("section header entry size: " . $sh_ent_size);
$sh_num = uint($fp, 2);
print_ln("section header number: " . $sh_num);
$sh_strtab_index = uint($fp, 2);
print_ln("section header string table index: " . $sh_strtab_index);

// 讀.shstrtab Section的offset和size
fseek($fp, $sh_off + $sh_strtab_index * $sh_ent_size, SEEK_SET);
fseek($fp, 24, SEEK_CUR); //sh_name(4) + sh_type(4) + sh_flags(8) + sh_addr(8) = 24
$str_table_off = uint($fp, 8);
print_ln("section name string table offset: " . $str_table_off);
$str_table_size = uint($fp, 8);
print_ln("section name string table size: " . $str_table_size);

// 讀取.shstrtab
fseek($fp, $str_table_off, SEEK_SET);
$shstrtab = fread($fp, $str_table_size);

// 讀取所有Section條目資訊
$sh_table = [];
for ($i = 0; $i < $sh_num; $i++) {
    fseek($fp, $sh_off + $i * $sh_ent_size, SEEK_SET);
    $sh_name = uint($fp, 4);
    fseek($fp, 20, SEEK_CUR); //sh_type(4) + sh_flags(8) + sh_addr(8) = 20
    $sh_offset = uint($fp, 8);
    $sh_size = uint($fp, 8);

    $name = get_string_name($shstrtab, $sh_name);
    $sh_table[$name] = [$sh_offset, $sh_size];
    printf("section: %2s name: %-24s offset: %12s size: %12s\n", $i, $name, $sh_offset, $sh_size);
}

// 讀取所有program條目資訊
for ($i = 0; $i < $ph_num; $i++) {
    fseek($fp, $ph_off + $i * $ph_ent_size, SEEK_SET);
    fseek($fp, 16, SEEK_CUR);
    $vAddr = uint($fp, 8);
    $pAddr = uint($fp, 8);
    $fileSize = uint($fp, 8);
    $memSize = uint($fp, 8);
    $align = uint($fp, 8);

    printf("program: %2s vAddr: %12s pAddr: %12s fileSize: %12s memSize: %12s align: %12s\n",
        $i, $vAddr, $pAddr, $fileSize, $memSize, $align);
}

// 讀取.strtab
fseek($fp, $sh_table['.strtab'][0], SEEK_SET);
$strtab = fread($fp, $sh_table['.strtab'][1]);

// 讀取.symtab所有條目
for ($i = 0; $i < $sh_table['.symtab'][1] / 24; $i++) {
    fseek($fp, $sh_table['.symtab'][0] + $i * 24, SEEK_SET);
    $sh_name = uint($fp, 4);
    fseek($fp, 2, SEEK_CUR);
    $ndx = uint($fp, 2);
    $value = uint($fp, 8);
    $size = uint($fp, 8);

    $ndxArr = [0 => 'UND', 65521 => 'ABS', 65522 => 'COM'];
    $ndx = isset($ndxArr[$ndx]) ? $ndxArr[$ndx] : $ndx;
    $name = get_string_name($strtab, $sh_name);
    printf("symtab: %2s value: %12s size: %12s ndx: %3s name: %-24s\n", $i, $value, $size, $ndx, $name);
}

// 讀取.dynstr
$dynstr = "";
if (isset($sh_table['.dynstr'])) {
    fseek($fp, $sh_table['.dynstr'][0], SEEK_SET);
    $dynstr = fread($fp, $sh_table['.dynstr'][1]);
}

// 讀取.dynsym所有條目
if (isset($sh_table['.dynsym'])) for ($i = 0; $i < $sh_table['.dynsym'][1] / 24; $i++) {
    fseek($fp, $sh_table['.dynsym'][0] + $i * 24, SEEK_SET);
    $sh_name = uint($fp, 4);
    fseek($fp, 2, SEEK_CUR);
    $ndx = uint($fp, 2);
    $value = uint($fp, 8);
    $size = uint($fp, 8);

    $ndxArr = [0 => 'UND', 65521 => 'ABS', 65522 => 'COM'];
    $ndx = isset($ndxArr[$ndx]) ? $ndxArr[$ndx] : $ndx;
    $name = get_string_name($dynstr, $sh_name);
    printf("dynsym: %2s value: %12s size: %12s ndx: %3s name: %-24s\n", $i, $value, $size, $ndx, $name);
}

function uint($fp, $size)
{
    $sizeFlags = [2 => 'v', 4 => 'V', 8 => 'P'];
    $value = fread($fp, $size);
    $value = unpack($sizeFlags[$size], $value);
    return $value[1];
}

function get_string_name($str, $start)
{
    $name = substr($str, $start);
    return strstr($name, "\x00", true);
}

function print_ln($msg)
{
    print_r($msg . "\n");
}
本作品採用《CC 協議》,轉載必須註明作者和本文連結

相關文章