Computer Architecture
Assembly
Verbs/Operations, Assembly Dialects, AT&T created a terrible assembly syntax
Data
bit: on or off
[A-Z]
: 0x40+offset
[a-z]
: 0x60+offset
digit: 0x30
lower than 0x20: control characters
0x09: tab, 0x0a:newline, 0x07:bell, 0x20: space
0x80:UTF-8, extended char
byte: 8bits
words or half words: 16bits
dword: 32bits
qword:64bits
number overflow
用補碼(除了符號位取反加一)表達負數:
unsigned: 0-1 = 255=0b11111111
signed: 0-1=-1=0b11111111,此時1的反碼為'0b11111110',補碼為'0b11111111'即-1
最小的signed byte就變為-128,'0b10000000'對應0
most significant bits/bytes相當於leftmost bits/bytes相當於high bits/bytes
Registers
general registers: amd64: rax, rcx, rdx, rbx, rsp, rbp, rsi, rdi, r8...r15
stack bottom: EBP/RBP
stack pointer: ESP/RSP
next instruction register: eip(x86), rip(amd64), r15(arm)
其他: registers for kernel only; registers for floating point computation, registers for crunching large data fast
Register Size: 目前大多是8bytes
Partial Register Access: rax的低32bits是eax,eax的低16 bits是ax,ax的高8bits是ah,低8bits是al
move rax, 0x539
32-bit CAVEAT
如果寫了Partial Register的低32位,高32位也會被設定為0
注意:寫入記憶體就沒有這種問題,寫入16位暫存器也沒有這種問題
mov rax, 0xffffffffffffffff
mov eax 0x1
則結果將是0x1而不是0xffffffff00000001
而
mov eax -1
會導致rax為0x00000000ffffffff(4294967295),並不是-1
使用Move With Sign-Extension
mov eax -1
movsx rax eax
則會將eax的符號位賦給rax,這樣rax被迫成為0xffffffffffffffff(-1)
instruction
add rax, rbx
sub ebx, ecx
imul rsi, rdi
inc rdx
dec rdx
neg rax
not rax
and rax, rbx
or rax, rbx
xor rcx, rdx
shl rax, 10
shr rax, 10
sar rax, 10
ror rax, 10
rol rax, 10
除錯: https://github.com/yrp604/rappel
Memory
virtual Memory
相對地址:
0x1000
Program Binary Code
Dynamic Allocated Memory(Managed by libraries)
Dynamic Mapped Memory(required by process)
Library Code
Process Stack
OS Helper Regions
0x7fffffffffff
heap is from lower address, increase
stack is from higher address, decrease $rsp
push rax
push 0xb0bacafe # !!!!!!even on 64-bit x86, you can only push 32-bit immedates!!!!
pop rbx
pop rcx
mov rbx, [rax]
mov [rax], rbx
lea rbx, [rsp+rax*8]
mov DWORD PTX [rax], 0x1337
little endian:
Address: 0x1000 0x1001 0x1002 0x1003
Value: 0x78 0x56 0x34 0x12
big endian
Address: 0x1000 0x1001 0x1002 0x1003
Value: 0x12 0x34 0x56 0x78
大部分目前的系統Memory使用small endian,注意是memory
RIP-relative addressing
specify size
mov DWORD [rax], 0x1337
mov [rax], 0x1337
如果不設定size,就會使用當前architecture或者assembler預設的size
Control Flow
jump: 在二進位制中是skip x bytes: jnz STAY_LEET ;對應eb 04 skip 4 bytes
je, jne, jg, jl, jle, jge, ja, jb, jae, jbe, js, jns, jo, jno, jz, jnz
check condition in flags register: rflags
most flags are updated by cmp(sub, but discard result), test(and, but discard result)
main conditional flags:
carry flag: was the 65 bit 1?
zero flag: was the result 0?
overflow flag: did the result wrap between positive and negative?
signed flag: was the signed bit set?
e.g.:
mov rax, 0
LOOP HEADER:
inc rax
cmp rax, 10
jb LOOP_HEADER
call
caller and callee must agree on argument passing 是已經不用stack了?
Linux x86: push arguments in reverse order, then call, return value in eax
linux amd64: rdi, rsi, rdx, rcx, r8. r9, return value in eax
rbx, rbp, r12, r13, r14, r15 are called-saved( involves saving their original values to the stack at the beginning of a function (prolog) and restoring them before returning (epilog). )
Argument type Registers
Integer/pointer arguments 1-6 RDI, RSI, RDX, RCX, R8, R9
Floating point arguments 1-8 XMM0 - XMM7
Excess arguments Stack
Static chain pointer R10
Argument register overview
From <https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI>
linux arm: r0, r1, r2, r3, return in r0
System Calls
syscall, systemcall number in rax, arguments(按照順序,不是逆序) in rdi, rsi, rdx, r10, r8, r9, return value in rax
constant arguments:建議用C獲取
exit:
mov rdi, 42
mov rax, 60
syscall
Building Programs
# .intel_syntax tells the assembler that we are using Intel syntax
# noprefix tells it that we will not prefix all the register name with %
.intel_syntax noprefix
.global start
_start:
mov rdi, 42
mov rax, 60
syscall
編譯
gcc -nostlib -o quitter quitter.s
disassemble
objdump -M intel -d quitter
only binary code
objcopy --dump-section .text=quitter_binary_code quitter
debugging:
gdb, strace, rappel
https://github.com/zardus/ctf-tools
https://www.google.com/url?q=https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf&sa=D&source=editors&ust=1712252961031799&usg=AOvVaw1A2XOGvKGhhBBYDLvjIHbQ
https://www.felixcloutier.com/x86/
http://ref.x86asm.net/coder64.html
Further Reading
An awesome intro series that covers some of the fundamentals from LiveOverflow.
A `Ike: The Systems Hacking Handbook, an excellent guide to Computer Organization.
A comprehensive assembly tutorial for several architectures (amd64 is the relevant one here).
The course "Architecture 1001: x86-64 Assembly" from OpenSecurityTraining2.
A whole x86_64 assembly book to help you out!
A game to teach you x86 assembly and one to stress test your knowledge!
A flowchart of x86 prefix and escape opcodes.
An unofficial, but extremely detailed and useful x86 reference.
practice
as -o asm.o asm.S
objcopy -O binary --only-section=.text asm.o asm.bin
cat ./asm.bin | /challenge/run
used operation: mov, add, mul, imul, div, shr, shl, and, or, xor, push, pop
level4:
Note: there is an important difference between mul (unsigned multiply) and imul (signed multiply) in terms of which registers are used.
mul不能直接和immediate value用
level5:
Note: div is a special instruction that can divide a 128-bit dividend by a 64-bit divisor, while storing both the quotient and the remainder, using only one register as an operand.
How does this complex div instruction work and operate on a 128-bit dividend (which is twice as large as a register)?
For the instruction: div reg, the following happens:
rax = rdx:rax / reg
rdx = remainder
DIV r/m64 Unsigned divide RDX:RAX by r/m64, with result stored in RAX := Quotient, RDX := Remainder.
level8:
We can use a math trick to optimize the modulo operator (%). Compilers use this trick a lot.
If we have "x % y", and y is a power of 2, such as 2^n, the result will be the lower n bits of x.
In x86 assembly language, several registers have smaller sub-registers, which are accessible for more granular operations. Here's a list of common registers with their corresponding sub-registers:
AX, AH, AL:
AX is the 16-bit register, which can be accessed as a whole.
AH is the high 8 bits (upper byte) of AX.
AL is the low 8 bits (lower byte) of AX.
BX, BH, BL:
Similar to AX, BX is the 16-bit register, BH represents its high 8 bits, and BL represents its low 8 bits.
CX, CH, CL:
Similar to AX and BX, CX is the 16-bit register, CH represents its high 8 bits, and CL represents its low 8 bits.
DX, DH, DL:
Similar to AX, BX, and CX, DX is the 16-bit register, DH represents its high 8 bits, and DL represents its low 8 bits.
SI, DI:
SI and DI are 16-bit index registers typically used for string operations.
BP, SP:
BP and SP are 16-bit base and stack pointer registers, respectively.
EAX, EBX, ECX, EDX:
These are the 32-bit versions of AX, BX, CX, and DX. They can be accessed as a whole or as two 16-bit sub-registers (AX, BX, CX, DX) and four 8-bit sub-registers (AH, AL, BH, BL, CH, CL, DH, DL).
ESI, EDI, EBP, ESP:
These are the 32-bit versions of SI, DI, BP, and SP.
RAX, RBX, RCX, RDX:
These are the 64-bit versions of EAX, EBX, ECX, and EDX. They can be accessed as a whole or as two 32-bit sub-registers (EAX, EBX, ECX, EDX).
RDI, RSI, RBP, RSP:
These are the 64-bit versions of EDI, ESI, EBP, and ESP.
level9:
1. 當使用mov rax, rdi; mov eax, 0
時,忘了32bit CAVEAT,是不可以這樣做的
2. 左右方向一定要搞清,shr是減小,shl是增加
level11:
別忘了給eax置零
level15:
1. 注意mov的兩個運算子大小要相稱
2. `byte ptr [0x1234]`
level17
儘管文件上允許imm32,但是register所指memory要標定sizedword ptr[rdi]
別忘了數一數立即數長度,立即數長度QWord時,只支援mov reg
level21:
寫的時候一直在想棧是由高往低增長,所以一直在用[rsp-8],忘掉了rsp本來就是棧頂了,是最低的地址
雖然文件中有push r/m32,但是push dword ptr[rsp] -4
就不行,但是push qword ptr[rsp]-4
可以
level23:
1. 並不只是要相對跳轉,還要寫跳轉後的邏輯
2. jmp short $+0x53
是跳轉到這條指令(也即jmp處)+53位元組的地方,而題目要求的是從jmp結束處再跳轉53位元組,因此還要加2位元組(jmp)自身的大小
3. mov rax, 0
是7位元組,mov eax, 0
是5位元組,二者並沒有差4位元組
4. 0x53並不是53
level25:
不要自作主張擔心會溢位用64位暫存器計算
level26
jump table
level29:
注意call沒有imm引數,只有r/m64(absolute indirect),call [0x403000]
是跳到[0x403000]所指的地址而不是0x403000;call 0x403000
對應call rel32並不會跳到0x403000,只能先放進暫存器,用'mov rax 0x403000;call rax'
level30:
快速分配stack的方法
most_common_byte:
push rbp
mov rbp, rsp
sub rsp, rsi ; rsi:the size
[rbp-i]: the i-th byte
final:
mov rsp, rbp
pop rbp
ret