pwn.college Fundementals Assembly Crash Course

雪溯發表於2024-04-07

Computer Architecture

Assembly

Verbs/Operations, Assembly Dialects, AT&T created a terrible assembly syntax

Data

bit: on or off
[A-Z]: 0x40+offset
[a-z]: 0x60+offset
digit: 0x30
lower than 0x20: control characters
0x09: tab, 0x0a:newline, 0x07:bell, 0x20: space

0x80:UTF-8, extended char
byte: 8bits
words or half words: 16bits
dword: 32bits
qword:64bits
number overflow
用補碼(除了符號位取反加一)表達負數:
unsigned: 0-1 = 255=0b11111111
signed: 0-1=-1=0b11111111,此時1的反碼為'0b11111110',補碼為'0b11111111'即-1
最小的signed byte就變為-128,'0b10000000'對應0
most significant bits/bytes相當於leftmost bits/bytes相當於high bits/bytes

Registers

general registers: amd64: rax, rcx, rdx, rbx, rsp, rbp, rsi, rdi, r8...r15
stack bottom: EBP/RBP
stack pointer: ESP/RSP
next instruction register: eip(x86), rip(amd64), r15(arm)
其他: registers for kernel only; registers for floating point computation, registers for crunching large data fast
Register Size: 目前大多是8bytes
Partial Register Access: rax的低32bits是eax,eax的低16 bits是ax,ax的高8bits是ah,低8bits是al

move rax, 0x539

32-bit CAVEAT
如果寫了Partial Register的低32位,高32位也會被設定為0
注意:寫入記憶體就沒有這種問題,寫入16位暫存器也沒有這種問題

mov rax, 0xffffffffffffffff
mov eax 0x1

則結果將是0x1而不是0xffffffff00000001

mov eax -1

會導致rax為0x00000000ffffffff(4294967295),並不是-1
使用Move With Sign-Extension

mov eax -1
movsx rax eax

則會將eax的符號位賦給rax,這樣rax被迫成為0xffffffffffffffff(-1)

instruction

add rax, rbx
sub ebx, ecx
imul rsi, rdi
inc rdx
dec rdx
neg rax
not rax
and rax, rbx
or rax, rbx
xor rcx, rdx
shl rax, 10
shr rax, 10
sar rax, 10
ror rax, 10
rol rax, 10

除錯: https://github.com/yrp604/rappel

Memory

virtual Memory
相對地址:

0x1000
Program Binary Code
Dynamic Allocated Memory(Managed by libraries)
Dynamic Mapped Memory(required by process)
Library Code
Process Stack
OS Helper Regions
0x7fffffffffff

heap is from lower address, increase
stack is from higher address, decrease $rsp

push rax
push 0xb0bacafe # !!!!!!even on 64-bit x86, you can only push 32-bit immedates!!!!
pop rbx
pop rcx

mov rbx, [rax]
mov [rax], rbx
lea rbx, [rsp+rax*8]
mov DWORD PTX [rax], 0x1337

little endian:
Address: 0x1000 0x1001 0x1002 0x1003
Value: 0x78 0x56 0x34 0x12

big endian
Address: 0x1000 0x1001 0x1002 0x1003
Value: 0x12 0x34 0x56 0x78

大部分目前的系統Memory使用small endian,注意是memory
RIP-relative addressing

specify size

mov DWORD [rax], 0x1337
mov [rax], 0x1337

如果不設定size,就會使用當前architecture或者assembler預設的size

Control Flow

jump: 在二進位制中是skip x bytes: jnz STAY_LEET ;對應eb 04 skip 4 bytes
je, jne, jg, jl, jle, jge, ja, jb, jae, jbe, js, jns, jo, jno, jz, jnz
check condition in flags register: rflags
most flags are updated by cmp(sub, but discard result), test(and, but discard result)

main conditional flags:
carry flag: was the 65 bit 1?
zero flag: was the result 0?
overflow flag: did the result wrap between positive and negative?
signed flag: was the signed bit set?

e.g.:

mov rax, 0
LOOP HEADER:
inc rax
cmp rax, 10
jb LOOP_HEADER

call
caller and callee must agree on argument passing 是已經不用stack了?
Linux x86: push arguments in reverse order, then call, return value in eax
linux amd64: rdi, rsi, rdx, rcx, r8. r9, return value in eax
rbx, rbp, r12, r13, r14, r15 are called-saved( involves saving their original values to the stack at the beginning of a function (prolog) and restoring them before returning (epilog). )

Argument type	Registers
Integer/pointer arguments 1-6	RDI, RSI, RDX, RCX, R8, R9
Floating point arguments 1-8	XMM0 - XMM7
Excess arguments	Stack
Static chain pointer	R10
Argument register overview

From <https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI> 

linux arm: r0, r1, r2, r3, return in r0

System Calls

syscall, systemcall number in rax, arguments(按照順序,不是逆序) in rdi, rsi, rdx, r10, r8, r9, return value in rax

constant arguments:建議用C獲取
exit:

mov rdi, 42
mov rax, 60
syscall

Building Programs

# .intel_syntax tells the assembler that we are using Intel syntax
# noprefix tells it that we will not prefix all the register name with %
.intel_syntax noprefix 
.global start
_start:
mov rdi, 42
mov rax, 60
syscall

編譯

gcc -nostlib -o quitter quitter.s

disassemble

objdump -M intel -d quitter

only binary code

objcopy --dump-section  .text=quitter_binary_code quitter

debugging:
gdb, strace, rappel
https://github.com/zardus/ctf-tools
https://www.google.com/url?q=https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf&sa=D&source=editors&ust=1712252961031799&usg=AOvVaw1A2XOGvKGhhBBYDLvjIHbQ
https://www.felixcloutier.com/x86/
http://ref.x86asm.net/coder64.html

Further Reading

An awesome intro series that covers some of the fundamentals from LiveOverflow.
A `Ike: The Systems Hacking Handbook, an excellent guide to Computer Organization.
A comprehensive assembly tutorial for several architectures (amd64 is the relevant one here).
The course "Architecture 1001: x86-64 Assembly" from OpenSecurityTraining2.
A whole x86_64 assembly book to help you out!
A game to teach you x86 assembly and one to stress test your knowledge!
A flowchart of x86 prefix and escape opcodes.
An unofficial, but extremely detailed and useful x86 reference.

practice

as -o asm.o asm.S
objcopy -O binary --only-section=.text asm.o asm.bin
cat ./asm.bin | /challenge/run

used operation: mov, add, mul, imul, div, shr, shl, and, or, xor, push, pop
level4:
Note: there is an important difference between mul (unsigned multiply) and imul (signed multiply) in terms of which registers are used.
mul不能直接和immediate value用

level5:
Note: div is a special instruction that can divide a 128-bit dividend by a 64-bit divisor, while storing both the quotient and the remainder, using only one register as an operand.

How does this complex div instruction work and operate on a 128-bit dividend (which is twice as large as a register)?

For the instruction: div reg, the following happens:
rax = rdx:rax / reg
rdx = remainder

DIV r/m64 Unsigned divide RDX:RAX by r/m64, with result stored in RAX := Quotient, RDX := Remainder.

level8:
We can use a math trick to optimize the modulo operator (%). Compilers use this trick a lot.

If we have "x % y", and y is a power of 2, such as 2^n, the result will be the lower n bits of x.

In x86 assembly language, several registers have smaller sub-registers, which are accessible for more granular operations. Here's a list of common registers with their corresponding sub-registers:

AX, AH, AL:

AX is the 16-bit register, which can be accessed as a whole.
AH is the high 8 bits (upper byte) of AX.
AL is the low 8 bits (lower byte) of AX.
BX, BH, BL:

Similar to AX, BX is the 16-bit register, BH represents its high 8 bits, and BL represents its low 8 bits.
CX, CH, CL:

Similar to AX and BX, CX is the 16-bit register, CH represents its high 8 bits, and CL represents its low 8 bits.
DX, DH, DL:

Similar to AX, BX, and CX, DX is the 16-bit register, DH represents its high 8 bits, and DL represents its low 8 bits.
SI, DI:

SI and DI are 16-bit index registers typically used for string operations.
BP, SP:

BP and SP are 16-bit base and stack pointer registers, respectively.
EAX, EBX, ECX, EDX:

These are the 32-bit versions of AX, BX, CX, and DX. They can be accessed as a whole or as two 16-bit sub-registers (AX, BX, CX, DX) and four 8-bit sub-registers (AH, AL, BH, BL, CH, CL, DH, DL).
ESI, EDI, EBP, ESP:

These are the 32-bit versions of SI, DI, BP, and SP.
RAX, RBX, RCX, RDX:

These are the 64-bit versions of EAX, EBX, ECX, and EDX. They can be accessed as a whole or as two 32-bit sub-registers (EAX, EBX, ECX, EDX).
RDI, RSI, RBP, RSP:

These are the 64-bit versions of EDI, ESI, EBP, and ESP.

level9:
1. 當使用mov rax, rdi; mov eax, 0時,忘了32bit CAVEAT,是不可以這樣做的
2. 左右方向一定要搞清,shr是減小,shl是增加

level11:
別忘了給eax置零

level15:

1. 注意mov的兩個運算子大小要相稱
2. `byte ptr [0x1234]`

level17
儘管文件上允許imm32,但是register所指memory要標定sizedword ptr[rdi]
別忘了數一數立即數長度,立即數長度QWord時,只支援mov reg

level21:
寫的時候一直在想棧是由高往低增長,所以一直在用[rsp-8],忘掉了rsp本來就是棧頂了,是最低的地址
雖然文件中有push r/m32,但是push dword ptr[rsp] -4就不行,但是push qword ptr[rsp]-4可以

level23:
1. 並不只是要相對跳轉,還要寫跳轉後的邏輯
2. jmp short $+0x53是跳轉到這條指令(也即jmp處)+53位元組的地方,而題目要求的是從jmp結束處再跳轉53位元組,因此還要加2位元組(jmp)自身的大小
3. mov rax, 0是7位元組,mov eax, 0是5位元組,二者並沒有差4位元組
4. 0x53並不是53

level25:
不要自作主張擔心會溢位用64位暫存器計算

level26
jump table

level29:
注意call沒有imm引數,只有r/m64(absolute indirect),call [0x403000]是跳到[0x403000]所指的地址而不是0x403000;call 0x403000對應call rel32並不會跳到0x403000,只能先放進暫存器,用'mov rax 0x403000;call rax'

level30:
快速分配stack的方法

most_common_byte:
	push rbp
	mov rbp, rsp
	sub rsp, rsi ; rsi:the size
	[rbp-i]: the i-th byte
final:
	mov rsp, rbp
	pop rbp
	ret

相關文章