CVE-2023-23504: XNU Heap Underwrite in dlil.c

This post describes the second vulnerability that I found in the XNU kernel, (first of which is here). XNU is the Operating System used for a number of Apple products, including Macs, iPhones, iPads, Apple Watches, Apple TVs, and so on.

The vulnerability is a 19-year-old heap underwrite vulnerability in XNU’s dlil.c (which handles network interfaces) caused by an (uint16_t) integer overflow in if.c. This can be triggered by a root user creating 65536 total network interfaces.

Root Cause

When an interface is created in ifnet_attach:dlil.c, if_next_index:if.c is called to create a if_index on the ifnet_t ifp:

    int idx = if_next_index();

    if (idx == -1) {
        ifp->if_index = 0;
        ifnet_lock_done(ifp);
        ifnet_head_done();
        dlil_if_unlock();
        return ENOBUFS;
    }
    ifp->if_index = (uint16_t)idx; // Vulnerability

This index is cast to a uint16_t.

if_next_index creates one chunk of memory that it splits into two: ifnet_addrs and ifindex2ifnet, and the comments for if_next_index hint at the problem:

“ifnet_addrs[] is indexed by (if_index - 1), whereas ifindex2ifnet[] is indexed by ifp->if_index.”

This means that when 65536 network interfaces are created, the last interface has a ifp->if_index of 0, and then ifnet_attach will write the allocated struct ifaddr * ifa out of the bounds of ifnet_addrs:

VERIFY(ifnet_addrs[ifp->if_index - 1] == NULL);
ifnet_addrs[ifp->if_index - 1] = ifa;

My Proposed Fix

One fix for the vulnerability would be to limit the amount of interfaces that can be created to 0xFFFF. This could be done in if_next_index, and would not impact an interface with the same name (e.g. feth0) that is created and destroy repeatably (the only likely scenario for this would be a utun device, which is created when a root or privileged process creates a PF_SYSTEM SYSPROTO_CONTROL socket).

The Real Fix

The real fix here is in if_next_index:

/*
 * Although we are returning an integer,
 * ifnet's if_index is a uint16_t which means
 * that's our upper bound.
 */
if (if_index >= UINT16_MAX) {
  return -1;
}

It seems that we agree on the correct fix (although it’s strange to keep the return value as an int if what you’re returning cannot ever be that large).

Affected Versions

Verified on MacOS 13.0 M1 Mac mini running build 22A380.

Also tested on iOS.

From what I can tell, it seems the vulnerable code was introduced in XNU 517.3.7, Mac OSX 10.3.2, released on December 17th, 2003, making it a 19-year-old bug!

Exploitation Conditions

Creating (and destroying) a network interface normally requires root permissions.

POC

This was a super interesting POC to create.

The simplest POC for this fits in a tweet (NOTE: this might crash your machine):

C=$(sysctl -A | grep ifcount | cut -d':' -f2 | xargs)
for i in `seq 32767`
do
    sudo ifconfig "feth$i" create
    sudo ifconfig "feth$i" destroy
done
T=$((65536 - $C - 32767))
for i in `seq $T`
do
    sudo ifconfig "vlan$i" create
    sudo ifconfig "vlan$i" destroy
done

Learning about the destroy after the create was hard-fought knowledge—for the first ~month of trying to POC this bug I only used create. This triggers some exponential slowdown in the kernel, and so creating enough interfaces took several hours (in my VM it would take >12 hours to trigger). Finally I realized that you could destroy the interface and this would fix the slowdown (I also had learn that the interface info was reused/cached, even if it was deleted, so you couldn’t just create and destroy the same interface type over and over).

However, it’s much faster to trigger the bug in C (by calling the correct ioctl to create and destroy the interfaces.

Here’s the POC that I wrote to trigger this bug, which creates enough interfaces to trigger the integer overflow and the heap underwrite.

If you’re on MacOS 12.6 (last OS that I tested this on), then ~50% of the time your system will crash. This is because there is no memory mapped before ifnet_addrs, and so the write goes to an unmapped page.

Potential Physical Attack

I think it might be possible to trigger this bug through the lightning cable on an iPhone or perhaps USB-C cable on a MacOS machine.

However, Apple (in a great move) now requires you to unlock your device and approve the connection from USB. So, this wouldn’t be possible to do on a locked, pre-first-boot device, however it might be possible to create a malicious device that tricks the user into plugging in and allowing.

I did not pursue this approach (I really don’t have hardware experience), however the idea would be to create a USB device that pretends to be ~65536 NICs. One downside of this approach it that it takes on the order of hours to create all these interfaces (which is why the POC destroys the interface after it’s created).

I did test that this idea could work by using an old iPhone 6s running iOS 12.1, with the checkra1n beta 0.12.4 jailbreak.

On first boot, I plugged in a lightning to ethernet adapter, and it actually created three new interfaces: en3 (the ethernet device), EHC1, and OHC2 (no idea what those are).

So this might be possible, and I would love to know if anyone’s able to do this.

My Failed Exploit Attempt

While creating a POC in MacOS 12.5 I tried a ton to create a POC that could alter the struct ifaddr * ifa that was underwritten, by controlling something that was allocated before ifnet_addrs and then modifying that pointer.

The spoiler alert here is that I failed: I was very close (as I’ll try to layout here) but was stuck on how to flip bits in the pointer without crashing or triggering an infinite loop. Then, MacOS 12.6 dropped which changed the behavior of the kernel’s memory allocator so the POC crashed 50% of the time. I decided I spent enough of my life on this bug (about two months of dedicated effort) so I sent the basic POC to Apple and here we are.

I hope that maybe you can learn something from my failed approach.

Anyway, it seems like this should be easy, use the standard trick of spraying a bunch of Out-Of-Line Mach Messages, then trigger the underwrite, read the messages to see which one was overwritten, then use that to change/alter the pointer.

O, dear reader, it was not so easy.

The first thing to understand is that ifnet_addrs is if_next_index creates it from two chunks of memory, and this memory is doubled every time the limit is hit:

    if (ifnet_addrs == NULL) {
        new_if_indexlim = INITIAL_IF_INDEXLIM;
    } else {
        new_if_indexlim = if_indexlim << 1;
    }

    /* allocate space for the larger arrays */
    n = (2 * new_if_indexlim + 1);
    new_ifnet_addrs = (caddr_t)kalloc_type(caddr_t, n, Z_WAITOK | Z_ZERO);

This means that n gets larger and larger, so we need to allocated 0x8000 interfaces first, and the next one will trigger the allocation of the final location of ifnet_addrs.

For allocations larger than KHEAP_MAX_SIZE, kalloc_type will call into kalloc_large.

#if !defined(__LP64__)
#define KHEAP_MAX_SIZE          8 * 1024
#elif  __x86_64__
#define KHEAP_MAX_SIZE          16 * 1024
#else
#define KHEAP_MAX_SIZE          32 * 1024
#endif

So, we should be able to allocate any object into kalloc_large if it’s over say 0x8000 in size (this way it’s applicable in all the platforms).

Oh if only things were that easy/simple.

Turns out that kalloc_large works by calling into kernel_memory_allocate to allocate a page of memory directly from the VM system. Which means that this is essentially above the kernel heap allocation layer.

kernel_memory_allocate eventually calls vm_map_find_space, which then calls vm_map_locate_space.

vm_map_get_range then gets the range from a global variable called kmem_ranges based on flags that are passed all the way:

    kmem_range_id_t range_id = vmk_flags->vmkf_range_id;
    effective_range = kmem_ranges[range_id];

However, there’s also a check later in vm_map_get_range to see if the size is greater than KMEM_SMALLMAP_THRESHOLD (which is 1MB on 64-bit platforms):

        if (size >= KMEM_SMALLMAP_THRESHOLD) {
            effective_range = kmem_large_ranges[range_id];
        }

These ranges are quite different, as shown from an lldb debug session that I had:

(lldb) print kmem_large_ranges
(kmem_range [4]) $4 = {
  [KMEM_RANGE_ID_NONE]  = (min_address = 0x0000000000000000, max_address = 0x0000000000000000)
  [KMEM_RANGE_ID_PTR_0] = (min_address = 0xfffffff35b14b000, max_address = 0xfffffffee9aa1000)
  [KMEM_RANGE_ID_PTR_1] = (min_address = 0xffffffe625d7b000, max_address = 0xfffffff1b46d1000)
  [KMEM_RANGE_ID_DATA]  = (min_address = 0xffffffa7100a6000, max_address = 0xffffffd54a5fe000)
}

(lldb) print kmem_ranges
(kmem_range [4]) $1 = {
  [KMEM_RANGE_ID_NONE]  = (min_address = 0x0000000000000000, max_address = 0x0000000000000000)
  [KMEM_RANGE_ID_PTR_0] = (min_address = 0xfffffff287c0e000, max_address = 0xffffffffbcfde000)
  [KMEM_RANGE_ID_PTR_1] = (min_address = 0xffffffe55283e000, max_address = 0xfffffff287c0e000)
  [KMEM_RANGE_ID_DATA]  = (min_address = 0xffffffa0756be000, max_address = 0xffffffd54a5fe000)
}

So, this explains why we couldn’t use OOL Mach messages (I tried them twice I think): due to some limit that I can’t find right now we can’t allocate an OOL Mach Message that’s > 1MB.

To make matters worse, we need our victim allocation to end up in KMEM_RANGE_ID_PTR_0 in kmem_large_ranges (which, empirically, is where ifnet_addrs ended up).

(I learned a lot about the importance of keeping notes while trying exploitation. I didn’t keep track of all of these limits, so I wasted lots of time trying different exploitation methods while eventually rediscovering them.)

I then did what any good hacker does: look at every single allocation site in the kernel (using IDA this time on a debug kernel) to see ones that were unbounded.

But now we need to define what our goal here is: We want this victim object allocation to be before the vulnerable object so that we can underwrite the vulnerable object and change the last 8 bytes of that object.

So, it needs:

An allocation that we can control/trigger from userspace.
An allocation that persists (no thank you race conditions, not today).
An allocation that is greater than 1MB (the fun KMEM_SMALLMAP_THRESHOLD).
An allocation that falls into KMEM_RANGE_ID_PTR_0.
An allocation where the object size (i.e. the space that we can use) is a multiple of the page size: We need to be able to read or write to the last 8 bytes of the allocation.

Note that I didn’t start with this list, but only ended up here after following multiple dead ends and false starts.

Finally, I find something promising in kern_descrip.c’s fdalloc:

    newofiles = kheap_alloc(KM_OFILETABL, numfiles * OFILESIZE,
        Z_WAITOK);

The SUPER weird thing here is that OFILESIZE is NINE (9) BYTES! Why why why, such a weird allocation pattern!

And it starts out at a strange initial that’s difficult to tell, so I created this table (note that it’s allocation size not number of objects) to see when it would be page divisible (I like to do this in an org-mode table where I keep my notes):

| Actual Allocation | Actual Allocation | page(0x1000) divisible |
|-------------------+-------------------+------------------------|
|            0x1518 |              5400 |              1.3183594 |
|            0x2a30 |             10800 |              2.6367188 |
|            0x5460 |             21600 |              5.2734375 |
|            0xa8c0 |             43200 |              10.546875 |
|           0x15180 |             86400 |               21.09375 |
|           0x2A300 |            172800 |                42.1875 |
|           0x54600 |            345600 |                 84.375 |
|           0xA8C00 |            691200 |                 168.75 |
|          0x151800 |           1382400 |                  337.5 |
|          0x2A3000 |           2764800 |                    675 |
|          0x546000 |           5529600 |                   1350 |
|          0xA8C000 |          11059200 |                   2700 |

So we need there to be 0x2A3000 / 9 = 0x4B000 numfiles here. But what are those numfiles you ask?

Turns out that we’re looking at the kernel’s storage of a process' fds!

Oh no, can we create 0x4B000 fds in a process?

Yes, if we are root, there are two limits that control this and we can just bump them right up:

sudo ulimit -n unlimited
sudo sysctl -w kern.maxfilesperproc=614400

The cool thing is that we can actually use dup2 to specify a (large) wanted fd (second argument to dup2) and the kernel will allocate all this memory for us!

Through trial, error, and debugging (dtrace ftw) I found an allocation pattern of fds that put things where we want:

Allocate three smaller fd tables first (to fill up first rather than after) using an fd of 153599.
Allocate in a proc 0x2A3000 / 9 = 0x4B000 fds using dup2. This will be the victim proc table using an fd of 307199.
Allocate one smaller fd tables in other procs of 0x151800 / 9 = 0x25800. These are only needed as spacing to take up room (and as much as needed) so that the target allocation will go after the victim. This is needed because of the “realloc” behavior that goes on when we allocate the victim.
Trigger underwrite.

At this point, we can allocate a victim object in the correct region, we can allocate the vulnerable object after, then we can underwrite to write into it.

Success?

Oh no, now what do we control in this newofiles array?

Later on in fdalloc we find:

    newofileflags = (char *) &newofiles[numfiles];
    // ...
    (void) memcpy(newofiles, fdp->fd_ofiles,
        oldnfiles * sizeof(*fdp->fd_ofiles));
    // ...
    (void) memcpy(newofileflags, fdp->fd_ofileflags,
        oldnfiles * sizeof(*fdp->fd_ofileflags));

So now we can see why the allocation size here is 9 bytes: 8 bytes for a pointer (what fdp->fd_ofiles consists of) and 1 bytes for fdp->fd_ofileflags which is the (single byte) flag for the file.

And, to make matters worse, the flags go at the end of newofiles, which is where the underwrite happens (my kingdom for a pointer overlap).

Here are the flags that matter:

#define UF_RESERVED     0x04            /* open pending / in progress */
#define UF_CLOSING      0x08            /* close in progress */
#define UF_RESVWAIT     0x10            /* close in progress */
#define UF_INHERIT      0x20            /* "inherit-on-exec" */

I spent a ton of time trying to find a way to flip bits in the pointer using these flags.

Ultimately I gave up, sent what I had to Apple, and moved on to the next bug (but I did learn a lot in the process).

Then, I saw this awesome blog post by Jack Dates from Ret2 Systems talking about how to corrupt from kalloc_large and kernel_map.

Hope this was enlightening or maybe you can empathize in my plight (it seems that us hackers rarely talk about our failures).

Adam Doupé

Associate Professor, Arizona State University
Director, Center for Cybersecurity and Trusted Foundations