Retme的未来道具研究所

世界線の収束には、逆らえない

这是两个多月前的一篇笔记,一直没有贴出来。而现在霓虹小兄弟已经把逆向出的代码扔出来很久了。但是写这篇笔记的时候除了towelroot V1以外啥也没有,所以~~当时其实我很快就掌握了利用的细节,主要是一开始我就没想逆向towelroot,直接靠trace定位了利用方法。


转载请注明 http://retme.net/index.php/2014/09/19/cve-2014-3153.html

膜拜geohot~以下是当时的笔记:


一,首先看补丁

https://github.com/torvalds/linux/commit/e9c243a5a6de0be8e584c604d353412584b592f8

    if (requeue_pi) {
       /*
 +      * Requeue PI only works on two distinct uaddrs. This
 +      * check is only valid for private futexes. See below.
 +      */
 +     if (uaddr1 == uaddr2)
 +         return -EINVAL;
 +
 +     /*

补丁要求两个 futex地址不能相同。如果相同会发生什么呢?



二,相关数据结构

实际上每个 futex进入内核中会计算一个 key( get_futex_key)并且被插入哈希表futext_queues, futext_queues的结构如下:


static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];

static struct futex_hash_bucket *hash_futex(union futex_key *key)
{
    u32 hash = jhash2((u32*)&key->both.word,
             (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
             key->both.offset);
    return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)];
}

futex_hash_bucket是哈希表中的一个节点,结构如下

struct futex_hash_bucket {
    spinlock_t lock;
    struct plist_head chain;
};

其内部也是一个自旋锁,和一个队列。 chain 是一个优先级队列,等待线程的优先级越高,该线程在队列中越靠前。



plist_head链表中的成员是futex_q,代表了一个 futex的内核对象



/**
 * struct futex_q - The hashed futex queue entry, one per waiting task
 * @list:     priority-sorted list of tasks waiting on this futex
 * @task:     the task waiting on the futex
 * @lock_ptr:     the hash bucket lock
 * @key:      the key the futex is hashed on
 * @pi_state:     optional priority inheritance state
 * @rt_waiter:       rt_waiter storage for use with requeue_pi
 * @requeue_pi_key:  the requeue_pi target futex key
 * @bitset:       bitset for the optional bitmasked wakeup
 *
 * We use this hashed waitqueue, instead of a normal wait_queue_t, so
 * we can wake only the relevant ones (hashed queues may be shared).
 *
 * A futex_q has a woken state, just like tasks have TASK_RUNNING.
 * It is considered woken when plist_node_empty(&q->list) || q->lock_ptr == 0.
 * The order of wakeup is always to make the first condition true, then
 * the second.
 *
 * PI futexes are typically woken before they are removed from the hash list via
 * the rt_mutex code. See unqueue_me_pi().
 */
struct futex_q {
    struct plist_node list;

    struct task_struct *task;
    spinlock_t *lock_ptr;
    union futex_key key;
    struct futex_pi_state *pi_state;
    struct rt_mutex_waiter *rt_waiter;
    union futex_key *requeue_pi_key;
    u32 bitset;
};

看到了里面与 PI有关的东西,现在还不明白 ,一会儿通过几个函数了解一下



现在只要知道 futex 有 PI 和 non-PI之分, PI futex的 futex_q结构会有额外的几个成员, futex-> pi_state->pi_mutex会是一个rt_mutex ,而 rt_mutex_waiter是等待他的一个结构,通常分配在等待线程的栈上


三 函数执行流程
1. futex_lock_pi

实际上会将一个栈上的 rt_mutex_waiter插入到链表futex_q.pi_state->pi_mutex 中,这是一个rt_mutex的结构

调用流程: futex_lock_pi->rt_mutex_timed_lock-> rt_mutex_timed_fastlock->rt_mutex_slowlock->task_blocks_on_rt_mutex

debug_rt_mutex_init_waiter(&waiter);  rt_waiter 是rt_mutex_slowlock 在栈上的临时分配的结构

随后futex_lock_pi->rt_mutex_timed_lock-> rt_mutex_timed_fastlock->rt_mutex_slowlock->__rt_mutex_slowlock

将进入无限等待,除非被唤醒


static int __sched
__rt_mutex_slowlock(struct rt_mutex *lock, int state,
           struct hrtimer_sleeper *timeout,
           struct rt_mutex_waiter *waiter)
{
    int ret = 0;

    for (;;) {
       /* Try to acquire the lock: */
       if (try_to_take_rt_mutex(lock, current, waiter))
           break;

       /*
        * TASK_INTERRUPTIBLE checks for signals and
        * timeout. Ignored otherwise.
        */
       if (unlikely(state == TASK_INTERRUPTIBLE)) {
           /* Signal pending? */
           if (signal_pending(current))
              ret = -EINTR;
           if (timeout && !timeout->task)
              ret = -ETIMEDOUT;
           if (ret)
              break;
       }

2.futex_wait_requeue_pi



    /*
     * The waiter is allocated on our stack, manipulated by the requeue
     * code while we sleep on uaddr.
     */
    debug_rt_mutex_init_waiter(&rt_waiter);// 临时分配一个rt_waiter,与 futex_lock_pi类似
    rt_waiter.task = NULL;

    ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, VERIFY_WRITE);
    if (unlikely(ret != 0))
       goto out;

    q.bitset = bitset;
    q.rt_waiter = &rt_waiter;   //for use with requeue_pi
    q.requeue_pi_key = &key2;  //requeue pi target key

    if(is_my_process){
       printk("[%d] futex_wait_requeue_pi:Prepare to wait on uaddr.\n",
           task_pid_vnr(current_task));

       futex_dump_futex_q(&q);
    }
    /*
     * Prepare to wait on uaddr. On success, increments q.key (key1) ref
     * count.
     *//等待从addr1 被唤醒
    ret = futex_wait_setup(uaddr, val, flags, &q, &hb);
    if (ret)
       goto out_key2;

    if(is_my_process){
       printk("[%d] futex_wait_requeue_pi:before Queue the futex_q.\n",
           task_pid_vnr(current_task));

       futex_dump_futex_q(&q);
    }

    /* Queue the futex_q, drop the hb lock, wait for wakeup. */
    futex_wait_queue_me(hb, &q, to);   //将本线程插入futex2的队列中,这里是将 rt_waiter插入去等待

3  futex_requeue_pi(futex1 ,futex2 )会将futex1上面的 waiter唤醒并插入 futex2

如果这两个值相等,那么唤醒 futex1上的 waiter会使得 futex_wait_queue_me线程被唤醒,但是这个值又会被插入到 futex2中



由于futex_wait_requeue_pi的线程被唤醒并退出,那么 futex2的 rt_mutex队列上面便挂了一个已经被释放掉的 rt_mutex_waiter,这就是内核栈空间的use after free


四。如何利用?

futex_wait_requeue_pi所在的线程内核栈出现的 UAF问题,该线程利用 sendmmsg可以对内核堆栈进行控制

我们选择控制 rt_mutex_waiter结构中,这个结构有两个链表, UAF之后链表将被我们控制

struct rt_mutex_waiter {
    struct plist_node list_entry;
    struct plist_node pi_list_entry;
    struct task_struct   *task;
    struct rt_mutex      *lock;
}

于是我们调用 futex_lock_pi会走到task_blocks_on_rt_mutex 触发一个plist_add操作,造成内核栈信息泄漏,并且给了我们一次机会进行任意地址写



我们选择写内核栈上的 thread_info->addr_limit,一个栈上面的地址将会被写入到 addr_limit,导致我们有了从用户态写内核态的方法



这相当于造出了 CVE-2013-6282,读写任意地址



注意:该方法不能退出进程,否则释放被利用的线程将让内核崩溃



该漏洞的利用技术是Project Zero最近的大作[1],遗憾是有些局限性,我也就没有搭环境调试了,仅学习下思路

可能有错误和理解不到位的地方。本文只是笔记,推荐阅读原文[1]


首先看下源码[2]


      newp = (struct known_trans *) malloc (sizeof (struct known_trans)
                        + (__gconv_max_path_elem_len
                           + name_len + 3)
                        + name_len);
      if (newp != NULL)
    {
      char *cp;

      /* Clear the struct.  */
      memset (newp, '\0', sizeof (struct known_trans));

      /* Store a copy of the module name.  */
      newp->info.name = cp = (char *) (newp + 1);
      cp = __mempcpy (cp, trans->name, name_len);

      newp->fname = cp;

      /* Search in all the directories.  */
      for (runp = __gconv_path_elem; runp->name != NULL; ++runp)
        {
          cp = __mempcpy (__stpcpy ((char *) newp->fname, runp->name),
                  trans->name, name_len);
          if (need_so)
                //nul byte overflow
        memcpy (cp, ".so", sizeof (".so"));

cp是堆上的内存,如此拷贝将可能导致在cp尾部覆盖四字节0x6f732e00 即为".so"


这样做将导致内存破坏,proof如下:


$ CHARSET=//ABCDE pkexec 
*** Error in `pkexec': malloc(): memory corruption: 0x00007f15bc0732d0 ***
*** Error in `pkexec': malloc(): memory corruption: 0x00007f15bc0732d0 ***




绕过ASLR?

据说在Fedora 32-bit上可以直接这样:

  rlim.rlim_cur = rlim.rlim_max = RLIM_INFINITY;
  setrlimit(RLIMIT_STACK, &rlim);
  rlim.rlim_cur = rlim.rlim_max = 1;
  setrlimit(RLIMIT_DATA, &rlim);

绕过后,程序永远从固定基址加载

40000000-40005000 r-xp 00000000 fd:01 9909        /usr/bin/pkexec
406b9000-407bb000 rw-p 00000000 00:00 0           /* mmap() heap */
bfce5000-bfd06000 rw-p 00000000 00:00 0           [stack]


往后复制固定的四字节有什么用?

malloc 内存堆线性排列,类似于 |m| blah1 |m| blah2 |m| blah3

复制四个字节可以覆盖后面一个块的meta data,metadata是一个内存块长度,最后一个字节是flag,0x1代表正在使用,0x0代表已经free,需要回收。而0x6f732e00最后一个字节肯定是NUL byte,所以正好将下一个块堆内存标记为free。

所以如果能溢出blah2,覆盖blah3前面的m,然后坐等blah3回收,那么回收机制将会去m + &blah3的地方找链表进行断链,这时候将得到一次地址写的机会


如何找到一个合适的blah3?

首先选择攻击的目标是pkexec,这个文件有权限提权,pkexec在判断传入的路径不存在时将打出一个error message  这块堆得大小是508bytes + 4bytes metadata

而这个error message 的分配逻辑是这样的:先申请100字节,不满足则在100*2+100 = 300字节,在不满足则申请300*2+100 = 700字节

本例中的申请顺序如下:

malloc(100), malloc(300), free(100), malloc(700), free(300), realloc(508)

内存布局如下:

| free space: 100 |m| free space: 300 |m| error message: 508 bytes |

这时候将CHARSET=//AAAAA…设置为236 bytes 的A,将恰好覆盖到300的free space里面:

| blah |m| blah |m| charset derived value: 236 bytes |m: 0x00000201| error message: 508 bytes |


m = 0x201是指512字节的buffer,并且这段内存在使用中,这个值将在后续利用中改写




接下来如何利用?heap spray

| blah |m| blah |m| charset derived value: 236 bytes |m: 0x6f732e00| error message: 508 bytes |

修改过之后,m指向的内存结尾将指向 0x406xxxxx + 0x6f732e00 ,那么加完后这个值已经进入内核空间了,无法利用

如果能做一个heap spray,把堆内存推到7xxx xxxx上面,那么加完0x6f732e00最终就是一个0x5xxxxxxx的地址,这个地址的内容是spray出来可,可控


pkexec恰好有一个传入参数,没有做内存释放,可用来做heap spray,而且这个 -u 可以传多次,实际上他传了15 million+个 --user 。。。

     else if (strcmp (argv[n], "--user") == 0 || strcmp (argv[n], "-u") == 0)
        {
          n++;
          if (n >= (guint) argc)
            {
              usage (argc, argv);
              goto out;
            }

          opt_user = g_strdup (argv[n]);
        }



最后:

利用链表断链操作写一个地址,向tls_dtor_list. __exit_funcs 写入值以控制代码执行流程










[1] http://googleprojectzero.blogspot.tw/2014/08/the-poisoned-nul-byte-2014-edition.html

[2] https://github.com/lattera/glibc/blob/a2f34833b1042d5d8eeb263b4cf4caaea138c4ad/iconv/gconv_trans.c

[3] https://code.google.com/p/google-security-research/issues/detail?id=96