Linux Systemcall By INT 0x80、Llinux Kernel Debug Based On Sourcecode

Andrew.Hann發表於2014-07-28

目錄

1. 系統呼叫簡介
2. 系統呼叫跟蹤除錯
3. 系統呼叫核心原始碼分析

 

1. 系統呼叫簡介

關於系統呼叫的基本原理,請參閱另一篇文章,本文的主要目標是從核心原始碼的角度來學習一下系統呼叫在底層的核心中是如何實現的

Relevant Link:

http://www.cnblogs.com/LittleHann/p/3850653.html
http://www.kerneltravel.net/journal/iv/syscall.htm
http://zjuedward.blog.51cto.com/1445231/465997
http://www.ibm.com/developerworks/cn/linux/l-system-calls/
http://www.cnblogs.com/LittleHann/p/3850655.html

 

2. 系統呼叫跟蹤除錯

我們可以藉助一些核心除錯工具,來動態跟蹤執行路徑
code:

#include <syscall.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>

int main(void) 
{
    long ID;
    ID = getpid();
    printf ("getpid()=%ld\n", ID);
    return(0);
}

編譯:

gcc sys.c -o sys

kdb除錯

1. 啟用KDB(按下pause鍵,當然你必須已經給核心打了KDB補丁)
2. 設定核心斷點: "bp sys_getpid"
3. 退出kdb: "go"
4. 執行./getpid
5. 進入核心除錯狀態,執行路徑停止在斷點sys_getpid處。
6. 在KDB>提示符下,執行bt命令觀察堆疊,發現呼叫的巢狀路徑,可以看到在sys_getpid是在核心函式system_call中被巢狀呼叫的。
7. 在KDB>提示符下,執行rd命令檢視暫存器中的數值,可以看到eax中存放的getpid呼叫號: 0x00000014(=20).
8. 在KDB>提示符下,執行ssb(或ss)命令跟蹤核心程式碼執行路徑,可以發現sys_getpid執行後,會返回system_call函式,然後接者轉入ret_from_sys_call例程 

程式的執行流程大致可歸結為以下幾個步驟

1. 該程式呼叫libc庫的封裝函式getpid。該封裝函式將系統呼叫號_NR_getpid(第20個)壓入EAX暫存器
2. 通過IDT找到0x80中斷的地址
3. 呼叫軟中斷int 0x80 進入核心態
4. 在核心中首先執行system_call,接著執行根據系統呼叫號在呼叫表中查詢到的對應的系統呼叫服務例程sys_getpid
5. 執行sys_getpid服務例程
6. 執行完畢後,轉入ret_from_sys_call例程,系統呼叫中返回

Relevant Link:

ftp://oss.sgi.com/projects/kdb/download/v4.4/
https://www.kernel.org/
http://www.cnblogs.com/shineshqw/articles/2359114.html
http://www.drdobbs.com/open-source/linux-kernel-debugging/184406318

 

3. 系統呼叫核心原始碼分析

還是拿文章第2節的getuid()這個小程式作為例如,我們來從原始碼角度分析一下linux中系統呼叫的原理

linux-2.6.32.63\arch\x86\kernel

/*
所有系統呼叫的入口點,引數system_call是所希望啟用的系統呼叫的"呼叫號"
*/
ENTRY(system_call)
    RING0_INT_FRAME            # can't unwind into user space anyway
    /*
    儲存orig_eax,這個值就是傳入的"系統呼叫號"        
    */
    pushl %eax      
    CFI_ADJUST_CFA_OFFSET 4
    /*
SAVE_ALL的巨集定義如下
他的作用是先把所有暫存器的值壓棧,然後在system_call返回之前使用RESTORE_ALL把棧從棧中彈出,在這其中system_call可以根據需要子去使用暫存器的值。任何它呼叫的c函式都可以從棧中查詢到所希望的引數,因為
SAVE_ALL已經把所有暫存器的值都壓入棧中了 .macro SAVE_ALL cld PUSH_GS pushl %fs CFI_ADJUST_CFA_OFFSET 4 pushl %es CFI_ADJUST_CFA_OFFSET 4 pushl %ds CFI_ADJUST_CFA_OFFSET 4 pushl %eax CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET eax, 0 pushl %ebp CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET ebp, 0 pushl %edi CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET edi, 0 pushl %esi CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET esi, 0 pushl %edx CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET edx, 0 pushl %ecx CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET ecx, 0 pushl %ebx CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET ebx, 0 movl $(__USER_DS), %edx movl %edx, %ds movl %edx, %es movl $(__KERNEL_PERCPU), %edx movl %edx, %fs SET_KERNEL_GS %edx .endm
*/ SAVE_ALL GET_THREAD_INFO(%ebp) /* 檢查系統呼叫是否正在被跟蹤 */ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp) jnz syscall_trace_entry cmpl $(nr_syscalls), %eax jae syscall_badsys syscall_call: /* 呼叫系統函式 sys_call_table也定義在是一張由指向實現各種系統呼叫的核心函式的函式指標組成的表: linux-2.6.32.63\arch\x86\kernel\syscall_table_32.S ENTRY(sys_call_table) .long sys_restart_syscall /* 0 - old "setup()" system call, used for restarting */ .long sys_exit .long ptregs_fork .long sys_read .long sys_write .long sys_open /* 5 */ .long sys_close .long sys_waitpid .long sys_creat .long sys_link .long sys_unlink /* 10 */ .long ptregs_execve .long sys_chdir .long sys_time .long sys_mknod .long sys_chmod /* 15 */ .long sys_lchown16 .long sys_ni_syscall /* old break syscall holder */ .long sys_stat .long sys_lseek .long sys_getpid /* 20 */ .long sys_mount .long sys_oldumount .long sys_setuid16 .long sys_getuid16 .long sys_stime /* 25 */ .long sys_ptrace .long sys_alarm .long sys_fstat .long sys_pause .long sys_utime /* 30 */ .long sys_ni_syscall /* old stty syscall holder */ .long sys_ni_syscall /* old gtty syscall holder */ .long sys_access .long sys_nice .long sys_ni_syscall /* 35 - old ftime syscall holder */ .long sys_sync .long sys_kill .long sys_rename .long sys_mkdir .long sys_rmdir /* 40 */ .long sys_dup .long sys_pipe .long sys_times .long sys_ni_syscall /* old prof syscall holder */ .long sys_brk /* 45 */ .long sys_setgid16 .long sys_getgid16 .long sys_signal .long sys_geteuid16 .long sys_getegid16 /* 50 */ .long sys_acct .long sys_umount /* recycled never used phys() */ .long sys_ni_syscall /* old lock syscall holder */ .long sys_ioctl .long sys_fcntl /* 55 */ .long sys_ni_syscall /* old mpx syscall holder */ .long sys_setpgid .long sys_ni_syscall /* old ulimit syscall holder */ .long sys_olduname .long sys_umask /* 60 */ .long sys_chroot .long sys_ustat .long sys_dup2 .long sys_getppid .long sys_getpgrp /* 65 */ .long sys_setsid .long sys_sigaction .long sys_sgetmask .long sys_ssetmask .long sys_setreuid16 /* 70 */ .long sys_setregid16 .long sys_sigsuspend .long sys_sigpending .long sys_sethostname .long sys_setrlimit /* 75 */ .long sys_old_getrlimit .long sys_getrusage .long sys_gettimeofday .long sys_settimeofday .long sys_getgroups16 /* 80 */ .long sys_setgroups16 .long old_select .long sys_symlink .long sys_lstat .long sys_readlink /* 85 */ .long sys_uselib .long sys_swapon .long sys_reboot .long sys_old_readdir .long old_mmap /* 90 */ .long sys_munmap .long sys_truncate .long sys_ftruncate .long sys_fchmod .long sys_fchown16 /* 95 */ .long sys_getpriority .long sys_setpriority .long sys_ni_syscall /* old profil syscall holder */ .long sys_statfs .long sys_fstatfs /* 100 */ .long sys_ioperm .long sys_socketcall .long sys_syslog .long sys_setitimer .long sys_getitimer /* 105 */ .long sys_newstat .long sys_newlstat .long sys_newfstat .long sys_uname .long ptregs_iopl /* 110 */ .long sys_vhangup .long sys_ni_syscall /* old "idle" system call */ .long ptregs_vm86old .long sys_wait4 .long sys_swapoff /* 115 */ .long sys_sysinfo .long sys_ipc .long sys_fsync .long ptregs_sigreturn .long ptregs_clone /* 120 */ .long sys_setdomainname .long sys_newuname .long sys_modify_ldt .long sys_adjtimex .long sys_mprotect /* 125 */ .long sys_sigprocmask .long sys_ni_syscall /* old "create_module" */ .long sys_init_module .long sys_delete_module .long sys_ni_syscall /* 130: old "get_kernel_syms" */ .long sys_quotactl .long sys_getpgid .long sys_fchdir .long sys_bdflush .long sys_sysfs /* 135 */ .long sys_personality .long sys_ni_syscall /* reserved for afs_syscall */ .long sys_setfsuid16 .long sys_setfsgid16 .long sys_llseek /* 140 */ .long sys_getdents .long sys_select .long sys_flock .long sys_msync .long sys_readv /* 145 */ .long sys_writev .long sys_getsid .long sys_fdatasync .long sys_sysctl .long sys_mlock /* 150 */ .long sys_munlock .long sys_mlockall .long sys_munlockall .long sys_sched_setparam .long sys_sched_getparam /* 155 */ .long sys_sched_setscheduler .long sys_sched_getscheduler .long sys_sched_yield .long sys_sched_get_priority_max .long sys_sched_get_priority_min /* 160 */ .long sys_sched_rr_get_interval .long sys_nanosleep .long sys_mremap .long sys_setresuid16 .long sys_getresuid16 /* 165 */ .long ptregs_vm86 .long sys_ni_syscall /* Old sys_query_module */ .long sys_poll .long sys_nfsservctl .long sys_setresgid16 /* 170 */ .long sys_getresgid16 .long sys_prctl .long ptregs_rt_sigreturn .long sys_rt_sigaction .long sys_rt_sigprocmask /* 175 */ .long sys_rt_sigpending .long sys_rt_sigtimedwait .long sys_rt_sigqueueinfo .long sys_rt_sigsuspend .long sys_pread64 /* 180 */ .long sys_pwrite64 .long sys_chown16 .long sys_getcwd .long sys_capget .long sys_capset /* 185 */ .long ptregs_sigaltstack .long sys_sendfile .long sys_ni_syscall /* reserved for streams1 */ .long sys_ni_syscall /* reserved for streams2 */ .long ptregs_vfork /* 190 */ .long sys_getrlimit .long sys_mmap_pgoff .long sys_truncate64 .long sys_ftruncate64 .long sys_stat64 /* 195 */ .long sys_lstat64 .long sys_fstat64 .long sys_lchown .long sys_getuid .long sys_getgid /* 200 */ .long sys_geteuid .long sys_getegid .long sys_setreuid .long sys_setregid .long sys_getgroups /* 205 */ .long sys_setgroups .long sys_fchown .long sys_setresuid .long sys_getresuid .long sys_setresgid /* 210 */ .long sys_getresgid .long sys_chown .long sys_setuid .long sys_setgid .long sys_setfsuid /* 215 */ .long sys_setfsgid .long sys_pivot_root .long sys_mincore .long sys_madvise .long sys_getdents64 /* 220 */ .long sys_fcntl64 .long sys_ni_syscall /* reserved for TUX */ .long sys_ni_syscall .long sys_gettid .long sys_readahead /* 225 */ .long sys_setxattr .long sys_lsetxattr .long sys_fsetxattr .long sys_getxattr .long sys_lgetxattr /* 230 */ .long sys_fgetxattr .long sys_listxattr .long sys_llistxattr .long sys_flistxattr .long sys_removexattr /* 235 */ .long sys_lremovexattr .long sys_fremovexattr .long sys_tkill .long sys_sendfile64 .long sys_futex /* 240 */ .long sys_sched_setaffinity .long sys_sched_getaffinity .long sys_set_thread_area .long sys_get_thread_area .long sys_io_setup /* 245 */ .long sys_io_destroy .long sys_io_getevents .long sys_io_submit .long sys_io_cancel .long sys_fadvise64 /* 250 */ .long sys_ni_syscall .long sys_exit_group .long sys_lookup_dcookie .long sys_epoll_create .long sys_epoll_ctl /* 255 */ .long sys_epoll_wait .long sys_remap_file_pages .long sys_set_tid_address .long sys_timer_create .long sys_timer_settime /* 260 */ .long sys_timer_gettime .long sys_timer_getoverrun .long sys_timer_delete .long sys_clock_settime .long sys_clock_gettime /* 265 */ .long sys_clock_getres .long sys_clock_nanosleep .long sys_statfs64 .long sys_fstatfs64 .long sys_tgkill /* 270 */ .long sys_utimes .long sys_fadvise64_64 .long sys_ni_syscall /* sys_vserver */ .long sys_mbind .long sys_get_mempolicy .long sys_set_mempolicy .long sys_mq_open .long sys_mq_unlink .long sys_mq_timedsend .long sys_mq_timedreceive /* 280 */ .long sys_mq_notify .long sys_mq_getsetattr .long sys_kexec_load .long sys_waitid .long sys_ni_syscall /* 285 */ /* available */ .long sys_add_key .long sys_request_key .long sys_keyctl .long sys_ioprio_set .long sys_ioprio_get /* 290 */ .long sys_inotify_init .long sys_inotify_add_watch .long sys_inotify_rm_watch .long sys_migrate_pages .long sys_openat /* 295 */ .long sys_mkdirat .long sys_mknodat .long sys_fchownat .long sys_futimesat .long sys_fstatat64 /* 300 */ .long sys_unlinkat .long sys_renameat .long sys_linkat .long sys_symlinkat .long sys_readlinkat /* 305 */ .long sys_fchmodat .long sys_faccessat .long sys_pselect6 .long sys_ppoll .long sys_unshare /* 310 */ .long sys_set_robust_list .long sys_get_robust_list .long sys_splice .long sys_sync_file_range .long sys_tee /* 315 */ .long sys_vmsplice .long sys_move_pages .long sys_getcpu .long sys_epoll_pwait .long sys_utimensat /* 320 */ .long sys_signalfd .long sys_timerfd_create .long sys_eventfd .long sys_fallocate .long sys_timerfd_settime /* 325 */ .long sys_timerfd_gettime .long sys_signalfd4 .long sys_eventfd2 .long sys_epoll_create1 .long sys_dup3 /* 330 */ .long sys_pipe2 .long sys_inotify_init1 .long sys_preadv .long sys_pwritev .long sys_rt_tgsigqueueinfo /* 335 */ .long sys_perf_event_open sys_call_table的三個引數 1) 預設陣列的基地址,因為這裡為空則賦予0,但它需要和偏移地址sys_call_table相加,簡單的說是sys_call_table被當作陣列的基地址 2) 陣列索引(系統呼叫號) 3) 大小(每個陣列元素中的位元組數) 所以這行程式碼可以重寫為: call sys_call_table[%eax](); */ call *sys_call_table(,%eax,4) /* 系統呼叫返回 它在EAX暫存器中的返回值(這個值同時也是system_call的返回值)被儲存了起來。返回值被儲存在堆疊中的EAX內,以使得RESTORE_ALL可以迅速地恢復實際的EAX暫存器及其他暫存器的值 */ movl %eax,PT_EAX(%esp) syscall_exit: LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_ANY) TRACE_IRQS_OFF movl TI_flags(%ebp), %ecx testl $_TIF_ALLWORK_MASK, %ecx /* 退出系統呼叫 */ jne syscall_exit_work .....

Relevant Link:

http://bbs.chinaunix.net/thread-2284015-1-1.html
http://wenku.baidu.com/view/f367a0ccda38376baf1fae0d.html
http://www.ibm.com/developerworks/cn/linux/l-system-calls/

 

Copyright (c) 2014 LittleHann All rights reserved

 

相關文章