容器內的Linux診斷工具0x.tools

扣釘日記發表於2022-05-07

原創:扣釘日記(微信公眾號ID:codelogs),歡迎分享,轉載請保留出處。

簡介

Linux上有大量的問題診斷工具,如perf、bcc等,但這些診斷工具,雖然功能強大,但卻需要很高的許可權才可以使用。

而0x.tools這個工具提供了一個很好的思路,通過取樣/proc目錄來診斷問題,對被測量程式幾乎無效能影響,且只要與目標程式擁有同等級的許可權,即可正常使用。

不要小看這個許可權區別,在網際網路大廠,開發同學一般只能獲取到一個受限於容器內的shell環境,想要獲取機器的root許可權幾乎是不可能的。

安裝

# 下載原始碼
$ git clone https://github.com/tanelpoder/0xtools.git

# 安裝編譯器
$ yum install -y make gcc

# 編譯並安裝程式
$ make && make install

實際上0x.tools裡的工具大多數是指令碼,如psn工具是python指令碼,因此直接將程式碼clone下來,然後執行bin/psn也是可以的。

psn工具

psn工具用來觀測系統中當前活躍的執行緒正在做什麼,如執行緒在做什麼系統呼叫、寫什麼檔案、阻塞在哪個核心函式下?

檢視活躍執行緒

[tanel@linux01 ~]$ psn

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat for 5 seconds... finished.


=== Active Threads ================================================

 samples | avg_threads | comm             | state                  
-------------------------------------------------------------------
   10628 |     3542.67 | (kworker/*:*)    | Disk (Uninterruptible) 
      37 |       12.33 | (oracle_*_l)     | Running (ON CPU)       
      17 |        5.67 | (oracle_*_l)     | Disk (Uninterruptible) 
       2 |        0.67 | (xcapture)       | Running (ON CPU)       
       1 |        0.33 | (ora_lg*_xe)     | Disk (Uninterruptible) 
       1 |        0.33 | (ora_lgwr_lin*)  | Disk (Uninterruptible) 
       1 |        0.33 | (ora_lgwr_lin*c) | Disk (Uninterruptible) 


samples: 3 (expected: 100)
total processes: 10470, threads: 11530
runtime: 6.13, measure time: 6.03

如上,預設情況下,psn取樣/proc目錄下每個執行緒的/proc/$pid/stat檔案,取樣5秒鐘,將R(正在執行)或D(不可中斷休眠)狀態的執行緒的資料記錄下來,並做彙總。

由於R或D狀態的執行緒都是活躍執行緒,被取樣到的次數越多,則越說明這些執行緒執行得更慢或更頻繁。

檢視執行緒讀寫檔案

[tanel@linux01 ~]$ sudo psn -G syscall,filenamesum

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/syscall, stat for 5 seconds... finished.


=== Active Threads =======================================================================================================

 samples | avg_threads | comm            | state                  | syscall         | filenamesum                         
--------------------------------------------------------------------------------------------------------------------------
    2027 |      506.75 | (kworker/*:*)   | Disk (Uninterruptible) | [kernel_thread] |                                     
    1963 |      490.75 | (oracle_*_l)    | Disk (Uninterruptible) | pread64         | /data/oracle/LIN*C/soe_bigfile.dbf
      87 |       21.75 | (oracle_*_l)    | Running (ON CPU)       | [running]       |                                     
      13 |        3.25 | (kworker/*:*)   | Running (ON CPU)       | [running]       |                                     
       4 |        1.00 | (oracle_*_l)    | Running (ON CPU)       | read            | socket:[*]                          
       2 |        0.50 | (collectl)      | Running (ON CPU)       | [running]       |                                     
       1 |        0.25 | (java)          | Running (ON CPU)       | futex           |                                     
       1 |        0.25 | (ora_ckpt_xe)   | Disk (Uninterruptible) | pread64         | /data/oracle/XE/control*.ctl        
       1 |        0.25 | (ora_m*_linprd) | Running (ON CPU)       | [running]       |                                     
       1 |        0.25 | (ora_m*_lintes) | Running (ON CPU)       | [running]       |                                     

通過-G可以指定需要檢視的列,syscall表示執行緒正在執行的系統呼叫,filenamesum表示正在讀寫的檔案,一般來說,執行緒處於D狀態時在做檔案io操作,如果D狀態執行緒頻繁出現,那麼我們肯定想知道執行緒正在讀寫哪個檔案。

檢視執行緒的核心棧

[tanel@linux01 ~]$ sudo psn -p -G syscall,wchan,kstack

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/wchan, stack, syscall, stat for 5 seconds... finished.


=== Active Threads =======================================================================================================================================================================================================================================================================================================================================================================================

 samples | avg_threads | comm          | state                  | syscall         | wchan                        | kstack                                                                                                                                                                                                                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     281 |      140.50 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | blkdev_issue_flush           | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_blkdev_issue_flush()->blkdev_issue_flush()                                                                          
     211 |      105.50 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | call_rwsem_down_read_failed  | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_ilock()->call_rwsem_down_read_failed()                                                                            
     169 |       84.50 | (oracle_*_li) | Disk (Uninterruptible) | pread64         | call_rwsem_down_write_failed | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->touch_atime()->update_time()->xfs_vn_update_time()->xfs_ilock()->call_rwsem_down_write_failed()                                                                       
      64 |       32.00 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | xfs_log_force_lsn            | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_log_force_lsn()                                                                                                     
      24 |       12.00 | (oracle_*_li) | Disk (Uninterruptible) | pread64         | call_rwsem_down_read_failed  | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->__blockdev_direct_IO()->do_blockdev_direct_IO()->xfs_get_blocks_direct()->__xfs_get_blocks()->xfs_ilock_data_map_shared()->xfs_ilock()->call_rwsem_down_read_failed() 
       5 |        2.50 | (oracle_*_li) | Disk (Uninterruptible) | pread64         | do_blockdev_direct_IO        | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->__blockdev_direct_IO()->do_blockdev_direct_IO()                                                                                                                       
       3 |        1.50 | (oracle_*_li) | Running (ON CPU)       | [running]       | 0                            | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->__blockdev_direct_IO()->do_blockdev_direct_IO()                                                                                                                       
       2 |        1.00 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | call_rwsem_down_write_failed | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->xfs_end_io_direct_write()->xfs_iomap_write_unwritten()->xfs_ilock()->call_rwsem_down_write_failed()                                                             
       2 |        1.00 | (kworker/*:*) | Running (ON CPU)       | [running]       | 0                            | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_blkdev_issue_flush()->blkdev_issue_flush()                                                                          
       2 |        1.00 | (oracle_*_li) | Disk (Uninterruptible) | io_submit       | call_rwsem_down_write_failed | system_call_fastpath()->SyS_io_submit()->do_io_submit()->xfs_file_aio_read()->xfs_file_dio_aio_read()->touch_atime()->update_time()->xfs_vn_update_time()->xfs_ilock()->call_rwsem_down_write_failed()                                                                                 
       1 |        0.50 | (java)        | Running (ON CPU)       | futex           | futex_wait_queue_me          | system_call_fastpath()->SyS_futex()->do_futex()->futex_wait()->futex_wait_queue_me()                                                                                                                                                                                                   
       1 |        0.50 | (ksoftirqd/*) | Running (ON CPU)       | [running]       | 0                            | ret_from_fork_nospec_begin()->kthread()->smpboot_thread_fn()                                                                                                                                                                                                                           
       1 |        0.50 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | worker_thread                | ret_from_fork_nospec_begin()->kthread()->worker_thread()                                                                                                                                                                                                                               
       1 |        0.50 | (kworker/*:*) | Disk (Uninterruptible) | [kernel_thread] | worker_thread                | ret_from_fork_nospec_begin()->kthread()->worker_thread()->process_one_work()->dio_aio_complete_work()->dio_complete()->generic_write_sync()->xfs_file_fsync()->xfs_blkdev_issue_flush()->blkdev_issue_flush()                                                                          
       1 |        0.50 | (ora_lg*_xe)  | Disk (Uninterruptible) | io_submit       | inode_dio_wait               | system_call_fastpath()->SyS_io_submit()->do_io_submit()->xfs_file_aio_write()->xfs_file_dio_aio_write()->inode_dio_wait()                                                                                                                                                              
       1 |        0.50 | (oracle_*_li) | Disk (Uninterruptible) | [running]       | 0                            | -                                                                                                                                                                                                                                                                                      

同理,通過wchan欄位可以檢視執行緒阻塞在什麼核心方法上,而kstack欄位則可以檢視執行緒阻塞時的核心呼叫棧是什麼。

psn的原理

其實psn和ps命令一樣,是通過遍歷/proc目錄來獲取執行緒資訊的,如下:
state:取自/proc/$pid/stat檔案。
syscall:取自/proc/$pid/syscall檔案。
wchan:取自/proc/$pid/wchan檔案。
kstack:取自/proc/$pid/stack檔案。
與perf、bcc等工具的區別是,讀取這些檔案只需要與程式同等級的許可權即可,不需要使用root賬號。

其它工具

除了psn外,0x.tools裡面還有一些其它工具,如xcapture、schedlat等,這裡就不一一介紹了,感興趣可以訪問 https://0x.tools/ 檢視。

另外,由於psn是通過遍歷/proc目錄實現的,因此我們也可自己編寫指令碼來實現同樣的功能,如下:

active_thread_kstack(){
  # 列印當前系統活躍java執行緒的核心棧
  ps h -Lo pid,tid,s,pcpu,comm,wchan:32,min_flt,maj_flt -C java|grep '[RD] '| awk '
    BEGIN{
        syscall_files["/usr/include/asm/unistd_64.h"]=1;
        syscall_files["/usr/include/x86_64-linux-gnu/asm/unistd_64.h"]=1;
        syscall_files["/usr/include/asm-x86_64/unistd.h"]=1;
        for(tfile in syscall_files){
            cmd="test -f "tfile
            if(system(cmd)==0){
                hfile=tfile;
                break;
            }
        }
        if(hfile){
            while (getline <hfile){
                if($0 ~ /__NR_/){
                    syscall_map[$3]=gensub(/__NR_/,"","g",$2)
                }
            }
        }else{
            syscall_str="0:read 1:write 2:open 3:close 4:stat 5:fstat 6:lstat 7:poll 8:lseek 9:mmap 10:mprotect 11:munmap 12:brk 13:rt_sigaction 14:rt_sigprocmask 15:rt_sigreturn 16:ioctl 17:pread64 18:pwrite64 19:readv 20:writev 21:access 22:pipe 23:select 24:sched_yield 25:mremap 26:msync 27:mincore 28:madvise 29:shmget 30:shmat 31:shmctl 32:dup 33:dup2 34:pause 35:nanosleep 36:getitimer 37:alarm 38:setitimer 39:getpid 40:sendfile 41:socket 42:connect 43:accept 44:sendto 45:recvfrom 46:sendmsg 47:recvmsg 48:shutdown 49:bind 50:listen 51:getsockname 52:getpeername 53:socketpair 54:setsockopt 55:getsockopt 56:clone 57:fork 58:vfork 59:execve 60:exit 61:wait4 62:kill 63:uname 64:semget 65:semop 66:semctl 67:shmdt 68:msgget 69:msgsnd 70:msgrcv 71:msgctl 72:fcntl 73:flock 74:fsync 75:fdatasync 76:truncate 77:ftruncate 78:getdents 79:getcwd 80:chdir 81:fchdir 82:rename 83:mkdir 84:rmdir 85:creat 86:link 87:unlink 88:symlink 89:readlink 90:chmod 91:fchmod 92:chown 93:fchown 94:lchown 95:umask 96:gettimeofday 97:getrlimit 98:getrusage 99:sysinfo 100:times 101:ptrace 102:getuid 103:syslog 104:getgid 105:setuid 106:setgid 107:geteuid 108:getegid 109:setpgid 110:getppid 111:getpgrp 112:setsid 113:setreuid 114:setregid 115:getgroups 116:setgroups 117:setresuid 118:getresuid 119:setresgid 120:getresgid 121:getpgid 122:setfsuid 123:setfsgid 124:getsid 125:capget 126:capset 127:rt_sigpending 128:rt_sigtimedwait 129:rt_sigqueueinfo 130:rt_sigsuspend 131:sigaltstack 132:utime 133:mknod 134:uselib 135:personality 136:ustat 137:statfs 138:fstatfs 139:sysfs 140:getpriority 141:setpriority 142:sched_setparam 143:sched_getparam 144:sched_setscheduler 145:sched_getscheduler 146:sched_get_priority_max 147:sched_get_priority_min 148:sched_rr_get_interval 149:mlock 150:munlock 151:mlockall 152:munlockall 153:vhangup 154:modify_ldt 155:pivot_root 156:_sysctl 157:prctl 158:arch_prctl 159:adjtimex 160:setrlimit 161:chroot 162:sync 163:acct 164:settimeofday 165:mount 166:umount2 167:swapon 168:swapoff 169:reboot 170:sethostname 171:setdomainname 172:iopl 173:ioperm 174:create_module 175:init_module 176:delete_module 177:get_kernel_syms 178:query_module 179:quotactl 180:nfsservctl 181:getpmsg 182:putpmsg 183:afs_syscall 184:tuxcall 185:security 186:gettid 187:readahead 188:setxattr 189:lsetxattr 190:fsetxattr 191:getxattr 192:lgetxattr 193:fgetxattr 194:listxattr 195:llistxattr 196:flistxattr 197:removexattr 198:lremovexattr 199:fremovexattr 200:tkill 201:time 202:futex 203:sched_setaffinity 204:sched_getaffinity 205:set_thread_area 206:io_setup 207:io_destroy 208:io_getevents 209:io_submit 210:io_cancel 211:get_thread_area 212:lookup_dcookie 213:epoll_create 214:epoll_ctl_old 215:epoll_wait_old 216:remap_file_pages 217:getdents64 218:set_tid_address 219:restart_syscall 220:semtimedop 221:fadvise64 222:timer_create 223:timer_settime 224:timer_gettime 225:timer_getoverrun 226:timer_delete 227:clock_settime 228:clock_gettime 229:clock_getres 230:clock_nanosleep 231:exit_group 232:epoll_wait 233:epoll_ctl 234:tgkill 235:utimes 236:vserver 237:mbind 238:set_mempolicy 239:get_mempolicy 240:mq_open 241:mq_unlink 242:mq_timedsend 243:mq_timedreceive 244:mq_notify 245:mq_getsetattr 246:kexec_load 247:waitid 248:add_key 249:request_key 250:keyctl 251:ioprio_set 252:ioprio_get 253:inotify_init 254:inotify_add_watch 255:inotify_rm_watch 256:migrate_pages 257:openat 258:mkdirat 259:mknodat 260:fchownat 261:futimesat 262:newfstatat 263:unlinkat 264:renameat 265:linkat 266:symlinkat 267:readlinkat 268:fchmodat 269:faccessat 270:pselect6 271:ppoll 272:unshare 273:set_robust_list 274:get_robust_list 275:splice 276:tee 277:sync_file_range 278:vmsplice 279:move_pages 280:utimensat 281:epoll_pwait 282:signalfd 283:timerfd_create 284:eventfd 285:fallocate 286:timerfd_settime 287:timerfd_gettime 288:accept4 289:signalfd4 290:eventfd2 291:epoll_create1 292:dup3 293:pipe2 294:inotify_init1 295:preadv 296:pwritev 297:rt_tgsigqueueinfo 298:perf_event_open 299:recvmmsg 300:fanotify_init 301:fanotify_mark 302:prlimit64 303:name_to_handle_at 304:open_by_handle_at 305:clock_adjtime 306:syncfs 307:sendmmsg 308:setns 309:getcpu 310:process_vm_readv 311:process_vm_writev 312:kcmp 313:finit_module 314:sched_setattr 315:sched_getattr 316:renameat2 317:seccomp 318:getrandom 319:memfd_create 320:kexec_file_load 321:bpf 322:execveat 323:userfaultfd 324:membarrier 325:mlock2 326:copy_file_range 327:preadv2 328:pwritev2 329:pkey_mprotect 330:pkey_alloc 331:pkey_free 332:statx 333:io_pgetevents 334:rseq 424:pidfd_send_signal 425:io_uring_setup 426:io_uring_enter 427:io_uring_register 428:open_tree 429:move_mount 430:fsopen 431:fsconfig 432:fsmount 433:fspick 434:pidfd_open 435:clone3"
            split(syscall_str, syscall_arr, " ")
            for(i in syscall_arr){
                split(syscall_arr[i], idname_arr, ":")
                syscall_map[idname_arr[1]]=idname_arr[2]
            }
        }

        syscall_with_fd_map["read"]=1;
        syscall_with_fd_map["write"]=1;
        syscall_with_fd_map["pread64"]=1;
        syscall_with_fd_map["pwrite64"]=1;
        syscall_with_fd_map["fsync"]=1;
        syscall_with_fd_map["fdatasync"]=1;
        syscall_with_fd_map["recvfrom"]=1;
        syscall_with_fd_map["sendto"]=1;
        syscall_with_fd_map["recvmsg"]=1;
        syscall_with_fd_map["sendmsg"]=1;
        syscall_with_fd_map["epoll_wait"]=1;
        syscall_with_fd_map["ioctl"]=1;
        syscall_with_fd_map["accept"]=1;
        syscall_with_fd_map["accept4"]=1;
        special_fd_map["0"]="(stdin)";
        special_fd_map["1"]="(stdout)";
        special_fd_map["2"]="(stderr)";
    }
    {
        RS="^$";
        getline wchan <("/proc/"$1"/task/"$2"/wchan");
        close("/proc/"$1"/task/"$2"/wchan");
        getline stack <("/proc/"$1"/task/"$2"/stack");
        close("/proc/"$1"/task/"$2"/stack");
        getline syscall <("/proc/"$1"/task/"$2"/syscall");
        close("/proc/"$1"/task/"$2"/syscall");
        split(syscall, syscall_arr, /\s+/);
        syscall_id=syscall_arr[1]
        syscall_name=syscall_map[syscall_id];
        if(syscall_name in syscall_with_fd_map){
            fd=strtonum(syscall_arr[2])
            cmd="readlink /proc/"$1"/fd/"fd;
            cmd|getline filename;
            close(cmd);
            if(fd in special_fd_map){
                filename=filename special_fd_map[syscall_id]
            }
        }
        printf "pid:%s,tid:%s,stat:%s,pcpu:%s,comm:%s,wchan:%s,min_flt:%s,maj_flt:%s,syscall:%s,filename:%s\n",$1,$2,$3,$4,$5,wchan,$7,$8,syscall_name,filename;
        print stack;
        RS="\n"
    }'
}

這樣,我們不用安裝0x.tools,就也能得到類似於psn命令的功能了!

往期內容

Linux命令拾遺-入門篇
Linux命令拾遺-文字處理篇
Linux命令拾遺-軟體資源觀測
mysql的timestamp會存在時區問題?
真正理解可重複讀事務隔離級別
字元編碼解惑

相關文章