從原始碼角度看traces.txt是如何生成的

CheapTalks發表於2019-03-04

簡介

原文請看我的部落格:從原始碼角度看traces.txt是如何生成的

traces.txt 位於安卓系統下/data/anr目錄下,當系統中有應用出現ANR時,framework會在彈出Dialog的同時dump出當前各執行緒的堆疊情況,方便開發者分析出ANR的root cause。

ANR是Application Not Responsing 的簡稱,簡而言之,就是安卓系統內建提示使用者應用介面沒有反應的機制,是用來避免應用介面一直卡頓,增加系統使用者友好度的一種方式。

造成ANR的原因很多,都是因為在主執行緒執行任務太久阻塞了介面更新導致的,主要有以下幾類:

  1. Broadcast Timeout: 前臺廣播執行超過10s, 後臺廣播執行超過60s (要注意的是,只有序列的廣播才有超時機制,並行的廣播並不會超時,也就是說,如果廣播是動態註冊的,直接呼叫sendBroadcast觸發,如果主執行緒Looper中排在後面的Message不會觸發超時機制,那麼即時這個廣播是前臺廣播,系統也永遠不會彈出框提示使用者超時了)
  2. Service Timeout: 前臺服務之星超過20s, 後臺服務之星超過200s
  3. Provider Timeout: 內容提供者,publish超過10s
  4. Input Timeout: 按鍵觸控事件分發超過5s

能夠造成ANR的前提是任務是在主執行緒上執行的,執行什麼樣的任務主要有以下幾點:

  1. 執行耗時任務過久,如檔案讀取或儲存,網路訪問獲取檔案太耗時
  2. 執行緒被阻塞過久,或者乾脆出現了死鎖
  3. 執行緒飢餓,如Binder執行緒總共16個,Binder主執行緒佔有一個,剩餘的15個工作執行緒都被佔滿
  4. CPU飢餓,負載值過大,雖然程式碼正常額但任務一直沒有來得及執行

那麼回到traces.txt檔案,它到底包含了什麼資訊可以幫助開發者找到ANR問題的根源呢。

在這篇文章,我先從一份traces.txt的日誌例項開始解析,然後再通過追蹤原始碼來解釋traces.txt是如何生成的。

我是基於Android 5.0原生程式碼進行分析的,最新的一些程式碼邏輯放到了AppErrors類中了,有些小差別,但是大體的邏輯和步驟沒有變化,不會影響閱讀。

日誌例項

// 觸發ANR的應用pid及觸發時間
----- pid 30307 at 2017-09-03 14:51:14 -----
Cmd line: com.example.androidyue.bitmapdemo

JNI: CheckJNI is off; workarounds are off; pins=0; globals=272

DALVIK THREADS:
(mutexes: tll=0 tsl=0 tscl=0 ghl=0)

// JAVA 態執行緒狀態

// 執行緒名稱; 優先順序; 執行緒id; 執行緒狀態
"main" prio=5 tid=1 TIMED_WAIT
  // 執行緒組; 執行緒suspend計數; 執行緒debug suspend計數; 同輩執行緒物件; 當前執行緒物件
  | group="main" sCount=1 dsCount=0 obj=0x416eaf18 self=0x416d8650
  // 執行緒id; 程式優先順序; 排程者優先順序; 排程者組名; 執行緒快取控制程式碼
  | sysTid=30307 nice=0 sched=0/0 cgrp=apps handle=1074565528
  // native執行緒狀態; 排程者狀態; 執行使用者程式碼的總時間; 執行系統程式碼的總時間; 任務cpu使用數量
  | state=S schedstat=( 0 0 0 ) utm=5 stm=4 core=3
  // 執行緒呼叫棧
  at java.lang.VMThread.sleep(Native Method)
  at java.lang.Thread.sleep(Thread.java:1044)
  at java.lang.Thread.sleep(Thread.java:1026)
  at com.example.androidyue.bitmapdemo.MainActivity$1.run(MainActivity.java:27)
  at android.app.Activity.runOnUiThread(Activity.java:4794)
  at com.example.androidyue.bitmapdemo.MainActivity.onResume(MainActivity.java:33)
  at android.app.Instrumentation.callActivityOnResume(Instrumentation.java:1282)
  at android.app.Activity.performResume(Activity.java:5405)

// Native態執行緒狀態

"Binder_5" prio=5 tid=32 NATIVE
  | group="main" sCount=1 dsCount=0 obj=0x41dd6570 self=0x678fa458
  | sysTid=29473 nice=0 sched=0/0 cgrp=apps handle=1732666432
  | state=S schedstat=( 578000 10600000 3 ) utm=0 stm=0 core=0
  #00  pc 00016fe4  /system/lib/libc.so (__ioctl+8)
  #01  pc 0002a97d  /system/lib/libc.so (ioctl+16)
  #02  pc 00016ba1  /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+132)
  #03  pc 00017363  /system/lib/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+154)
  #04  pc 0001b15d  /system/lib/libbinder.so
  #05  pc 00011267  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+114)
  #06  pc 0004679f  /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+66)
  #07  pc 00010dcd  /system/lib/libutils.so
  #08  pc 0000e3d8  /system/lib/libc.so (__thread_entry+72)
  #09  pc 0000dac4  /system/lib/libc.so (pthread_create+160)
  at dalvik.system.NativeStart.run(Native Method)複製程式碼

關鍵詞對照

關鍵詞 解釋
prio 優先順序
tid 執行緒id
group 組名
sCount 掛起次數
dsCount 除錯下 掛起計數
obj 同輩執行緒物件
self 當前執行緒物件
sysTid 執行緒id
nice 程式優先順序
sched 排程者優先順序
cgrp 排程者組名
handle 執行緒快取控制程式碼
state native執行緒狀態
schedstat 排程者狀態
utm 使用者態CPU時間
stm 核心態CPU時間
core 執行所在核

執行緒狀態對照

JAVA狀態 CPP狀態 解釋
TERMINATED ZOMBIE 執行緒死亡
RUNNABLE RUNNING/RUNNABLE 可執行或正在執行
TIMED_WAITING TIMED_WAIT 執行了帶有超時引數的wait, sleep或join函式
BLOCKED MONITOR 執行緒阻塞,等待獲取物件鎖
WAITING WAIT 執行了無超時引數的wait
NEW INITIALIZING 新建,正在初始化中
NEW STARTING 新建,正在啟動
RUNNABLE NATIVE 正在執行JNI本地函式
WAITING VMWAIT 正在等待VM資源
RUNNABLE SUSPENDED 執行緒暫停,通常由於GC或者DEBUG被掛起

原始碼分析 (虛擬機器)

我將從程式碼的最底層開始分析,再慢慢到觸發點的頂端,先從cpp開始

Thread::DumpState

thread.cc

void Thread::DumpState(std::ostream& os, const Thread* thread, pid_t tid) {
  // 組名
  std::string group_name;
  // 優先順序
  int priority;
  // 是否是daemon執行緒
  bool is_daemon = false;
  // 拿到當前執行緒物件指標
  Thread* self = Thread::Current();

  if (thread != nullptr) {
    ...
  }


  if (gAborting == 0 && self != nullptr && thread != nullptr && thread->tlsPtr_.opeer != nullptr) {
    ScopedObjectAccessUnchecked soa(self);
    priority = soa.DecodeField(WellKnownClasses::java_lang_Thread_priority)
        ->GetInt(thread->tlsPtr_.opeer);
    is_daemon = soa.DecodeField(WellKnownClasses::java_lang_Thread_daemon)
        ->GetBoolean(thread->tlsPtr_.opeer);

    ...

    if (thread_group != nullptr) {
      ...
      group_name = (group_name_string != nullptr) ? group_name_string->ToModifiedUtf8() : "<null>";
    }
  } else {
    priority = GetNativePriority();
  }

  std::string scheduler_group_name(GetSchedulerGroupName(tid));
  if (scheduler_group_name.empty()) {
    scheduler_group_name = "default";
  }

  // 這裡開始輸出執行緒狀態
  if (thread != nullptr) {
    // 執行緒名輸出
    os << `"` << *thread->tlsPtr_.name << `"`;
    if (is_daemon) {
      os << " daemon";
    }
    // 優先順序; ID; 狀態
    os << " prio=" << priority
       << " tid=" << thread->GetThreadId()
       << " " << thread->GetState();
    if (thread->IsStillStarting()) {
      os << " (still starting up)";
    }
    os << "
";
  } else {
    os << `"` << ::art::GetThreadName(tid) << `"`
       << " prio=" << priority
       << " (not attached)
";
  }

  // 執行緒組; 執行緒suspend計數; 執行緒debug suspend計數; 同輩執行緒物件; 當前執行緒物件
  if (thread != nullptr) {
    MutexLock mu(self, *Locks::thread_suspend_count_lock_);
    os << "  | group="" << group_name << """
       << " sCount=" << thread->tls32_.suspend_count
       << " dsCount=" << thread->tls32_.debug_suspend_count
       << " obj=" << reinterpret_cast<void*>(thread->tlsPtr_.opeer)
       << " self=" << reinterpret_cast<const void*>(thread) << "
";
  }

  // 執行緒id; 程式優先順序; 排程者組名
  os << "  | sysTid=" << tid
     << " nice=" << getpriority(PRIO_PROCESS, tid)
     << " cgrp=" << scheduler_group_name;

  // 排程者優先順序; 快取控制程式碼
  if (thread != nullptr) {
    int policy;
    sched_param sp;
    CHECK_PTHREAD_CALL(pthread_getschedparam, (thread->tlsPtr_.pthread_self, &policy, &sp),
                       __FUNCTION__);
    os << " sched=" << policy << "/" << sp.sched_priority
       << " handle=" << reinterpret_cast<void*>(thread->tlsPtr_.pthread_self);
  }
  os << "
";

  // Grab the scheduler stats for this thread.
  std::string scheduler_stats;
  if (ReadFileToString(StringPrintf("/proc/self/task/%d/schedstat", tid), &scheduler_stats)) {
    scheduler_stats.resize(scheduler_stats.size() - 1);  // Lose the trailing `
`.
  } else {
    scheduler_stats = "0 0 0";
  }

  char native_thread_state = `?`;
  int utime = 0;
  int stime = 0;
  int task_cpu = 0;
  GetTaskStats(tid, &native_thread_state, &utime, &stime, &task_cpu);

  // native執行緒狀態; 排程者狀態; 執行使用者程式碼的總時間; 執行系統程式碼的總時間; 任務cpu使用數量
  os << "  | state=" << native_thread_state
     << " schedstat=( " << scheduler_stats << " )"
     << " utm=" << utime
     << " stm=" << stime
     << " core=" << task_cpu
     << " HZ=" << sysconf(_SC_CLK_TCK) << "
";
  if (thread != nullptr) {
    // dump 執行緒當前棧
    os << "  | stack=" << reinterpret_cast<void*>(thread->tlsPtr_.stack_begin) << "-"
        << reinterpret_cast<void*>(thread->tlsPtr_.stack_end) << " stackSize="
        << PrettySize(thread->tlsPtr_.stack_size) << "
";
    // dump 執行緒鎖資訊
    os << "  | held mutexes=";
    for (size_t i = 0; i < kLockLevelCount; ++i) {
      if (i != kMonitorLock) {
        BaseMutex* mutex = thread->GetHeldMutex(static_cast<LockLevel>(i));
        if (mutex != nullptr) {
          os << " "" << mutex->GetName() << """;
          if (mutex->IsReaderWriterMutex()) {
            ReaderWriterMutex* rw_mutex = down_cast<ReaderWriterMutex*>(mutex);
            if (rw_mutex->GetExclusiveOwnerTid() == static_cast<uint64_t>(tid)) {
              os << "(exclusive held)";
            } else {
              os << "(shared held)";
            }
          }
        }
      }
    }
    os << "
";
  }
}複製程式碼

GetTaskStats

utils.cc

void GetTaskStats(pid_t tid, char* state, int* utime, int* stime, int* task_cpu) {
  *utime = *stime = *task_cpu = 0;
  std::string stats;
  // 從stat節點獲取資料
  if (!ReadFileToString(StringPrintf("/proc/self/task/%d/stat", tid), &stats)) {
    return;
  }
  // Skip the command, which may contain spaces.
  stats = stats.substr(stats.find(`)`) + 2);
  // Extract the three fields we care about.
  std::vector<std::string> fields;
  Split(stats, ` `, &fields);
  *state = fields[0][0];
  *utime = strtoull(fields[11].c_str(), nullptr, 10);
  *stime = strtoull(fields[12].c_str(), nullptr, 10);
  *task_cpu = strtoull(fields[36].c_str(), nullptr, 10);
}複製程式碼

DumpUnattachedThread

thread_list.cc

// 讀取該執行緒中的所有狀態資訊:JAVA態,Native態,Kernel態
static void DumpUnattachedThread(std::ostream& os, pid_t tid) NO_THREAD_SAFETY_ANALYSIS {
  // [接上]
  Thread::DumpState(os, nullptr, tid);
  DumpKernelStack(os, tid, "  kernel: ", false);

  if (false) {
    DumpNativeStack(os, tid, "  native: ");
  }
  os << "
";
}

void ThreadList::DumpUnattachedThreads(std::ostream& os) {
  DIR* d = opendir("/proc/self/task");
  if (!d) {
    return;
  }

  Thread* self = Thread::Current();
  dirent* e;
  // 獲取當前程式中所有執行緒,dump這些執行緒的資訊
  while ((e = readdir(d)) != nullptr) {
    char* end;
    pid_t tid = strtol(e->d_name, &end, 10);
    if (!*end) {
      bool contains;
      {
        MutexLock mu(self, *Locks::thread_list_lock_);
        contains = Contains(tid);
      }
      if (!contains) {
        // [接上]
        DumpUnattachedThread(os, tid);
      }
    }
  }
  closedir(d);
}

bool ThreadList::Contains(Thread* thread) {
  return find(list_.begin(), list_.end(), thread) != list_.end();
}複製程式碼

DumpForSigQuit

thread_list.cc

void ThreadList::DumpForSigQuit(std::ostream& os) {
  ...
  DumpUnattachedThreads(os);
}複製程式碼

runtime.cc

void Runtime::DumpForSigQuit(std::ostream& os) {
  GetClassLinker()->DumpForSigQuit(os);
  GetInternTable()->DumpForSigQuit(os);
  GetJavaVM()->DumpForSigQuit(os);
  GetHeap()->DumpForSigQuit(os);
  TrackedAllocators::Dump(os);
  os << "
";

  // [接上]
  // dump 執行緒列表
  thread_list_->DumpForSigQuit(os);
  BaseMutex::DumpAll(os);
}複製程式碼

HandleSigQuit

singal_catcher.cc

void SignalCatcher::HandleSigQuit() {
  Runtime* runtime = Runtime::Current();
  std::ostringstream os;
  // 輸出程式id以及當前dump時間
  os << "
"
      << "----- pid " << getpid() << " at " << GetIsoDate() << " -----
";

  DumpCmdLine(os);

  // 輸出系統資訊
  std::string fingerprint = runtime->GetFingerprint();
  os << "Build fingerprint: `" << (fingerprint.empty() ? "unknown" : fingerprint) << "`
";
  os << "ABI: `" << GetInstructionSetString(runtime->GetInstructionSet()) << "`
";

  os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "
";

  // [接上]
  runtime->DumpForSigQuit(os);

  ...
  os << "----- end " << getpid() << " -----
";
  Output(os.str());
}複製程式碼

SignalCatcher::Run

void* SignalCatcher::Run(void* arg) {
...
  // Set up mask with signals we want to handle.
  SignalSet signals;
  signals.Add(SIGQUIT);
  signals.Add(SIGUSR1);

  while (true) {
    // 等待訊號
    int signal_number = signal_catcher->WaitForSignal(self, signals);
...

    switch (signal_number) {
    case SIGQUIT:
      // [接上]
      signal_catcher->HandleSigQuit();
      break;
    case SIGUSR1:
      signal_catcher->HandleSigUsr1();
      break;
    default:
      LOG(ERROR) << "Unexpected signal %d" << signal_number;
      break;
    }
  }
}

int SignalCatcher::WaitForSignal(Thread* self, SignalSet& signals) {
...
  int signal_number = signals.Wait();
...
  return signal_number;
}複製程式碼

原始碼分析 (framework)

可見,android framework通過傳送訊號3來通知虛擬機器來進行dump到traces.txt檔案中的

dumpStackTraces

ActivityManagerService.java

private static void dumpStackTraces(String tracesPath, ArrayList<Integer> firstPids,
                                    ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {
    // 當trace寫完後,監聽並觸發onEvent方法
    FileObserver observer = new FileObserver(tracesPath, FileObserver.CLOSE_WRITE) {
        @Override
        public synchronized void onEvent(int event, String path) {
            notify();
        }
    };

    try {
        // 開始監聽
        observer.startWatching();

        // 首先收集firstPids連結串列裡的程式資訊
        if (firstPids != null) {
            try {
                int num = firstPids.size();
                for (int i = 0; i < num; i++) {
                    synchronized (observer) {
                        // 傳送訊號3來觸發
                        Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT);
                        observer.wait(200);  // Wait for write-close, give up after 200msec
                    }
                }
            } catch (InterruptedException e) {
                Slog.wtf(TAG, e);
            }
        }

        // 收集native程式資訊
        ...

        // 最後收集CPU資訊
        ...
    } finally {
        observer.stopWatching();
    }
}複製程式碼
public static File dumpStackTraces(boolean clearTraces, ArrayList<Integer> firstPids,
                                   ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {
    // 獲取trace檔案路徑
    String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
...

    File tracesFile = new File(tracesPath);
    try {
...
        FileUtils.setPermissions(tracesDir.getPath(), 0775, -1, -1);  // drwxrwxr-x

        if (clearTraces && tracesFile.exists()) tracesFile.delete();
        tracesFile.createNewFile();
        FileUtils.setPermissions(tracesFile.getPath(), 0666, -1, -1); // -rw-rw-rw-
    } catch (IOException e) {
        Slog.w(TAG, "Unable to prepare ANR traces file: " + tracesPath, e);
        return null;
    }

    // [接上]
    dumpStackTraces(tracesPath, firstPids, processCpuTracker, lastPids, nativeProcs);
    return tracesFile;
}複製程式碼

appNotResponding

ActivityManagerService.java

final void appNotResponding(ProcessRecord app, ActivityRecord activity,
                            ActivityRecord parent, boolean aboveSystem, final String annotation) {
    ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
    SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
...
    final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

    // [接上]
    File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids,
            NATIVE_STACKS_OF_INTEREST);
...
    Slog.e(TAG, info.toString());
    if (tracesFile == null) {
        // There is no trace file, so dump (only) the alleged culprit`s threads to the log
        Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
    }

    addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
            cpuInfo, tracesFile, null);
...
}複製程式碼

觸發ANR的方式上面我已經列舉過了,這裡我以觸發Broadcast Timeout時進行分析

AppNotResponding.run

BroadcastQueue.java

private final class AppNotResponding implements Runnable {
    private final ProcessRecord mApp;
    private final String mAnnotation;

    public AppNotResponding(ProcessRecord app, String annotation) {
        mApp = app;
        mAnnotation = annotation;
    }

    @Override
    public void run() {
        // [接上]
        mService.appNotResponding(mApp, null, null, false, mAnnotation);
    }
}複製程式碼

broadcastTimeoutLocked

BroadcastQueue.java

final void broadcastTimeoutLocked(boolean fromMsg) {
...
    String anrMessage = null;
...
    if (app != null) {
        anrMessage = "Broadcast of " + r.intent.toString();
    }
...
    if (anrMessage != null) {
        // [接上]
        mHandler.post(new AppNotResponding(app, anrMessage));
    }
}複製程式碼

BroadcastHandler.handleMessage

private final class BroadcastHandler extends Handler {
    public BroadcastHandler(Looper looper) {
        super(looper, null, true);
    }

    @Override
    public void handleMessage(Message msg) {
        switch (msg.what) {
...
            case BROADCAST_TIMEOUT_MSG: {
                synchronized (mService) {
                    // [接上]
                    broadcastTimeoutLocked(true);
                }
            } break;
...
        }
    }
};複製程式碼

setBroadcastTimeoutLocked

BroadcastQueue.java

final void setBroadcastTimeoutLocked(long timeoutTime) {
    if (! mPendingBroadcastTimeoutMessage) {
        Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
        mHandler.sendMessageAtTime(msg, timeoutTime);
        mPendingBroadcastTimeoutMessage = true;
    }
}

final void processNextBroadcast(boolean fromMsg) {
    if (! mPendingBroadcastTimeoutMessage) {
        long timeoutTime = r.receiverTime + mTimeoutPeriod;
        // [接上]
        setBroadcastTimeoutLocked(timeoutTime);
    }
}複製程式碼

總結

下面總結一下觸發廣播ANR後輸出traces.txt的過程:

  1. 有序廣播在註冊後傳送
  2. AMS設定廣播超時時間,通過Handler進行延遲設定
  3. 廣播超時觸發超時機制
  4. Framework層傳送訊號給虛擬機器
  5. 虛擬機器dump出特定程式所有執行緒資訊並儲存到traces.txt檔案中

相關文章