簡介
原文請看我的部落格:從原始碼角度看traces.txt是如何生成的
traces.txt 位於安卓系統下/data/anr目錄下,當系統中有應用出現ANR時,framework會在彈出Dialog的同時dump出當前各執行緒的堆疊情況,方便開發者分析出ANR的root cause。
ANR是Application Not Responsing 的簡稱,簡而言之,就是安卓系統內建提示使用者應用介面沒有反應的機制,是用來避免應用介面一直卡頓,增加系統使用者友好度的一種方式。
造成ANR的原因很多,都是因為在主執行緒執行任務太久阻塞了介面更新導致的,主要有以下幾類:
- Broadcast Timeout: 前臺廣播執行超過10s, 後臺廣播執行超過60s (要注意的是,只有序列的廣播才有超時機制,並行的廣播並不會超時,也就是說,如果廣播是動態註冊的,直接呼叫sendBroadcast觸發,如果主執行緒Looper中排在後面的Message不會觸發超時機制,那麼即時這個廣播是前臺廣播,系統也永遠不會彈出框提示使用者超時了)
- Service Timeout: 前臺服務之星超過20s, 後臺服務之星超過200s
- Provider Timeout: 內容提供者,publish超過10s
- Input Timeout: 按鍵觸控事件分發超過5s
能夠造成ANR的前提是任務是在主執行緒上執行的,執行什麼樣的任務主要有以下幾點:
- 執行耗時任務過久,如檔案讀取或儲存,網路訪問獲取檔案太耗時
- 執行緒被阻塞過久,或者乾脆出現了死鎖
- 執行緒飢餓,如Binder執行緒總共16個,Binder主執行緒佔有一個,剩餘的15個工作執行緒都被佔滿
- CPU飢餓,負載值過大,雖然程式碼正常額但任務一直沒有來得及執行
那麼回到traces.txt檔案,它到底包含了什麼資訊可以幫助開發者找到ANR問題的根源呢。
在這篇文章,我先從一份traces.txt的日誌例項開始解析,然後再通過追蹤原始碼來解釋traces.txt是如何生成的。
我是基於Android 5.0原生程式碼進行分析的,最新的一些程式碼邏輯放到了AppErrors類中了,有些小差別,但是大體的邏輯和步驟沒有變化,不會影響閱讀。
日誌例項
// 觸發ANR的應用pid及觸發時間
----- pid 30307 at 2017-09-03 14:51:14 -----
Cmd line: com.example.androidyue.bitmapdemo
JNI: CheckJNI is off; workarounds are off; pins=0; globals=272
DALVIK THREADS:
(mutexes: tll=0 tsl=0 tscl=0 ghl=0)
// JAVA 態執行緒狀態
// 執行緒名稱; 優先順序; 執行緒id; 執行緒狀態
"main" prio=5 tid=1 TIMED_WAIT
// 執行緒組; 執行緒suspend計數; 執行緒debug suspend計數; 同輩執行緒物件; 當前執行緒物件
| group="main" sCount=1 dsCount=0 obj=0x416eaf18 self=0x416d8650
// 執行緒id; 程式優先順序; 排程者優先順序; 排程者組名; 執行緒快取控制程式碼
| sysTid=30307 nice=0 sched=0/0 cgrp=apps handle=1074565528
// native執行緒狀態; 排程者狀態; 執行使用者程式碼的總時間; 執行系統程式碼的總時間; 任務cpu使用數量
| state=S schedstat=( 0 0 0 ) utm=5 stm=4 core=3
// 執行緒呼叫棧
at java.lang.VMThread.sleep(Native Method)
at java.lang.Thread.sleep(Thread.java:1044)
at java.lang.Thread.sleep(Thread.java:1026)
at com.example.androidyue.bitmapdemo.MainActivity$1.run(MainActivity.java:27)
at android.app.Activity.runOnUiThread(Activity.java:4794)
at com.example.androidyue.bitmapdemo.MainActivity.onResume(MainActivity.java:33)
at android.app.Instrumentation.callActivityOnResume(Instrumentation.java:1282)
at android.app.Activity.performResume(Activity.java:5405)
// Native態執行緒狀態
"Binder_5" prio=5 tid=32 NATIVE
| group="main" sCount=1 dsCount=0 obj=0x41dd6570 self=0x678fa458
| sysTid=29473 nice=0 sched=0/0 cgrp=apps handle=1732666432
| state=S schedstat=( 578000 10600000 3 ) utm=0 stm=0 core=0
#00 pc 00016fe4 /system/lib/libc.so (__ioctl+8)
#01 pc 0002a97d /system/lib/libc.so (ioctl+16)
#02 pc 00016ba1 /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+132)
#03 pc 00017363 /system/lib/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+154)
#04 pc 0001b15d /system/lib/libbinder.so
#05 pc 00011267 /system/lib/libutils.so (android::Thread::_threadLoop(void*)+114)
#06 pc 0004679f /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+66)
#07 pc 00010dcd /system/lib/libutils.so
#08 pc 0000e3d8 /system/lib/libc.so (__thread_entry+72)
#09 pc 0000dac4 /system/lib/libc.so (pthread_create+160)
at dalvik.system.NativeStart.run(Native Method)複製程式碼
關鍵詞對照
關鍵詞 | 解釋 | |
---|---|---|
prio | 優先順序 | |
tid | 執行緒id | |
group | 組名 | |
sCount | 掛起次數 | |
dsCount | 除錯下 掛起計數 | |
obj | 同輩執行緒物件 | |
self | 當前執行緒物件 | |
sysTid | 執行緒id | |
nice | 程式優先順序 | |
sched | 排程者優先順序 | |
cgrp | 排程者組名 | |
handle | 執行緒快取控制程式碼 | |
state | native執行緒狀態 | |
schedstat | 排程者狀態 | |
utm | 使用者態CPU時間 | |
stm | 核心態CPU時間 | |
core | 執行所在核 |
執行緒狀態對照
JAVA狀態 | CPP狀態 | 解釋 |
---|---|---|
TERMINATED | ZOMBIE | 執行緒死亡 |
RUNNABLE | RUNNING/RUNNABLE | 可執行或正在執行 |
TIMED_WAITING | TIMED_WAIT | 執行了帶有超時引數的wait, sleep或join函式 |
BLOCKED | MONITOR | 執行緒阻塞,等待獲取物件鎖 |
WAITING | WAIT | 執行了無超時引數的wait |
NEW | INITIALIZING | 新建,正在初始化中 |
NEW | STARTING | 新建,正在啟動 |
RUNNABLE | NATIVE | 正在執行JNI本地函式 |
WAITING | VMWAIT | 正在等待VM資源 |
RUNNABLE | SUSPENDED | 執行緒暫停,通常由於GC或者DEBUG被掛起 |
原始碼分析 (虛擬機器)
我將從程式碼的最底層開始分析,再慢慢到觸發點的頂端,先從cpp開始
Thread::DumpState
thread.cc
void Thread::DumpState(std::ostream& os, const Thread* thread, pid_t tid) {
// 組名
std::string group_name;
// 優先順序
int priority;
// 是否是daemon執行緒
bool is_daemon = false;
// 拿到當前執行緒物件指標
Thread* self = Thread::Current();
if (thread != nullptr) {
...
}
if (gAborting == 0 && self != nullptr && thread != nullptr && thread->tlsPtr_.opeer != nullptr) {
ScopedObjectAccessUnchecked soa(self);
priority = soa.DecodeField(WellKnownClasses::java_lang_Thread_priority)
->GetInt(thread->tlsPtr_.opeer);
is_daemon = soa.DecodeField(WellKnownClasses::java_lang_Thread_daemon)
->GetBoolean(thread->tlsPtr_.opeer);
...
if (thread_group != nullptr) {
...
group_name = (group_name_string != nullptr) ? group_name_string->ToModifiedUtf8() : "<null>";
}
} else {
priority = GetNativePriority();
}
std::string scheduler_group_name(GetSchedulerGroupName(tid));
if (scheduler_group_name.empty()) {
scheduler_group_name = "default";
}
// 這裡開始輸出執行緒狀態
if (thread != nullptr) {
// 執行緒名輸出
os << `"` << *thread->tlsPtr_.name << `"`;
if (is_daemon) {
os << " daemon";
}
// 優先順序; ID; 狀態
os << " prio=" << priority
<< " tid=" << thread->GetThreadId()
<< " " << thread->GetState();
if (thread->IsStillStarting()) {
os << " (still starting up)";
}
os << "
";
} else {
os << `"` << ::art::GetThreadName(tid) << `"`
<< " prio=" << priority
<< " (not attached)
";
}
// 執行緒組; 執行緒suspend計數; 執行緒debug suspend計數; 同輩執行緒物件; 當前執行緒物件
if (thread != nullptr) {
MutexLock mu(self, *Locks::thread_suspend_count_lock_);
os << " | group="" << group_name << """
<< " sCount=" << thread->tls32_.suspend_count
<< " dsCount=" << thread->tls32_.debug_suspend_count
<< " obj=" << reinterpret_cast<void*>(thread->tlsPtr_.opeer)
<< " self=" << reinterpret_cast<const void*>(thread) << "
";
}
// 執行緒id; 程式優先順序; 排程者組名
os << " | sysTid=" << tid
<< " nice=" << getpriority(PRIO_PROCESS, tid)
<< " cgrp=" << scheduler_group_name;
// 排程者優先順序; 快取控制程式碼
if (thread != nullptr) {
int policy;
sched_param sp;
CHECK_PTHREAD_CALL(pthread_getschedparam, (thread->tlsPtr_.pthread_self, &policy, &sp),
__FUNCTION__);
os << " sched=" << policy << "/" << sp.sched_priority
<< " handle=" << reinterpret_cast<void*>(thread->tlsPtr_.pthread_self);
}
os << "
";
// Grab the scheduler stats for this thread.
std::string scheduler_stats;
if (ReadFileToString(StringPrintf("/proc/self/task/%d/schedstat", tid), &scheduler_stats)) {
scheduler_stats.resize(scheduler_stats.size() - 1); // Lose the trailing `
`.
} else {
scheduler_stats = "0 0 0";
}
char native_thread_state = `?`;
int utime = 0;
int stime = 0;
int task_cpu = 0;
GetTaskStats(tid, &native_thread_state, &utime, &stime, &task_cpu);
// native執行緒狀態; 排程者狀態; 執行使用者程式碼的總時間; 執行系統程式碼的總時間; 任務cpu使用數量
os << " | state=" << native_thread_state
<< " schedstat=( " << scheduler_stats << " )"
<< " utm=" << utime
<< " stm=" << stime
<< " core=" << task_cpu
<< " HZ=" << sysconf(_SC_CLK_TCK) << "
";
if (thread != nullptr) {
// dump 執行緒當前棧
os << " | stack=" << reinterpret_cast<void*>(thread->tlsPtr_.stack_begin) << "-"
<< reinterpret_cast<void*>(thread->tlsPtr_.stack_end) << " stackSize="
<< PrettySize(thread->tlsPtr_.stack_size) << "
";
// dump 執行緒鎖資訊
os << " | held mutexes=";
for (size_t i = 0; i < kLockLevelCount; ++i) {
if (i != kMonitorLock) {
BaseMutex* mutex = thread->GetHeldMutex(static_cast<LockLevel>(i));
if (mutex != nullptr) {
os << " "" << mutex->GetName() << """;
if (mutex->IsReaderWriterMutex()) {
ReaderWriterMutex* rw_mutex = down_cast<ReaderWriterMutex*>(mutex);
if (rw_mutex->GetExclusiveOwnerTid() == static_cast<uint64_t>(tid)) {
os << "(exclusive held)";
} else {
os << "(shared held)";
}
}
}
}
}
os << "
";
}
}複製程式碼
GetTaskStats
utils.cc
void GetTaskStats(pid_t tid, char* state, int* utime, int* stime, int* task_cpu) {
*utime = *stime = *task_cpu = 0;
std::string stats;
// 從stat節點獲取資料
if (!ReadFileToString(StringPrintf("/proc/self/task/%d/stat", tid), &stats)) {
return;
}
// Skip the command, which may contain spaces.
stats = stats.substr(stats.find(`)`) + 2);
// Extract the three fields we care about.
std::vector<std::string> fields;
Split(stats, ` `, &fields);
*state = fields[0][0];
*utime = strtoull(fields[11].c_str(), nullptr, 10);
*stime = strtoull(fields[12].c_str(), nullptr, 10);
*task_cpu = strtoull(fields[36].c_str(), nullptr, 10);
}複製程式碼
DumpUnattachedThread
thread_list.cc
// 讀取該執行緒中的所有狀態資訊:JAVA態,Native態,Kernel態
static void DumpUnattachedThread(std::ostream& os, pid_t tid) NO_THREAD_SAFETY_ANALYSIS {
// [接上]
Thread::DumpState(os, nullptr, tid);
DumpKernelStack(os, tid, " kernel: ", false);
if (false) {
DumpNativeStack(os, tid, " native: ");
}
os << "
";
}
void ThreadList::DumpUnattachedThreads(std::ostream& os) {
DIR* d = opendir("/proc/self/task");
if (!d) {
return;
}
Thread* self = Thread::Current();
dirent* e;
// 獲取當前程式中所有執行緒,dump這些執行緒的資訊
while ((e = readdir(d)) != nullptr) {
char* end;
pid_t tid = strtol(e->d_name, &end, 10);
if (!*end) {
bool contains;
{
MutexLock mu(self, *Locks::thread_list_lock_);
contains = Contains(tid);
}
if (!contains) {
// [接上]
DumpUnattachedThread(os, tid);
}
}
}
closedir(d);
}
bool ThreadList::Contains(Thread* thread) {
return find(list_.begin(), list_.end(), thread) != list_.end();
}複製程式碼
DumpForSigQuit
thread_list.cc
void ThreadList::DumpForSigQuit(std::ostream& os) {
...
DumpUnattachedThreads(os);
}複製程式碼
runtime.cc
void Runtime::DumpForSigQuit(std::ostream& os) {
GetClassLinker()->DumpForSigQuit(os);
GetInternTable()->DumpForSigQuit(os);
GetJavaVM()->DumpForSigQuit(os);
GetHeap()->DumpForSigQuit(os);
TrackedAllocators::Dump(os);
os << "
";
// [接上]
// dump 執行緒列表
thread_list_->DumpForSigQuit(os);
BaseMutex::DumpAll(os);
}複製程式碼
HandleSigQuit
singal_catcher.cc
void SignalCatcher::HandleSigQuit() {
Runtime* runtime = Runtime::Current();
std::ostringstream os;
// 輸出程式id以及當前dump時間
os << "
"
<< "----- pid " << getpid() << " at " << GetIsoDate() << " -----
";
DumpCmdLine(os);
// 輸出系統資訊
std::string fingerprint = runtime->GetFingerprint();
os << "Build fingerprint: `" << (fingerprint.empty() ? "unknown" : fingerprint) << "`
";
os << "ABI: `" << GetInstructionSetString(runtime->GetInstructionSet()) << "`
";
os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "
";
// [接上]
runtime->DumpForSigQuit(os);
...
os << "----- end " << getpid() << " -----
";
Output(os.str());
}複製程式碼
SignalCatcher::Run
void* SignalCatcher::Run(void* arg) {
...
// Set up mask with signals we want to handle.
SignalSet signals;
signals.Add(SIGQUIT);
signals.Add(SIGUSR1);
while (true) {
// 等待訊號
int signal_number = signal_catcher->WaitForSignal(self, signals);
...
switch (signal_number) {
case SIGQUIT:
// [接上]
signal_catcher->HandleSigQuit();
break;
case SIGUSR1:
signal_catcher->HandleSigUsr1();
break;
default:
LOG(ERROR) << "Unexpected signal %d" << signal_number;
break;
}
}
}
int SignalCatcher::WaitForSignal(Thread* self, SignalSet& signals) {
...
int signal_number = signals.Wait();
...
return signal_number;
}複製程式碼
原始碼分析 (framework)
可見,android framework通過傳送訊號3來通知虛擬機器來進行dump到traces.txt檔案中的
dumpStackTraces
ActivityManagerService.java
private static void dumpStackTraces(String tracesPath, ArrayList<Integer> firstPids,
ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {
// 當trace寫完後,監聽並觸發onEvent方法
FileObserver observer = new FileObserver(tracesPath, FileObserver.CLOSE_WRITE) {
@Override
public synchronized void onEvent(int event, String path) {
notify();
}
};
try {
// 開始監聽
observer.startWatching();
// 首先收集firstPids連結串列裡的程式資訊
if (firstPids != null) {
try {
int num = firstPids.size();
for (int i = 0; i < num; i++) {
synchronized (observer) {
// 傳送訊號3來觸發
Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT);
observer.wait(200); // Wait for write-close, give up after 200msec
}
}
} catch (InterruptedException e) {
Slog.wtf(TAG, e);
}
}
// 收集native程式資訊
...
// 最後收集CPU資訊
...
} finally {
observer.stopWatching();
}
}複製程式碼
public static File dumpStackTraces(boolean clearTraces, ArrayList<Integer> firstPids,
ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {
// 獲取trace檔案路徑
String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
...
File tracesFile = new File(tracesPath);
try {
...
FileUtils.setPermissions(tracesDir.getPath(), 0775, -1, -1); // drwxrwxr-x
if (clearTraces && tracesFile.exists()) tracesFile.delete();
tracesFile.createNewFile();
FileUtils.setPermissions(tracesFile.getPath(), 0666, -1, -1); // -rw-rw-rw-
} catch (IOException e) {
Slog.w(TAG, "Unable to prepare ANR traces file: " + tracesPath, e);
return null;
}
// [接上]
dumpStackTraces(tracesPath, firstPids, processCpuTracker, lastPids, nativeProcs);
return tracesFile;
}複製程式碼
appNotResponding
ActivityManagerService.java
final void appNotResponding(ProcessRecord app, ActivityRecord activity,
ActivityRecord parent, boolean aboveSystem, final String annotation) {
ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
...
final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
// [接上]
File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids,
NATIVE_STACKS_OF_INTEREST);
...
Slog.e(TAG, info.toString());
if (tracesFile == null) {
// There is no trace file, so dump (only) the alleged culprit`s threads to the log
Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
}
addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
cpuInfo, tracesFile, null);
...
}複製程式碼
觸發ANR的方式上面我已經列舉過了,這裡我以觸發Broadcast Timeout時進行分析
AppNotResponding.run
BroadcastQueue.java
private final class AppNotResponding implements Runnable {
private final ProcessRecord mApp;
private final String mAnnotation;
public AppNotResponding(ProcessRecord app, String annotation) {
mApp = app;
mAnnotation = annotation;
}
@Override
public void run() {
// [接上]
mService.appNotResponding(mApp, null, null, false, mAnnotation);
}
}複製程式碼
broadcastTimeoutLocked
BroadcastQueue.java
final void broadcastTimeoutLocked(boolean fromMsg) {
...
String anrMessage = null;
...
if (app != null) {
anrMessage = "Broadcast of " + r.intent.toString();
}
...
if (anrMessage != null) {
// [接上]
mHandler.post(new AppNotResponding(app, anrMessage));
}
}複製程式碼
BroadcastHandler.handleMessage
private final class BroadcastHandler extends Handler {
public BroadcastHandler(Looper looper) {
super(looper, null, true);
}
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
...
case BROADCAST_TIMEOUT_MSG: {
synchronized (mService) {
// [接上]
broadcastTimeoutLocked(true);
}
} break;
...
}
}
};複製程式碼
setBroadcastTimeoutLocked
BroadcastQueue.java
final void setBroadcastTimeoutLocked(long timeoutTime) {
if (! mPendingBroadcastTimeoutMessage) {
Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
mHandler.sendMessageAtTime(msg, timeoutTime);
mPendingBroadcastTimeoutMessage = true;
}
}
final void processNextBroadcast(boolean fromMsg) {
if (! mPendingBroadcastTimeoutMessage) {
long timeoutTime = r.receiverTime + mTimeoutPeriod;
// [接上]
setBroadcastTimeoutLocked(timeoutTime);
}
}複製程式碼
總結
下面總結一下觸發廣播ANR後輸出traces.txt的過程:
- 有序廣播在註冊後傳送
- AMS設定廣播超時時間,通過Handler進行延遲設定
- 廣播超時觸發超時機制
- Framework層傳送訊號給虛擬機器
- 虛擬機器dump出特定程式所有執行緒資訊並儲存到traces.txt檔案中