Android插拔U盘导致黑屏问题排查
问题现象:
车机大屏偶先插拔带音乐的U盘,导致车机系统短暂黑屏的情况。
日志中可以看到vold进程unmount了两次分区,一次是U盘分区,一次是/storage/emulated/0分区:
I vold : Start killProcesses: /mnt/media_rw/050F-4BB4
I vold : Start killProcesses: /storage/emulated/0
问题分析:
/storage/emulated/0分区是/sdcard的映射,理论上不应该在系统正常运行期间被unmount掉,查询上面的日志,发现有这样一条:
StorageUserConnection: Service: [ComponentInfo{com.android.providers.media.module/com.android.providers.media.fuse.ExternalStorageServiceImpl}] disconnected. User [0]
看这个日志的意思是StorageManagerService检测到com.android.providers.media.module进程不在了,对应的代码是这样的:
// StorageUserConnection.javapublic void onServiceDisconnected(ComponentName name) {// Service crashed or process was killed, #onServiceConnected will be called// Don't need to re-bind.Slog.i(TAG, "Service: [" + name + "] disconnected. User [" + mUserId + "]");handleDisconnection();}private void handleDisconnection() {// Clear all sessions because we will need a new device fd since// StorageManagerService will reset the device mount state and #startSession// will be called for any required mounts.// Notify StorageManagerService so it can restart all necessary sessionsclose();resetUserSessions();}
跟踪resetUserSessions可以看到这么一条调用栈:
StorageUserConnection::resetUserSessionsStorageManagerService::resetUser
--------------- binder 调用 -------------------
StorageManagerService::resetIfBootedAndConnectedStorageSessionController::onResetIVold::unmount
--------------- binder 调用 -------------------
VolumeBase::unmountEmulatedVolume::doUnmountKillProcessesUsingPathKillProcessesWithOpenFiles
也就是因为com.android.providers.media.module进程被kill了,导致被SystemServer中的StorageManagerService这个BinderService检测到了(通过bindService时传入的mServiceConnection回调到的)
日志再往上找,看下com.android.providers.media.module进程为什么会被kill,看到下面这几条日志:
I vold : Start KillProcessesUsingPath: public:8,1 /mnt/media_rw/050F-4BB4
I vold : Start killProcesses: /mnt/media_rw/050F-4BB4
W vold : Found symlink /proc/2487/fd/93 referencing /mnt/media_rw/050F-4BB4
W vold : Sending Interrupt to pid 2487 (rs.media.module, /system/bin/app_process64)
可以看到是因为拔掉U盘的时候,触发了Vold的动作,检测到com.android.providers.media.module进程在访问U盘分区中的内容,就把它kill了。
这里就有点不对了,com.android.providers.media.module按理也是android系统中一个重要的进程,不应该就因为拔一个U盘就重启了吧,阅读KillProcessesWithOpenFiles代码,看到有一个判断:
int KillProcessesWithOpenFiles(const std::string& prefix, int signal, bool killFuseDaemon) {...if (found) {if (!IsFuseDaemon(pid) || killFuseDaemon) { // 判断进程是否为fusedaemonpids.insert(pid);} else {LOG(WARNING) << "Found FUSE daemon with open file. Skipping...";}}...
}// TODO: Use a better way to determine if it's media provider app.
bool IsFuseDaemon(const pid_t pid) {auto path = StringPrintf("/proc/%d/mounts", pid);char* tmp;if (lgetfilecon(path.c_str(), &tmp) < 0) { // 这里的判断是否存在不确定性?return false;}bool result = android::base::StartsWith(tmp, kMediaProviderAppCtx)|| android::base::StartsWith(tmp, kMediaProviderCtx);freecon(tmp);return result;
}
因为是偶先的,因此将怀疑点定在IsFuseDaemon里面lgetfilecon函数调用不稳定,没有识别出mediaprovider进程
后续准备换掉lgetfilecon判断是否为mediaprovider进程的方式,改为从/proc/$pid目录读取cmd文件内容来判断进程名的方式。