DescriptionFix signal propagation logic for Linux/Android exception handler. The current code is relying on info->si_pid to figure out whether the exception handler was triggered by a signal coming from the kernel (that will re-trigger until the cause that triggered the signal has been cleared) or from user-space e.g., kill -SIGNAL pid, which will NOT automatically re-trigger in the next signal handler in the chain. While the intentions are good (manually re-triggering user-space signals), the current implementation mistakenly looks at the si_pid field in siginfo_t, assuming that it is coming from the kernel if si_pid == 0. This is wrong. siginfo_t, in fact, is a union and si_pid is meaningful only for userspace signals. For signals originated by the kernel, instead, si_pid overlaps with si_addr (the faulting address). As a matter of facts, the current implementation is mistakenly re-triggering the signal using tgkill for most of the kernel-space signals (unless the fault address is exactly 0x0). This is not completelly correct for the case of SIGSEGV/SIGBUS. The next handler in the chain will stil see the signal, but the |siginfo| and the |context| arguments of the handler will be meaningless (retriggering a signal with tgkill doesn't preserve them). Therefore, if the next handler in the chain expects those arguments to be set, it will fail. Concretelly, this is causing problems to WebView. In some rare circumstances, the next handler in the chain is a user-space runtime which does SIGSEGV handling to implement speculative null pointer managed exceptions (see as an example http://www.mono-project.com/docs/advanced/runtime/docs/exception-handling/) The fix herein proposed consists in using the si_code (see SI_FROMUSER macros) to determine whether a signal is coming form the kernel (and therefore just re-establish the next signal handler) or from userspace (and use the tgkill logic). Repro case: This issue is visible in Chrome for Android with this simple repro case: - Add a non-null pointer dereference in the codebase: *((volatile int*)0xbeef) = 42 Without this change: the next handler (the libc trap) prints: F/libc ( 595): Fatal signal 11 (SIGSEGV), code 1, fault addr 0x487 where 0x487 is actually the PID of the process (which is wrong). With this change: the next handler prints: F/libc ( 595): Fatal signal 11 (SIGSEGV), code 1, fault addr 0xbeef which is the correct answer. BUG=chromium:481937 Patch Set 1 #Patch Set 2 : #
Total comments: 1
MessagesTotal messages: 5
|