Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(1)

Issue 394002: Breakpad fails to walk stacks which have FPO-optimized Windows system calls in the context frame (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
11 years, 11 months ago by ivanpe.google
Modified:
11 years, 10 months ago
CC:
google-breakpad-dev_googlegroups.com
Base URL:
http://google-breakpad.googlecode.com/svn/trunk/src/
Visibility:
Public.

Description

It appears that breakpad has trouble walking certain stacks on the x86 platform
where the function on the top of the stack (the context frame) represents a
Windows system call (stub) and it is compiled with FPO (frame-pointer-omission)
optimization.

The main reason that only certain CPU models are affected is that the OS call
dispatching mechanism differs from CPU to CPU.  For more details on why and how
it differs take a look at the following blog post:

http://www.dumpanalysis.org/blog/index.php/2008/01/10/what-is-kifastsystemcal...

Now, it is not surprising that stacks which contain FPO frames are not walkable
in the general case. However we are dealing with a very specific (and quite
common) case where the function on the top of the stack represents a Windows
system call stub (the thread is in kernel mode) where the stack configuration is
well understood.

First, let’s look at the general case. The following blog post makes a very good
case on why such stacks (__stdcall and FPO involved) are not walkable in the
general case even in the presence of full debugging information:
    http://www.virtualdub.org/blog/pivot/entry.php?id=148

To summarize, the ESP-to-return-address offset normally changes while executing
a function which touches ESP (contains push, pop, call, ... instructions).  And
unfortunately, on the x86 platform, it is not possible to know this offset for
each instruction because such information is not present in the debugging
information.  So the hope is basically that you won’t crash while preparing a
function call (ESP is modified).  It is indeed relatively rare for a function to
crash while preparing a call (after a push instruction and before the call is
executed).  And once a call is executed a new frame is established so we don’t
need to worry about it.  We only have this issue when crashing before the call
(and after at least one push).

  00000000: 8B 01    mov   eax,dword ptr [ecx]  ← crashing here is OK (offset is
known)
  00000002: 6A 02    push        2  ← changes ESP
  00000004: 6A 01    push        1  ← changes ESP
  00000006: FF D0    call        eax  ← crashing here is problematic because ESP
is changed 
  00000008: FF D0    call        eax
  0000000A: C3       ret      ← offset is known again 

Now let’s focus on our issue.  In our case, stack walking is affected for
threads which are suspended or running in kernel mode - not for the crashing
threads (although if an FPO optimized function crashes while preparing a
function call we’ll most likely fail to walk the stack as well).

The reason stack walking is affected is because ESP is changed due to a call
instruction, but no new frame is established (at least not on the user mode
stack) because this is a system call and for certain CPU models the system call
dispatch mechanism doesn’t cvreate a new frame.  Since our function on the top
of the stack (ntdll!NtWaitForSingleObject) has no locals, ESP is expected to
point exactly at the return address (should point to
KERNELBASE!WaitForSingleObjectEx).  However the call instruction has modified
ESP (stored a new address address on the top of the stack) without actually
establishing a new frame (the context record makes the fake impression that the
function call has just returned) and ESP points to a return address which equals
the current instruction pointer (EIP) stored in the user mode thread context:

ntdll!NtWaitForSingleObject:
771ef8ac b801000000      mov     eax,1
771ef8b1 b90d000000      mov     ecx,0Dh
771ef8b6 8d542404        lea     edx,[esp+4]
771ef8ba 64ff15c0000000  call    dword ptr fs:[0C0h]  ← ESP is decremented by 4
771ef8c1 83c404          add     esp,4   ← EIP in the context points here
                                           This is the instruction that will
get
                                           executed when the thread is resumed
771ef8c4 c20c00          ret     0Ch

0:000> r
eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=000017b0 edi=002ff2d8
eip=771ef8c1 esp=002ff290 ebp=002ff2fc iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00200246
ntdll!NtWaitForSingleObject+0x15:
771ef8c1 83c404          add     esp


0:000> dps @esp
002ff290  771ef8c1 ntdll!NtWaitForSingleObject+0x15
002ff294  75fa0a91 KERNELBASE!WaitForSingleObjectEx+0x98  ← return value for the
top frame
002ff298  000017b0
002ff29c  00000000
002ff2a0  002ff2d8  ← second scan.  Looking for the return value for frame 1
002ff2a4  88014a2e  ← 
002ff2a8  002ff364  ...
002ff2ac  000017b0
002ff2b0  00000000
002ff2b4  00000024
002ff2b8  00000001
002ff2bc  00000000
002ff2c0  00000000
002ff2c4  00000000
002ff2c8  00000000
002ff2cc  00000000
002ff2d0  00000000
002ff2d4  00000000
002ff2d8  9e3b9800
002ff2dc  fffffff7
002ff2e0  00000000
002ff2e4  002ff2a4
002ff2e8  64a07ff1
chrome_647f0000!base::internal::Invoker1<0,base::internal::InvokerStorage1<void
(__thiscall
TransportSecurityPersister::Loader::*)(void),base::internal::UnretainedWrapper<TransportSecurityPersister::Loader>
>,void (__thiscall TransportSecurityPersister::Loader::*)(void)>::DoInvoke
[d:\b\build\slave\chrome-official\build\src\base\bind_internal.h @ 594]     ←
garbage
002ff2ec  002ff8dc
002ff2f0  75fc6590 KERNELBASE!_except_handler4
002ff2f4  fdd2c6ea
002ff2f8  00000000
002ff2fc  002ff314 ← EBP chain
002ff300  75741194 kernel32!WaitForSingleObjectExImplementation+0x75  ← next
return value
002ff304  000017b0
002ff308  0036ee80
002ff30c  00000000

As a result, breakpad doesn’t see the real return value and it is confused that
the return value is in the same function (a recursive call).  Then, since it has
to deal with the same function again (in frame 1) which we already mentioned it
is compiled with FPO, breakpad fails to find the return value (high trust
option) and fails back to scanning the stack for a suitable return value (low
trust option) and detects random garbage.  Once a non-FPO frame is detected
(being it garbage or not) the EBP frame chain is followed, so the rest of the
stack looks reasonable (with the exception of the second frame which is always
lost).  Here is an example:

0x771ef8c1 [ntdll.dll] + 0x0001f8c1]    ZwWaitForSingleObject
0x771ef8c0 [ntdll.dll] + 0x0001f8c0]    ZwWaitForSingleObject
0x64a07ff0 [chrome.dll]	- ...]         
TransportSecurityPersister::SerializeData(std::basic_string<char,std::char_traits<char>,std::allocator<char>
> *)
0x75741193 [kernel32.dll] + 0x00011193] WaitForSingleObjectExImplementation
0x75741147 [kernel32.dll] + 0x00011147] WaitForSingleObject
0x6491ecdf [chrome.dll]	- ...]         
base::WaitableEvent::TimedWait(base::TimeDelta const &)
[..snip..]

Expected output if the following (from WinDBG):

ChildEBP RetAddr  
002ff290 75fa0a91 ntdll!NtWaitForSingleObject+0x15
002ff2fc 75741194 KERNELBASE!WaitForSingleObjectEx+0x98
002ff314 75741148 kernel32!WaitForSingleObjectExImplementation+0x75
002ff328 6491ece0 kernel32!WaitForSingleObject+0x12
002ff33c 64a2d039 chrome_647f0000!base::WaitableEvent::TimedWait+0x2d
[..snip..]

It appears that this is a very common pattern in our current crash reports.  The
pattern can be recognized as follows:
   - The top two frames look like a recursive call done by a system function
(very weird indeed)
   - The real return address from the system function (second frame) is missing
   - The third frame is very often random crap (execution residue from the
stack)
   - In most cases, the call stack below the fourth frame looks good.  In some
cases, we can’t recover and the stack is truncated

Here is how we can fix it:
1. We need to recognize frames created by the system call dispatch mechanism.
2. We need to correct ESP and get a proper return address.

I isolated a simple repro of the issue, created a new unit test for it, and
validated that my fix works.

Thanks,
-Ivan

Patch Set 1 #

Total comments: 2

Patch Set 2 : I'm breaking this change into three different changes #

Total comments: 15

Patch Set 3 : Incorporating changes from the code review #

Total comments: 4

Patch Set 4 : Getting rid of the changes to the serialization code #

Total comments: 12

Patch Set 5 : Incorporating changes from code review #

Total comments: 14

Patch Set 6 : Incorporating code review changes #

Total comments: 1

Patch Set 7 : Incorporating changes from code review. Encapsulating 'type_' inside WindowsFrameInfo #

Total comments: 1

Patch Set 8 : Incorporating code review changes - fixing indent #

Unified diffs Side-by-side diffs Delta from patch set Stats Patch
M processor/basic_source_line_resolver_unittest.cc View 1 2 3 4 5 6 4 chunks +4 lines, -0 lines 0 comments Download
M processor/fast_source_line_resolver.cc View 1 2 3 4 5 6 2 chunks +10 lines, -4 lines 0 comments Download
M processor/fast_source_line_resolver_unittest.cc View 1 2 3 4 5 6 3 chunks +3 lines, -0 lines 0 comments Download
M processor/module_comparer.cc View 4 5 6 1 chunk +1 line, -0 lines 0 comments Download
M processor/simple_serializer-inl.h View 1 2 3 4 5 6 2 chunks +3 lines, -0 lines 0 comments Download
M processor/stackwalker_x86.cc View 1 2 3 4 5 6 1 chunk +19 lines, -3 lines 0 comments Download
M processor/stackwalker_x86_unittest.cc View 1 2 3 4 5 6 7 14 chunks +495 lines, -283 lines 0 comments Download
M processor/windows_frame_info.h View 1 2 3 4 5 6 6 chunks +12 lines, -4 lines 0 comments Download

Messages

Total messages: 31
ivanpe.google
It appears that breakpad has trouble walking stacks on certain CPU models for the x86/Win32 ...
11 years, 11 months ago #1
Mark Mentovai
http://breakpad.appspot.com/394002/diff/1/common/test_assembler.cc File common/test_assembler.cc (right): http://breakpad.appspot.com/394002/diff/1/common/test_assembler.cc#newcode223 common/test_assembler.cc:223: back_insert_iterator<std::string>(contents_)); Is it possible to break the std::string change ...
11 years, 11 months ago #2
ivanpe.google
I'm breaking this change into three different changes. I'll send separate code reviews for the ...
11 years, 11 months ago #3
Ted Mielczarek
LGTM, just a few minor comments. http://breakpad.appspot.com/394002/diff/5001/processor/fast_source_line_resolver.cc File processor/fast_source_line_resolver.cc (right): http://breakpad.appspot.com/394002/diff/5001/processor/fast_source_line_resolver.cc#newcode110 processor/fast_source_line_resolver.cc:110: const int *type ...
11 years, 11 months ago #4
Mark Mentovai
http://breakpad.appspot.com/394002/diff/5001/processor/stackwalker_x86.cc File processor/stackwalker_x86.cc (right): http://breakpad.appspot.com/394002/diff/5001/processor/stackwalker_x86.cc#newcode217 processor/stackwalker_x86.cc:217: StackFrame::FRAME_TRUST_CONTEXT == last_frame->trust && Can you flip this and ...
11 years, 11 months ago #5
ivanpe.google
Incorporating changes from the code review. http://breakpad.appspot.com/394002/diff/5001/processor/fast_source_line_resolver.cc File processor/fast_source_line_resolver.cc (right): http://breakpad.appspot.com/394002/diff/5001/processor/fast_source_line_resolver.cc#newcode110 processor/fast_source_line_resolver.cc:110: const int *type ...
11 years, 11 months ago #6
Mark Mentovai
Please let me know if you’d like to commit to Breakpad as ivanpe@google.com or some ...
11 years, 11 months ago #7
ivanpe.google
I'm getting rid of the changes to the serialization code. I'll update the code review ...
11 years, 11 months ago #8
Ted Mielczarek
On 2012/06/11 18:25:41, ivanpe wrote: > I updated my code to not change the WFI ...
11 years, 11 months ago #9
ivanpe.google
On 2012/06/11 19:12:13, Ted Mielczarek wrote: > On 2012/06/11 18:25:41, ivanpe wrote: > > I ...
11 years, 11 months ago #10
ivanpe.google
Please, take another look. I got rid of the changes to the serialization code. Mark, ...
11 years, 11 months ago #11
Mark Mentovai
http://breakpad.appspot.com/394002/diff/12002/google_breakpad/processor/source_line_resolver_interface.h File google_breakpad/processor/source_line_resolver_interface.h (right): http://breakpad.appspot.com/394002/diff/12002/google_breakpad/processor/source_line_resolver_interface.h#newcode95 google_breakpad/processor/source_line_resolver_interface.h:95: // ownership of any returned WindowsFrameInfo object. Say something ...
11 years, 11 months ago #12
Mark Mentovai
I’ve set you up as ivan.penkov@gmail.com. Please continue to use this issue (and ivanpe@google.com) for ...
11 years, 11 months ago #13
Ted Mielczarek
LGTM if you address Mark's comments. http://breakpad.appspot.com/394002/diff/12002/processor/stackwalker_x86_unittest.cc File processor/stackwalker_x86_unittest.cc (right): http://breakpad.appspot.com/394002/diff/12002/processor/stackwalker_x86_unittest.cc#newcode890 processor/stackwalker_x86_unittest.cc:890: "src\\chrome\\browser\\transport_security_persister.cc\n" Just shorten ...
11 years, 11 months ago #14
ivanpe.google
The requested changes were incorporated in the latest iteration Thanks, -Ivan http://breakpad.appspot.com/394002/diff/12002/google_breakpad/processor/source_line_resolver_interface.h File google_breakpad/processor/source_line_resolver_interface.h (right): ...
11 years, 11 months ago #15
Mark Mentovai
http://breakpad.appspot.com/394002/diff/10003/google_breakpad/processor/source_line_resolver_interface.h File google_breakpad/processor/source_line_resolver_interface.h (right): http://breakpad.appspot.com/394002/diff/10003/google_breakpad/processor/source_line_resolver_interface.h#newcode96 google_breakpad/processor/source_line_resolver_interface.h:96: // 'type' is an optinal out parameter of type ...
11 years, 11 months ago #16
ivanpe.google
Done. http://breakpad.appspot.com/394002/diff/10003/google_breakpad/processor/source_line_resolver_interface.h File google_breakpad/processor/source_line_resolver_interface.h (right): http://breakpad.appspot.com/394002/diff/10003/google_breakpad/processor/source_line_resolver_interface.h#newcode96 google_breakpad/processor/source_line_resolver_interface.h:96: // 'type' is an optinal out parameter of ...
11 years, 11 months ago #17
Mark Mentovai
One last thing: Can you explain (to me, here in the code review) why you ...
11 years, 11 months ago #18
ivanpe.google
On 2012/06/12 00:43:51, Mark Mentovai wrote: > One last thing: > > Can you explain ...
11 years, 11 months ago #19
ivanpe.google
.
11 years, 11 months ago #20
Mark Mentovai
You shouldn’t (can’t) forward-declare an enum in pre-C++11 with any compiler. The compiler needs to ...
11 years, 11 months ago #21
ivanpe.google
I can certainly move WindowsFrameInfo, but more and more while I'm thinking about this I ...
11 years, 11 months ago #22
Mark Mentovai
I think that sounds like a good approach.
11 years, 11 months ago #23
ivanpe.google
Encapsulating 'type_' inside WindowsFrameInfo.
11 years, 11 months ago #24
Mark Mentovai
http://breakpad.appspot.com/394002/diff/12006/processor/stackwalker_x86_unittest.cc File processor/stackwalker_x86_unittest.cc (right): http://breakpad.appspot.com/394002/diff/12006/processor/stackwalker_x86_unittest.cc#newcode211 processor/stackwalker_x86_unittest.cc:211: { // To avoid reusing locals by mistake OK, ...
11 years, 11 months ago #25
ivanpe.google
On 2012/06/12 19:37:26, Mark Mentovai wrote: > http://breakpad.appspot.com/394002/diff/12006/processor/stackwalker_x86_unittest.cc > File processor/stackwalker_x86_unittest.cc (right): > > http://breakpad.appspot.com/394002/diff/12006/processor/stackwalker_x86_unittest.cc#newcode211 ...
11 years, 11 months ago #26
Mark Mentovai
That cast thing is a tiny bit scary, but LGTM.
11 years, 11 months ago #27
ivanpe.google
Mark, I'm getting the following error when trying to commit the changes: Do you have ...
11 years, 11 months ago #28
Mark Mentovai
You probably have a read only (http://) checkout. You need a read-write (https://) checkout. https://code.google.com/p/google-breakpad/source/checkout ...
11 years, 11 months ago #29
ivanpe.google
Never mind. I saw that you changed my e-mail to ivan.penkov@gmail.com. It worked fine with ...
11 years, 11 months ago #30
ivanpe.google
11 years, 11 months ago #31
Oops, I sent my password by mistake.  I just regenerated a fresh one from
the website.  I hope the old one doesn't work anymore.

Thanks,
-Ivan

On Tue, Jun 12, 2012 at 2:20 PM, Ivan Penkov <ivanpe@google.com> wrote:

> Never mind.  I saw that you changed my e-mail to ivan.penkov@gmail.com.
>  It worked fine with the new e-mail.
>
> ivanpe@ivanp1:~/src/svn/google-breakpad$ svn commit --username
> ivan.penkov@gmail.com --password Bg8Yz7Vs7rN5
> Sending        src/processor/basic_source_line_resolver_unittest.cc
> Sending        src/processor/fast_source_line_resolver.cc
> Sending        src/processor/fast_source_line_resolver_unittest.cc
> Sending        src/processor/module_comparer.cc
> Sending        src/processor/simple_serializer-inl.h
> Sending        src/processor/stackwalker_x86.cc
> Sending        src/processor/stackwalker_x86_unittest.cc
> Sending        src/processor/windows_frame_info.h
> Transmitting file data ........
> Committed revision 971.
>
> Thanks,
> -Ivan
>
> On Tue, Jun 12, 2012 at 2:15 PM, <mark@chromium.org> wrote:
>
>> You probably have a read only (http://) checkout. You need a read-write
>> (https://) checkout.
>>
https://code.google.com/p/**google-breakpad/source/**checkout<https://code.go...
>> instructions if you’re signed in with an account that has write access.
>>
>> You can try “svn switch --relocate” to move your working copy over.
>> That’d be something like
>>
>> svn switch
http://google-breakpad.**googlecode.com/svn/trunk/<http://google-breakpad.goo...>
>>
https://google-breakpad.**googlecode.com/svn/trunk/<https://google-breakpad.g...
>>
>> http://breakpad.appspot.com/**394002/<http://breakpad.appspot.com/394002/>
>>
>
>
>
> --
> Thanks,
> -Ivan
>
>


-- 
Thanks,
-Ivan
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld 1004:630ec63f810e-tainted