On Mon, 22 Dec 2025 16:46:53 +0100
Corinna Vinschen wrote:
> On Dec 22 23:37, Takashi Yano via Cygwin wrote:
> > Alignment issue?
> > 
> > This might be the right thing.
> > 
> > diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
> > index 86a00e76e..ec1e3c98c 100644
> > --- a/winsup/cygwin/thread.cc
> > +++ b/winsup/cygwin/thread.cc
> > @@ -630,6 +630,8 @@ pthread::cancel ()
> >        threadlist_t *tl_entry = cygheap->find_tls (cygtls);
> >        if (!cygtls->inside_kernel (&context))
> >     {
> > +     if ((context._CX_stackPtr & 8) == 0)
> > +       context._CX_stackPtr -= 8;
> 
> Does that really help?  Checking for 8 byte alignment is usually done
> with (X & 7) != 0, because this won't catch 16 byte aligned stacks.

This code does not aim for 8 byte alignment, but 16n + 8. I assume
context._CX_stackPtr & 7 is always 0. I wonder if this assumption
is true. What if user code pushes 16 bit register such as AX?
It might be necessary to mask least 3 bits in advance.

diff --git a/winsup/cygwin/thread.cc b/winsup/cygwin/thread.cc
index 86a00e76e..628aef16f 100644
--- a/winsup/cygwin/thread.cc
+++ b/winsup/cygwin/thread.cc
@@ -630,6 +630,9 @@ pthread::cancel ()
       threadlist_t *tl_entry = cygheap->find_tls (cygtls);
       if (!cygtls->inside_kernel (&context))
        {
+         context._CX_stackPtr &= 0xfffffffffffffff8UL;
+         if ((context._CX_stackPtr & 8) == 0)
+           context._CX_stackPtr -= 8;
          context._CX_instPtr = (ULONG_PTR) pthread::static_cancel_self;
          SetThreadContext (win32_obj_id, &context);
        }

> But afaic the stack is always 8 byte aligned anyway.  However, there are
> some scenarios where 16 byte alignment is required, as for context
> itself when calling RtlCaptureContext.  Maybe that's the problem here?

I think so. x86_64 ABI in Windows requires 16 byte alignment.
https://learn.microsoft.com/en-us/cpp/build/stack-usage?view=msvc-170
says:
    The stack will always be maintained 16-byte aligned, except
    within the prolog (for example, after the return address is pushed), 

Therefore, stack alignment here must be 16n +  8 byte alignment.
Because 'call' instruction pushes the RIP (8 byte) into stack,
while the code
context._CX_instPtr = (ULONG_PTR) pthread::static_cancel_self;
does not do that.

> But the context Stackptr is the stackpointer of the current function the
> target thread is running in.  The instruction pointer is set to
> pthread::static_cancel_self(), which doesn't get any arguments and doesn't
> use any content from the stack.

Yeah, that was my question.

> It might be a good idea to make sure the stack is always 16 byte
> aligned, but I don't see why pthread::static_cancel_self() ->
> pthread::cancel_self() -> pthread::exit() would require other than 8
> byte alignment.

pthread::exit() calls _cygtls::remove(), and it calls CloseHandle(),
It appears that, from a certain point, CloseHandle() stopped working
unless it was 16n + 8 byte aligned.

> Apparently something in pthread::exit() crashes?  But where?  Does
> adding debug_printf's help to figure that out?

It crashes in CloseHandle(). debug_printf() also crashes.

#0  0x00007ffa5bea998b in ntdll!SbSelectProcedure ()
   from /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll
#1  0x00007ffa594a1ee5 in KERNELBASE!CloseHandle ()
   from /cygdrive/c/WINDOWS/System32/KERNELBASE.dll
#2  0x00007ff9e68858ef in _cygtls::remove (this=0x7ffdfce00, wait=4294967295)
    at /usr/src/debug/cygwin-3.6.5-1/winsup/cygwin/cygtls.cc:121
#3  0x00007ff9e6885e88 in _cygtls::remove (this=<optimized out>,
    wait=<optimized out>)
    at /usr/src/debug/cygwin-3.6.5-1/winsup/cygwin/cygtls.cc:153
#4  0x00007ff9e68e3803 in pthread::exit (this=0xa00003750,
    value_ptr=0xffffffffffffffff)
    at /usr/src/debug/cygwin-3.6.5-1/winsup/cygwin/thread.cc:583
#5  0x00007ff9e68e38d4 in pthread::cancel_self (this=0x4)
    at /usr/src/debug/cygwin-3.6.5-1/winsup/cygwin/thread.cc:1061
#6  0x00007ff9e68e3939 in pthread::static_cancel_self ()
    at /usr/src/debug/cygwin-3.6.5-1/winsup/cygwin/thread.cc:986
#7  0x0000000000000000 in ?? ()

and crashes at:
Dump of assembler code for function ntdll!SbSelectProcedure:
   0x00007ffa5bea9820 <+0>:     mov    %rbx,0x8(%rsp)
   0x00007ffa5bea9825 <+5>:     mov    %rsi,0x10(%rsp)
   0x00007ffa5bea982a <+10>:    mov    %rdi,0x20(%rsp)
   0x00007ffa5bea982f <+15>:    push   %rbp
   0x00007ffa5bea9830 <+16>:    push   %r12
   0x00007ffa5bea9832 <+18>:    push   %r13
   0x00007ffa5bea9834 <+20>:    push   %r14
   0x00007ffa5bea9836 <+22>:    push   %r15
   0x00007ffa5bea9838 <+24>:    lea    -0x1b0(%rsp),%rbp
   0x00007ffa5bea9840 <+32>:    sub    $0x2b0,%rsp
....
=> 0x00007ffa5bea998b <+363>:   movaps %xmm0,0x170(%rbp)
   0x00007ffa5bea9992 <+370>:   movaps %xmm0,0x180(%rbp)
   0x00007ffa5bea9999 <+377>:   movaps %xmm0,0x190(%rbp)

This means that RBP is not aligned to 16 byte. If the RSP is aligned
to 16n + 8 byte at the begining of the SbSelectProcedure(),
RSP - 8*5 (rbp, r12, r13, r14, r15) - 0x1b0 is not alignd to
16 byte, that is, RSP is not aligned to 16n + 8 byte.

-- 
Takashi Yano <[email protected]>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to