On 15/11/2024 9:28 am, Jan Beulich wrote:
> On 14.11.2024 19:22, Andrew Cooper wrote:
>> It may have taken a while, but it occurs to me that the mentioned commit 
>> fixed
>> a second problem too.
>>
>> On entering trampoline_boot_cpu_entry(), %esp points at the trampoline stack,
>> but in a 32bit flat segment.  It happens to be page aligned.
>>
>> When dropping into 16bit mode, the stack segment operates on %sp, preserving
>> the upper bits.  Prior to 1ed76797439e, the top nibble of %sp would depend on
>> where the trampoline was placed in low memory, and only had a 1/16 chance of
>> being 0 and therefore operating on the intended stack.
>>
>> There was a 15/16 chance of using a different page in the trampoline as if it
>> were the stack.  Therefore, zeroing %esp was correct, but for more reasons
>> than realised at the time.
> I'm afraid I don't follow this analysis. Said commit replaced clearing of %sp
> by clearing of %esp.

Correct

> That made no difference for anything using the 16-bit
> register.

True, but Xen's 16bit code isn't very relevant to the analysis. 
Fujitsu's BIOS is.

> I don't see how the top nibble of %sp could have been non-zero
> prior to that change.

Oh, that's a typo.  It should have been the 5th nibble of %esp.

Said nibble depends entirely on where the trampoline is placed in low
memory.

We first enter the trampoline in 32bit flat mode with %esp being the
absolute address of the stack. i.e. it's 0x000yyyyy with a 15/16th's
chance of the 5th nibble being non-zero.

Then we drop down into Real mode (non flat, because the trampoline never
overlaps the IVT at 0).  At this point we used to zero %sp which
preserves %esp's 5th nibble.

And in the case that went wrong, INT $0x15 corrupted memory that
happened to be in the Xen image.


Anyway, when I was debugging 11 years ago, I noticed that %esp was
nonzero in its upper half and, despite deciding this was suspicious,
couldn't figure out why and zeroing it all fixed the memory corruption.

I also didn't appreciate that `xor %sp, %sp` was strongly dependent on
the trampoline being at exactly trampoline_start + 64k.


Anyway, given that everyone seemed to be confused, I guess I need to try
rewriting the commit message.

~Andrew

Reply via email to