On 10/18/25 04:07, David Hildenbrand wrote:
> On 17.10.25 17:20, Christian Borntraeger wrote:
>>
>>
>> Am 17.10.25 um 17:07 schrieb David Hildenbrand:
>>> On 17.10.25 17:01, Christian Borntraeger wrote:
>>>> Am 17.10.25 um 16:54 schrieb David Hildenbrand:
>>>>> On 17.10.25 16:49, Christian Borntraeger wrote:
>>>>>> This patch triggers a regression for s390x kvm as qemu guests can no 
>>>>>> longer start
>>>>>>
>>>>>> error: kvm run failed Cannot allocate memory
>>>>>> PSW=mask 0000000180000000 addr 000000007fd00600
>>>>>> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000 
>>>>>> R03=0000000000000000
>>>>>> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000 
>>>>>> R07=0000000000000000
>>>>>> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000 
>>>>>> R11=0000000000000000
>>>>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 
>>>>>> R15=0000000000000000
>>>>>> C00=00000000000000e0 C01=0000000000000000 C02=0000000000000000 
>>>>>> C03=0000000000000000
>>>>>> C04=0000000000000000 C05=0000000000000000 C06=0000000000000000 
>>>>>> C07=0000000000000000
>>>>>> C08=0000000000000000 C09=0000000000000000 C10=0000000000000000 
>>>>>> C11=0000000000000000
>>>>>> C12=0000000000000000 C13=0000000000000000 C14=00000000c2000000 
>>>>>> C15=0000000000000000
>>>>>>
>>>>>> KVM on s390x does not use THP so far, will investigate. Does anyone have 
>>>>>> a quick idea?
>>>>>
>>>>> Only when running KVM guests and apart from that everything else seems to 
>>>>> be fine?
>>>>
>>>> We have other weirdness in linux-next but in different areas. Could that 
>>>> somehow be
>>>> related to use disabling THP for the kvm address space?
>>>
>>> Not sure ... it's a bit weird. I mean, when KVM disables THPs we 
>>> essentially just remap everything to be mapped by PTEs. So there shouldn't 
>>> be any PMDs in that whole process.
>>>
>>> Remapping a file THP (shmem) implies zapping the THP completely.
>>>
>>>
>>> I assume in your kernel config has CONFIG_ZONE_DEVICE and 
>>> CONFIG_ARCH_ENABLE_THP_MIGRATION set, right?
>>
>> yes.
>>
>>>
>>> I'd rule out copy_huge_pmd(), zap_huge_pmd() a well.
>>>
>>>
>>> What happens if you revert the change in mm/pgtable-generic.c?
>>
>> That partial revert seems to fix the issue
>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>> index 0c847cdf4fd3..567e2d084071 100644
>> --- a/mm/pgtable-generic.c
>> +++ b/mm/pgtable-generic.c
>> @@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, 
>> pmd_t *pmdvalp)
>>              if (pmdvalp)
>>                   *pmdvalp = pmdval;
>> -       if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
>> +       if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval)))
> 
> Okay, but that means that effectively we stumble over a PMD entry that is not 
> a migration entry but still non-present.
> 
> And I would expect that it's a page table, because otherwise the change
> wouldn't make a difference.
> 
> And the weird thing is that this only triggers sometimes, because if
> it would always trigger nothing would ever work.
> 
> Is there some weird scenario where s390x might set a left page table mapped 
> in a PMD to non-present?
> 

Good point

> Staring at the definition of pmd_present() on s390x it's really just
> 
>     return (pmd_val(pmd) & _SEGMENT_ENTRY_PRESENT) != 0;
> 
> 
> Maybe this is happening in the gmap code only and not actually in the core-mm 
> code?
> 


I am not an s390 expert, but just looking at the code

So the check on s390 effectively

segment_entry/present = false or segment_entry_empty/invalid = true

Given that the revert works, the check changes to

segment_entry/present = false or pmd_migration_entry (PAGE_INVALID | 
PAGE_PROTECT)

So it isn't the first check of segment_entry/present = false

sounds like for s390 we would want __pte_offset_map to allow mappings with
segment_entry_empty/invalid entries?

Any chance we can get the stack trace and a dump of the PMD entry when the 
issue occurs?

In the meanwhile, does this fix/workaround work?

diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 0c847cdf4fd3..31c1754d5bd4 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -290,7 +290,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, 
pmd_t *pmdvalp)
 
        if (pmdvalp)
                *pmdvalp = pmdval;
-       if (unlikely(pmd_none(pmdval) || !pmd_present(pmdval)))
+       if (unlikely(pmd_none(pmdval) || 
is_pmd_non_present_folio_entry(pmdval)))
                goto nomap;
        if (unlikely(pmd_trans_huge(pmdval)))
                goto nomap;



Thanks David and Christian!

Balbir

Reply via email to