On 22. 9. 25. 08:52, Philippe Mathieu-Daudé wrote:
> CAUTION: This email originated from outside of the organization. Do 
> not click links or open attachments unless you recognize the sender 
> and know the content is safe.
>
>
> On 3/9/25 14:35, Djordje Todorovic wrote:
>>
>> On 1. 9. 25. 13:05, Philippe Mathieu-Daudé wrote:
>>> CAUTION: This email originated from outside of the organization. Do
>>> not click links or open attachments unless you recognize the sender
>>> and know the content is safe.
>>>
>>>
>>> On 1/9/25 10:17, Djordje Todorovic wrote:
>>>> On 8. 8. 25. 17:52, Philippe Mathieu-Daudé wrote:
>>>>
>>>>> CAUTION: This email originated from outside of the organization. Do
>>>>> not click links or open attachments unless you recognize the sender
>>>>> and know the content is safe.
>>>>>
>>>>>
>>>>> On 17/7/25 11:38, Djordje Todorovic wrote:
>>>>>> This is needed for riscv based CPUs by MIPS since those may have
>>>>>> sparse hart-ID layouts. ACLINT and APLIC still assume a dense
>>>>>> range, and if a hart is missing, this causes NULL derefs.
>>>>>>
>>>>>> Signed-off-by: Chao-ying Fu <[email protected]>
>>>>>> Signed-off-by: Djordje Todorovic <[email protected]>
>>>>>> ---
>>>>>>     hw/intc/riscv_aclint.c | 21 +++++++++++++++++++--
>>>>>>     hw/intc/riscv_aplic.c  | 11 ++++++++---
>>>>>>     2 files changed, 27 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
>>>>>> index b0139f03f5..22ac4133d5 100644
>>>>>> --- a/hw/intc/riscv_aclint.c
>>>>>> +++ b/hw/intc/riscv_aclint.c
>>>>>> @@ -292,7 +292,13 @@ static void
>>>>>> riscv_aclint_mtimer_realize(DeviceState *dev, Error **errp)
>>>>>>         s->timecmp = g_new0(uint64_t, s->num_harts);
>>>>>>         /* Claim timer interrupt bits */
>>>>>>         for (i = 0; i < s->num_harts; i++) {
>>>>>> -        RISCVCPU *cpu = RISCV_CPU(cpu_by_arch_id(s->hartid_base +
>>>>>> i));
>>>>>> +        CPUState *cpu_by_hartid = cpu_by_arch_id(s->hartid_base 
>>>>>> + i);
>>>>>> +        if (cpu_by_hartid == NULL) {
>>>>>> +            qemu_log_mask(LOG_GUEST_ERROR, "aclint-mtimer: invalid
>>>>>> hartid: %u",
>>>>>> +                          s->hartid_base + i);
>>>>>
>>>>> DeviceRealize() handlers are part of machine modelling, not guest 
>>>>> uses.
>>>>>
>>>>> IOW, triggering this is a programming mistake, so we should just
>>>>> abort() here.
>>>>
>>>> Well, if we do it that way, our Boston board target for P8700 cannot
>>>> run.
>>>
>>> So the problem is elsewhere :)
>>
>>
>> I see. Would something like this be acceptable:
>>
>> --- a/hw/intc/riscv_aclint.c
>> +++ b/hw/intc/riscv_aclint.c
>> @@ -279,7 +279,7 @@ static const Property
>> riscv_aclint_mtimer_properties[] = {
>>    static void riscv_aclint_mtimer_realize(DeviceState *dev, Error 
>> **errp)
>>    {
>>        RISCVAclintMTimerState *s = RISCV_ACLINT_MTIMER(dev);
>> -    int i;
>> +    CPUState *cpu;
>>
>>        memory_region_init_io(&s->mmio, OBJECT(dev), 
>> &riscv_aclint_mtimer_ops,
>>                              s, TYPE_RISCV_ACLINT_MTIMER, 
>> s->aperture_size);
>> @@ -291,18 +291,15 @@ static void
>> riscv_aclint_mtimer_realize(DeviceState *dev, Error **errp)
>>        s->timers = g_new0(QEMUTimer *, s->num_harts);
>>        s->timecmp = g_new0(uint64_t, s->num_harts);
>>        /* Claim timer interrupt bits */
>> -    for (i = 0; i < s->num_harts; i++) {
>> -        CPUState *cpu_by_hartid = cpu_by_arch_id(s->hartid_base + i);
>> -        if (cpu_by_hartid == NULL) {
>> -            qemu_log_mask(LOG_GUEST_ERROR, "aclint-mtimer: invalid
>> hartid: %u",
>> -                          s->hartid_base + i);
>> -            continue;
>> -        }
>> -        RISCVCPU *cpu = RISCV_CPU(cpu_by_hartid);
>> -        if (riscv_cpu_claim_interrupts(cpu, MIP_MTIP) < 0) {
>> -            error_report("MTIP already claimed");
>> -            exit(1);
>> -        }
>> +    CPU_FOREACH(cpu) {
>> +      if (cpu == NULL)
>> +        abort();
>
> Why do you end having a NULL vcpu in the global cpus_queue?
> (this is the 'elsewhere problem').
>
Well, it is true, for our case, we would never get into vcpu == NULL case.


After several attempts to come up with a better solution for this, I think

we are back to the existing one. I will try to elaborate why.

The sparse hart-ID layout in this case is not a programming mistake but

an intentional hardware design characteristic of the P8700. The P8700

RISC-V implementation has a sparse hart-ID layout where not all hart IDs

in a range are populated. This is explicitly supported by the RISC-V APLIC

specification. The current ACLINT/APLIC implementation assumes a dense

range of hart IDs (from hartid_base to hartid_base + num_harts - 1).

For the P8700 board:

   - We iterate through the theoretical hart ID range for a cluster

   - Some hart IDs legitimately don't have corresponding CPUs (sparse 
layout)

   - We need to skip these without failing

The CPU_FOREACH approach doesn't work here because:

   - The cpu==NULL will never happen

   - It iterates over all CPUs system-wide, not just those in the current

     cluster

   - The hartid_base parameter defines which cluster's interrupt controller

     we're initializing

   - We need to maintain the index-based register array mapping even for

     non-existent harts

I agree the LOG_GUEST_ERROR was inappropriate since this isn't a guest

error, so I will add a comment in v8 to explain why we are skipping.


Thanks a lot,

Djordje


>> +
>> +      RISCVCPU *cpu_riscv = RISCV_CPU(cpu);
>> +      if (riscv_cpu_claim_interrupts(cpu_riscv, MIP_MTIP) < 0) {
>> +        error_report("MTIP already claimed");
>> +        exit(1);
>> +      }
>>        }
>>    }
>>
>>
>> Thanks,
>>
>> Djordje
>>
>>

Reply via email to