On 22. 9. 25. 08:52, Philippe Mathieu-Daudé wrote:
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you recognize the sender
> and know the content is safe.
>
>
> On 3/9/25 14:35, Djordje Todorovic wrote:
>>
>> On 1. 9. 25. 13:05, Philippe Mathieu-Daudé wrote:
>>> CAUTION: This email originated from outside of the organization. Do
>>> not click links or open attachments unless you recognize the sender
>>> and know the content is safe.
>>>
>>>
>>> On 1/9/25 10:17, Djordje Todorovic wrote:
>>>> On 8. 8. 25. 17:52, Philippe Mathieu-Daudé wrote:
>>>>
>>>>> CAUTION: This email originated from outside of the organization. Do
>>>>> not click links or open attachments unless you recognize the sender
>>>>> and know the content is safe.
>>>>>
>>>>>
>>>>> On 17/7/25 11:38, Djordje Todorovic wrote:
>>>>>> This is needed for riscv based CPUs by MIPS since those may have
>>>>>> sparse hart-ID layouts. ACLINT and APLIC still assume a dense
>>>>>> range, and if a hart is missing, this causes NULL derefs.
>>>>>>
>>>>>> Signed-off-by: Chao-ying Fu <[email protected]>
>>>>>> Signed-off-by: Djordje Todorovic <[email protected]>
>>>>>> ---
>>>>>> hw/intc/riscv_aclint.c | 21 +++++++++++++++++++--
>>>>>> hw/intc/riscv_aplic.c | 11 ++++++++---
>>>>>> 2 files changed, 27 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
>>>>>> index b0139f03f5..22ac4133d5 100644
>>>>>> --- a/hw/intc/riscv_aclint.c
>>>>>> +++ b/hw/intc/riscv_aclint.c
>>>>>> @@ -292,7 +292,13 @@ static void
>>>>>> riscv_aclint_mtimer_realize(DeviceState *dev, Error **errp)
>>>>>> s->timecmp = g_new0(uint64_t, s->num_harts);
>>>>>> /* Claim timer interrupt bits */
>>>>>> for (i = 0; i < s->num_harts; i++) {
>>>>>> - RISCVCPU *cpu = RISCV_CPU(cpu_by_arch_id(s->hartid_base +
>>>>>> i));
>>>>>> + CPUState *cpu_by_hartid = cpu_by_arch_id(s->hartid_base
>>>>>> + i);
>>>>>> + if (cpu_by_hartid == NULL) {
>>>>>> + qemu_log_mask(LOG_GUEST_ERROR, "aclint-mtimer: invalid
>>>>>> hartid: %u",
>>>>>> + s->hartid_base + i);
>>>>>
>>>>> DeviceRealize() handlers are part of machine modelling, not guest
>>>>> uses.
>>>>>
>>>>> IOW, triggering this is a programming mistake, so we should just
>>>>> abort() here.
>>>>
>>>> Well, if we do it that way, our Boston board target for P8700 cannot
>>>> run.
>>>
>>> So the problem is elsewhere :)
>>
>>
>> I see. Would something like this be acceptable:
>>
>> --- a/hw/intc/riscv_aclint.c
>> +++ b/hw/intc/riscv_aclint.c
>> @@ -279,7 +279,7 @@ static const Property
>> riscv_aclint_mtimer_properties[] = {
>> static void riscv_aclint_mtimer_realize(DeviceState *dev, Error
>> **errp)
>> {
>> RISCVAclintMTimerState *s = RISCV_ACLINT_MTIMER(dev);
>> - int i;
>> + CPUState *cpu;
>>
>> memory_region_init_io(&s->mmio, OBJECT(dev),
>> &riscv_aclint_mtimer_ops,
>> s, TYPE_RISCV_ACLINT_MTIMER,
>> s->aperture_size);
>> @@ -291,18 +291,15 @@ static void
>> riscv_aclint_mtimer_realize(DeviceState *dev, Error **errp)
>> s->timers = g_new0(QEMUTimer *, s->num_harts);
>> s->timecmp = g_new0(uint64_t, s->num_harts);
>> /* Claim timer interrupt bits */
>> - for (i = 0; i < s->num_harts; i++) {
>> - CPUState *cpu_by_hartid = cpu_by_arch_id(s->hartid_base + i);
>> - if (cpu_by_hartid == NULL) {
>> - qemu_log_mask(LOG_GUEST_ERROR, "aclint-mtimer: invalid
>> hartid: %u",
>> - s->hartid_base + i);
>> - continue;
>> - }
>> - RISCVCPU *cpu = RISCV_CPU(cpu_by_hartid);
>> - if (riscv_cpu_claim_interrupts(cpu, MIP_MTIP) < 0) {
>> - error_report("MTIP already claimed");
>> - exit(1);
>> - }
>> + CPU_FOREACH(cpu) {
>> + if (cpu == NULL)
>> + abort();
>
> Why do you end having a NULL vcpu in the global cpus_queue?
> (this is the 'elsewhere problem').
>
Well, it is true, for our case, we would never get into vcpu == NULL case.
After several attempts to come up with a better solution for this, I think
we are back to the existing one. I will try to elaborate why.
The sparse hart-ID layout in this case is not a programming mistake but
an intentional hardware design characteristic of the P8700. The P8700
RISC-V implementation has a sparse hart-ID layout where not all hart IDs
in a range are populated. This is explicitly supported by the RISC-V APLIC
specification. The current ACLINT/APLIC implementation assumes a dense
range of hart IDs (from hartid_base to hartid_base + num_harts - 1).
For the P8700 board:
- We iterate through the theoretical hart ID range for a cluster
- Some hart IDs legitimately don't have corresponding CPUs (sparse
layout)
- We need to skip these without failing
The CPU_FOREACH approach doesn't work here because:
- The cpu==NULL will never happen
- It iterates over all CPUs system-wide, not just those in the current
cluster
- The hartid_base parameter defines which cluster's interrupt controller
we're initializing
- We need to maintain the index-based register array mapping even for
non-existent harts
I agree the LOG_GUEST_ERROR was inappropriate since this isn't a guest
error, so I will add a comment in v8 to explain why we are skipping.
Thanks a lot,
Djordje
>> +
>> + RISCVCPU *cpu_riscv = RISCV_CPU(cpu);
>> + if (riscv_cpu_claim_interrupts(cpu_riscv, MIP_MTIP) < 0) {
>> + error_report("MTIP already claimed");
>> + exit(1);
>> + }
>> }
>> }
>>
>>
>> Thanks,
>>
>> Djordje
>>
>>