Re: [PATCH for 4-21 v4] xen/riscv: identify specific ISA supported by cpu

Jan Beulich Thu, 06 Feb 2025 03:11:19 -0800

On 05.02.2025 20:00, Oleksii Kurochko wrote:
> On 2/4/25 12:47 PM, Jan Beulich wrote:
>> On 04.02.2025 08:20, Oleksii Kurochko wrote:
>>> +static int __init riscv_isa_parse_string(const char *isa,
>>> +                                         unsigned long *out_bitmap)
>>> +{
>>> +    if ( (isa[0] != 'r') && (isa[1] != 'v') )
>>> +        return -EINVAL;
>>> +
>>> +#if defined(CONFIG_RISCV_32)
>>> +    if ( isa[2] != '3' && isa[3] != '2' )
>>> +        return -EINVAL;
>>> +#elif defined(CONFIG_RISCV_64)
>>> +    if ( isa[2] != '6' && isa[3] != '4' )
>>> +        return -EINVAL;
>>> +#else
>>> +# error "unsupported RISC-V bitness"
>>> +#endif
>>> +
>>> +    isa += 4;
>>> +
>>> +    while ( *isa )
>>> +    {
>>> +        const char *ext = isa++;
>>> +        const char *ext_end = isa;
>>> +        bool ext_err = false;
>>> +
>>> +        switch ( *ext )
>>> +        {
>>> +        case 'x':
>>> +            printk_once("Vendor extensions are ignored in riscv,isa\n");
>>> +            /*
>>> +             * To skip an extension, we find its end.
>>> +             * As multi-letter extensions must be split from other 
>>> multi-letter
>>> +             * extensions with an "_", the end of a multi-letter extension 
>>> will
>>> +             * either be the null character or the "_" at the start of the 
>>> next
>>> +             * multi-letter extension.
>>> +             */
>>> +            for ( ; *isa && *isa != '_'; ++isa )
>>> +                ;
>>> +            ext_err = true;
>>> +            break;
>> I was wondering why ext_end isn't updated here. Looks like that's because
>> ext_err is set to true, and the checking below the switch looks at ext_err
>> first. That's technically fine, but - to me at least - a little confusing.
>> Could setting ext_end to NULL take the role of ext_err? For the code here
>> this would then immediately clarify why ext_end isn't otherwise maintained.
>> (The "err" in the name is also somewhat odd: The log message above says
>> "ignored", and that's what the code below also does. There's nothing
>> error-ish in here; in fact there may be nothing wrong about the specific
>> vendor extension we're ignoring.)
> 
> IIUC, your suggestion is to use instead of "ext_err = true" -> "ext_end = 
> NULL".


Yes.

>>> +        case 's':
>>> +            /*
>>> +             * Workaround for invalid single-letter 's' & 'u' (QEMU):
>>> +             *   Before QEMU 7.1 it was an issue with misa to ISA string
>>> +             *   conversion:
>>> +             
>>> *https://patchwork.kernel.org/project/qemu-devel/patch/dee09d708405075420b29115c1e9e87910b8da55.1648270894.git.research_tra...@irq.a4lg.com/#24792587
>>> +             *   Additional details of the workaround on Linux kernel side:
>>> +             
>>> *https://lore.kernel.org/linux-riscv/[email protected]/#t
>>> +             *
>>> +             * No need to set the bit in riscv_isa as 's' & 'u' are
>>> +             * not valid ISA extensions. It works unless the first
>>> +             * multi-letter extension in the ISA string begins with
>>> +             * "Su" and is not prefixed with an underscore.
>>> +             */
>>> +            if ( ext[-1] != '_' && ext[1] == 'u' )
>>> +            {
>>> +                ++isa;
>>> +                ext_err = true;
>>> +                break;
>>> +            }
>>> +            fallthrough;
>>> +        case 'z':
>>> +            /*
>>> +             * Before attempting to parse the extension itself, we find 
>>> its end.
>>> +             * As multi-letter extensions must be split from other 
>>> multi-letter
>>> +             * extensions with an "_", the end of a multi-letter extension 
>>> will
>>> +             * either be the null character or the "_" at the start of the 
>>> next
>>> +             * multi-letter extension.
>>> +             *
>>> +             * Next, as the extensions version is currently ignored, we
>>> +             * eliminate that portion. This is done by parsing backwards 
>>> from
>>> +             * the end of the extension, removing any numbers. This may be 
>>> a
>>> +             * major or minor number however, so the process is repeated 
>>> if a
>>> +             * minor number was found.
>>> +             *
>>> +             * ext_end is intended to represent the first character 
>>> *after* the
>>> +             * name portion of an extension, but will be decremented to 
>>> the last
>>> +             * character itself while eliminating the extensions version 
>>> number.
>>> +             * A simple re-increment solves this problem.
>>> +             */
>>> +            for ( ; *isa && *isa != '_'; ++isa )
>>> +                if ( unlikely(!isalnum(*isa)) )
>>> +                    ext_err = true;
>>> +
>>> +            ext_end = isa;
>>> +            if ( unlikely(ext_err) )
>>> +                break;
>> This, otoh, is an error. Which probably shouldn't go silently.
> 
> It is actually not an error, what it means that if version is mentioned or 
> not, so
> (1) rv32..._zbb_zbc
> (2) rv32..._zbb1p2_zbc
> 
> For (1), ext_err = true and it means that version isn't mentioned for _zbb 
> and after it
> immediately another extension name is going, so there is no any sense in 
> ignoring (or parsing) version
> numbers.
> For (2), ext_err = false (because it ends on number ) so we have to ignore 
> (or parse) version nubmers.

I don't follow. Why would ext_err be true for (1)? That's a well-formed
specifier, isn't it? rv32..._zbb.zbc (with the last dot meaning a literal
one, unlike the earlier ...) is an example of what would cause ext_err to
be true.

>>> +        default:
>>> +            /*
>>> +             * Things are a little easier for single-letter extensions, as 
>>> they
>>> +             * are parsed forwards.
>>> +             *
>>> +             * After checking that our starting position is valid, we need 
>>> to
>>> +             * ensure that, when isa was incremented at the start of the 
>>> loop,
>>> +             * that it arrived at the start of the next extension.
>>> +             *
>>> +             * If we are already on a non-digit, there is nothing to do. 
>>> Either
>>> +             * we have a multi-letter extension's _, or the start of an
>>> +             * extension.
>>> +             *
>>> +             * Otherwise we have found the current extension's major 
>>> version
>>> +             * number. Parse past it, and a subsequent p/minor version 
>>> number
>>> +             * if present. The `p` extension must not appear immediately 
>>> after
>>> +             * a number, so there is no fear of missing it.
>>> +             */
>>> +            if ( unlikely(!isalpha(*ext)) )
>>> +            {
>>> +                ext_err = true;
>>> +                break;
>>> +            }
>> Like above this also looks to be a situation that maybe better would be
>> left go entirely silently. I even wonder whether you can safely continue
>> the parsing in that case: How do you know whether what follows is indeed
>> the start of an extension name?
> 
> Lets consider examples:
> (1) riscv,isa=im
> (2) riscv,isa=i1p2m
> (3) riscv,isa=i1m
> 
> (4) riscv,isa=i1_m1
> 
> Note: Underscores "_" may be used to separate ISA extensions to improve 
> readability
> and to provide disambiguation, e.g., "RV32I2_M2_A2".
> 
> (1) isa="i" so ext = "m" => no minor and major version between "i" and "m" => 
> `ext` contains the name
>      of the next extension name, thereby we will break a loop and go for 
> parsing of the next extension
>      which is "m" in this example
> (2) isa="i" => ext="1" => algorithm will go further and will just skip minor 
> and major version for
>      this extension (1p2 => 1.2 version of extension "i")
> (3) just the same as (2) but will stop earlier as minor version isn't set.
> 
> Note: having "_" is legal from specification point of view, but it is illegal 
> to use it between single letter
>        extension from device tree binding point of view.
> (4) just the same as (2) and (3) but using "_" separator, so the if above 
> will just skip "_" and the next
>      after "_" is an extension name again.
> 
> Considering that "_" is illegal from device tree point of view between single 
> letter extensions we should have
> ASSERT(*ext != "_") and then only cases (1) - (3) will be possible to have.
> Probably it also will make a sense to have an array with legal single letter 
> extensions and check that *ext is
> in this array.
> 
> Interesting that device tree binding doesn't cover this case:
> ```
> Because the "P" extension for Packed SIMD can be confused for the decimal 
> point in a version number,
> it must be preceded by an underscore if it follows a number. For example, 
> "rv32i2p2" means version
> 2.2 of RV32I, whereas "rv32i2_p2" means version 2.0 of RV32I with version 2.0 
> of the P extension.
> ```
> if I correctly interpreted the 
> pattern:pattern:^rv(?:64|32)imaf?d?q?c?b?k?j?p?v?h?(?:[hsxz](?:[0-9a-z])+)?(?:_[hsxz](?:[0-9a-z])+)*$
> And it also doesn't support versions for single letter extensions.
> So "rv32i2_m2_a2" or with using "p" is impossible?
> Then riscv_isa_parse_string() could be simplified ( probably when it was 
> implemented in Linux kernel they don't have the binding for riscv,isa and they
> just tried to follow RISC-V specification ) for the case of single letter 
> extensions (or just keep it as is to be in sync with Linux kernel).

All fine, but what about e.g.

(5) riscv,isa=i?m1

>>> +static void __init riscv_fill_hwcap_from_isa_string(void)
>>> +{
>>> +    const struct dt_device_node *cpus = dt_find_node_by_path("/cpus");
>>> +    const struct dt_device_node *cpu;
>>> +
>>> +    if ( !cpus )
>>> +    {
>>> +        printk("Missing /cpus node in the device tree?\n");
>>> +        return;
>>> +    }
>>> +
>>> +    dt_for_each_child_node(cpus, cpu)
>>> +    {
>>> +        DECLARE_BITMAP(this_isa, RISCV_ISA_EXT_MAX);
>>> +        const char *isa;
>>> +        unsigned long cpuid;
>>> +
>>> +        if ( !dt_device_type_is_equal(cpu, "cpu") )
>>> +            continue;
>>> +
>>> +        if ( dt_get_cpuid_from_node(cpu, &cpuid) < 0 )
>>> +            continue;
>>> +
>>> +        if ( dt_property_read_string(cpu, "riscv,isa", &isa) )
>>> +        {
>>> +            printk("Unable to find \"riscv,isa\" devicetree entry "
>>> +                   "for DT's cpu%ld node\n", cpuid);
>>> +            continue;
>>> +        }
>>> +
>>> +        for ( unsigned int i = 0; (isa[i] != '\0'); i++ )
>>> +            if ( !isdigit(isa[i]) && (isa[i] != '_') && !islower(isa[i]) )
>>> +                panic("According to DT binding riscv,isa must be 
>>> lowercase\n");
>>> +
>>> +        riscv_isa_parse_string(isa, this_isa);
>>> +
>>> +        /*
>>> +         * In the unpriv. spec is mentioned:
>>> +         *   A RISC-V ISA is defined as a base integer ISA, which must be
>>> +         *   present in any implementation, plus optional extensions to
>>> +         *   the base ISA.
>>> +         * What means that isa should contain, at least, I or E thereby
>>> +         * this_isa can't be empty too.
>>> +         *
>>> +         * Ensure that this_isa is not empty, so riscv_isa won't be empty
>>> +         * during initialization. This prevents ending up with extensions
>>> +         * not supported by one of the CPUs.
>>> +         */
>>> +        ASSERT(!bitmap_empty(this_isa, RISCV_ISA_EXT_MAX));
>> This is again an assertion on input we consume. Afaict it would actually
>> trigger not only on all kinds of invalid inputs, but on something valid
>> like "rv32e".
> 
> In the case of "rv32e" I think it is fine that it will be asserted as in 
> riscv_isa_ext[] we don't
> have 'E' extension and thereby riscv_isa_ext[] should be updated properly.

Of course, but that still doesn't want to be by way of ASSERT()ing.

> Could you please explain me again what is wrong with having ASSERT() itself 
> for input we consume? If to change that
> to 'if ()' would it be better? Or it should be just moved to 
> riscv_isa_parse_string() where this bitmap is filled?

It's very simple: For internal state we maintain, use assertions to check
assumptions you make. For input we get, use other error checking (which may
be panic(), where no better option exists).

Jan

Re: [PATCH for 4-21 v4] xen/riscv: identify specific ISA supported by cpu

Reply via email to