RE: QEMU for Qualcomm Hexagon - KVM Forum talk and code available

Taylor Simpson Tue, 05 Nov 2019 08:35:48 -0800

Hi Aleksandar,

Thank you – We’re glad you enjoyed the talk.


One point of clarification on SIMD in Hexagon.  What we refer to as the 
“scalar” core does have some SIMD operations.  Register pairs are 8 bytes, and 
there are several SIMD instructions.  The example we showed in the talk 
included a VADDH instruction.  It treats the register pair as 4 half-words and 
does a vector add.  Then there are the Hexagon Vector eXtensions (HVX) 
instructions that operate on 128-byte vectors.  There is a wide variety of 
instructions in this set.  As you mentioned, some of them are pure SIMD and 
others are very complex.

For the helper generator, the vast majority of these are implemented with 
helpers.  There are only 2 vector instructions in the scalar core that have a 
TCG override, and all of the HVX instructions are implemented with helpers.  If 
you are interested in a deeper dive, see below.

Alessandro and Niccolo can comment on the flex/bison implementation.

Thanks,
Taylor


Now for the deeper dive in case anyone is interested.  Look at the genptr.c 
file in target/hexagon.

The first vector instruction that is with an override is A6_vminub_RdP.  It 
does a byte-wise comparison of two register pairs and sets a predicate register 
indicating whether the byte in the left or right operand is greater.  Here is 
the TCG code.
#define fWRAP_A6_vminub_RdP(GENHLPR, SHORTCODE) \
{ \
    TCGv BYTE = tcg_temp_new(); \
    TCGv left = tcg_temp_new(); \
    TCGv right = tcg_temp_new(); \
    TCGv tmp = tcg_temp_new(); \
    int i; \
    tcg_gen_movi_tl(PeV, 0); \
    tcg_gen_movi_i64(RddV, 0); \
    for (i = 0; i < 8; i++) { \
        fGETUBYTE(i, RttV); \
        tcg_gen_mov_tl(left, BYTE); \
        fGETUBYTE(i, RssV); \
        tcg_gen_mov_tl(right, BYTE); \
        tcg_gen_setcond_tl(TCG_COND_GT, tmp, left, right); \
        fSETBIT(i, PeV, tmp); \
        fMIN(tmp, left, right); \
        fSETBYTE(i, RddV, tmp); \
    } \
    tcg_temp_free(BYTE); \
    tcg_temp_free(left); \
    tcg_temp_free(right); \
    tcg_temp_free(tmp); \
}

The second instruction is S2_vsplatrb.  It takes the byte from the operand and 
replicates it 4 times into the destination register.  Here is the TCG code.
#define fWRAP_S2_vsplatrb(GENHLPR, SHORTCODE) \
{ \
    TCGv tmp = tcg_temp_new(); \
    int i; \
    tcg_gen_movi_tl(RdV, 0); \
    tcg_gen_andi_tl(tmp, RsV, 0xff); \
    for (i = 0; i < 4; i++) { \
        tcg_gen_shli_tl(RdV, RdV, 8); \
        tcg_gen_or_tl(RdV, RdV, tmp); \
    } \
    tcg_temp_free(tmp); \
}


From: Aleksandar Markovic <[email protected]>
Sent: Monday, November 4, 2019 6:05 PM
To: Taylor Simpson <[email protected]>
Cc: [email protected]; Alessandro Di Federico <[email protected]>; [email protected]; 
Niccolò Izzo <[email protected]>
Subject: Re: QEMU for Qualcomm Hexagon - KVM Forum talk and code available


CAUTION: This email originated from outside of the organization.


On Friday, October 25, 2019, Taylor Simpson 
<[email protected]<mailto:[email protected]>> wrote:
We would like inform the you that we will be doing a talk at the KVM Forum next 
week on QEMU for Qualcomm Hexagon.  Alessandro Di Federico, Niccolo Izzo, and I 
have been working independently on implementations of the Hexagon target.  We 
plan to merge the implementations, have a community review, and ultimately have 
Hexagon be an official target in QEMU.  Our code is available at the links 
below.
https://github.com/revng/qemu-hexagon
https://github.com/quic/qemu
If anyone has any feedback on the code as it stands today or guidance on how 
best to prepare it for review, please let us know.


Hi, Taylor, Niccolo (and Alessandro too).

I didn't have a chance to take a look at neither the code nor the docs, but I 
did attend you presentation at KVM Forum, and I found it superb and attractive, 
one of the best on the conference, if not the very best.

I just have a couple of general questions:

- Regarding the code you plan to upstream, are all SIMD instructions 
implemented via tcg API, or perhaps some of them remain being implemented using 
helpers?

- Most of SIMD instructions can be viewed simply as several paralel elementary 
operations. However, for a given SIMD instruction set, usually not all of them 
fit into this pattern. For example, "horizontal add" (addind data elements from 
the same SIMD register), various "pack/unpack/interleave/merge" operations, and 
more general "shuffle/permute" operations as well (here I am not sure which of 
these are included in Hexagon SIMD set, but there must be some). How did you 
deal with them?

- What were the most challenging Hexagon SIMD instructions you came accross 
while developing your solution?

Sincerely,
Aleksandar




Thanks,
Taylor

RE: QEMU for Qualcomm Hexagon - KVM Forum talk and code available

Reply via email to