> -----Original Message-----
> From: Richard Henderson <[email protected]>
> Sent: Sunday, July 25, 2021 8:08 AM
> To: Taylor Simpson <[email protected]>; [email protected]
> Cc: [email protected]; [email protected]; Brian Cain <[email protected]>;
> [email protected]
> Subject: Re: [PATCH 02/20] Hexagon HVX (target/hexagon) add Hexagon
> Vector eXtensions (HVX) to core
>
> On 7/5/21 1:34 PM, Taylor Simpson wrote:
> > HVX is a set of wide vector instructions. Machine state includes
> > vector registers (VRegs)
> > vector predicate registers (QRegs)
> > temporary registers for intermediate values
> > store buffer (masked stores and scatter/gather)
> >
> > Signed-off-by: Taylor Simpson <[email protected]>
> > ---
> > target/hexagon/cpu.h | 35 ++++++++++++++++-
> > target/hexagon/hex_arch_types.h | 5 +++
> > target/hexagon/insn.h | 3 ++
> > target/hexagon/internal.h | 3 ++
> > target/hexagon/mmvec/mmvec.h | 83
> +++++++++++++++++++++++++++++++++++++++++
> > target/hexagon/cpu.c | 72
> ++++++++++++++++++++++++++++++++++-
> > 6 files changed, 198 insertions(+), 3 deletions(-)
> > create mode 100644 target/hexagon/mmvec/mmvec.h
> >
> > diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h index
> > 2855dd3..0b377c3 100644
> > --- a/target/hexagon/cpu.h
> > +++ b/target/hexagon/cpu.h
> > @@ -26,6 +26,7 @@ typedef struct CPUHexagonState CPUHexagonState;
> > #include "qemu-common.h"
> > #include "exec/cpu-defs.h"
> > #include "hex_regs.h"
> > +#include "mmvec/mmvec.h"
> >
> > #define NUM_PREGS 4
> > #define TOTAL_PER_THREAD_REGS 64
> > @@ -34,6 +35,7 @@ typedef struct CPUHexagonState CPUHexagonState;
> > #define STORES_MAX 2
> > #define REG_WRITES_MAX 32
> > #define PRED_WRITES_MAX 5 /* 4 insns + endloop */
> > +#define VSTORES_MAX 2
> >
> > #define TYPE_HEXAGON_CPU "hexagon-cpu"
> >
> > @@ -52,6 +54,13 @@ typedef struct {
> > uint64_t data64;
> > } MemLog;
> >
> > +typedef struct {
> > + target_ulong va;
> > + int size;
> > + MMVector mask QEMU_ALIGNED(16);
> > + MMVector data QEMU_ALIGNED(16);
> > +} VStoreLog;
>
> Do you really need a MMVector mask, or should this be a QRegMask?
You are correct. I'll change this.
>
> > - target_ulong gather_issued;
> > + bool gather_issued;
>
> Surely unrelated to adding state.
This was unintentionally included in the patch series for the scalar core.
Based on previous feedback, I know it should be a bool. However, this can
actually be removed altogether. So, in the next iteration of this series,
you'll see it removed.
>
> > + MMVector VRegs[NUM_VREGS] QEMU_ALIGNED(16);
> > + MMVector future_VRegs[NUM_VREGS] QEMU_ALIGNED(16);
> > + MMVector tmp_VRegs[NUM_VREGS] QEMU_ALIGNED(16);
>
> Ok, this is where I'm not keen on how you handle this for integer code, and
> for vector code has got to be past the realm of acceptable.
>
> You have exactly 4 slots in your vliw packet. You cannot possibly use 32
> future or tmp slots. For integers this wastage was at least small, but for
> vectors these waste just shy of 8k.
>
> All you need to do is to be smarter about mapping e.g. 5 to 8 temp slots
> during translation.
OK
>
> > + MMQReg QRegs[NUM_QREGS] QEMU_ALIGNED(16);
> > + MMQReg future_QRegs[NUM_QREGS] QEMU_ALIGNED(16);
>
> Likewise.
>
> > + /* Temporaries used within instructions */
> > + MMVector zero_vector QEMU_ALIGNED(16);
>
> You must be doing something wrong to need zero in memory.
The architecture specifies that if you use a .new in a packet where the vector
register isn't defined, it gets zero. So, the generator produces the
following for .new references
const intptr_t OsN_off =
test_bit(OsN_num, ctx->vregs_updated)
? offsetof(CPUHexagonState, future_VRegs[OsN_num])
: offsetof(CPUHexagonState, zero_vector);
>
> > +/*
> > + * The HVX register dump takes up a ton of space in the log
> > + * Don't print it unless it is needed */ #define DUMP_HVX 0 #if
> > +DUMP_HVX
> > + qemu_fprintf(f, "Vector Registers = {\n");
> > + for (int i = 0; i < NUM_VREGS; i++) {
> > + print_vreg(f, env, i);
> > + }
> > + for (int i = 0; i < NUM_QREGS; i++) {
> > + print_qreg(f, env, i);
> > + }
> > + qemu_fprintf(f, "}\n");
> > +#endif
>
> Use CPU_DUMP_FPU, controlled by -d fpu.
These aren't FP registers, but OK.
>
> > - cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;
> > + cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS + NUM_VREGS
> +
> > + NUM_QREGS;
>
> They're not really core regs though are they?
> Surely gdb_register_coprocessor is a better representation.
I'll look into this.
Thanks,
Taylor