Below is what gdb shows actually generated for instruction 15 of
test_ld_mbuf1_prog (with minimal changes and comments for readability). I
suggest adding this to the comments or (if we don't feel like keeping it
updated) the commit message, it helps analyzing the code a bit.

(Also, stack drawings in the file do not include the buffer we use here.)

     0: 0x92800069      mov     x9, #-4         // mov x9, <imm>
     1: 0x8b150129      add     x9, x9, x21     // add x9, src_reg
     2: 0xd280050a      mov     x10, #40        // mov x9, <&::data_len>
     3: 0x786a6a6a      ldrh    w10, [x19, x10]
     4: 0xcb09014a      sub     x10, x10, x9
     5: 0xd280008b      mov     x11, #4         // mov x11, <sz>
     6: 0xeb0b014f      subs    x15, x10, x11
     7: 0x5400010b      b.lt    +8              // b.lt slow
     8: 0xd280020a      mov     x10, #16        // mov x10, <&::data_off>
     9: 0x786a6a6a      ldrh    w10, [x19, x10]
    10: 0xd2800007      mov     x7, #0          // mov x7, <&::buf_addr>
    11: 0xf8676a67      ldr     x7, [x19, x7]
    12: 0x8b0a00e7      add     x7, x7, x10
    13: 0x8b0900e7      add     x7, x7, x9
    14: 0x1400000c      b       +12             // b load
                        slow:
    15: 0x91000121      add     x1, x9, #0      // mov x1, x9
    16: 0x91000260      add     x0, x19, #0     // mov x0, x19
    17: 0x52800082      mov     w2, #4          // mov w2, <sz>
    18: 0xd1002323      sub     x3, x25, #8     // sub x3, x25, <stack_ofs>
    19: 0xd2a04d49      mov     x9, #0x26a0000  // mov x9,
    20: 0xf29d3409      movk    x9, #0xe9a0     //   __rte_pktmbuf_read
    21: 0xd63f0120      blr     x9
    22: 0x91000007      add     x7, x0, #0      // mov x7, x0
    23: 0xb5000067      cbnz    x7, +3          // cbnz load
    24: 0xd2800007      mov     x7, #0x0
    25: 0x17ffff88      b       -120            // b epilogue
                        load:
    26: 0xb87f68e7      ldr     w7, [x7, xzr]
    27: 0xdac008e7      rev32   x7, x7

Opcode variations:
* Instruction 1 is omitted for BPF_ABS.
* Instruction 26 varies depending on sz.
* Instruction 27 varies or is omitted depending on sz.

Some benign nits:
* Instruction 6 should probably be `subs xzr, x10, x11`, a slight 1-bit error in
  the existing code, though x15 is unused.
* Instructions 5 and 17 use different encoding for the same operation, would be
  nice to keep them consistent, though operand never exceeds INT32_MAX.
* Instruction 10 is redundant.

I see two problems:
* We never check that x9 is non-negative. We could either add one more check,
  or rearrange the code and use unsigned comparison at 7: (currently b.lt).
  (There was some discussion previously regarding the special meaning of
  negative BPF_ABS immediate, but I believe this is out of scope of this patch,
  here we should just fail on negative _effective_ offset regardless of opcode.)
* Second argument of __rte_pktmbuf_read is `uint32_t off`, and we are trying to
  pass 64-bit offset in x1. We need a check that it does not exceed UINT32_MAX.

Otherwise looks good to me.

Reply via email to