Below is what gdb shows actually generated for instruction 15 of
test_ld_mbuf1_prog (with minimal changes and comments for readability). I
suggest adding this to the comments or (if we don't feel like keeping it
updated) the commit message, it helps analyzing the code a bit.
(Also, stack drawings in the file do not include the buffer we use here.)
0: 0x92800069 mov x9, #-4 // mov x9, <imm>
1: 0x8b150129 add x9, x9, x21 // add x9, src_reg
2: 0xd280050a mov x10, #40 // mov x9, <&::data_len>
3: 0x786a6a6a ldrh w10, [x19, x10]
4: 0xcb09014a sub x10, x10, x9
5: 0xd280008b mov x11, #4 // mov x11, <sz>
6: 0xeb0b014f subs x15, x10, x11
7: 0x5400010b b.lt +8 // b.lt slow
8: 0xd280020a mov x10, #16 // mov x10, <&::data_off>
9: 0x786a6a6a ldrh w10, [x19, x10]
10: 0xd2800007 mov x7, #0 // mov x7, <&::buf_addr>
11: 0xf8676a67 ldr x7, [x19, x7]
12: 0x8b0a00e7 add x7, x7, x10
13: 0x8b0900e7 add x7, x7, x9
14: 0x1400000c b +12 // b load
slow:
15: 0x91000121 add x1, x9, #0 // mov x1, x9
16: 0x91000260 add x0, x19, #0 // mov x0, x19
17: 0x52800082 mov w2, #4 // mov w2, <sz>
18: 0xd1002323 sub x3, x25, #8 // sub x3, x25, <stack_ofs>
19: 0xd2a04d49 mov x9, #0x26a0000 // mov x9,
20: 0xf29d3409 movk x9, #0xe9a0 // __rte_pktmbuf_read
21: 0xd63f0120 blr x9
22: 0x91000007 add x7, x0, #0 // mov x7, x0
23: 0xb5000067 cbnz x7, +3 // cbnz load
24: 0xd2800007 mov x7, #0x0
25: 0x17ffff88 b -120 // b epilogue
load:
26: 0xb87f68e7 ldr w7, [x7, xzr]
27: 0xdac008e7 rev32 x7, x7
Opcode variations:
* Instruction 1 is omitted for BPF_ABS.
* Instruction 26 varies depending on sz.
* Instruction 27 varies or is omitted depending on sz.
Some benign nits:
* Instruction 6 should probably be `subs xzr, x10, x11`, a slight 1-bit error in
the existing code, though x15 is unused.
* Instructions 5 and 17 use different encoding for the same operation, would be
nice to keep them consistent, though operand never exceeds INT32_MAX.
* Instruction 10 is redundant.
I see two problems:
* We never check that x9 is non-negative. We could either add one more check,
or rearrange the code and use unsigned comparison at 7: (currently b.lt).
(There was some discussion previously regarding the special meaning of
negative BPF_ABS immediate, but I believe this is out of scope of this patch,
here we should just fail on negative _effective_ offset regardless of opcode.)
* Second argument of __rte_pktmbuf_read is `uint32_t off`, and we are trying to
pass 64-bit offset in x1. We need a check that it does not exceed UINT32_MAX.
Otherwise looks good to me.