Hi Phil,

On 2021-01-20 10:23 a.m., Phil Sutter wrote:
Hi Jamal,

On Wed, Jan 20, 2021 at 08:55:11AM -0500, Jamal Hadi Salim wrote:
On 2021-01-18 6:29 a.m., Phil Sutter wrote:
Hi!

Playing with u32 filter's hash table I noticed it is not possible to use
'sample' option with keys larger than 8bits to calculate the hash
bucket.


I have mostly used something like: ht 2:: sample ip protocol 1 0xff
Hoping this is continuing to work.

This should read 'sample ip protocol 1 divisor 0xff', right?


0xff is a mask.
The table(256 buckets) is created earlier. Something like:
filter add dev XXX parent ffff: protocol ip prio 10 handle 2:: u32 divisor 256
This is from some scripts i have that worked. I cant see anything
that would say they will break today.


Reminder: you can only have 256 buckets (8 bit representation).
Could that be the contributing factor?

It is. Any key smaller than 256B is unaffected as no folding is done in
either kernel or user space.


Ok. I have never used it in any scenario other than 8 bits
(maybe subconsciously because of the 256 bucket limit was playing in
my head). I am not sure if Alexey at the time was thinking it is
useful for more than that.

Here's an example of something which is not 8 bit that i found in
an old script that should work (but I didnt test in current kernels).
ht 2:: sample u32 0x00000800 0x0000ff00 at 12
We are still going to extract only 8 bits for the bucket.

Yes. The resulting key is 8Bit as the low zeroes are automatically
shifted away.


ok.

Can you provide an example of what wouldnt work?

Sure, sorry for not including it in the original email. Let's apply
actions to some packets based on source IP address. To efficiently
support arbitrary numbers, we use a hash table with 256 buckets:

# tc qd add dev test0 ingress
# tc filter add dev test0 parent ffff: prio 99 handle 1: u32 divisor 256
# tc filter add dev test0 parent ffff: prio 1 protocol ip u32 \
        hashkey mask 0xffffffff at 12 link 1: match u8 0 0

So with the above in place, the kernel uses 32bits at offset 12 as a key
to determine the bucket to jump to. This is done by just extracting the
lowest 8bits in host byteorder, i.e. the last octet of the packet's
source address.

Users don't know the above (and shouldn't need to), so they use sample
to have the bucket determined automatically:

# tc filter add dev test0 parent ffff: prio 99 u32 \
        match ip src 10.0.0.2 \
        ht 1: sample ip src 10.0.0.2 divisor 256 \
        action drop

iproute2 calculates bucket 8 (= 10 ^ 2), while the kernel will check
bucket 2. So the above filter will never match.


Ok, makes more sense.
Is this always true though for all scenarios of key > 8b?
And is there a pattern that can be deduced?
My gut feel is user space is the right/easier spot to fix this
as long as it doesnt break the working setup of 8b.

cheers,
jamal

Reply via email to