Re: [PATCH v2 1/2] bpf: add a longest prefix match trie map implementation

Daniel Mack Sat, 14 Jan 2017 02:39:05 -0800

On 01/13/2017 07:01 PM, Alexei Starovoitov wrote:
> On Thu, Jan 12, 2017 at 06:29:21PM +0100, Daniel Mack wrote:
>> This trie implements a longest prefix match algorithm that can be used
>> to match IP addresses to a stored set of ranges.
>>
>> Internally, data is stored in an unbalanced trie of nodes that has a
>> maximum height of n, where n is the prefixlen the trie was created
>> with.
>>
>> Tries may be created with prefix lengths that are multiples of 8, in
>> the range from 8 to 2048. The key used for lookup and update operations
>> is a struct bpf_lpm_trie_key, and the value is a uint64_t.
>>
>> The code carries more information about the internal implementation.
>>
>> Signed-off-by: Daniel Mack <dan...@zonque.org>
>> Reviewed-by: David Herrmann <dh.herrm...@gmail.com>
>> ---
>>  include/uapi/linux/bpf.h |   7 +
>>  kernel/bpf/Makefile      |   2 +-
>>  kernel/bpf/lpm_trie.c    | 499 
>> +++++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 507 insertions(+), 1 deletion(-)
>>  create mode 100644 kernel/bpf/lpm_trie.c


...

Thanks for spotting my typos! :)

>> +static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie 
>> *trie,
>> +                                             const void *value)
>> +{
>> +    struct lpm_trie_node *node;
>> +    gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
>> +
>> +    node = kmalloc(sizeof(struct lpm_trie_node) + trie->data_size, gfp);
>> +    if (!node)
>> +            return ERR_PTR(-ENOMEM);
>> +
>> +    if (value) {
>> +            node->value = kmemdup(value, trie->map.value_size, gfp);
> 
> can you make value to be part of the node? similar to how hash map is done?
> that will help avoid 2nd allocation, will speedup insertion and will
> help converting this code to user pre-allocated elements.
> I suspect the concern was that for many inner nodes that value is null ?
> But in your use case the value_size will be == 0 eventually,
> so by embedding it when can save memory too, since 'value' pointer will
> be replaced with boolean present flag ?
> So potentially less memory and less cache misses?

Yes, that's a good idea. Implemented that now.

> Overall algorithm is indeed straightforward and simple which is great,
> but I would still like to see some performance numbers.

I'm not sure yet how to implement such a test in a meaningful way, tbh.
Given that the lookups have to be done one by one, I expect the syscall
overhead to be quite significant.

> Looks like
> the best case for single 32-bit element it needs 4 xors and compares
> which is fine. For mostly populate trie it's 4xors * 32 depth
> which is pretty good too, but cache misses on pointer walks may
> kill performance unless we're hitting the same path all the time.
> I think it's all acceptable due to simplicity of the implementation
> which we may improve later if it turns out to be a bottle neck for
> some use cases. We just need a baseline to have realistic expectations.

Yes, the maximum height of the trie is the number of bits in the prefix,
so for n bits, the iteration would at most take n steps to finish. For
each step, an xor and compare for n/8 bytes are needed.

As you say, the implementation could be improved under the hood if
someone spots a bottleneck somewhere.

I'll post a v3 with your comments addressed for further discussion.


Thanks,
Daniel

Re: [PATCH v2 1/2] bpf: add a longest prefix match trie map implementation

Reply via email to