labath wrote:

> > It also reads the memory in bulk (up to 1MB)
> 
> ```
> // Maximum number of bytes read (and buffered). We need to read at least
> // `size` bytes for a successful match.
> const size_t max_read_size = std::max<size_t>(size, 0x10000);
> ```
> 
> It seems the minimal chunk is 64KB and the maximal chunk may be very long 
> (more than 1MB).

Depends on how you look at it, I guess. The actual read size will be the 
minimum of this number and the "remainder of memory to be searched" 
(`std::min(mem.size(), high - s)`), so `max_read_size` is really a cap on the 
read size. It's true that the value can be bigger than 64K (or even 1MB). 
However that can only happen if the string being searched is larger than that 
value. This is necessary because the search algorithm matches from the end of 
the pattern.

I wanted to keep the algorithm simple and keep the entire (potential) match in 
memory at once, as trying to read in the middle of matching would get messy. I 
don't think anyone searches for strings this long, but if we do want to 
(better) support that use case, I think that the best solution would be to 
switch to an algorithm (e.g. 
[KMP](https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm))
 which scans the search space in a strictly linear fashion -- as that will make 
it easy to read the memory in arbitrary increments.

> 
> Something is wrong with the first memory block in tests
> 
> ```
> Got output:
> data found at location: 0x7f6bdf3eb000
> 0x7f6bdf3eb000: 6e 65 65 64 6c 65 00 00 00 00 00 00 00 00 00 00  
> needle..........
> 0x7f6bdf3eb010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> ................
> data found at location: 0x7f6bdf3eb800
> 0x7f6bdf3eb800: 6e 65 65 64 6c 65 00 00 00 00 00 00 00 00 00 00  
> needle..........
> 0x7f6bdf3eb810: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> ................
> data found at location: 0x7f6bdf3edff9
> 0x7f6bdf3edff9: 6e 65 65 64 6c 65 00 50 e0 3e df 6b 7f 00 00 6d  
> needle.P.>.k...m
> 0x7f6bdf3ee009: dd 41 df 6b 7f 00 00 00 00 00 00 00 00 00 00 40  
> .A.k...........@
> no more matches within the range.
> 
> Expecting sub string: "data found at location: 0x7f6bdf3e9000" (was not found)
> ```
> 

Interesting. It does work on my machine. I'll try to debug this.

> Note the capacity=0 is a special case here `llvm::SmallVector<uint8_t, 0> 
> mem;` I'd recommend
> 
> ```
> const size_t min_read_size = 0x10000;
> const size_t max_read_size = std::max<size_t>(size, min_read_size);
> llvm::SmallVector<uint8_t, min_read_size> mem;
> ```

That would allocate the 64k on the stack, which isn't good practice (I doubt it 
will make things faster, and it can cause us to blow the stack). 64k is  1000x 
larger than the [preffered small vector 
size](https://github.com/llvm/llvm-project/blob/16910a21ee0fabab2df291e4e5bc18289bd5762d/llvm/include/llvm/ADT/SmallVector.h#L1150C27-L1150C54).
 In practice there will only ever be one heap allocation here -- first `resize` 
call on the vector will allocate a ~64k heap block, and all subsequent resizes 
will just move the end pointer around.

https://github.com/llvm/llvm-project/pull/104193
_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to