zhiqiang-hhhh commented on PR #32333:
URL: https://github.com/apache/doris/pull/32333#issuecomment-2084526888

   > > @sjyango Hello, could you please introduce the motivation of your 
optimization? And i am curious whether there is compile cache in RE2 for same 
pattern?
   > 
   > This optimization was completed under the guidance of @HappenLee, with the 
aim of optimizing like and regexp queries in special modes such as `start_with` 
or `ends_with` or more. In this special mode, like does not go through RE2 
(which involves a time-consuming regular matching process), but instead 
directly goes through the built-in start_with or endd_with functions (without a 
regular matching process, compare directly by byte). Moreover, I'm sorry, I 
don't quite understand what you mean by `compile cache`. If it's convenient, 
could you please explain it
   
   @sjyango 
   Before this opt, for a table who has N row, the query like `select A like 
concat("%", B, "%") from tbl` ( A and B are both column of tbl, not const 
literal), will involve `RE2::PartialMatch(A, "%B%")` N times. The PartialMatch 
itself has about two steps for each execution round: 1. the compile of "%B%", 2 
the match of A with the compile result.
   
   In your optimization, the situation would be like:
   1. we call `RE2::FullMatch("%B%", LIKE_SUBSTRING_RE)` N times, 
`LIKE_SUBSTRING_RE` is a const literal, the compilation of same string is where 
the compilation cache could help.
   2. we call our hand coded substring function, `substring(A, B)`, which is 
much faster than RE match.
   
   So the question here is that, why calling `RE2::FullMatch("%B%", 
LIKE_SUBSTRING_RE)` is faster than calling `RE2::PartialMatch(A, "%B%")`? The 
only reason here I can consider is that RE2 has cache for the compile result, 
or there is no way we are faster than before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to