zhiqiang-hhhh commented on PR #32333: URL: https://github.com/apache/doris/pull/32333#issuecomment-2084526888
> > @sjyango Hello, could you please introduce the motivation of your optimization? And i am curious whether there is compile cache in RE2 for same pattern? > > This optimization was completed under the guidance of @HappenLee, with the aim of optimizing like and regexp queries in special modes such as `start_with` or `ends_with` or more. In this special mode, like does not go through RE2 (which involves a time-consuming regular matching process), but instead directly goes through the built-in start_with or endd_with functions (without a regular matching process, compare directly by byte). Moreover, I'm sorry, I don't quite understand what you mean by `compile cache`. If it's convenient, could you please explain it @sjyango Before this opt, for a table who has N row, the query like `select A like concat("%", B, "%") from tbl` ( A and B are both column of tbl, not const literal), will involve `RE2::PartialMatch(A, "%B%")` N times. The PartialMatch itself has about two steps for each execution round: 1. the compile of "%B%", 2 the match of A with the compile result. In your optimization, the situation would be like: 1. we call `RE2::FullMatch("%B%", LIKE_SUBSTRING_RE)` N times, `LIKE_SUBSTRING_RE` is a const literal, the compilation of same string is where the compilation cache could help. 2. we call our hand coded substring function, `substring(A, B)`, which is much faster than RE match. So the question here is that, why calling `RE2::FullMatch("%B%", LIKE_SUBSTRING_RE)` is faster than calling `RE2::PartialMatch(A, "%B%")`? The only reason here I can consider is that RE2 has cache for the compile result, or there is no way we are faster than before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org