gortiz commented on PR #8818: URL: https://github.com/apache/pinot/pull/8818#issuecomment-1154813143
> Lastly, why do you say that indices don't help for single regexes? Unless I am misreading the benchmark result, even for a single regex case, native index is pretty fast? My benchmark may not be the best and it is only measuring a few regexp cases, but I wouldn't say that native gives us a significant advantage: ``` -- this actually has a single, quite complex regex expression BenchmarkFuseRegexp.decreasing9Fusing LUCENE avgt 5 24.198 ± 13.439 ms/op BenchmarkFuseRegexp.decreasing9Fusing NATIVE avgt 5 0.263 ± 0.019 ms/op <-- here native is faster by 1.8x BenchmarkFuseRegexp.decreasing9Fusing null avgt 5 0.476 ± 0.040 ms/op -- this actually has a single, quite complex regex expression BenchmarkFuseRegexp.increasing10Fusing LUCENE avgt 5 ERROR BenchmarkFuseRegexp.increasing10Fusing NATIVE avgt 5 0.326 ± 0.138 ms/op <-- here native is slower by 1.4x, although the difference is lower than the error margin BenchmarkFuseRegexp.increasing10Fusing null avgt 5 0.226 ± 0.062 ms/op -- this is just a `where regexp_like(DOMAIN_NAMES, 'domain\d')` BenchmarkFuseRegexp.optimal10 LUCENE avgt 5 0.169 ± 0.066 ms/op BenchmarkFuseRegexp.optimal10 NATIVE avgt 5 0.136 ± 0.011 ms/op <-- here native is faster by 1.1x, although the difference is lower than the error margin BenchmarkFuseRegexp.optimal10 null avgt 5 0.149 ± 0.082 ms/op ``` So we have one case where native is 1.8x faster and some others were it is equivalent to the java regex engine, which is known by its lack of performance. I mean, it is not like the native index is atrocious, but I would expect more from an index than a situational 1.8x performance increase. To be clear, as I already said, this is not a benchmark on the FST itself and it is very narrow, so I don't want to get conclusions about whether we should or we should not use these indexes. I think this benchmark proves that there are very specific cases where these indexes are not very useful and therefore it would be nice to invest time to design proper tests with different kind of regex to verify whether FST indexes are in general useful evaluating regex or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org