richardstartin commented on PR #8818: URL: https://github.com/apache/pinot/pull/8818#issuecomment-1154343989
> > > > > Thanks @atris for taking a look. I also have a question around how Lucene and native FST handle the query, specifically for these results: > > > > > ``` > > > > > SELECT INT_COL FROM MyTable WHERE regexp_like(DOMAIN_NAMES, '(?:^.*domain9.*$)|...|(?:^.*domain1.*$)') > > > > > BenchmarkFuseRegexp.decreasing9Fusing LUCENE avgt 5 24.198 ± 13.439 ms/op > > > > > BenchmarkFuseRegexp.decreasing9Fusing NATIVE avgt 5 0.263 ± 0.019 ms/op > > > > > BenchmarkFuseRegexp.decreasing9Fusing null avgt 5 0.476 ± 0.040 ms/op > > > > > > > > > > SELECT INT_COL FROM MyTable WHERE regexp_like(DOMAIN_NAMES, '(?:^.*domain0.*$)|...|(?:^.*domain9.*$)') > > > > > BenchmarkFuseRegexp.increasing10Fusing LUCENE avgt 5 ERROR > > > > > BenchmarkFuseRegexp.increasing10Fusing NATIVE avgt 5 0.326 ± 0.138 ms/op > > > > > BenchmarkFuseRegexp.increasing10Fusing null avgt 5 0.226 ± 0.062 ms/op > > > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We can see that Lucene FST is very slow in this case, but native is still performing okay. Trying to understand what is causing this difference. > > > > > > > > > > > > The difference will be due to determinization of the automaton, which the native implementation doesn’t do. > > > > > > > > > I doubt determinization alone can cause such a massive difference. > > > > > > Well, it’s an exponential time algorithm and Lucene refuses to even attempt to determinize some of the automata in the benchmark because they are too large. For an apples to apples comparison, native determinization should be re-enabled (by temporarily reverting/reversing #8237) > > I am still failing to understand what the concern is. > > Even if its plain determinization that causes native to be significantly faster, is that not a good thing? @Jackie-Jiang was asking about the root cause of the difference. I suggested that the root cause is likely to be determinization, but it needs to be investigated. Where has concern been expressed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org