[GitHub] [pinot] richardstartin commented on pull request #8818: regexp_like fusing

GitBox Mon, 13 Jun 2022 12:26:30 -0700


richardstartin commented on PR #8818:
URL: https://github.com/apache/pinot/pull/8818#issuecomment-1154343989


   > > > > > Thanks @atris for taking a look. I also have a question around how 
Lucene and native FST handle the query, specifically for these results:
   > > > > > ```
   > > > > > SELECT INT_COL FROM MyTable WHERE regexp_like(DOMAIN_NAMES, 
'(?:^.*domain9.*$)|...|(?:^.*domain1.*$)')
   > > > > > BenchmarkFuseRegexp.decreasing9Fusing       LUCENE  avgt    5    
24.198 ±   13.439  ms/op
   > > > > > BenchmarkFuseRegexp.decreasing9Fusing       NATIVE  avgt    5     
0.263 ±    0.019  ms/op
   > > > > > BenchmarkFuseRegexp.decreasing9Fusing         null  avgt    5     
0.476 ±    0.040  ms/op
   > > > > > 
   > > > > > SELECT INT_COL FROM MyTable WHERE regexp_like(DOMAIN_NAMES, 
'(?:^.*domain0.*$)|...|(?:^.*domain9.*$)')
   > > > > > BenchmarkFuseRegexp.increasing10Fusing      LUCENE  avgt    5     
ERROR
   > > > > > BenchmarkFuseRegexp.increasing10Fusing      NATIVE  avgt    5     
0.326 ±    0.138  ms/op
   > > > > > BenchmarkFuseRegexp.increasing10Fusing        null  avgt    5     
0.226 ±    0.062  ms/op
   > > > > > ```
   > > > > > 
   > > > > > 
   > > > > >     
   > > > > >       
   > > > > >     
   > > > > > 
   > > > > >       
   > > > > >     
   > > > > > 
   > > > > >     
   > > > > >   
   > > > > > We can see that Lucene FST is very slow in this case, but native 
is still performing okay. Trying to understand what is causing this difference.
   > > > > 
   > > > > 
   > > > > The difference will be due to determinization of the automaton, 
which the native implementation doesn’t do.
   > > > 
   > > > 
   > > > I doubt determinization alone can cause such a massive difference.
   > > 
   > > 
   > > Well, it’s an exponential time algorithm and Lucene refuses to even 
attempt to determinize some of the automata in the benchmark because they are 
too large. For an apples to apples comparison, native determinization should be 
re-enabled (by temporarily reverting/reversing #8237)
   > 
   > I am still failing to understand what the concern is.
   > 
   > Even if its plain determinization that causes native to be significantly 
faster, is that not a good thing?
   
   @Jackie-Jiang was asking about the root cause of the difference. I suggested 
that the root cause is likely to be determinization, but it needs to be 
investigated. Where has concern been expressed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] richardstartin commented on pull request #8818: regexp_like fusing

Reply via email to