gortiz commented on PR #8818:
URL: https://github.com/apache/pinot/pull/8818#issuecomment-1154813143

   > Lastly, why do you say that indices don't help for single regexes? Unless 
I am misreading the benchmark result, even for a single regex case, native 
index is pretty fast?
   
   My benchmark may not be the best and it is only measuring a few regexp 
cases, but I wouldn't say that native gives us a significant advantage:
   
   ```
   -- this actually has a single, quite complex regex expression
   BenchmarkFuseRegexp.decreasing9Fusing       LUCENE  avgt    5    24.198 ±   
13.439  ms/op
   BenchmarkFuseRegexp.decreasing9Fusing       NATIVE  avgt    5     0.263 ±    
0.019  ms/op <-- here native is faster by 1.8x
   BenchmarkFuseRegexp.decreasing9Fusing         null  avgt    5     0.476 ±    
0.040  ms/op
   -- this actually has a single, quite complex regex expression
   BenchmarkFuseRegexp.increasing10Fusing      LUCENE  avgt    5     ERROR
   BenchmarkFuseRegexp.increasing10Fusing      NATIVE  avgt    5     0.326 ±    
0.138  ms/op <-- here native is slower by 1.4x, although the difference is 
lower than the error margin
   BenchmarkFuseRegexp.increasing10Fusing        null  avgt    5     0.226 ±    
0.062  ms/op
   -- this is just a `where regexp_like(DOMAIN_NAMES, 'domain\d')`
   BenchmarkFuseRegexp.optimal10               LUCENE  avgt    5     0.169 ±    
0.066  ms/op
   BenchmarkFuseRegexp.optimal10               NATIVE  avgt    5     0.136 ±    
0.011  ms/op  <-- here native is faster by 1.1x, although the difference is 
lower than the error margin
   BenchmarkFuseRegexp.optimal10                 null  avgt    5     0.149 ±    
0.082  ms/op
   ```
   
   So we have one case where native is 1.8x faster and some others were it is 
equivalent to the java regex engine, which is known by its lack of performance. 
I mean, it is not like the native index is atrocious, but I would expect more 
from an index than a situational 1.8x performance increase. 
   
   To be clear, as I already said, this is not a benchmark on the FST itself 
and it is very narrow, so I don't want to get conclusions about whether we 
should or we should not use these indexes. I think this benchmark proves that 
there are very specific cases where these indexes are not very useful and 
therefore it would be nice to invest time to design proper tests with different 
kind of regex to verify whether FST indexes are in general useful evaluating 
regex or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to