richardstartin opened a new pull request #8097:
URL: https://github.com/apache/pinot/pull/8097


   I noticed that the FST Like benchmark results vary a lot:
   
   ```
   Benchmark                                                 Mode  Cnt  Score   
Error  Units
   BenchmarkNativeAndLuceneBasedLike.testLuceneBasedFSTLike  avgt   25  3.127 ± 
0.578   s/op
   BenchmarkNativeAndLuceneBasedLike.testNativeBasedFSTLike  avgt   25  3.440 ± 
0.555   s/op
   ```
   
   When running these benchmarks with `-prof jfr` to figure out what they were 
actually measuring I noticed a couple of things:
   
   1. Most of the time is spent parsing SQL creating the query context
   <img width="1530" alt="Screenshot 2022-01-31 at 17 54 48" 
src="https://user-images.githubusercontent.com/16439049/151846946-0b3aa09b-a0ba-43bb-af13-a120406e311e.png";>
   2. Because the benchmark doesn't properly segregate the FST types or clean 
up after itself properly (it uses a testng `AfterClass` annotation which JMH 
doesn't know anything about), the `testNativeBasedFSTLike` sometimes measures 
the Lucene implementation, but when it does measure the Native implementation, 
the SQL parsing frames are much narrower relative to the construction of the 
filter operator.
   
   <img width="1546" alt="Screenshot 2022-01-31 at 17 58 54" 
src="https://user-images.githubusercontent.com/16439049/151847574-9073b823-d2ea-4f03-82c8-84e33aca94e2.png";>
   
   After a couple of changes to use proper JMH lifecycle and to factor out SQL 
parsing into the setup, most of the benchmark time is spent in construction of 
the filter operator:
   
   Lucene:
   <img width="1509" alt="Screenshot 2022-01-31 at 17 48 12" 
src="https://user-images.githubusercontent.com/16439049/151845914-9b5440f8-c643-43e1-b873-6fd2e031fcbd.png";>
   Native:
   <img width="1540" alt="Screenshot 2022-01-31 at 17 48 48" 
src="https://user-images.githubusercontent.com/16439049/151846014-a8d5cd66-d61d-419d-8f5e-d0c72807dc4d.png";>
   
   The results are more stable and tell a different story, which should help 
drive future improvement in this space:
   
   ```
   Benchmark                                (_fstType)  (_intBaseValue)  
(_numRows)                                                                 
(_query)  Mode  Cnt    Score    Error  Units
   BenchmarkNativeAndLuceneBasedLike.query      LUCENE             1000     
2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'%domain%'  avgt   25   65.626 ±  1.454  us/op
   BenchmarkNativeAndLuceneBasedLike.query      NATIVE             1000     
2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'%domain%'  avgt   25  232.908 ± 17.302  us/op
   ```
   
   Future improvements may include iterating over more than one block.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to