siddharthteotia opened a new pull request #6314:
URL: https://github.com/apache/incubator-pinot/pull/6314


   For very large queries (several 100s of lines, size of single query 50KB+), 
@mayankshriv  had noticed the overhead of SqlNode.toString when converting the 
predicates to literal expression in `getLiteralExpression` . See the below 
flamegraph profile. The time spent ratio between calcite internal code 
(`SqlParser.parseQuery`) and our code (`toExpression`) is 1:1
   
   ![Screen Shot 2020-12-03 at 10 04 34 
AM](https://user-images.githubusercontent.com/2150694/101079851-e2025880-355c-11eb-9340-584ed986207f.png)
   
   In the PR https://github.com/apache/incubator-pinot/pull/6258/ by 
@fx19880617 , this code was optimized to avoid the overhead of SqlNode.toString 
and replaceAll by using SqlNode.toValue().replace("''", "'"). This helped 
significantly by shifting the overhead more towards calcite internal code. The 
ratio changed to 4 : 1 as seen in the flamegraph below But String.replace() was 
still taking reasonable time for our huge queries where the replacement is 
actually not needed. This internally uses regex to find the occurrence of '' in 
the string literal and replace it with '. 
   
   ![Screen Shot 2020-12-03 at 10 36 47 
AM](https://user-images.githubusercontent.com/2150694/101080646-e2e7ba00-355d-11eb-918a-3808b05344e9.png)
   
   In this PR, we use StringUtils.replace since this first uses indexOf to find 
if the searchString ('') is present in the query. If not, it returns 
immediately. If present, it builds the query without regex. This shifts the 
overhead pretty much towards Calcite internal code as the ratio is changed to 
10 : 1 as seen in the flamegraph. The box for SqlParser.parseQuery widens
   
   ![Screen Shot 2020-12-03 at 10 38 20 
AM](https://user-images.githubusercontent.com/2150694/101081838-74a3f700-355f-11eb-9267-0f71240e37ee.png)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to