jitendrakr88 commented on code in PR #15498:
URL: https://github.com/apache/pinot/pull/15498#discussion_r2047578237


##########
pinot-common/src/main/java/org/apache/pinot/sql/parsers/ParserUtils.java:
##########
@@ -39,6 +39,49 @@ public static void validateFunction(String canonicalName, 
List<Expression> opera
     }
   }
 
+  /**
+   * Sanitize the sql string for parsing by normalizing whitespace which can
+   * cause performance issues with regex based parsing.
+   * This method replaces multiple consecutive whitespace characters with a 
single space.
+   *
+   * @param sql The raw SQL string to sanitize. May be null.
+   * @return A sanitized SQL string with normalized whitespace and no trailing 
spaces,
+   *         or {@code null} if the input was {@code null}.
+   */
+  public static String sanitizeSqlForParsing(String sql) {
+
+    // 1. Remove excessive whitespace
+
+    int length = sql.length();
+    StringBuilder builder = new StringBuilder(length);
+    boolean inWhitespaceBlock = false;
+
+    for (int charIndex = 0; charIndex < length; charIndex++) {

Review Comment:
   @Jackie-Jiang I feel there might be many edge cases that would eventually 
lead us into complex parsing logic.
   
   Example: (traversing backwards is likely to have no effect)
   ```
   "SELECT * FROM t " + " ".repeat(20000) + "  /* comment */ ";
   ```
   
   Given this, I think the earlier approach of parsing character by character 
might be more robust. It's also battle-tested with heavy production traffic and 
has been running reliably within Uber for a while.
   
   Let me know if that makes sense—I'm happy to revert to the original 
implementation, and you can take your time to review it at your convenience. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to