Jackie-Jiang commented on code in PR #16147:
URL: https://github.com/apache/pinot/pull/16147#discussion_r2162298096


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/text/LuceneTextIndexReader.java:
##########
@@ -161,6 +161,15 @@ public ImmutableRoaringBitmap getDictIds(String 
searchQuery) {
 
   @Override
   public MutableRoaringBitmap getDocIds(String searchQuery) {
+    java.util.Map.Entry<String, java.util.Map<String, String>> result =

Review Comment:
   (minor) Import `java.util.Map`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/text/LuceneTextIndexReader.java:
##########
@@ -187,6 +196,23 @@ public MutableRoaringBitmap getDocIds(String searchQuery) {
       throw new RuntimeException(msg, e);
     }
   }
+
+  private MutableRoaringBitmap getDocIdsWithOptions(String actualQuery, 
Map<String, String> options) {
+    Query query = 
LuceneTextIndexUtils.createQueryParserWithOptions(actualQuery, options, 
_column, _analyzer);
+
+    MutableRoaringBitmap docIds = new MutableRoaringBitmap();
+    Collector docIDCollector = new LuceneDocIdCollector(docIds, 
_docIdTranslator);

Review Comment:
   (minor)
   ```suggestion
       Collector docIdCollector = new LuceneDocIdCollector(docIds, 
_docIdTranslator);
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/text/MultiColumnLuceneTextIndexReader.java:
##########
@@ -351,6 +351,33 @@ public MutableRoaringBitmap getDocIds(String searchQuery) {
 
   @Override
   public MutableRoaringBitmap getDocIds(String column, String searchQuery) {
+    java.util.Map.Entry<String, java.util.Map<String, String>> result =

Review Comment:
   Same for this class



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/text/LuceneTextIndexReader.java:
##########
@@ -161,6 +161,15 @@ public ImmutableRoaringBitmap getDictIds(String 
searchQuery) {
 
   @Override
   public MutableRoaringBitmap getDocIds(String searchQuery) {
+    java.util.Map.Entry<String, java.util.Map<String, String>> result =
+        LuceneTextIndexUtils.parseOptionsFromSearchString(searchQuery);
+    if (result != null) {
+      return getDocIdsWithOptions(result.getKey(), result.getValue());
+    }
+    return getDocIdsWithoutOptions(searchQuery);
+  }
+
+  private MutableRoaringBitmap getDocIdsWithoutOptions(String searchQuery) {
     MutableRoaringBitmap docIds = new MutableRoaringBitmap();
     Collector docIDCollector = new LuceneDocIdCollector(docIds, 
_docIdTranslator);
     try {

Review Comment:
   Consider moving this logic also into the utils class



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/LuceneTextIndexUtils.java:
##########
@@ -72,4 +81,202 @@ public static Query convertToMultiTermSpanQuery(Query 
query) {
     LOGGER.debug("The phrase query {} is re-written as {}", query, 
spanNearQuery);
     return spanNearQuery;
   }
+
+  /**
+   * Parses search options from a search query string.
+   * The options can be specified anywhere in the query using __OPTIONS(...) 
syntax.
+   * For example: "term1 __OPTIONS(parser=CLASSIC) AND term2 
__OPTIONS(analyzer=STANDARD)"
+   * Options from later occurrences will override earlier ones.
+   *
+   * @param searchQuery The search query string
+   * @return A Map.Entry containing the cleaned search term and options map, 
or null if no options found
+   */
+  public static Map.Entry<String, Map<String, String>> 
parseOptionsFromSearchString(String searchQuery) {
+    try {
+      // Early check for __OPTIONS to avoid unnecessary processing
+      if (searchQuery == null || !searchQuery.contains("__OPTIONS")) {
+        return null;
+      }
+
+      // Pattern to match __OPTIONS(...) with word boundaries
+      // (?<!\\\\) - negative lookbehind to avoid escaped __OPTIONS
+      // \\b - word boundary to ensure __OPTIONS is standalone
+      // __OPTIONS\\((.*?)\\) - non-greedy match for content inside parentheses
+      Pattern pattern = Pattern.compile("(?<!\\\\)\\b__OPTIONS\\((.*?)\\)");

Review Comment:
   Is this regexp match expensive? We have run into similar performance issue 
before on query option. I'd also suggest keeping the options as a separate 
argument. You can find example of passing parameters as separate argument in 
some complex aggregate functions, such as 
`DistinctCountOffHeapAggregationFunction`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/LuceneTextIndexUtils.java:
##########
@@ -72,4 +81,202 @@ public static Query convertToMultiTermSpanQuery(Query 
query) {
     LOGGER.debug("The phrase query {} is re-written as {}", query, 
spanNearQuery);
     return spanNearQuery;
   }
+
+  /**
+   * Parses search options from a search query string.
+   * The options can be specified anywhere in the query using __OPTIONS(...) 
syntax.
+   * For example: "term1 __OPTIONS(parser=CLASSIC) AND term2 
__OPTIONS(analyzer=STANDARD)"
+   * Options from later occurrences will override earlier ones.
+   *
+   * @param searchQuery The search query string
+   * @return A Map.Entry containing the cleaned search term and options map, 
or null if no options found
+   */
+  public static Map.Entry<String, Map<String, String>> 
parseOptionsFromSearchString(String searchQuery) {

Review Comment:
   (minor) Annotate it with `@Nullable`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to