chenboat opened a new pull request, #12680:
URL: https://github.com/apache/pinot/pull/12680

   Pinot's TEXT_MATCH filter today does not support phrase search with wildcard 
and prefix matching (e.g., "*apache pino*" to match "Apache Pinot") directly. 
The kind of queries is very common in use case like log search where user needs 
to search matching results in long text. Today one has to use external means to 
walk around this issue (e.g., concatenating all words in a paragraph into a 
longer string and performing regex query on it) and usually they will incur 
much higher query latency. 
   
   This PR adds support to allow phrase search with wildcard and prefix 
matching for Lucene indexed tables. This feature is enabled through a config in 
the Lucene text indexed column. The default value is false (or not enabled). 
User can write a text match function to perform the filter (text_match(col, 
'*apache pino*').
   
   We have tested this feature in our internal env and it can process 100+G  of 
text data in 5 sec on 1 server. It is much more performant than the 
alternatives (up to 3x faster in phrase matching tests)
   
   Instructions:
   1. The PR has to be tagged with at least one of the following labels (*):
      1. `feature` (*)
      2. `bugfix`
      3. `performance`
      4. `ui`
      5. `backward-incompat`
      6. `release-notes` (**)
   2. Remove these instructions before publishing the PR.
    
   (*) Other labels to consider:
   - `testing`
   - `dependencies`
   - `docker`
   - `kubernetes`
   - `observability`
   - `security`
   - `code-style`
   - `extension-point`
   - `refactor`
   - `cleanup`
   
   (**) Use `release-notes` label for scenarios like:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to