chenboat opened a new pull request, #12680: URL: https://github.com/apache/pinot/pull/12680
Pinot's TEXT_MATCH filter today does not support phrase search with wildcard and prefix matching (e.g., "*apache pino*" to match "Apache Pinot") directly. The kind of queries is very common in use case like log search where user needs to search matching results in long text. Today one has to use external means to walk around this issue (e.g., concatenating all words in a paragraph into a longer string and performing regex query on it) and usually they will incur much higher query latency. This PR adds support to allow phrase search with wildcard and prefix matching for Lucene indexed tables. This feature is enabled through a config in the Lucene text indexed column. The default value is false (or not enabled). User can write a text match function to perform the filter (text_match(col, '*apache pino*'). We have tested this feature in our internal env and it can process 100+G of text data in 5 sec on 1 server. It is much more performant than the alternatives (up to 3x faster in phrase matching tests) Instructions: 1. The PR has to be tagged with at least one of the following labels (*): 1. `feature` (*) 2. `bugfix` 3. `performance` 4. `ui` 5. `backward-incompat` 6. `release-notes` (**) 2. Remove these instructions before publishing the PR. (*) Other labels to consider: - `testing` - `dependencies` - `docker` - `kubernetes` - `observability` - `security` - `code-style` - `extension-point` - `refactor` - `cleanup` (**) Use `release-notes` label for scenarios like: - New configuration options - Deprecation of configurations - Signature changes to public methods/interfaces - New plugins added or old plugins removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org