itschrispeck opened a new pull request, #12339:
URL: https://github.com/apache/pinot/pull/12339

   **Motivation:** 
   Query performance against the Lucene index suffers when chaining multiple 
`text_match` predicates together. Our users often programmatically generate 
their queries, which exacerbates the issue as 10s/100s of `text_match` 
predicates can be included in a single query. 
   
   Because of this, users are required to understand Pinot's Lucene 
implementation details for them to compose an efficient query. To remove this 
requirement, this PR adds a `TextMatchFilterOptmizer` that performs the 
optimization automatically. 
   
   **Summary:**
   This functionality is best understood through the unit testcases. In short: 
   - Merge all AND's and OR's `text_match` operands when possible, without 
affecting query accuracy
   - Push down NOT into Lucene, unless all `text_match` filters are inversed, 
then the NOT expression remains in Pinot
   
   **Open question:**
   There is one edge case (that I can think of) where this optimization can 
hurt performance: if there are a number of `text_match OR text_match OR 
text_match` etc, early termination when `limit` is reached might take longer 
since the entire merged `text_match` query must now complete. For this reason, 
it might be prudent to put this behind a query option (or add a query option to 
disable it). Alternatively, the `LuceneDocIdCollector` could early terminate 
(but doesn't have the required context). 
   
   Testing: unit tests (query performance separately verified via running the 
optimized vs unoptimized queries)
   
   tags: `feature`, `performance` (?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to