jasperjiaguo opened a new issue, #9666: URL: https://github.com/apache/pinot/issues/9666
In `LuceneTextIndexCreator` we are now hardcoding the stop words for Lucene text index. ``` Arrays.asList("a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "than", "there", "these", "they", "this", "to", "was", "will", "with", "those"), ``` These words will get pruned out during the text index generation as well as filter (in `StandardAnalyzer`). The problem with this is in production we found users issuing queries like `SELECT ... FROM ignoreMe WHERE TEXT_MATCH(title, '"IT staff" OR "IT manager"')` as will actually give the result matching `TEXT_MATCH(title, '"staff" OR "manager"')`This can be easily reproduced in `TextSearchQueriesTest`. Meanwhile, there is a TODO item of making LUCENE_INDEX_MAX_BUFFER_SIZE_MB. These two changes can be evaluated/made together. cc @Jackie-Jiang @walterddr @siddharthteotia @SabrinaZhaozyf -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org