jackluo923 opened a new issue, #10865: URL: https://github.com/apache/pinot/issues/10865
Lucene strips away stop words and symbols prior to indexing but it seems like Pinot doesn't do the same when running queries on a text index. As a result, a query like: `SELECT * FROM table WHERE text_match("col", '"function not in list"')` will not return the expected result if the words `not` and in are stop words that were stripped out during ingestion. A temporary workaround is to exclude all stop words in the index config. Real examples: 1. Lots of results when we perform `select json_data from sawmill_query_for_pinot_dca_special WHERE "json_data" LIKE '%could not exec%'` <img width="1243" alt="image" src="https://github.com/apache/pinot/assets/8986643/3ec8878f-7339-48fc-83f7-e731a6a42b96"> 2. No result matched when we perform `select * from sawmill_query_for_pinot_dca_special WHERE text_match("json_data", '"could not exec"') limit 10` <img width="1242" alt="image" src="https://github.com/apache/pinot/assets/8986643/eb67c11c-e3f8-4f1f-907a-26c4eed9a7c7"> 3. Lots of result when all stop words are excluded, then we perform `select * from sawmill_query_for_pinot_dca_special WHERE text_match("json_data", '"could not exec"') limit 10` <img width="1245" alt="image" src="https://github.com/apache/pinot/assets/8986643/b8f35ea3-f76e-4821-8d10-073609b74b99"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org