jackluo923 opened a new issue, #10865:
URL: https://github.com/apache/pinot/issues/10865

   Lucene strips away stop words and symbols prior to indexing but it seems 
like Pinot doesn't do the same when running queries on a text index. As a 
result, a query like:
   `SELECT * FROM table WHERE text_match("col", '"function not in list"')`
   will not return the expected result if the words `not` and in  are stop 
words that were stripped out during ingestion. A temporary workaround is to 
exclude all stop words in the index config. 
   
   Real examples:
   1. Lots of results when we perform `select json_data from 
sawmill_query_for_pinot_dca_special WHERE "json_data" LIKE '%could not exec%'`
   <img width="1243" alt="image" 
src="https://github.com/apache/pinot/assets/8986643/3ec8878f-7339-48fc-83f7-e731a6a42b96";>
   
   2. No result matched when we perform `select * from 
sawmill_query_for_pinot_dca_special WHERE text_match("json_data", '"could not 
exec"') limit 10`
   <img width="1242" alt="image" 
src="https://github.com/apache/pinot/assets/8986643/eb67c11c-e3f8-4f1f-907a-26c4eed9a7c7";>
   
   3. Lots of result when all stop words are excluded, then we perform `select 
* from sawmill_query_for_pinot_dca_special WHERE text_match("json_data", 
'"could not exec"') limit 10`
   <img width="1245" alt="image" 
src="https://github.com/apache/pinot/assets/8986643/b8f35ea3-f76e-4821-8d10-073609b74b99";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to