jackluo923 commented on issue #10865:
URL: https://github.com/apache/pinot/issues/10865#issuecomment-1591126575

   The correct behavior is that 
   1. if stop words are specified during ingestion, remove them from the query 
during query time
   2. If stop words all stop words are excluded, we should not remove any stop 
words from the query
   3. If a customized list of stop words are excluded, only remove the 
customized list of stop words from the query
   
   To give you a concrete example, let's use the input example provided in 
Pinot's 
[documentation](https://docs.pinot.apache.org/basics/indexing/text-search-support#resume-text)
 with default text-index ingestion configs:
   > Distributed systems, Java, C++, Go, distributed query engines for 
analytics and data warehouses, Machine learning, spark, Kubernetes, transaction 
processing, Java, Python, C++, Machine learning, building and deploying large 
scale... CUDA, GPU processing, Tensor flow ...
   
   With the above input, the following query would return a match: 
   ```
   SELECT SKILLS_COL 
   FROM MyTable 
   WHERE TEXT_MATCH(SKILLS_COL, '"Machine learning" AND "gpu processing"')
   ```
   
   However, the following query would not return any match for the same input 
because the query contains the stop words `for` and `and`
   ```
   SELECT SKILLS_COL 
   FROM MyTable 
   WHERE TEXT_MATCH(SKILLS_COL, '"query engines for analytics" AND "building 
and deploying"')
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to