jackluo923 commented on issue #10865: URL: https://github.com/apache/pinot/issues/10865#issuecomment-1591126575
The correct behavior is that 1. if stop words are specified during ingestion, remove them from the query during query time 2. If stop words all stop words are excluded, we should not remove any stop words from the query 3. If a customized list of stop words are excluded, only remove the customized list of stop words from the query To give you a concrete example, let's use the input example provided in Pinot's [documentation](https://docs.pinot.apache.org/basics/indexing/text-search-support#resume-text) with default text-index ingestion configs: > Distributed systems, Java, C++, Go, distributed query engines for analytics and data warehouses, Machine learning, spark, Kubernetes, transaction processing, Java, Python, C++, Machine learning, building and deploying large scale... CUDA, GPU processing, Tensor flow ... With the above input, the following query would return a match: ``` SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '"Machine learning" AND "gpu processing"') ``` However, the following query would not return any match for the same input because the query contains the stop words `for` and `and` ``` SELECT SKILLS_COL FROM MyTable WHERE TEXT_MATCH(SKILLS_COL, '"query engines for analytics" AND "building and deploying"') ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org