boicehuang opened a new pull request, #13306: URL: https://github.com/apache/lucene/pull/13306
Elasticsearch (which based on lucene) can automatically infer types for users with its dynamic mapping feature. When users index some low cardinality fields, such as gender / age / status... they often use some numbers to represent the values, while ES will infer these fields as long, and ES uses BKD as the index of long fields. Just as #541 said, when the data volume grows, building the result set of low-cardinality fields will make the CPU usage and load very high even if we use a boolean query with filter clauses for low-cardinality fields. I found that one main reason is that it uses node-level exclusive ReentrantLock to limit accessing LRUQueryCache. For low cardinality fields, QPS and costs of their queries are often very high, which often causes trying locking failures when obtaining cache, resulting in low concurrency in accessing the cache. So I use a ReentrantReadWriteLock to replace ReentrantLock. When IndexSearcher needs to get the cache of an query, we only lock the read lock. In update or clear cache cases, the write lock is still used to lock to ensure exclusive use. I mocked a field that has 10,000,000 docs per value and to ensure that most segments can be cached, I send forcemerge to merge small segments into several large segments before I search it with a 1 term PointInSetQuery, finally the request per second increased from 500 to 5000. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org