boicehuang opened a new pull request, #13306:
URL: https://github.com/apache/lucene/pull/13306

   Elasticsearch (which based on lucene) can automatically infer types for 
users with its dynamic mapping feature. When users index some low cardinality 
fields, such as gender / age / status... they often use some numbers to 
represent the values, while ES will infer these fields as long, and ES uses BKD 
as the index of long fields. 
   
   Just as #541 said, when the data volume grows, building the result set of 
low-cardinality fields will make the CPU usage and load very high even if we 
use a boolean query with filter clauses for low-cardinality fields. 
   
   I found that one main reason is that it uses node-level exclusive 
ReentrantLock to limit accessing LRUQueryCache.
   For low cardinality fields, QPS and costs of their queries are often very 
high,  which often causes trying locking failures when obtaining cache, 
resulting in low concurrency in accessing the cache.
   
   So I use a ReentrantReadWriteLock to replace ReentrantLock. When 
IndexSearcher needs to get the cache of an query, we only lock the read lock.  
In update or clear cache cases, the write lock is still used to lock to ensure 
exclusive use.
   
   I mocked a field that has 10,000,000 docs per value and to ensure that most 
segments can be cached, I send forcemerge to merge small segments into several 
large segments before I search it with a 1 term PointInSetQuery, finally the 
request per second increased from 500 to 5000.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to