[ https://issues.apache.org/jira/browse/LUCENE-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551712#comment-17551712 ]
Adrien Grand commented on LUCENE-10602: --------------------------------------- I work with Chris and I suggested him to open this issue, I'll try to provide a bit more context. It's not uncommon for us to have nodes that handle several TBs of data. With documents that need ~250 bytes each, which is also typical, this gives ~4.4B documents per TB of data. Caching a query for 4.4B documents requires ~520MB of memory assuming one bit per document. So if we want to be able to cache, say 4 queries across 1TB of data, then we need ~2GB of heap for the query cache. We could give less memory to the cache, but this would increase the risk that every new entry in the cache evicts a hot entry for the cache. This is potentially an issue for the query cache since computing cache entries has overhead: it requires evaluating all documents that match the query, while the query that is being cached might be used in a conjunction that only requires evaluating a subset of the matching docs. But we're also seeing the opposite case when the cache is oversized for the amount of data that a node handles. And because the cache only evicts when it's full or when segments get closed, the cache will often grow until it's completely full, even though most cache entries never get used. We don't know at node startup time how much data this node is going to handle eventually, which makes it impossible to size the query cache correctly. So if Lucene's query cache could evict entries from the cache when they appear to be very little used, this would help not spend large amounts of heap on useless cache entries. > Dynamic Index Cache Sizing > -------------------------- > > Key: LUCENE-10602 > URL: https://issues.apache.org/jira/browse/LUCENE-10602 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Chris Earle > Priority: Major > > Working with Lucene's filter cache, it has become apparent that it can be an > enormous drain on the heap and therefore the JVM. After extensive usage of an > index, it is not uncommon to tune performance by shrinking or altogether > removing the filter cache. > Lucene tracks hit/miss stats of the filter cache, but it does nothing with > the data other than inform an interested user about the effectiveness of > their index's caching. > It would be interesting if Lucene would be able to tune the index filter > cache heuristically based on actual usage (age, frequency, and value). > This could ultimately be used to give GBs of heap back to an individual > Lucene instance instead of burning it on cache storage that's not effectively > used (or useful). -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org