Hi, I’m involved in the an open source project called Vufind which uses Solr to search across library catalogue records [1].
The project uses what seems to be very high defaults cache settings in solrconfig.xml [2]: - filterCache (size="300000" initialSize="300000" autowarmCount="50000"), - queryResultCache (size="100000" initialSize="100000" autowarmCount="50000"), - documentCache (size="50000" initialSize="50000"). These settings haven’t been reviewed since early in the project history (c. 2007) but came up in a recent discussion around out-of-memory issues and garbage collection. Of course decisions on cache configuration (along with jvm settings, sharding etc) vary depending on the instance (index size, query/sec etc), but I wanted to run these values past this list as a sanity check for what you’d consider good default settings giving that most adopters of the software will not touch the defaults. Some characteristics of library data & Vufind’s schema [3] which may have a bearing on the issue: - quite a few facet fields & filtering (~ 12 facets configured by default) - high number of unique facet values (e.g. several hundred-thousands in a facet field for authors or subjects) - most libraries would do only one or two incremental commits a day (which may justify high auto-warming settings since the next commit isn’t for 24 hours) - sorting: relevance by default but other options configured by default (title, author, callnumber, year, etc) - mostly, small sparse documents (MARC records containing title, author, desciption etc but no full-text content) - quite a few stored fields, including a field which stores the full MARC record for additional parsing by the application - average number of documents for most adopters probably somewhere between 500K and 2 million MARC records (Vufind has several adopters with up to 50m full-text docs but these make considerable customisations their Solr setup) - query/sec will vary from library to library, but shouldn't be anything too taxing for most adopters Do the current cache settings make sense in this context, or should we consider dropping back to the much lower values given in the Solr example and wiki? Many thanks Eoghan [1] vufind.org [2] https://github.com/vufind-org/vufind/blob/master/solr/biblio/conf/solrconfig.xml [3] https://github.com/vufind-org/vufind/blob/master/solr/biblio/conf/schema.xml