One additional bit: The *.fdt files contain the stored values (i.e. stored=true). This a verbatim, compressed copy of the input for these fields. This data does not need to reside in any memory. Say you have rows=10, and numFound is 10,000,000. The stored data is only accessed for the 10 returned docs. So it's really impossible to answer "for an index with on-disk size X, how much memory do I need?" I've seen the stored data be a very significant portion of the on-disk size.
Best, Erick On Thu, May 11, 2017 at 5:24 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 5/11/2017 4:59 PM, S G wrote: >> How can 50GB index be handled by a 10GB heap? >> I am a developer myself and would love to know as many details as possible. >> So a long answer would be much appreciated. > > Lucene (which is what provides large pieces of Solr's functionality) > does not read the entire index into heap memory. It only accesses the > parts of the index that it needs for the current query, and builds > certain structures in memory that it needs in order to process that > query. Much of that gets thrown away as soon as the query is done, but > both Lucene and Solr do keep some of it in caches. > > The precise details of what Lucene accesses and what memory structures > it uses are not known to me. If you really want to know, the full > source code is available. > > I have production servers running Solr that have well over 200GB of > index data and are running with a 13GB heap. It is likely that I could > reduce that heap and still have no problems. > > If there is free memory available, then large parts of your index will > be loaded into the operating system's OS disk cache and will remain > there, making Lucene fast. Having enough spare memory for this is > essential for good performance with Lucene-based software like Solr. > > Here's some more reading. Disclaimer: I wrote the wiki page on the > second link to make supporting Solr on this mailing list easier. > > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > https://wiki.apache.org/solr/SolrPerformanceProblems > > Thanks, > Shawn >