John Nielsen [j...@mcb.dk]: > I never seriously looked at my fieldValueCache. It never seemed to get used:
> http://screencast.com/t/YtKw7UQfU That was strange. As you are using a multi-valued field with the new setup, they should appear there. Can you find the facet fields in any of the other caches? ...I hope you are not calling the facets with facet.method=enum? Could you paste a typical facet-enabled search request? > Yep. We still do a lot of sorting on dynamic field names, so the field cache > has a lot of entries. (9.411 entries as we speak. This is considerably lower > than before.). You mentioned in an earlier mail that faceting on a field > shared between all facet queries would bring down the memory needed. > Does the same thing go for sorting? More or less. Sorting stores the raw string representations (utf-8) in memory so the number of unique values has more to say than it does for faceting. Just as with faceting, a list of pointers from documents to values (1 value/document as we are sorting) is maintained, so the overhead is something like #documents*log2(#unique_terms*average_term_length) + #unique_terms*average_term_length (where average_term_length is in bits) Caveat: This is with the index-wide sorting structure. I am fairly confident that this is what Solr uses, but I have not looked at it lately so it is possible that some memory-saving segment-based trickery has been implemented. > Does those 9411 entries duplicate data between them? Sorry, I do not know. SOLR-1111 discusses the problems with the field cache and duplication of data, but I cannot infer if it is has been solved or not. I am not familiar with the stat breakdown of the fieldCache, but it _seems_ to me that there are 2 or 3 entries for each segment for each sort field. Guesstimating further, let's say you have 30 segments in your index. Going with the guesswork, that would bring the number of sort fields to 9411/3/30 ~= 100. Looks like you use a custom sort field for each client? Extrapolating from 1.4M documents and 180 clients, let's say that there are 1.4M/180/5 unique terms for each sort-field and that their average length is 10. We thus have 1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB per sort field or about 4GB for all the 180 fields. With this few unique values, the doc->value structure is by far the biggest, just as with facets. As opposed to the faceting structure, this is fairly close to the actual memory usage. Switching to a single sort field would reduce the memory usage from 4GB to about 55MB. > I do commit a bit more often than i should. I get these in my log file from > time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 So 1 active searcher and 2 warming searchers. Ignoring that one of the warming searchers is highly likely to finish well ahead of the other one, that means that your heap must hold 3 times the structures for a single searcher. With the old heap size of 25GB that left "only" 8GB for a full dataset. Subtract the 4GB for sorting and a similar amount for faceting and you have your OOM. Tweaking your ingest to avoid 3 overlapping searchers will lower your memory requirements by 1/3. Fixing the facet & sorting logic will bring it down to laptop size. > The control panel says that the warm up time of the last searcher is 5574. Is > that seconds or milliseconds? > http://screencast.com/t/d9oIbGLCFQwl milliseconds, I am fairly sure. It is much faster than I anticipated. Are you warming all the sort- and facet-fields? > Waiting for a full GC would take a long time. Until you have fixed the core memory issue, you might consider doing an explicit GC every night to clean up and hope that it does not occur automatically at daytime (or whenever your clients uses it). > Unfortunately I don't know of a way to provoke a full GC on command. VisualVM, which is delivered with the Oracle JDK (look somewhere in the bin folder), is your friend. Just start it on the server and click on the relevant process. Regards, Toke Eskildsen