John Nielsen [j...@mcb.dk]:
> I never seriously looked at my fieldValueCache. It never seemed to get used:

> http://screencast.com/t/YtKw7UQfU

That was strange. As you are using a multi-valued field with the new setup, 
they should appear there. Can you find the facet fields in any of the other 
caches?

...I hope you are not calling the facets with facet.method=enum? Could you 
paste a typical facet-enabled search request?

> Yep. We still do a lot of sorting on dynamic field names, so the field cache
> has a lot of entries. (9.411 entries as we speak. This is considerably lower
> than before.). You mentioned in an earlier mail that faceting on a field
> shared between all facet queries would bring down the memory needed.
> Does the same thing go for sorting?

More or less. Sorting stores the raw string representations (utf-8) in memory 
so the number of unique values has more to say than it does for faceting. Just 
as with faceting, a list of pointers from documents to values (1 value/document 
as we are sorting) is maintained, so the overhead is something like

#documents*log2(#unique_terms*average_term_length) + 
#unique_terms*average_term_length
(where average_term_length is in bits)

Caveat: This is with the index-wide sorting structure. I am fairly confident 
that this is what Solr uses, but I have not looked at it lately so it is 
possible that some memory-saving segment-based trickery has been implemented.

> Does those 9411 entries duplicate data between them?

Sorry, I do not know. SOLR-1111 discusses the problems with the field cache and 
duplication of data, but I cannot infer if it is has been solved or not. I am 
not familiar with the stat breakdown of the fieldCache, but it _seems_ to me 
that there are 2 or 3 entries for each segment for each sort field. 
Guesstimating further, let's say you have 30 segments in your index. Going with 
the guesswork, that would bring the number of sort fields to 9411/3/30 ~= 100. 
Looks like you use a custom sort field for each client?

Extrapolating from 1.4M documents and 180 clients, let's say that there are 
1.4M/180/5 unique terms for each sort-field and that their average length is 
10. We thus have
1.4M*log2(1500*10*8) + 1500*10*8 bit ~= 23MB 
per sort field or about 4GB for all the 180 fields.

With this few unique values, the doc->value structure is by far the biggest, 
just as with facets. As opposed to the faceting structure, this is fairly close 
to the actual memory usage. Switching to a single sort field would reduce the 
memory usage from 4GB to about 55MB.

> I do commit a bit more often than i should. I get these in my log file from
> time to time: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

So 1 active searcher and 2 warming searchers. Ignoring that one of the warming 
searchers is highly likely to finish well ahead of the other one, that means 
that your heap must hold 3 times the structures for a single searcher. With the 
old heap size of 25GB that left "only" 8GB for a full dataset. Subtract the 4GB 
for sorting and a similar amount for faceting and you have your OOM.

Tweaking your ingest to avoid 3 overlapping searchers will lower your memory 
requirements by 1/3. Fixing the facet & sorting logic will bring it down to 
laptop size.

> The control panel says that the warm up time of the last searcher is 5574. Is 
> that seconds or milliseconds?
> http://screencast.com/t/d9oIbGLCFQwl

milliseconds, I am fairly sure. It is much faster than I anticipated. Are you 
warming all the sort- and facet-fields?

> Waiting for a full GC would take a long time.

Until you have fixed the core memory issue, you might consider doing an 
explicit GC every night to clean up and hope that it does not occur 
automatically at daytime (or whenever your clients uses it).

> Unfortunately I don't know of a way to provoke a full GC on command.

VisualVM, which is delivered with the Oracle JDK (look somewhere in the bin 
folder), is your friend. Just start it on the server and click on the relevant 
process.

Regards,
Toke Eskildsen

Reply via email to