: is there a way (or formula) to determine the required amount of RAM memory,
: e.g. by number of documents, document size?

There's a lot of factors that come into play ... number of documents and 
size of documents aren't nearly as significant as number of unique indexed 
terms.

: with 4.000.000 documents, searching the index is quite fast, but when I trie
: to sort the results, I get the well-known OutOfMemory error. I'm aware of the

Sorting does have some pretty sell defined memory requirements.  Sorting a 
field builds up a "FieldCache" ... esentailly an array with one slot per 
document of whatever type you are sorting on, so sorting an index 
of 15Million docs on an int field takes ~60Megs, string fields get more 
interesting.  There the FieldCache maintains an int[] for each doc, and a 
String[] for each unique string value ... so sorting your 15M docs by a 
"category" string field where there are only 10000 category names and they 
are all about 20 characters would take still only take ~60Megs, but 
sotring on a "title" field where every doc has a unique title and the 
average title length is 20 characters would take ~60Megs + ~290Megs

If you plan on doing some static warming of your searches using your sorts 
as newSearcher events (which is a good idea so the first user to do a 
search after any commit doesn't have to wait a really long time for the 
FieldCache to be built) you'll need twice that (one FieldCache for the 
current searcher, one FieldCache for the "on deck" searcher).

-Hoss

  • RAM size Geert Van Huychem
    • Re: RAM size Chris Hostetter

Reply via email to