On 10/4/2012 3:22 PM, jame vaalet wrote:
so imagine i have merged the 150 Gb index into single segment, this would make a single segment of 150 GB in memory. When new docs are indexed it wouldn't alter this 150 Gb index unless i update or delete the older docs, right? will 150 Gb single segment have problem with memory swapping at OS level?
The number of segments involved don't matter much at all. The way Solr will use memory is the same either way. With only one segment, the size of the index on disk (and the amount of memory required to track those segments) will be slightly less with one segment than with many segments, and searches will be slightly faster with one segment once everything is warmed up. Of course, it takes a lot of I/O and CPU cycles to get the index optimized, which can have a strong negative effect on searching.
Deleting or indexing docs will never alter the existing segments on your disk. Once a segment is finalized, it never gets changed. Deleted documents are just marked deleted by a separate file and still exist in the index, and new documents end up in new segments, until a merge or an optimize happens on those segments.
Your index is never actually loaded into application RAM. Later versions of Solr (default starting in 3.1 on Windows and 3.3 on Linux), use an OS feature called memory mapping (MMapDirectory) which efficiently turns the data on disk into a large section of virtual memory. The application makes requests to this memory section and the OS automatically turns it into a disk read. This is not real memory, and it's not swap. Real memory is used by the operating system (not Solr) to *cache* this memory map, speeding up access. If you have more than 150GB of memory, your entire index can fit into the disk cache, otherwise it will determine which parts of the index get used the most and try to cache those. It is a good idea to have a large amount of memory that is not allocated to applications.
Your OS will only begin swapping if the actual amount of *real* memory used by your applications (including Solr, by virtue of the -Xmx parameter to Java) begins to exceed the amount of physical memory available.
Thanks, Shawn