Re: Solr file size limit?

Shawn Heisey Wed, 18 Apr 2012 13:38:04 -0700

On 4/18/2012 6:17 AM, Bram Rongen wrote:

I'm using Solr 3.5 on a dedicated Ubuntu 10.04 box with 3TB of diskspace
and 16GB of memory. I've tried using the sun JRE and OpenJDK, both
resulting in the same problem. Indexing works great until my .fdt file
reaches the size of 4.9GB/ 5217987319b. At this point when Solr starts
merging it just keeps on merging, starting over and over.. Java is using
all the available memory even though Xmx is set at 8G. When I restart Solr
everything looks fine until merging is triggered. Whenever it hangs the
server load averages 3, searching is possible but slow, the solr admin
interface is reachable but sending new documents leads to a time-out.

Solr 3.5 works a little differently than previous versions (MMAPs allthe index files), so if you look at the memory usage as reported by theOS, it's going to look all wrong. I've got my max heap set to 8192M,but this is what top looks like:


Mem:  64937704k total, 58876376k used,  6061328k free,   379400k buffers
Swap:  8388600k total,    77844k used,  8310756k free, 47080172k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
22798 ncindex   20   0 75.6g  21g  12g S  1.0 34.3  14312:55 java

If you add up the 47GB it says it's using for the disk cache, the 6GBthat it says is free, and the 21GB it says that Java has resident, youend up with considerably more than the 64GB total RAM the machine has,even if you include the 77MB of swap that's used. You can use the jstatcommand to get a better idea of how much RAM java really is using:


jstat -gc -t <pid> 5000

Add up the S0C, S1C, EC, OC, and PC columns. The alignment is oftenwrong on this output, so you'll have to count the columns. If I do thisfor my system, I end up with 8462972 KB. Alternatively, if you have aGUI installed on the server or you have set up remote JMX, you can useJConsole to very easily get a correct number.

The extra memory reported by the OS is not really being used, it is aside effect of the memory mapping used by the Lucene indexes.

I've tried using several different settings for MergePolicy and started
reindexing a couple of times but the behavior stays the same. My current
solrconf.xml can be found here: http://pastebin.com/NXDT0B8f. I'm unable to
find errors in the log which makes it really difficult to debug.. Could
anyone point me in the right direction?

A MergeFactor of 4 is extremely low and will result in very frequentmerging. The default is 10. I use a value of 36, but that is unusuallyhigh.

Looking at one of my indexes on that machine, the largest fdt file is7657412 KB, the other three are tiny - 9880, 12160, and 28 KB. Thatindex was recently optimized. The total index size is over 20GB. Ihave three indexes that size running in different cores on thatmachine. You're definitely not running into any limits as far as Solris concerned.

You might be running into I/O issues. Are you relying on autocommit, orexplicitly committing your updates and waiting for the commit to finishbefore doing more updates? When there is segment merging, commits cantake a really long time. If you are using autocommit or not waiting formanual commits to finish, it might get bad enough that one commit hasnot yet finished when another is ready to take place. I don't know whatthis would actually do, but it would not be a good situation.

How have you created your 3TB of disk space? If you are using RAID5 orRAID6, you can run into very serious and unavoidable performanceproblems with writes. If it is a single disk, it may not provide enoughIOPS for good performance. My servers also have 3TB of disk space,using six 1TB SATA drives in RAID10. The worst-case scenario for yourmerges is equivalent to an optimize. An optimize of one of my 20GBindexes takes 15 minutes even on RAID10, so I only optimize one largeindex once a day, so each large index gets optimized every six days.

I hope this helps, but I'll be happy to try and offer more, within myskill set.


Thanks,
Shawn

Re: Solr file size limit?

Reply via email to