On 4/18/2012 6:17 AM, Bram Rongen wrote:
I'm using Solr 3.5 on a dedicated Ubuntu 10.04 box with 3TB of diskspace
and 16GB of memory. I've tried using the sun JRE and OpenJDK, both
resulting in the same problem. Indexing works great until my .fdt file
reaches the size of 4.9GB/ 5217987319b. At this point when Solr starts
merging it just keeps on merging, starting over and over.. Java is using
all the available memory even though Xmx is set at 8G. When I restart Solr
everything looks fine until merging is triggered. Whenever it hangs the
server load averages 3, searching is possible but slow, the solr admin
interface is reachable but sending new documents leads to a time-out.

Solr 3.5 works a little differently than previous versions (MMAPs all the index files), so if you look at the memory usage as reported by the OS, it's going to look all wrong. I've got my max heap set to 8192M, but this is what top looks like:

Mem:  64937704k total, 58876376k used,  6061328k free,   379400k buffers
Swap:  8388600k total,    77844k used,  8310756k free, 47080172k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
22798 ncindex   20   0 75.6g  21g  12g S  1.0 34.3  14312:55 java

If you add up the 47GB it says it's using for the disk cache, the 6GB that it says is free, and the 21GB it says that Java has resident, you end up with considerably more than the 64GB total RAM the machine has, even if you include the 77MB of swap that's used. You can use the jstat command to get a better idea of how much RAM java really is using:

jstat -gc -t <pid> 5000

Add up the S0C, S1C, EC, OC, and PC columns. The alignment is often wrong on this output, so you'll have to count the columns. If I do this for my system, I end up with 8462972 KB. Alternatively, if you have a GUI installed on the server or you have set up remote JMX, you can use JConsole to very easily get a correct number.

The extra memory reported by the OS is not really being used, it is a side effect of the memory mapping used by the Lucene indexes.

I've tried using several different settings for MergePolicy and started
reindexing a couple of times but the behavior stays the same. My current
solrconf.xml can be found here: http://pastebin.com/NXDT0B8f. I'm unable to
find errors in the log which makes it really difficult to debug.. Could
anyone point me in the right direction?

A MergeFactor of 4 is extremely low and will result in very frequent merging. The default is 10. I use a value of 36, but that is unusually high.

Looking at one of my indexes on that machine, the largest fdt file is 7657412 KB, the other three are tiny - 9880, 12160, and 28 KB. That index was recently optimized. The total index size is over 20GB. I have three indexes that size running in different cores on that machine. You're definitely not running into any limits as far as Solr is concerned.

You might be running into I/O issues. Are you relying on autocommit, or explicitly committing your updates and waiting for the commit to finish before doing more updates? When there is segment merging, commits can take a really long time. If you are using autocommit or not waiting for manual commits to finish, it might get bad enough that one commit has not yet finished when another is ready to take place. I don't know what this would actually do, but it would not be a good situation.

How have you created your 3TB of disk space? If you are using RAID5 or RAID6, you can run into very serious and unavoidable performance problems with writes. If it is a single disk, it may not provide enough IOPS for good performance. My servers also have 3TB of disk space, using six 1TB SATA drives in RAID10. The worst-case scenario for your merges is equivalent to an optimize. An optimize of one of my 20GB indexes takes 15 minutes even on RAID10, so I only optimize one large index once a day, so each large index gets optimized every six days.

I hope this helps, but I'll be happy to try and offer more, within my skill set.

Thanks,
Shawn

Reply via email to