On 4/18/2012 6:17 AM, Bram Rongen wrote:
I'm using Solr 3.5 on a dedicated Ubuntu 10.04 box with 3TB of diskspace
and 16GB of memory. I've tried using the sun JRE and OpenJDK, both
resulting in the same problem. Indexing works great until my .fdt file
reaches the size of 4.9GB/ 5217987319b. At this point when Solr starts
merging it just keeps on merging, starting over and over.. Java is using
all the available memory even though Xmx is set at 8G. When I restart Solr
everything looks fine until merging is triggered. Whenever it hangs the
server load averages 3, searching is possible but slow, the solr admin
interface is reachable but sending new documents leads to a time-out.
Solr 3.5 works a little differently than previous versions (MMAPs all
the index files), so if you look at the memory usage as reported by the
OS, it's going to look all wrong. I've got my max heap set to 8192M,
but this is what top looks like:
Mem: 64937704k total, 58876376k used, 6061328k free, 379400k buffers
Swap: 8388600k total, 77844k used, 8310756k free, 47080172k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22798 ncindex 20 0 75.6g 21g 12g S 1.0 34.3 14312:55 java
If you add up the 47GB it says it's using for the disk cache, the 6GB
that it says is free, and the 21GB it says that Java has resident, you
end up with considerably more than the 64GB total RAM the machine has,
even if you include the 77MB of swap that's used. You can use the jstat
command to get a better idea of how much RAM java really is using:
jstat -gc -t <pid> 5000
Add up the S0C, S1C, EC, OC, and PC columns. The alignment is often
wrong on this output, so you'll have to count the columns. If I do this
for my system, I end up with 8462972 KB. Alternatively, if you have a
GUI installed on the server or you have set up remote JMX, you can use
JConsole to very easily get a correct number.
The extra memory reported by the OS is not really being used, it is a
side effect of the memory mapping used by the Lucene indexes.
I've tried using several different settings for MergePolicy and started
reindexing a couple of times but the behavior stays the same. My current
solrconf.xml can be found here: http://pastebin.com/NXDT0B8f. I'm unable to
find errors in the log which makes it really difficult to debug.. Could
anyone point me in the right direction?
A MergeFactor of 4 is extremely low and will result in very frequent
merging. The default is 10. I use a value of 36, but that is unusually
high.
Looking at one of my indexes on that machine, the largest fdt file is
7657412 KB, the other three are tiny - 9880, 12160, and 28 KB. That
index was recently optimized. The total index size is over 20GB. I
have three indexes that size running in different cores on that
machine. You're definitely not running into any limits as far as Solr
is concerned.
You might be running into I/O issues. Are you relying on autocommit, or
explicitly committing your updates and waiting for the commit to finish
before doing more updates? When there is segment merging, commits can
take a really long time. If you are using autocommit or not waiting for
manual commits to finish, it might get bad enough that one commit has
not yet finished when another is ready to take place. I don't know what
this would actually do, but it would not be a good situation.
How have you created your 3TB of disk space? If you are using RAID5 or
RAID6, you can run into very serious and unavoidable performance
problems with writes. If it is a single disk, it may not provide enough
IOPS for good performance. My servers also have 3TB of disk space,
using six 1TB SATA drives in RAID10. The worst-case scenario for your
merges is equivalent to an optimize. An optimize of one of my 20GB
indexes takes 15 minutes even on RAID10, so I only optimize one large
index once a day, so each large index gets optimized every six days.
I hope this helps, but I'll be happy to try and offer more, within my
skill set.
Thanks,
Shawn