On 4/12/2011 6:21 AM, stockii wrote:
Hello.

When is start an optimize (which takes more than 4 hours) no updates from
DIH are possible.
i thougt solr is copy the hole index and then start an optimize from the
copy and not lock the index and optimize this ... =(

any way to do both in the same time ?

You can't index and optimize at the same time, and I'm pretty sure that there isn't any way to make it possible that wouldn't involve a major rewrite of Lucene, and possibly Solr. The devs would have to say differently if my understanding is wrong.

The optimize takes place at the Lucene level. I can't give you much in-depth information, but I can give you some high level stuff. What it's doing is equivalent to a merge, down to one segment. This is not the same as a straight file copy. It must read the entire Lucene data structure and build a new one from scratch. The process removes deleted documents and will also upgrade the version number of the index if it was written with an older version of Lucene. It's very likely that the reading side of the process is nearly as comprehensive as the CheckIndex program, but it also has to write out a new index segment.

The net result -- the process gives your CPU and especially your I/O subsystem a workout, simultaneously. If you were to make your I/O subsystem faster, you would probably see a major improvement in your optimize times.

On my installation, it takes about 11 minutes to optimize one my 16GB shards, each with 9 million docs. These live in virtual machines that are stored on a six-drive RAID10 array using 7200RPM SATA disks. One of my pie-in-the-sky upgrade dreams is to replace that with a four-drive RAID10 array using SSD, the other two drives would be regular SATA -- a mirrored OS partition.

Thanks,
Shawn

Reply via email to