On 3/12/2013 4:17 PM, feroz_kh wrote:
Do we really need to optimize in order to reformat ?

The alternative would be to start with an empty index and just reindex your data. That is actually the best way to go, if that option is available to you.

If yes, What is the best way of optimizing index - Online or Offline ?
Can we do it online ? If yes -
1. What is the http request which we can use to invoke optimization - How
long it takes ?
2. What is the command line command to invoked optimization - How long this
one takes ?

The only way I know of to optimize an index that's offline is using Luke, but it is difficult to find versions of Luke that work with indexes after 4.0-ALPHA - the official Luke page doesn't have any newer versions, and I have no idea why. Online is better. Solr 4.2 just got released, you may want to consider skipping 4.1 and going with 4.2.

There would be no major speed difference between doing it offline or online. Whatever else the machine is doing might be a factor. I can only make guesses about how long it will take. You say your index in 3.5 is 14GB. I have experience with indexes that are 22GB in 3.5, which takes 11 minutes to optimize. The equivalent index in 4.2 is 14GB and takes 14 minutes, because of the extra compression/decompression step. This is on RAID10, volumes with no RAID or with other RAID levels would be slower. Also, if the structure of your index is significantly different than mine, yours might go faster or slower than the size alone would suggest.

There is a curl command that optimizes the index in the wiki:

http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_and_commitWithin_parameters_as_part_of_the_URL

You would want to leave off the "maxSegments" option so it optimizes down to one segment. Whether to include waitFlush is up to you, but if you don't include it, you won't know exactly when it finishes unless you are looking at the index directory.

Thanks,
Shawn

Reply via email to