On 3/12/2013 4:17 PM, feroz_kh wrote:
Do we really need to optimize in order to reformat ?
The alternative would be to start with an empty index and just reindex
your data. That is actually the best way to go, if that option is
available to you.
If yes, What is the best way of optimizing index - Online or Offline ?
Can we do it online ? If yes -
1. What is the http request which we can use to invoke optimization - How
long it takes ?
2. What is the command line command to invoked optimization - How long this
one takes ?
The only way I know of to optimize an index that's offline is using
Luke, but it is difficult to find versions of Luke that work with
indexes after 4.0-ALPHA - the official Luke page doesn't have any newer
versions, and I have no idea why. Online is better. Solr 4.2 just got
released, you may want to consider skipping 4.1 and going with 4.2.
There would be no major speed difference between doing it offline or
online. Whatever else the machine is doing might be a factor. I can
only make guesses about how long it will take. You say your index in
3.5 is 14GB. I have experience with indexes that are 22GB in 3.5, which
takes 11 minutes to optimize. The equivalent index in 4.2 is 14GB and
takes 14 minutes, because of the extra compression/decompression step.
This is on RAID10, volumes with no RAID or with other RAID levels would
be slower. Also, if the structure of your index is significantly
different than mine, yours might go faster or slower than the size alone
would suggest.
There is a curl command that optimizes the index in the wiki:
http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_and_commitWithin_parameters_as_part_of_the_URL
You would want to leave off the "maxSegments" option so it optimizes
down to one segment. Whether to include waitFlush is up to you, but if
you don't include it, you won't know exactly when it finishes unless you
are looking at the index directory.
Thanks,
Shawn