We are having an issue with running out of space when trying to do a
full re-index.
We are indexing with autocommit at 30 minutes.
We have it set to only optimize at the end of an indexing cycle.
On 12/12/2016 02:43 PM, Erick Erickson wrote:
First off, optimize is actually rarely necessary. I wouldn't bother
unless you have measurements to prove that it's desirable.
I would _certainly_ not call optimize every 10M docs. If you must call
it at all call it exactly once when indexing is complete. But see
above.
As far as the commit, I'd just set the autocommit settings in
solrconfig.xml to something "reasonable" and forget it. I usually use
time rather than doc count as it's a little more predictable. I often
use 60 seconds, but it can be longer. The longer it is, the bigger
your tlog will grow and if Solr shuts down forcefully the longer
replaying may take. Here's the whole writeup on this topic:
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
Running out of space during indexing with about 30% utilization is
very odd. My guess is that you're trying to take too much control.
Having multiple optimizations going on at once would be a very good
way to run out of disk space.
And I'm assuming one replica's index per disk or you're reporting
aggregate index size per disk when you sah 30%. Having three replicas
on the same disk each consuming 30% is A Bad Thing.
Best,
Erick
On Mon, Dec 12, 2016 at 8:36 AM, Michael Joyner <mich...@newsrx.com> wrote:
Halp!
I need to reindex over 43 millions documents, when optimized the collection
is currently < 30% of disk space, we tried it over this weekend and it ran
out of space during the reindexing.
I'm thinking for the best solution for what we are trying to do is to call
commit/optimize every 10,000,000 documents or so and then wait for the
optimize to complete.
How to check optimized status via solrj for a particular collection?
Also, is there is a way to check free space per shard by collection?
-Mike