Re: index size with replication

Walter Underwood Thu, 15 Mar 2012 09:07:22 -0700

No, the deleted files do not get replicated. Instead, the slaves do the same 
thing as the master, holding on to the deleted files after the new files are 
copied over.


The optimize is obsoleting all of your index files, so maybe should quit doing 
that. Without an optimize, the deleted files will be much smaller and the 
replicated files will be much smaller. Once in a while, the automatic merging 
will rebuild the largest files, but it will happen less often.

You need free disk space equal to the index size anyway, to handle a full 
reindex or replicating a full reindex. So provide the free space and stop 
worrying about this.

Shawn is right, Unix does a more graceful job of handling file deletion, but it 
doesn't make a lot of difference here. Even if the files are unlinked, open 
files still use disk blocks.

wunder
Search Guy, Chegg

On Mar 15, 2012, at 8:54 AM, Mike Austin wrote:

> The problem is that when replicating, the double-size index gets replicated
> to slaves.  I am now doing a dummy commit with always the same document and
> it works fine.. After the optimize and dummy commit process I just end up
> with numDocs = x and maxDocs = x+1.  I don't get the nice green checkmark
> in the admin interface but I can live with that.
> 
> mike
> 
> On Thu, Mar 15, 2012 at 8:17 AM, Erick Erickson 
> <erickerick...@gmail.com>wrote:
> 
>> Or just ignore it if you have the disk space. The files will be cleaned up
>> eventually. I believe they'll magically disappear if you simply bounce the
>> server (but work on *nix so can't personally guarantee it). And replication
>> won't replicate the stale files, so that's not a problem either....
>> 
>> Best
>> Erick
>> 
>> On Wed, Mar 14, 2012 at 11:54 PM, Mike Austin <mike.aus...@juggle.com>
>> wrote:
>>> Shawn,
>>> 
>>> Thanks for the detailed answer! I will play around with this information
>> in
>>> hand.  Maybe a second optimize or just a dummy commit after the optimize
>>> will help get me past this.  Both not the best options, but maybe it's a
>> do
>>> it because it's running on windows work-around. If it is indeed a file
>>> locking issue, I think I can probably work around this since my indexing
>> is
>>> scheduled at certain times and not "live" so I could try the optimize
>> again
>>> soon after or do a single commit that seems to fix the issue also.  Or
>> just
>>> not optimize..
>>> 
>>> Thanks,
>>> Mike
>>> 
>>> On Wed, Mar 14, 2012 at 6:34 PM, Shawn Heisey <s...@elyograg.org> wrote:
>>> 
>>>> On 3/14/2012 2:54 PM, Mike Austin wrote:
>>>> 
>>>>> The odd thing is that if I optimize the index it doubles in size.. If I
>>>>> then, add one more document to the index it goes back down to half
>> size?
>>>>> 
>>>>> Is there a way to force this without needing to wait until another
>>>>> document
>>>>> is added? Or do you have more information on what you think is going
>> on?
>>>>> I'm using a trunk version of solr4 from 9/12/2011 with a master with
>> two
>>>>> slaves setup.  Everything besides this is working great!
>>>>> 
>>>> 
>>>> The not-very-helpful-but-true answer: Don't run on Windows.  I checked
>>>> your prior messages to the list to verify that this is your environment.
>>>> If you can control index updates so they don't happen at the same time
>> as
>>>> your optimizes, you can also get around this problem by doing the
>> optimize
>>>> twice.  You would have to be absolutely sure that no changes are made to
>>>> the index between the two optimizes, so the second one basically
>> doesn't do
>>>> anything except take care of the deletes.
>>>> 
>>>> Nuts and bolts of why this happens: Solr keeps the old files open so the
>>>> existing reader can continue to serve queries.  That reader will not be
>>>> closed until the last query completes, which may not happen until well
>>>> after the time the new reader is completely online and ready.  I assume
>>>> that the delete attempt occurs as soon as the new index segments are
>>>> completely online, before the old reader begins to close.  I've not read
>>>> the source code to find out.
>>>> 
>>>> On Linux and other UNIX-like environments, you can delete files while
>> they
>>>> are open by a process.  They continue to exist as in-memory links and
>> take
>>>> up space until those processes close them, at which point they are truly
>>>> gone.  On Windows, an attempt to delete an open file will fail, even if
>>>> it's open read-only.
>>>> 
>>>> There are probably a number of ways that this problem could be solved
>> for
>>>> Windows platforms.  The simplest that I can think of, assuming it's even
>>>> possible, would be to wait until the old reader is closed before
>> attempting
>>>> the segment deletion.  That may not be possible - the information may
>> not
>>>> be available to the portion of code that does the deletion.  There are a
>>>> few things standing in the way of me fixing this problem myself: 1) I'm
>> a
>>>> beginning Java programmer.  2) I'm not familiar with the Solr code at
>> all.
>>>> 3) My interest level is low because I run on Linux, not Windows.
>>>> 
>>>> Thanks,
>>>> Shawn
>>>> 
>>>> 
>>

Re: index size with replication

Reply via email to