Re: index size with replication

Mike Austin Thu, 15 Mar 2012 08:55:01 -0700

The problem is that when replicating, the double-size index gets replicated
to slaves.  I am now doing a dummy commit with always the same document and
it works fine.. After the optimize and dummy commit process I just end up
with numDocs = x and maxDocs = x+1.  I don't get the nice green checkmark
in the admin interface but I can live with that.


mike

On Thu, Mar 15, 2012 at 8:17 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> Or just ignore it if you have the disk space. The files will be cleaned up
> eventually. I believe they'll magically disappear if you simply bounce the
> server (but work on *nix so can't personally guarantee it). And replication
> won't replicate the stale files, so that's not a problem either....
>
> Best
> Erick
>
> On Wed, Mar 14, 2012 at 11:54 PM, Mike Austin <mike.aus...@juggle.com>
> wrote:
> > Shawn,
> >
> > Thanks for the detailed answer! I will play around with this information
> in
> > hand.  Maybe a second optimize or just a dummy commit after the optimize
> > will help get me past this.  Both not the best options, but maybe it's a
> do
> > it because it's running on windows work-around. If it is indeed a file
> > locking issue, I think I can probably work around this since my indexing
> is
> > scheduled at certain times and not "live" so I could try the optimize
> again
> > soon after or do a single commit that seems to fix the issue also.  Or
> just
> > not optimize..
> >
> > Thanks,
> > Mike
> >
> > On Wed, Mar 14, 2012 at 6:34 PM, Shawn Heisey <s...@elyograg.org> wrote:
> >
> >> On 3/14/2012 2:54 PM, Mike Austin wrote:
> >>
> >>> The odd thing is that if I optimize the index it doubles in size.. If I
> >>> then, add one more document to the index it goes back down to half
> size?
> >>>
> >>> Is there a way to force this without needing to wait until another
> >>> document
> >>> is added? Or do you have more information on what you think is going
> on?
> >>> I'm using a trunk version of solr4 from 9/12/2011 with a master with
> two
> >>> slaves setup.  Everything besides this is working great!
> >>>
> >>
> >> The not-very-helpful-but-true answer: Don't run on Windows.  I checked
> >> your prior messages to the list to verify that this is your environment.
> >>  If you can control index updates so they don't happen at the same time
> as
> >> your optimizes, you can also get around this problem by doing the
> optimize
> >> twice.  You would have to be absolutely sure that no changes are made to
> >> the index between the two optimizes, so the second one basically
> doesn't do
> >> anything except take care of the deletes.
> >>
> >> Nuts and bolts of why this happens: Solr keeps the old files open so the
> >> existing reader can continue to serve queries.  That reader will not be
> >> closed until the last query completes, which may not happen until well
> >> after the time the new reader is completely online and ready.  I assume
> >> that the delete attempt occurs as soon as the new index segments are
> >> completely online, before the old reader begins to close.  I've not read
> >> the source code to find out.
> >>
> >> On Linux and other UNIX-like environments, you can delete files while
> they
> >> are open by a process.  They continue to exist as in-memory links and
> take
> >> up space until those processes close them, at which point they are truly
> >> gone.  On Windows, an attempt to delete an open file will fail, even if
> >> it's open read-only.
> >>
> >> There are probably a number of ways that this problem could be solved
> for
> >> Windows platforms.  The simplest that I can think of, assuming it's even
> >> possible, would be to wait until the old reader is closed before
> attempting
> >> the segment deletion.  That may not be possible - the information may
> not
> >> be available to the portion of code that does the deletion.  There are a
> >> few things standing in the way of me fixing this problem myself: 1) I'm
> a
> >> beginning Java programmer.  2) I'm not familiar with the Solr code at
> all.
> >> 3) My interest level is low because I run on Linux, not Windows.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>

Re: index size with replication

Reply via email to