Oops, you're right, term listings and counts for deleted docs are
adjusted during merges. I had the impression that optimize had some
special powers here that merge does not.
Thank you for bringing expungeDeletes to my attention.
On Sat, Nov 21, 2009 at 7:46 AM, Yonik Seeley
wrote:
> On Sat, Nov
On Sat, Nov 21, 2009 at 12:33 AM, Lance Norskog wrote:
> And, terms whose documents have been deleted are not purged. So, you
> can merge all you like and the index will not shrink back completely.
Under what conditions? Certainly not all, since I just tried a simple
test and a merge removed the
And, terms whose documents have been deleted are not purged. So, you
can merge all you like and the index will not shrink back completely.
Only an optimize will remove the "orphan" terms.
This is important because the orphan terms affect relevance
calculations. So you really want to purge them wit
On Fri, Nov 20, 2009 at 2:32 PM, Michael wrote:
> On Fri, Nov 20, 2009 at 12:35 PM, Yonik Seeley
> wrote:
>> On Fri, Nov 20, 2009 at 12:24 PM, Michael wrote:
>>> So -- I thought I understood you to mean that if I frequently merge,
>>> it's basically the same as an optimize, and cruft will get pu
On Fri, Nov 20, 2009 at 12:35 PM, Yonik Seeley
wrote:
> On Fri, Nov 20, 2009 at 12:24 PM, Michael wrote:
>> So -- I thought I understood you to mean that if I frequently merge,
>> it's basically the same as an optimize, and cruft will get purged. Am
>> I misunderstanding you?
>
> That only appli
On Fri, Nov 20, 2009 at 12:24 PM, Michael wrote:
> So -- I thought I understood you to mean that if I frequently merge,
> it's basically the same as an optimize, and cruft will get purged. Am
> I misunderstanding you?
That only applies to the segments involved in the merge. The deleted
document
Hoss,
Using Solr 1.4, I see constant index growth until an optimize. I
commit (hundreds of updates) every 5 minutes and have a mergefactor of
10, but every 50 minutes I don't see the index collapse down to its
original size -- it's slightly larger.
Over the course of a week, the index grew from
On Tue, Nov 17, 2009 at 2:24 PM, Chris Hostetter
wrote:
>
> : Basically, search entries are keyed to other documents. We have finite
> : storage,
> : so we purge old documents. My understanding was that deleted documents
> : still
> : take space until an optimize is done. Therefore, if I don't
: Basically, search entries are keyed to other documents. We have finite
: storage,
: so we purge old documents. My understanding was that deleted documents
: still
: take space until an optimize is done. Therefore, if I don't optimize, the
: index
: size on disk will grow without bound.
:
: A
R, IR
- Original Message
> From: Jerome L Quinn
> To: solr-user@lucene.apache.org
> Sent: Mon, November 16, 2009 10:05:55 AM
> Subject: Re: Solr 1.3 query and index perf tank during optimize
>
>
>
> Otis Gospodnetic wrote on 11/13/2009 11:15:43
> PM:
>
Otis Gospodnetic wrote on 11/13/2009 11:15:43
PM:
> Let's take a step back. Why do you need to optimize? You said: "As
> long as I'm not optimizing, search and indexing times are
satisfactory." :)
>
> You don't need to optimize just because you are continuously adding
> and deleting documents
Good question!
The terms in the deleted documents are left behind, and so the
relevance behavior will be off. The other space used directly by
documents will be reabsorbed. (??)
On Sat, Nov 14, 2009 at 1:28 PM, Jerome L Quinn wrote:
>
>
> Lance Norskog wrote on 11/13/2009 11:18:42 PM:
>
>> The
Lance Norskog wrote on 11/13/2009 11:18:42 PM:
> The 'maxSegments' feature is new with 1.4. I'm not sure that it will
> cause any less disk I/O during optimize.
It could still be useful to manage the "too many open files" problem that
rears its ugly head on occasion.
> The 'mergeFactor=2' id
The 'maxSegments' feature is new with 1.4. I'm not sure that it will
cause any less disk I/O during optimize.
The 'mergeFactor=2' idea is not what you think: in this case the index
is always "mostly optimized", so you never need to run optimize.
Indexing is always slower, because you amortize the
ematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
- Original Message
> From: Jerome L Quinn
> To: solr-user@lucene.apache.org
> Sent: Thu, November 12, 2009 6:30:42 PM
> Subject: Solr 1.3 query and index
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM:
> On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
> wrote:
> > I think we sorely need a Directory impl that down-prioritizes IO
> > performed by merging.
>
> It's unclear if this case is caused by IO contention, or the OS cache
> of the hot p
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM:
>
> On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
> wrote:
> > I think we sorely need a Directory impl that down-prioritizes IO
> > performed by merging.
>
> It's unclear if this case is caused by IO contention, or the OS cache
> of the hot
Mark Miller wrote on 11/12/2009 07:18:03 PM:
> Ah, the pains of optimization. Its kind of just how it is. One solution
> is to use two boxes and replication - optimize on the master, and then
> queries only hit the slave. Out of reach for some though, and adds many
> complications.
Yes, in my us
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
wrote:
> I think we sorely need a Directory impl that down-prioritizes IO
> performed by merging.
It's unclear if this case is caused by IO contention, or the OS cache
of the hot parts of the index being lost by that extra IO activity.
Of course
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
wrote:
> I think we sorely need a Directory impl that down-prioritizes IO
> performed by merging.
Presumably this "prioritizing Directory impl" could wrap/decorate any
existing Directory.
Mike
Another thing to try, is reducing the maxThreadCount for
ConcurrentMergeScheduler.
It defaults to 3, which I think is too high -- we should change this
default to 1 (I'll open a Lucene issue).
Mike
On Thu, Nov 12, 2009 at 6:30 PM, Jerome L Quinn wrote:
>
> Hi, everyone, this is a problem I've h
I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.
It would be wonderful if from Java we could simply set a per-thread
"IO priority", but, it'll be a looong time until that's possible.
So I think for now we should make a Directory impl that emulates such
behavi
Jerome L Quinn wrote:
> Hi, everyone, this is a problem I've had for quite a while,
> and have basically avoided optimizing because of it. However,
> eventually we will get to the point where we must delete as
> well as add docs continuously.
>
> I have a Solr 1.3 index with ~4M docs at around 90G
Hi, everyone, this is a problem I've had for quite a while,
and have basically avoided optimizing because of it. However,
eventually we will get to the point where we must delete as
well as add docs continuously.
I have a Solr 1.3 index with ~4M docs at around 90G. This is a single
instance run
24 matches
Mail list logo