Re: Solr 1.3 query and index perf tank during optimize

2009-11-22 Thread Lance Norskog
Oops, you're right, term listings and counts for deleted docs are adjusted during merges. I had the impression that optimize had some special powers here that merge does not. Thank you for bringing expungeDeletes to my attention. On Sat, Nov 21, 2009 at 7:46 AM, Yonik Seeley wrote: > On Sat, Nov

Re: Solr 1.3 query and index perf tank during optimize

2009-11-21 Thread Yonik Seeley
On Sat, Nov 21, 2009 at 12:33 AM, Lance Norskog wrote: > And, terms whose documents have been deleted are not purged. So, you > can merge all you like and the index will not shrink back completely. Under what conditions? Certainly not all, since I just tried a simple test and a merge removed the

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Lance Norskog
And, terms whose documents have been deleted are not purged. So, you can merge all you like and the index will not shrink back completely. Only an optimize will remove the "orphan" terms. This is important because the orphan terms affect relevance calculations. So you really want to purge them wit

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Yonik Seeley
On Fri, Nov 20, 2009 at 2:32 PM, Michael wrote: > On Fri, Nov 20, 2009 at 12:35 PM, Yonik Seeley > wrote: >> On Fri, Nov 20, 2009 at 12:24 PM, Michael wrote: >>> So -- I thought I understood you to mean that if I frequently merge, >>> it's basically the same as an optimize, and cruft will get pu

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Michael
On Fri, Nov 20, 2009 at 12:35 PM, Yonik Seeley wrote: > On Fri, Nov 20, 2009 at 12:24 PM, Michael wrote: >> So -- I thought I understood you to mean that if I frequently merge, >> it's basically the same as an optimize, and cruft will get purged.  Am >> I misunderstanding you? > > That only appli

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Yonik Seeley
On Fri, Nov 20, 2009 at 12:24 PM, Michael wrote: > So -- I thought I understood you to mean that if I frequently merge, > it's basically the same as an optimize, and cruft will get purged.  Am > I misunderstanding you? That only applies to the segments involved in the merge. The deleted document

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Michael
Hoss, Using Solr 1.4, I see constant index growth until an optimize. I commit (hundreds of updates) every 5 minutes and have a mergefactor of 10, but every 50 minutes I don't see the index collapse down to its original size -- it's slightly larger. Over the course of a week, the index grew from

Re: Solr 1.3 query and index perf tank during optimize

2009-11-17 Thread Israel Ekpo
On Tue, Nov 17, 2009 at 2:24 PM, Chris Hostetter wrote: > > : Basically, search entries are keyed to other documents. We have finite > : storage, > : so we purge old documents. My understanding was that deleted documents > : still > : take space until an optimize is done. Therefore, if I don't

Re: Solr 1.3 query and index perf tank during optimize

2009-11-17 Thread Chris Hostetter
: Basically, search entries are keyed to other documents. We have finite : storage, : so we purge old documents. My understanding was that deleted documents : still : take space until an optimize is done. Therefore, if I don't optimize, the : index : size on disk will grow without bound. : : A

Re: Solr 1.3 query and index perf tank during optimize

2009-11-16 Thread Otis Gospodnetic
R, IR - Original Message > From: Jerome L Quinn > To: solr-user@lucene.apache.org > Sent: Mon, November 16, 2009 10:05:55 AM > Subject: Re: Solr 1.3 query and index perf tank during optimize > > > > Otis Gospodnetic wrote on 11/13/2009 11:15:43 > PM: >

Re: Solr 1.3 query and index perf tank during optimize

2009-11-16 Thread Jerome L Quinn
Otis Gospodnetic wrote on 11/13/2009 11:15:43 PM: > Let's take a step back. Why do you need to optimize? You said: "As > long as I'm not optimizing, search and indexing times are satisfactory." :) > > You don't need to optimize just because you are continuously adding > and deleting documents

Re: Solr 1.3 query and index perf tank during optimize

2009-11-14 Thread Lance Norskog
Good question! The terms in the deleted documents are left behind, and so the relevance behavior will be off. The other space used directly by documents will be reabsorbed. (??) On Sat, Nov 14, 2009 at 1:28 PM, Jerome L Quinn wrote: > > > Lance Norskog wrote on 11/13/2009 11:18:42 PM: > >> The

Re: Solr 1.3 query and index perf tank during optimize

2009-11-14 Thread Jerome L Quinn
Lance Norskog wrote on 11/13/2009 11:18:42 PM: > The 'maxSegments' feature is new with 1.4. I'm not sure that it will > cause any less disk I/O during optimize. It could still be useful to manage the "too many open files" problem that rears its ugly head on occasion. > The 'mergeFactor=2' id

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Lance Norskog
The 'maxSegments' feature is new with 1.4. I'm not sure that it will cause any less disk I/O during optimize. The 'mergeFactor=2' idea is not what you think: in this case the index is always "mostly optimized", so you never need to run optimize. Indexing is always slower, because you amortize the

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Otis Gospodnetic
ematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Jerome L Quinn > To: solr-user@lucene.apache.org > Sent: Thu, November 12, 2009 6:30:42 PM > Subject: Solr 1.3 query and index

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Jerome L Quinn
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM: > On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless > wrote: > > I think we sorely need a Directory impl that down-prioritizes IO > > performed by merging. > > It's unclear if this case is caused by IO contention, or the OS cache > of the hot p

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Jerome L Quinn
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM: > > On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless > wrote: > > I think we sorely need a Directory impl that down-prioritizes IO > > performed by merging. > > It's unclear if this case is caused by IO contention, or the OS cache > of the hot

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Jerome L Quinn
Mark Miller wrote on 11/12/2009 07:18:03 PM: > Ah, the pains of optimization. Its kind of just how it is. One solution > is to use two boxes and replication - optimize on the master, and then > queries only hit the slave. Out of reach for some though, and adds many > complications. Yes, in my us

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Yonik Seeley
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless wrote: > I think we sorely need a Directory impl that down-prioritizes IO > performed by merging. It's unclear if this case is caused by IO contention, or the OS cache of the hot parts of the index being lost by that extra IO activity. Of course

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless wrote: > I think we sorely need a Directory impl that down-prioritizes IO > performed by merging. Presumably this "prioritizing Directory impl" could wrap/decorate any existing Directory. Mike

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless
Another thing to try, is reducing the maxThreadCount for ConcurrentMergeScheduler. It defaults to 3, which I think is too high -- we should change this default to 1 (I'll open a Lucene issue). Mike On Thu, Nov 12, 2009 at 6:30 PM, Jerome L Quinn wrote: > > Hi, everyone, this is a problem I've h

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless
I think we sorely need a Directory impl that down-prioritizes IO performed by merging. It would be wonderful if from Java we could simply set a per-thread "IO priority", but, it'll be a looong time until that's possible. So I think for now we should make a Directory impl that emulates such behavi

Re: Solr 1.3 query and index perf tank during optimize

2009-11-12 Thread Mark Miller
Jerome L Quinn wrote: > Hi, everyone, this is a problem I've had for quite a while, > and have basically avoided optimizing because of it. However, > eventually we will get to the point where we must delete as > well as add docs continuously. > > I have a Solr 1.3 index with ~4M docs at around 90G

Solr 1.3 query and index perf tank during optimize

2009-11-12 Thread Jerome L Quinn
Hi, everyone, this is a problem I've had for quite a while, and have basically avoided optimizing because of it. However, eventually we will get to the point where we must delete as well as add docs continuously. I have a Solr 1.3 index with ~4M docs at around 90G. This is a single instance run