AW: large scale indexing issues / single threaded bottleneck

2011-11-03 Thread sebastian.reese
endet: Donnerstag, 3. November 2011 14:00 An: 'solr-user@lucene.apache.org' Betreff: RE: large scale indexing issues / single threaded bottleneck Shishir, we have 35 million "documents", and should be doing about 5000-1 new "documents" a day, but with very small &qu

RE: large scale indexing issues / single threaded bottleneck

2011-11-03 Thread Jaeger, Jay - DOT
November 01, 2011 10:58 PM To: solr-user@lucene.apache.org Subject: RE: large scale indexing issues / single threaded bottleneck Roman, How frequently do you update your index? I have a need to do real time add/delete to SOLR documents at a rate of approximately 20/min. The total number of documents

RE: large scale indexing issues / single threaded bottleneck

2011-11-01 Thread Roman Alekseenkov
> The total number of documents are in the range of 4 million. Will there > be any performance issues? > > Thanks, > Shishir > -- View this message in context: http://lucene.472066.n3.nabble.com/large-scale-indexing-issues-single-threaded-bottleneck-tp3461815p3472901.html Sen

RE: large scale indexing issues / single threaded bottleneck

2011-11-01 Thread Awasthi, Shishir
Alekseenkov [mailto:ralekseen...@gmail.com] Sent: Sunday, October 30, 2011 6:11 PM To: solr-user@lucene.apache.org Subject: Re: large scale indexing issues / single threaded bottleneck Guys, thank you for all the replies. I think I have figured out a partial solution for the problem on Friday

Re: large scale indexing issues / single threaded bottleneck

2011-10-31 Thread Kiril Menshikov
Yonik, Adding overwrite=false don't help. XMLLoader don't check this HTTP parameter. Instead it check attribute in XML tag, with the same name. -Kiril -- View this message in context: http://lucene.472066.n3.nabble.com/large-scale-indexing-issues-single-threaded-bottleneck-tp34618

Re: large scale indexing issues / single threaded bottleneck

2011-10-30 Thread Roman Alekseenkov
ot;overwrite=false" didn't help, but the hack did. Once again, thank you for the answers and recommendations Roman -- View this message in context: http://lucene.472066.n3.nabble.com/large-scale-indexing-issues-single-threaded-bottleneck-tp3461815p3466523.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: large scale indexing issues / single threaded bottleneck

2011-10-29 Thread Nagendra Nagarajayya
Roman: 2) what would be the best way to port these (and only these) changes to 3.4.0? I tried to dig into the branching and revisions, but got lost quickly. Tried something like "svn diff […]realtime_search@r953476 […]realtime_search@r1097767", but I'm not sure if it's even possible to merge th

Re: large scale indexing issues / single threaded bottleneck

2011-10-29 Thread Yonik Seeley
On Sat, Oct 29, 2011 at 6:35 AM, Michael McCandless wrote: > I saw a mention somewhere that you can tell Solr not to use > IW.addDocument (not IW.updateDocument) when you add a document if you > are certain it's not replacing a previous document with the same ID Right - adding overwrite=false to

Re: large scale indexing issues / single threaded bottleneck

2011-10-29 Thread Michael McCandless
On Fri, Oct 28, 2011 at 3:27 PM, Simon Willnauer wrote: > one more thing, after somebody (thanks robert) pointed me at the > stacktrace it seems kind of obvious what the root cause of your > problem is. Its solr :) Solr closes the IndexWriter on commit which is > very wasteful since you basically

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Jason Rutherglen
> abstract away the encoding of the index Robert, this is what you wrote. "Abstract away the encoding of the index" means pluggable, otherwise it's not abstract and / or it's a flawed design. Sounds like it's the latter.

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Robert Muir
On Fri, Oct 28, 2011 at 8:10 PM, Jason Rutherglen wrote: >> Otherwise we have "flexible indexing" where "flexible" means "slower >> if you do anything but the default". > > The other encodings should exist as modules since they are pluggable. > 4.0 can ship with the existing codec.  4.1 with addit

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Jason Rutherglen
> Otherwise we have "flexible indexing" where "flexible" means "slower > if you do anything but the default". The other encodings should exist as modules since they are pluggable. 4.0 can ship with the existing codec. 4.1 with additional codecs and the bulk postings at a later time. Otherwise it

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Robert Muir
On Fri, Oct 28, 2011 at 5:03 PM, Jason Rutherglen wrote: > +1 I suggested it should be backported a while back.  Or that Lucene > 4.x should be released.  I'm not sure what is holding up Lucene 4.x at > this point, bulk postings is only needed useful for PFOR. This is not true, most modern index

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Jason Rutherglen
> We should maybe try to fix this in 3.x too? +1 I suggested it should be backported a while back. Or that Lucene 4.x should be released. I'm not sure what is holding up Lucene 4.x at this point, bulk postings is only needed useful for PFOR. On Fri, Oct 28, 2011 at 3:27 PM, Simon Willnauer wro

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Simon Willnauer
On Fri, Oct 28, 2011 at 9:17 PM, Simon Willnauer wrote: > Hey Roman, > > On Fri, Oct 28, 2011 at 8:38 PM, Roman Alekseenkov > wrote: >> Hi everyone, >> >> I'm looking for some help with Solr indexing issues on a large scale. >> >> We are indexing few terabytes/month on a sizeable Solr cluster (8

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Simon Willnauer
Hey Roman, On Fri, Oct 28, 2011 at 8:38 PM, Roman Alekseenkov wrote: > Hi everyone, > > I'm looking for some help with Solr indexing issues on a large scale. > > We are indexing few terabytes/month on a sizeable Solr cluster (8 > masters / serving writes, 16 slaves / serving reads). After certain

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Roman Alekseenkov
I'm wondering if this is relevant: https://issues.apache.org/jira/browse/LUCENE-2680 - Improve how IndexWriter flushes deletes against existing segments Roman On Fri, Oct 28, 2011 at 11:38 AM, Roman Alekseenkov wrote: > Hi everyone, > > I'm looking for some help with Solr indexing issues on a la

large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Roman Alekseenkov
Hi everyone, I'm looking for some help with Solr indexing issues on a large scale. We are indexing few terabytes/month on a sizeable Solr cluster (8 masters / serving writes, 16 slaves / serving reads). After certain amount of tuning we got to the point where a single Solr instance can handle ind