Re: mergeFactor / indexing speed

2009-08-09 Thread Avlesh Singh
> > And - indexing 160k documents now takes 5min instead of 1.5h! > Awesome! It works for all! (Now I can go relaxed on vacation. :-D ) > Take me along! Cheers Avlesh On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann < chantal.ackerm...@btelligent.de> wrote: > Juhu, great news, guys. I merged m

Re: mergeFactor / indexing speed

2009-08-07 Thread Chantal Ackermann
Thanks for the tip, Shalin. I'm happy with 6 indexes running in parallel and completing in less than 10min, right now, but I'll have look anyway. Shalin Shekhar Mangar schrieb: On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann < chantal.ackerm...@btelligent.de> wrote: Juhu, great news, guys.

Re: mergeFactor / indexing speed

2009-08-07 Thread Shalin Shekhar Mangar
On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann < chantal.ackerm...@btelligent.de> wrote: > Juhu, great news, guys. I merged my child entity into the root entity, and > changed the custom entityprocessor to handle the additional columns > correctly. > And - indexing 160k documents now takes 5min

Re: mergeFactor / indexing speed

2009-08-07 Thread Chantal Ackermann
Juhu, great news, guys. I merged my child entity into the root entity, and changed the custom entityprocessor to handle the additional columns correctly. And - indexing 160k documents now takes 5min instead of 1.5h! (Now I can go relaxed on vacation. :-D ) Conclusion: In my case performance w

Re: mergeFactor / indexing speed

2009-08-06 Thread Avlesh Singh
> > does DIH call commit periodically, or are things done in one big batch? > AFAIK, one big batch. Cheers Avlesh On Thu, Aug 6, 2009 at 11:23 PM, Yonik Seeley wrote: > On Mon, Aug 3, 2009 at 12:32 PM, Chantal > Ackermann wrote: > > avg-cpu: %user %nice%sys %iowait %idle > > 1

Re: mergeFactor / indexing speed

2009-08-06 Thread Yonik Seeley
On Mon, Aug 3, 2009 at 12:32 PM, Chantal Ackermann wrote: > avg-cpu:  %user   %nice    %sys %iowait   %idle >           1.23    0.00    0.03    0.03   98.71 > > Basically, it is doing very little? *scratch* How often is commit being called? (a Lucene commit sync's all of the index files so a cra

Re: mergeFactor / indexing speed

2009-08-06 Thread Avlesh Singh
> > Do you think it's possible to return (in the nested entity) rows > independent of the unique id, and let the processor decide when a document > is complete? > I don't think so. In my case, I had 9 (JDBC) entities for each document. Most of these entities returned a single column and limited nu

Re: mergeFactor / indexing speed

2009-08-06 Thread Chantal Ackermann
Hi all, to keep this thread up to date... ;-) d) jdbc batch size changed to 10. (Was default: 500, then 1000) The problem with my dih setup is that the root entity query returns a huge set (all ids that shall be indexed). A larger fetchsize would be good for that query. The nested entity, ho

Re: mergeFactor / indexing speed

2009-08-03 Thread Chantal Ackermann
Hi Avlesh, hi Otis, hi Grant, hi all, (enumerating to keep track of all the input) a) mergeFactor 1000 too high I'll change that back to 10. I thought it would make Lucene use more RAM before starting IO. b) ramBufferSize: OK, or maybe more. I'll keep that in mind. c) solrconfig.xml - defau

Re: mergeFactor / indexing speed

2009-08-03 Thread Grant Ingersoll
How big are your documents? I haven't benchmarked DIH, so I am not sure what to expect, but it does seem like something isn't right. Can you fully describe how you are indexing? Have you done any profiling? On Aug 3, 2009, at 12:32 PM, Chantal Ackermann wrote: Hi all, I'm still struggli

Re: mergeFactor / indexing speed

2009-08-03 Thread Otis Gospodnetic
, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Chantal Ackermann > To: "solr-user@lucene.apache.org" > Sent: Monday, August 3, 2009 12:32:12 PM > Subject: Re: mergeFactor / indexing speed > > Hi all, > > I'm still st

Re: mergeFactor / indexing speed

2009-08-03 Thread Avlesh Singh
> > avg-cpu: %user %nice%sys %iowait %idle > 1.230.000.030.03 98.71 > I agree, real bad statistics, actually. Currently, I've set mergeFactor to 1000 and ramBufferSize to 256MB. > To me the former appears to be too high and latter too low (for your machine configur

Re: mergeFactor / indexing speed

2009-08-03 Thread Chantal Ackermann
Hi all, I'm still struggling with the index performance. I've moved the indexer to a different machine, now, which is faster and less occupied. The new machine is a 64bit 8Gig-RAM RedHat. JDK1.6, Tomcat 6.0.18, running with those settings (and others): -server -Xms1G -Xmx7G Currently, I've set

Re: mergeFactor / indexing speed

2009-07-31 Thread Chantal Ackermann
Hi again! Thanks for the answer, Grant. > It could very well be the case that you aren't seeing any merges with > only 20K docs. Ultimately, if you really want to, you can look in > your data.dir and count the files. If you have indexed a lot and have > an MF of 100 and haven't done an optimiz

Re: mergeFactor / indexing speed

2009-07-31 Thread Grant Ingersoll
On Jul 31, 2009, at 8:04 AM, Chantal Ackermann wrote: Dear all, I want to find out which settings give the best full index performance for my setup. Therefore, I have been running a small index (less than 20k documents) with a mergeFactor of 10 and 100. In both cases, indexing took about

mergeFactor / indexing speed

2009-07-31 Thread Chantal Ackermann
Dear all, I want to find out which settings give the best full index performance for my setup. Therefore, I have been running a small index (less than 20k documents) with a mergeFactor of 10 and 100. In both cases, indexing took about 11.5 min: mergeFactor: 10 0:11:46.792 mergeFactor: 100 /ad