>
> And - indexing 160k documents now takes 5min instead of 1.5h!
>
Awesome! It works for all!
(Now I can go relaxed on vacation. :-D )
>
Take me along!
Cheers
Avlesh
On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:
> Juhu, great news, guys. I merged m
Thanks for the tip, Shalin. I'm happy with 6 indexes running in parallel
and completing in less than 10min, right now, but I'll have look anyway.
Shalin Shekhar Mangar schrieb:
On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:
Juhu, great news, guys.
On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:
> Juhu, great news, guys. I merged my child entity into the root entity, and
> changed the custom entityprocessor to handle the additional columns
> correctly.
> And - indexing 160k documents now takes 5min
Juhu, great news, guys. I merged my child entity into the root entity,
and changed the custom entityprocessor to handle the additional columns
correctly.
And - indexing 160k documents now takes 5min instead of 1.5h!
(Now I can go relaxed on vacation. :-D )
Conclusion:
In my case performance w
>
> does DIH call commit periodically, or are things done in one big batch?
>
AFAIK, one big batch.
Cheers
Avlesh
On Thu, Aug 6, 2009 at 11:23 PM, Yonik Seeley wrote:
> On Mon, Aug 3, 2009 at 12:32 PM, Chantal
> Ackermann wrote:
> > avg-cpu: %user %nice%sys %iowait %idle
> > 1
On Mon, Aug 3, 2009 at 12:32 PM, Chantal
Ackermann wrote:
> avg-cpu: %user %nice %sys %iowait %idle
> 1.23 0.00 0.03 0.03 98.71
>
> Basically, it is doing very little? *scratch*
How often is commit being called? (a Lucene commit sync's all of the
index files so a cra
>
> Do you think it's possible to return (in the nested entity) rows
> independent of the unique id, and let the processor decide when a document
> is complete?
>
I don't think so.
In my case, I had 9 (JDBC) entities for each document. Most of these
entities returned a single column and limited nu
Hi all,
to keep this thread up to date... ;-)
d) jdbc batch size
changed to 10. (Was default: 500, then 1000)
The problem with my dih setup is that the root entity query returns a
huge set (all ids that shall be indexed). A larger fetchsize would be
good for that query.
The nested entity, ho
Hi Avlesh,
hi Otis,
hi Grant,
hi all,
(enumerating to keep track of all the input)
a) mergeFactor 1000 too high
I'll change that back to 10. I thought it would make Lucene use more RAM
before starting IO.
b) ramBufferSize:
OK, or maybe more. I'll keep that in mind.
c) solrconfig.xml - defau
How big are your documents? I haven't benchmarked DIH, so I am not
sure what to expect, but it does seem like something isn't right. Can
you fully describe how you are indexing? Have you done any profiling?
On Aug 3, 2009, at 12:32 PM, Chantal Ackermann wrote:
Hi all,
I'm still struggli
, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
- Original Message
> From: Chantal Ackermann
> To: "solr-user@lucene.apache.org"
> Sent: Monday, August 3, 2009 12:32:12 PM
> Subject: Re: mergeFactor / indexing speed
>
> Hi all,
>
> I'm still st
>
> avg-cpu: %user %nice%sys %iowait %idle
> 1.230.000.030.03 98.71
>
I agree, real bad statistics, actually.
Currently, I've set mergeFactor to 1000 and ramBufferSize to 256MB.
>
To me the former appears to be too high and latter too low (for your machine
configur
Hi all,
I'm still struggling with the index performance. I've moved the indexer
to a different machine, now, which is faster and less occupied.
The new machine is a 64bit 8Gig-RAM RedHat. JDK1.6, Tomcat 6.0.18,
running with those settings (and others):
-server -Xms1G -Xmx7G
Currently, I've set
Hi again!
Thanks for the answer, Grant.
> It could very well be the case that you aren't seeing any merges with
> only 20K docs. Ultimately, if you really want to, you can look in
> your data.dir and count the files. If you have indexed a lot and have
> an MF of 100 and haven't done an optimiz
On Jul 31, 2009, at 8:04 AM, Chantal Ackermann wrote:
Dear all,
I want to find out which settings give the best full index
performance for my setup.
Therefore, I have been running a small index (less than 20k
documents) with a mergeFactor of 10 and 100.
In both cases, indexing took about
Dear all,
I want to find out which settings give the best full index performance
for my setup.
Therefore, I have been running a small index (less than 20k documents)
with a mergeFactor of 10 and 100.
In both cases, indexing took about 11.5 min:
mergeFactor: 10
0:11:46.792
mergeFactor: 100
/ad
16 matches
Mail list logo