Hi,

It could be that you are simply seeing the effect of index segment merges that 
take longer as segments get bigger.  Or it could be that the JVM doesn't have 
enough memory and is running GC too often.  Do you see high CPU load or lots of 
disk IO or something else?

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Gustavo A. Lopes <galo...@mediacapital.pt>
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Sent: Saturday, April 18, 2009 10:00:18 PM
> Subject: Slow indexing with data import handler
> 
> I'm indexing around 1 million documents of one type that requires 4 
> additional 
> queries for each document + 0,5 M documents that only require 1 query for all.
> 
> I'm using the data import handler from contrib with SolrWriter modified with 
> allowDups = true (doesn't seem to have made any difference).
> 
> This doesn't seem to be a that many documents, however, after 21 hours, I 
> have 
> only ~700 k documents of the first type indexed. The size of index is 
> currently 
> 2.1 GB
> 
> I'm noticing the initial import rate is relatively high, such as all the 
> documents of first type would be indexed in less than 6 hours if it were 
> maintained. When the number of documents already imported rises, the import 
> rate 
> falls significatively.
> 
> Does anyone have any suggestions on how to speed up full imports? What is the 
> bottleneck? I will probably have to make some changes to schema over the next 
> days that will require new imports.
> 
> thanks
> 
> 
> 
> Esta mensagem e quaisquer ficheiros anexos podem conter informação 
> confidencial 
> ou de uso restrito. Se não for o destinatário da mesma por favor notifique 
> imediatamente o seu remetente e proceda à sua destruição. Não poderá revelar, 
> copiar, distribuir ou de alguma forma usar o seu conteúdo. O Grupo Media 
> Capital 
> e suas associadas utilizam software de anti-virus. No entanto, não obstante 
> terem sido tomadas todas as precauções, não é garantido que a mensagem ou os 
> seus anexos não contenham vírus.
> 
> This message, including any attachments, may contain confidential information 
> or 
> privileged material. If you are not the intended recipient please notify the 
> sender immediately by e-mail and delete it from your system. You should not 
> disseminate, distribute or copy this e-mail or disclose its content. We 
> believe, 
> but do not warrant, that this e-mail, including any attachments, is virus 
> free.

Reply via email to