Hi Erick, Dimitry and Mikhail
thank you all for your time. I tried all of the suggestions below and am happy
to report that indexing speeds have improved. There were several confounding
problems including
- a bank of (~20) regexes that were poorly optimized and compiled at each
indexing step
-
How have you determined that it's the solr add? By timing the call on the
SolrJ side or by looking at the machine where Solr is running? This is the
very first thing you have to answer. You can get a rough ides with any
simple profiler (say Activity Monitor no a Mac, Task Manager on a Windows
box).
Dmitry,
If you start to speak about logging, don't forget to say that jdk logging
is absolutely not really performant, but is default for 3.x. Logback is
much faster.
Peyman,
1. shingles has performance implication. That is. it can cost much. Why
term positions and phrase queries are not enough f
one approach we have taken was decreasing the solr logging level for
the posting session, described here (implemented for 1.4, but should
be easy to port to 3.x):
http://dmitrykan.blogspot.com/2011/01/solr-speed-up-batch-posting.html
On 3/11/12, Yandong Yao wrote:
> I have similar issues by usin
I have similar issues by using DIH,
and org.apache.solr.update.DirectUpdateHandler2.addDoc(AddUpdateCommand)
consumes most of the time when indexing 10K rows (each row is about 70K)
- DIH nextRow takes about 10 seconds totally
- If index uses whitespace tokenizer and lower case filter, th
Hi
I am trying to index 12MM docs faster than is currently happening in Solr
(using solrj). We have identified solr's add method as the bottleneck (and not
commit - which is tuned ok through mergeFactor and maxRamBufferSize and jvm
ram).
Adding 1000 docs is taking approximately 25 seconds. We