On 5-Apr-08, at 7:09 AM, Britske wrote:
Indexing of these documents takes a long time. Because of the size
of the
documents (because of the indexed fields) I am currently batching 50
documents at once which takes about 2 seconds.Without adding the 10000
indexed fields to the document, indexing flies at about 15 ms for
these 50
documents. INdexing is done using SolrJ
This is on a intel core 2 6400 @2.13ghz and 2 gb ram.
To speed this up I let 2 threads do the indexing in parallel. What
happens
is that solr just takes double the time (about 4 seconds) to
complete these
two jobs of 50 docs each in parallel. I figured because of the multi-
core
setup indexing should improve, which it doesn't.
Multiple processors really only help indexing speeds when there is
heavy analysis.
Does this perhaps indicate that the setup is IO-bound? What would be
your
best guess (given the fact that the schema has a big amount of
indexed
fields) to try next to improve indexing performance?
Use Lucene 2.3 with solr 1.2, or simple try out solr trunk. The
indexing has been reworked to be considerably faster (it also makes
better use of multiple processors by spawing a background merging
thread).
-Mike