Thanks, Lance. I already commit at the end. I will take a look at the data
import handler. Thanks again!
-- Bill
--------------------------------------------------
From: "Lance Norskog" <goks...@gmail.com>
Sent: Saturday, October 10, 2009 7:58 PM
To: <solr-user@lucene.apache.org>
Subject: Re: Tips on speeding up indexing needed...
A few things off the bat:
1) do not commit until the end.
2) use the DataImportHandler - it runs inside Solr and reads the
database. This cuts out the HTTP transfer/XML xlation overheads.
3) examine your schema. Some of the text analyzers are quite slow.
Solr tips:
http://wiki.apache.org/solr/SolrPerformanceFactors
Lucene tips:
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
And, what you don't want to hear: for jobs like this, Solr/Lucene is
disk-bound. The Windows NTFS file system is much slower than what is
available for Linux or the Mac, and these numbers are for those
machines.
Good luck!
Lance Norskog
On Sat, Oct 10, 2009 at 5:57 PM, William Pierce <evalsi...@hotmail.com>
wrote:
Oh and one more thing...For historical reasons our apps run using msft
technologies, so using SolrJ would be next to impossible at the present
time....
Thanks in advance for your help!
-- Bill
--------------------------------------------------
From: "William Pierce" <evalsi...@hotmail.com>
Sent: Saturday, October 10, 2009 5:47 PM
To: <solr-user@lucene.apache.org>
Subject: Tips on speeding up indexing needed...
Folks:
I have a corpus of approx 6 M documents each of approx 4K bytes.
Currently, the way indexing is set up I read documents from a database
and
issue solr post requests in batches (batches are set up so that the
maxPostSize of tomcat which is set to 2MB is adhered to). This means
that
in each batch we write approx 600 or so documents to SOLR. What I am
seeing
is that I am able to push about 2500 docs per minute or approx 40 or so
per
second.
I saw in Erik's talk on Friday that speeds of 250 docs/sec to 25000
docs/sec have been achieved. Needless to say I am sure that performance
numbers vary widely and are dependent on the domain, machine
configurations,
etc.
I am running on Windows 2003 server, with 4 GB RAM, dual core xeon.
Any tips on what I can do to speed this up?
Thanks,
Bill
--
Lance Norskog
goks...@gmail.com