Re: Performance help for heavy indexing workload

Mike Klaas Tue, 12 Feb 2008 12:18:28 -0800

On 11-Feb-08, at 11:38 PM, James Brady wrote:

Hello,
I'm looking for some configuration guidance to help improveperformance of my application, which tends to do a lot moreindexing than searching.
At present, it needs to index around two documents / sec - adocument being the stripped content of a webpage. However,performance was so poor that I've had to disable indexing of thewebpage content as an emergency measure. In addition, some searchqueries take an inordinate length of time - regularly over 60 seconds.
This is running on a medium sized EC2 instance (2 x 2GHz Opteronsand 8GB RAM), and there's not too much else going on on the box. Intotal, there are about 1.5m documents in the index.
I'm using a fairly standard configuration - the things I've triedchanging so far have been parameters like maxMergeDocs, mergeFactorand the autoCommit options. I'm only using theStandardRequestHandler, no faceting. I have a scheduled taskcausing a database commit every 15 seconds.

By "database commit" do you mean "solr commit"? If so, that is fartoo frequent if you are sorting on big fields.

I use Solr to serve queries for ~10m docs on a medium size EC2instance. This is an optimized configuration where highlighting isbroken off into a separate index, and load balanced into twosubindices of 5m docs a piece. I do a good deal of faceting but nosorting. The only reason that this is possible is that the index isonly updated every few days.

On another box we have a several hundred thousand document indexwhich is updated relatively frequently (autocommit time: 20s). Theseare merged with the static-er index to create an illusion of real-time index updates.

When lucene supports efficient, reopen()able fieldcache upates, thissituation might improve, but the above architecture would stillprobably be better. Note that the second index can be on the samemachine.


-Mike

Re: Performance help for heavy indexing workload

Reply via email to