Performance help for heavy indexing workload

James Brady Mon, 11 Feb 2008 23:39:03 -0800

Hello,

I'm looking for some configuration guidance to help improveperformance of my application, which tends to do a lot more indexingthan searching.

At present, it needs to index around two documents / sec - a documentbeing the stripped content of a webpage. However, performance was sopoor that I've had to disable indexing of the webpage content as anemergency measure. In addition, some search queries take aninordinate length of time - regularly over 60 seconds.

This is running on a medium sized EC2 instance (2 x 2GHz Opterons and8GB RAM), and there's not too much else going on on the box. Intotal, there are about 1.5m documents in the index.

I'm using a fairly standard configuration - the things I've triedchanging so far have been parameters like maxMergeDocs, mergeFactorand the autoCommit options. I'm only using theStandardRequestHandler, no faceting. I have a scheduled task causinga database commit every 15 seconds.

Obviously, every workload varies, but could anyone comment on whetherthis sort of hardware should, with proper configuration, be able tomanage this sort of workload?

I can't see signs of Solr being IO-bound, CPU-bound or memory-bound,although my scheduled commit operation, or perhaps GC, does spike upthe CPU utilisation at intervals.


Any help appreciated!
James

Performance help for heavy indexing workload

Reply via email to