Hello,
I'm looking for some configuration guidance to help improve
performance of my application, which tends to do a lot more indexing
than searching.
At present, it needs to index around two documents / sec - a document
being the stripped content of a webpage. However, performance was so
poor that I've had to disable indexing of the webpage content as an
emergency measure. In addition, some search queries take an
inordinate length of time - regularly over 60 seconds.
This is running on a medium sized EC2 instance (2 x 2GHz Opterons and
8GB RAM), and there's not too much else going on on the box. In
total, there are about 1.5m documents in the index.
I'm using a fairly standard configuration - the things I've tried
changing so far have been parameters like maxMergeDocs, mergeFactor
and the autoCommit options. I'm only using the
StandardRequestHandler, no faceting. I have a scheduled task causing
a database commit every 15 seconds.
Obviously, every workload varies, but could anyone comment on whether
this sort of hardware should, with proper configuration, be able to
manage this sort of workload?
I can't see signs of Solr being IO-bound, CPU-bound or memory-bound,
although my scheduled commit operation, or perhaps GC, does spike up
the CPU utilisation at intervals.
Any help appreciated!
James