Hi all,

I wanted to share the issues we're having with Solr 1.4 to get some ideas
of things we can do in the short term that will buy us enough time to
validate Solr 4 before upgrading and not have 1.4 burn to the ground before
we get there.

We've been running Solr 1.4 in production for over 3 years now, but are
really starting to hit some performance bottlenecks that are beginning to
affect our users. Here are the details of our setup:

We're running 2 4-CPU Solr servers. The data is on a 4-disk RAID 10 array
and we're using block-level replication via DRBD over GigE to write to the
standby node. Only one server is serving traffic at a time.

Some tuning information:
- Merge Factor: 25
- Auto Commit: 60s / 1000 docs

What we're seeing:
In roughly 14 hour cycles, the CPU usage climbs from 100% to between 200
and 250%. At the end of the cycle, we get one long commit of roughly 500
seconds, blocking all writes. Around the same time queries begin to get
very slow, often causing timeouts from connecting clients. This behavior is
cyclical, and is getting progressively worse.

What is this, and what can we do about it?

I've attached relevant graphs. Apologies in advance for the obscenely large
image sizes.

Cheers,
Stephen

 
client-requests-2.png<https://docs.google.com/file/d/0B7_6ZI9PZjjUN1lhd1hfSE9Jc2M/edit?usp=drive_web>

 
cpu-usage.png<https://docs.google.com/file/d/0B7_6ZI9PZjjUSHpsY1B2T01iVGM/edit?usp=drive_web>

 
disk-ios-2.png<https://docs.google.com/file/d/0B7_6ZI9PZjjUNEpkMGRkR3dhYVk/edit?usp=drive_web>

 
mem-usage-2.png<https://docs.google.com/file/d/0B7_6ZI9PZjjUWnFVZlU3aUxYNXc/edit?usp=drive_web>

 
tcp-connections-2.png<https://docs.google.com/file/d/0B7_6ZI9PZjjUYmdvMmpDSlVvQUE/edit?usp=drive_web>

Reply via email to