On 5/2/2013 4:24 AM, Annette Newton wrote:
> Hi Shawn,
> 
> Thanks so much for your response.  We basically are very write intensive
> and write throughput is pretty essential to our product.  Reads are
> sporadic and actually is functioning really well.
> 
> We write on average (at the moment) 8-12 batches of 35 documents per
> minute.  But we really will be looking to write more in the future, so need
> to work out scaling of solr and how to cope with more volume.
> 
> Schema (I have changed the names) :
> 
> http://pastebin.com/x1ry7ieW
> 
> Config:
> 
> http://pastebin.com/pqjTCa7L

This is very clean.  There's probably more you could remove/comment, but
generally speaking I couldn't find any glaring issues.  In particular,
you have disabled autowarming, which is a major contributor to commit
speed problems.

The first thing I think I'd try is increasing zkClientTimeout to 30 or
60 seconds.  You can use the startup commandline or solr.xml, I would
probably use the latter.  Here's a solr.xml fragment that uses a system
property or a 15 second default:

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true" sharedLib="lib">
  <cores adminPath="/admin/cores"
zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
hostContext="solr">

General thoughts, these changes might not help this particular issue:
You've got autoCommit with openSearcher=true.  This is a hard commit.
If it were me, I would set that up with openSearcher=false and either do
explicit soft commits from my application or set up autoSoftCommit with
a shorter timeframe than autoCommit.

This might simply be a scaling issue, where you'll need to spread the
load wider than four shards.  I know that there are financial
considerations with that, and they might not be small, so let's leave
that alone for now.

The memory problems might be a symptom/cause of the scaling issue I just
mentioned.  You said you're using facets, which can be a real memory hog
even with only a few of them.  Have you tried facet.method=enum to see
how it performs?  You'd need to switch to it exclusively, never go with
the default of fc.  You could put that in the defaults or invariants
section of your request handler(s).

Another way to reduce memory usage for facets is to use disk-based
docValues on version 4.2 or later for the facet fields, but this will
increase your index size, and your index is already quite large.
Depending on your index contents, the increase may be small or large.

Something to just mention: It looks like your solrconfig.xml has
hard-coded absolute paths for dataDir and updateLog.  This is fine if
you'll only ever have one core/collection on each server, but it'll be a
disaster if you have multiples.  I could be wrong about how these get
interpreted in SolrCloud -- they might actually be relative despite
starting with a slash.

Thanks,
Shawn

Reply via email to