Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

Jack Krupansky Sat, 27 Jul 2013 13:45:09 -0700

No hard numbers, but the general guidance is that you should set your hardcommit interval to match your expectations for how quickly nodes should comeup if they need to be restarted. Specifically, a hard commit assures thatall changes have been committed to disk and are ready for immediate accesson restart, but any and all soft commit changes since the last hard commitmust be "replayed" (reexecuted) on restart of a node.

How long does it take to replay the changes in the update log? No firmnumbers, but treat it as if all of those uncommitted updates had to beresent and reprocessed by Solr. It's probably faster than that, but you getthe picture.

I would suggest thinking in terms of minutes rather than seconds for hardcommits 5 minutes, 10, 15, 20, 30 minutes.

Hard commits may result in kicking off segment merges, so too rapid a rateof segment creation might cause problems or at least be counterproductive.


So, instead of 15 seconds, try 15 minutes.

OTOH, if you really need to handle 4,000 update a seconds... you are clearlyin "uncharted territory" and need to expect to need to do some heavy dutytrial and error tuning on your own.


-- Jack Krupansky

-----Original Message-----From: Tim Vaillancourt

Sent: Saturday, July 27, 2013 4:21 PM
To: solr-user@lucene.apache.org

Subject: Re: SolrCloud 4.3.1 - "Failure to open existing log file (nonfatal)" errors under high load


Thanks for the reply Erick,

Hard Commit - 15000ms, openSearcher=false
Soft Commit - 1000ms, openSearcher=true

15sec hard commit was sort of a guess, I could try a smaller number.
When you say "getting too large" what limit do you think it would be
hitting: a ulimit (nofiles), disk space, number of changes, a limit in
Solr itself?

By my math there would be 15 tlogs max per core, but I don't really know
how it all works if someone could fill me in/point me somewhere.

Cheers,

Tim

On 27/07/13 07:57 AM, Erick Erickson wrote:

What is your autocommit limit? Is it possible that your transaction
logs are simply getting too large? tlogs are truncated whenever
you do a hard commit (autocommit) with openSearcher either
true for false it doesn't matter.....

FWIW,
Erick
On Fri, Jul 26, 2013 at 12:56 AM, Tim Vaillancourt<t...@elementspace.com>wrote:
Thanks Shawn and Yonik!

Yonik: I noticed this error appears to be fairly trivial, but it is not
appearing after a previous crash. Every time I run this high-volume test
that produced my stack trace, I zero out the logs, Solr data andZookeeperdata and start over from scratch with a brand new collection and zero'dout
logs.
The test is mostly high volume (2000-4000 updates/sec) and at the starttheSolrCloud runs decently for a good 20-60~ minutes, no errors in the logsatall. Then that stack trace occurs on all 3 nodes (staggered), Iimmediatelyget some replica down messages and then some "cannot connect" errors toallother cluster nodes, who have all crashed the same way. The tlog errorcould
be a symptom of the problem of running out of threads perhaps.
Shawn: thanks so much for sharing those details! Yes, they seem to benice
servers, for sure - I don't get to touch/see them but they're fast! I'll
look into firmwares for sure and will try again after updating them.TheseSolr instances are not-bare metal and are actually KVM VMs so that'sanother
layer to look into, although it is consistent between the two clusters.
I am not currently increasing the 'nofiles' ulimit to above default likeyouare, but does Solr use 10,000+ file handles? It won't hurt to try it Iguess
:). To rule out Java 7, I'll probably also try Jetty 8 and Java 1.6 as an
experiment as well.

Thanks!

Tim


On 25/07/13 05:55 PM, Yonik Seeley wrote:
On Thu, Jul 25, 2013 at 7:44 PM, Tim Vaillancourt<t...@elementspace.com>
wrote:
"ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
Failure to open existing log file (non fatal)
That itself isn't necessarily a problem (and why it says "non fatal")
- it just means that most likely the a transaction log file was
truncated from a previous crash.  It may be unrelated to the other
issues you are seeing.

-Yonik
http://lucidworks.com

Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

Reply via email to