Shawn, Thank you!
Just some quick responses: On your overflow theory, why would this impact the client? Is is possible that a write attempt to Solr would block indefinitely while the Solr server is running wild or in a bad state due to the overflow? We attempt to set the BinaryRequestWriter but per this bug: https://issues.apache.org/jira/browse/SOLR-1565, v3.5 uses the default XML writer. On upgrading to 3.6.2 or 4.x, we have an organizational challenge that requires approval of the software/upgrade. I am promoting/supporting this idea but cannot execute in the short-term. For the mass publish, we originally used the CommonsHttpSolrServer (what we use in live production updates) but we found the trade-off with performance was quite large. I really like your idea about KISS on threading. Since I'm already introducing complexity with all the multi-threading, why stress the older 3.x software. We may need to trade-off time for this. My first tactics will be to adjust the maxFieldLength and toggle the configuration to use CommonsHttpSolrServer. I will follow-up with any discoveries. Thanks again, Justin On Wed, May 22, 2013 at 11:46 AM, Shawn Heisey <s...@elyograg.org> wrote: > On 5/22/2013 9:08 AM, Justin Babuscio wrote: > >> We periodically rebuild our Solr index from scratch. We have built a >> custom publisher that horizontally scales to increase write throughput. >> On >> a given rebuild, we will have ~60 JVMs running with 5 threads that are >> actively publishing to all Solr masters. >> >> For each thread, we instantiate one StreamingUpdateSolrServer( >> QueueSize:100, QueueThreadSize: 2 ) for each master = 20 servers/thread. >> > > Looking over all your details, you might want to try first reducing the > maxFieldLength to slightly below Integer.MAX_VALUE. Try setting it to 2 > billion, or even something more modest, in the millions. It's > theoretically possible that the other value might be leading to an overflow > somewhere. I've been looking for evidence of this, nothing's turned up yet. > > There MIGHT be bugs in the Apache Commons libraries that SolrJ uses. The > next thing I would try is upgrading those component jars in your > application's classpath - httpclient, commons-io, commons-codec, etc. > > Upgrading to a newer SolrJ version is also a good idea. Your notes imply > that you are using the default XML request writer in SolrJ. If that's > true, you should be able to use a 4.3 SolrJ even with an older Solr > version, which would give you a server object that's based on > HttpComponents 4.x, where your current objects are based on HttpClient 3.x. > You would need to make adjustments in your source code. If you're not > using the default XML request writer, you can get a similar change by using > SolrJ 3.6.2. > > IMHO you should switch to HttpSolrServer (CommonsHttpSolrServer in SolrJ > 3.5 and earlier). StreamingUpdateSolrServer (and its replacement in 3.6 > and later, named ConcurrentUpdateSolrServer) has one glaring problem - it > never informs the calling application about any errors that it encounters > during indexing. It lies to you, and tells you that everything has > succeeded even when it doesn't. > > The one advantage that SUSS/CUSS has over its Http sibling is that it is > multi-threaded, so it can send updates concurrently. You seem to know > enough about how it works, so I'll just say that you don't need additional > complexity that is not under your control and refuses to throw exceptions > when an error occurs. You already have a large-scale concurrent and > multi-threaded indexing setup, so SolrJ's additional thread handling > doesn't really buy you much. > > Thanks, > Shawn > > -- Justin Babuscio 571-210-0035 http://linchpinsoftware.com