Shawn,

Thank you!

Just some quick responses:

On your overflow theory, why would this impact the client?  Is is possible
that a write attempt to Solr would block indefinitely while the Solr server
is running wild or in a bad state due to the overflow?


We attempt to set the BinaryRequestWriter but per this bug:
https://issues.apache.org/jira/browse/SOLR-1565, v3.5 uses the default XML
writer.


On upgrading to 3.6.2 or 4.x, we have an organizational challenge that
requires approval of the software/upgrade.  I am promoting/supporting this
idea but cannot execute in the short-term.

For the mass publish, we originally used the CommonsHttpSolrServer (what we
use in live production updates) but we found the trade-off with performance
was quite large.  I really like your idea about KISS on threading.  Since
I'm already introducing complexity with all the multi-threading, why stress
the older 3.x software.  We may need to trade-off time for this.



My first tactics will be to adjust the maxFieldLength and toggle the
configuration to use CommonsHttpSolrServer.  I will follow-up with any
discoveries.

Thanks again,
Justin





On Wed, May 22, 2013 at 11:46 AM, Shawn Heisey <s...@elyograg.org> wrote:

> On 5/22/2013 9:08 AM, Justin Babuscio wrote:
>
>> We periodically rebuild our Solr index from scratch.  We have built a
>> custom publisher that horizontally scales to increase write throughput.
>>  On
>> a given rebuild, we will have ~60 JVMs running with 5 threads that are
>> actively publishing to all Solr masters.
>>
>> For each thread, we instantiate one StreamingUpdateSolrServer(
>> QueueSize:100, QueueThreadSize: 2 ) for each master = 20 servers/thread.
>>
>
> Looking over all your details, you might want to try first reducing the
> maxFieldLength to slightly below Integer.MAX_VALUE.  Try setting it to 2
> billion, or even something more modest, in the millions.  It's
> theoretically possible that the other value might be leading to an overflow
> somewhere.  I've been looking for evidence of this, nothing's turned up yet.
>
> There MIGHT be bugs in the Apache Commons libraries that SolrJ uses. The
> next thing I would try is upgrading those component jars in your
> application's classpath - httpclient, commons-io, commons-codec, etc.
>
> Upgrading to a newer SolrJ version is also a good idea.  Your notes imply
> that you are using the default XML request writer in SolrJ.  If that's
> true, you should be able to use a 4.3 SolrJ even with an older Solr
> version, which would give you a server object that's based on
> HttpComponents 4.x, where your current objects are based on HttpClient 3.x.
>  You would need to make adjustments in your source code.  If you're not
> using the default XML request writer, you can get a similar change by using
> SolrJ 3.6.2.
>
> IMHO you should switch to HttpSolrServer (CommonsHttpSolrServer in SolrJ
> 3.5 and earlier).  StreamingUpdateSolrServer (and its replacement in 3.6
> and later, named ConcurrentUpdateSolrServer) has one glaring problem - it
> never informs the calling application about any errors that it encounters
> during indexing.  It lies to you, and tells you that everything has
> succeeded even when it doesn't.
>
> The one advantage that SUSS/CUSS has over its Http sibling is that it is
> multi-threaded, so it can send updates concurrently.  You seem to know
> enough about how it works, so I'll just say that you don't need additional
> complexity that is not under your control and refuses to throw exceptions
> when an error occurs.  You already have a large-scale concurrent and
> multi-threaded indexing setup, so SolrJ's additional thread handling
> doesn't really buy you much.
>
> Thanks,
> Shawn
>
>


-- 
Justin Babuscio
571-210-0035
http://linchpinsoftware.com

Reply via email to