Re: Deep paging in parallel with solr cloud - OutOfMemory

Steve Rowe Mon, 17 Mar 2014 13:49:28 -0700

Mike,

Days.  I plan on making a 4.7.1 release candidate a week from today, and 
assuming nobody finds any problems with the RC, it will be released roughly 
four days thereafter (three days for voting + one day for release propogation 
to the Apache mirrors): i.e., next Friday-ish.


Steve

On Mar 17, 2014, at 4:40 PM, Mike Hugo <m...@piragua.com> wrote:

> Thanks Steve,
> 
> That certainly looks like it could be the culprit.  Any word on a release
> date for 4.7.1?  Days?  Weeks?  Months?
> 
> Mike
> 
> 
> On Mon, Mar 17, 2014 at 3:31 PM, Steve Rowe <sar...@gmail.com> wrote:
> 
>> Hi Mike,
>> 
>> The OOM you're seeing is likely a result of the bug described in (and
>> fixed by a commit under) SOLR-5875: <
>> https://issues.apache.org/jira/browse/SOLR-5875>.
>> 
>> If you can build from source, it would be great if you could confirm the
>> fix addresses the issue you're facing.
>> 
>> This fix will be part of a to-be-released Solr 4.7.1.
>> 
>> Steve
>> 
>> On Mar 17, 2014, at 4:14 PM, Mike Hugo <m...@piragua.com> wrote:
>> 
>>> Hello,
>>> 
>>> We recently upgraded to Solr Cloud 4.7 (went from a single node Solr 4.0
>>> instance to 3 node Solr 4.7 cluster).
>>> 
>>> Part of out application does an automated traversal of all documents that
>>> match a specific query.  It does this by iterating through results by
>>> setting the start and rows parameters, starting with start=0 and
>> rows=1000,
>>> then start=1000, rows=1000, start = 2000, rows=1000, etc etc.
>>> 
>>> We do this in parallel fashion with multiple workers on multiple nodes.
>>> It's easy to chunk up the work to be done by figuring out how many total
>>> results there are and then creating 'chunks' (0-1000, 1000-2000,
>> 2000-3000)
>>> and sending each chunk to a worker in a pool of multi-threaded workers.
>>> 
>>> This worked well for us with a single server.  However upon upgrading to
>>> solr cloud, we've found that this quickly (within the first 4 or 5
>>> requests) causes an OutOfMemory error on the coordinating node that
>>> receives the query.   I don't fully understand what's going on here, but
>> it
>>> looks like the coordinating node receives the query and sends it to the
>>> shard requested.  For example, given:
>>> 
>>> shards=shard3&sort=id+asc&start=4000&q=*:*&rows=1000
>>> 
>>> The coordinating node sends this query to shard3:
>>> 
>>> NOW=1395086719189&shard.url=
>>> 
>> http://shard3_url_goes_here:8080/solr/collection1/&fl=id&sort=id+asc&start=0&q=*:*&distrib=false&wt=javabin&isShard=true&fsv=true&version=2&rows=5000
>>> 
>>> Notice the rows parameter is 5000 (start + rows).  If the coordinator
>> node
>>> is able to process the result set (which works for the first few pages,
>>> after that it will quickly run out of memory), it eventually issues this
>>> request back to shard3:
>>> 
>>> NOW=1395086719189&shard.url=
>>> 
>> http://10.128.215.226:8080/extera-search/gemindex/&start=4000&ids=a..bunch...(1000)..of..doc..ids..go..here&q=*:*&distrib=false&wt=javabin&isShard=true&version=2&rows=1000
>>> 
>>> and then finally returns the response to the client.
>>> 
>>> One possible workaround:  We've found that if we issue non-distributed
>>> requests to specific shards, that we get performance along the same lines
>>> that we did before.  E.g. issue a query with shards=shard3&distrib=false
>>> directly to the url of the shard3 instance, rather than going through the
>>> cloud solr server solrj API.
>>> 
>>> The other workaround is to adapt to use the new new cursorMark
>>> functionality.  I've manually tried a few requests and it is pretty
>>> efficient, and doesn't result in the OOM errors on the coordinating node.
>>> However, i've only done this in single threaded manner.  I'm wondering if
>>> there would be a way to get cursor marks for an entire result set at a
>>> given page interval, so that they could then be fed to the pool of
>> parallel
>>> workers to get the results in parallel rather than single threaded.  Is
>>> there a way to do this so we could process the results in parallel?
>>> 
>>> Any other possible solutions?  Thanks in advance.
>>> 
>>> Mike
>> 
>>

Re: Deep paging in parallel with solr cloud - OutOfMemory

Reply via email to