Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Mike Hugo
Greg and I are talking about the same type of parallel. We do the same thing - if I know there are 10,000 results, we can chunk that up across multiple worker threads up front without having to page through the results. We know there are 10 chunks of 1,000, so we can have one thread process 0-100

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Greg Pendlebury
Sorry, I meant one thread requesting records 1 - 1000, whilst the next thread requests 1001 - 2000 from the same ordered result set. We've observed several of our customers trying to harvest our data with multi-threaded scripts that work like this. I thought it would not work using cursor marks...

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Yonik Seeley
On Mon, Mar 17, 2014 at 7:14 PM, Greg Pendlebury wrote: > My suspicion is that it won't work in parallel Deep paging with cursorMark does work with distributed search (assuming that's what you meant by "parallel"... querying sub-shards in parallel?). -Yonik http://heliosearch.org - solve Solr GC

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Greg Pendlebury
My suspicion is that it won't work in parallel, but we've only just asked the ops team to start our upgrade to look into it, so I don't have a server yet to test. The bug identified in SOLR-5875 has put them off though :( If things pan out as I think they will I suspect we are going to end up with

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Mike Hugo
Cursor mark definitely seems like the way to go. If I can get it to work in parallel then that's additional bonus On Mon, Mar 17, 2014 at 5:41 PM, Greg Pendlebury wrote: > Shouldn't all deep pagination against a cluster use the new cursor mark > feature instead of 'start' and 'rows'? > > 4 or 5

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Greg Pendlebury
Shouldn't all deep pagination against a cluster use the new cursor mark feature instead of 'start' and 'rows'? 4 or 5 requests still seems a very low limit to be running into an OOM issues though, so perhaps it is both issues combined? Ta, Greg On 18 March 2014 07:49, Mike Hugo wrote: > Than

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Mike Hugo
Thanks! On Mon, Mar 17, 2014 at 3:47 PM, Steve Rowe wrote: > Mike, > > Days. I plan on making a 4.7.1 release candidate a week from today, and > assuming nobody finds any problems with the RC, it will be released roughly > four days thereafter (three days for voting + one day for release > pro

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Steve Rowe
Mike, Days. I plan on making a 4.7.1 release candidate a week from today, and assuming nobody finds any problems with the RC, it will be released roughly four days thereafter (three days for voting + one day for release propogation to the Apache mirrors): i.e., next Friday-ish. Steve On Mar

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Mike Hugo
Thanks Steve, That certainly looks like it could be the culprit. Any word on a release date for 4.7.1? Days? Weeks? Months? Mike On Mon, Mar 17, 2014 at 3:31 PM, Steve Rowe wrote: > Hi Mike, > > The OOM you're seeing is likely a result of the bug described in (and > fixed by a commit unde

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Steve Rowe
Hi Mike, The OOM you’re seeing is likely a result of the bug described in (and fixed by a commit under) SOLR-5875: . If you can build from source, it would be great if you could confirm the fix addresses the issue you’re facing. This fix will be

Re: Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Mike Hugo
I should add each node has 16G of ram, 8GB of which is allocated to the JVM. Each node has about 200k docs and happily uses only about 3 or 4gb of ram during normal operation. It's only during this deep pagination that we have seen OOM errors. On Mon, Mar 17, 2014 at 3:14 PM, Mike Hugo wrote:

Deep paging in parallel with solr cloud - OutOfMemory

2014-03-17 Thread Mike Hugo
Hello, We recently upgraded to Solr Cloud 4.7 (went from a single node Solr 4.0 instance to 3 node Solr 4.7 cluster). Part of out application does an automated traversal of all documents that match a specific query. It does this by iterating through results by setting the start and rows paramete