Thanks everybody. This is lot of good information. And we should try to update this in the documentation too to help users make the right choice. I can take a stab at this if someone can point me how to update the documentation.
Thanks SG On Tue, Mar 13, 2018 at 2:04 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : > 3) Lastly, it is not clear the role of export handler. It seems that > the > : > export handler would also have to do exactly the same kind of thing as > : > start=0 and rows=1000,000. And that again means bad performance. > > : <3> First, streaming requests can only return docValues="true" > : fields.Second, most streaming operations require sorting on something > : besides score. Within those constraints, streaming will be _much_ > : faster and more efficient than cursorMark. Without tuning I saw 200K > : rows/second returned for streaming, the bottleneck will be the speed > : that the client can read from the network. First of all you only > : execute one query rather than one query per N rows. Second, in the > : cursorMark case, to return a document you and assuming that any field > : you return is docValues=false > > Just to clarify, there is big difference between the /export handler > and "streaming expressions" > > Unless something has changed drasticly in the past few releases, the > /export handler does *NOT* support exporting a full *collection* in solr > cloud -- it only operates on an individual core (aka: shard/replica). > > Streaming expressions is a feature that does work in Cloud mode, and can > make calls to the /export handler on a replica of each shard in order to > process the data of an entire collection -- but when doing so it has to > aggregate the *ALL* the results from every shard in memory on the > coordinating node -- meaning that (in addition to the docvalues caveat) > streaming expressions requires you to "spend" a lot of ram usage on one > node as a trade off for spending more time & multiple requests to get teh > same data from cursorMark... > > https://lucene.apache.org/solr/guide/exporting-result-sets.html > https://lucene.apache.org/solr/guide/streaming-expressions.html > > An additional perk of cursorMakr that may be relevant to the OP is that > you can "stop" tailing a cursor at anytime (ie: if you're post processing > the results client side and decide you have "enough" results) but a simila > feature isn't available (AFAICT) from streaming expressions... > > https://lucene.apache.org/solr/guide/pagination-of- > results.html#tailing-a-cursor > > > -Hoss > http://www.lucidworks.com/ >