Maybe you can sort later using Spark or similar. For that you don’t need a full blown cluster - it runs also on localhost.
> Am 03.10.2019 um 09:49 schrieb Edward Turner <eddtur...@gmail.com>: > > Hi Erick, > > Many thanks for your detailed reply. It's really good information for us to > know, and although not exactly what we wanted to hear (that /export wasn't > designed to handle ranking), it's much better for us to definitively know > one way or the other -- and this allows us to move forward. We'll > experiment by going the cursorMark route. I'm hoping that the bottleneck > then isn't Solr, but rather the fetching and writing of the full records > (we use Solr as just a search engine, which gives us IDs of records of > interest; and we use a separate key-value store to get the actual record > data). Anyway, we'll see and fingers crossed :). > > Best wishes, > > Edd > > > >> On Tue, 1 Oct 2019 at 17:15, Erick Erickson <erickerick...@gmail.com> wrote: >> >> First, thanks for taking the time to ask a question with enough supporting >> details that I can hope to be able to answer in one exchange ;). It’s a >> pleasure to see. >> >> Second, NP with asking on Stack Overflow, they have some excellent answers >> there. But you’re right, this list gets more Solr-centered eyeballs. >> >> On to your question. I think the best answer was that “/export wasn’t >> designed to deal with scores”, which you’ll find disappointing. >> >> You could use the Streaming “search” expression (using qt=/select or just >> leave qt out) but that’ll sort all of the docs you’re exporting into a huge >> list, which may perform worse than CursorMark even if it doesn’t blow up >> memory. >> >> The root of this problem is that export can sort in batches since the >> values it’s sorting on are contained in each document, so it can iterate in >> batches, send them out, then iterate again on the remaining documents. >> >> Score, since it’s dynamic, can’t do that. Solr has to score _all_ the docs >> to know where a doc lands in the final set relative to any other doc, so if >> it were going to work it’d have to have enough memory to hold the scores of >> all the docs in an ordered list, which is very expensive. Conceptually this >> is an ordered list up to maxDoc long. Not only does there have to be enough >> memory to hold the entire list, every doc has to be inserted individually >> which can kill performance. This is the “deep paging” problem. >> >> In the usual case of returning, say, 20 docs, the sorted list only has to >> be 20 long, higher scoring docs evict lower scoring docs. >> >> So I think CursorMark is your best bet. >> >> Best, >> Erick >> >>>> On Oct 1, 2019, at 3:59 AM, Edward Turner <eddtur...@gmail.com> wrote: >>> >>> Hi all, >>> >>> As far as I understand, SolrCloud currently does not allow the use of >>> sorting by the pseudofield, score in the /export request handler (i.e., >> get >>> the results in relevancy order). If we do attempt this, we get an >>> exception, "org.apache.solr.search.SyntaxError: Scoring is not currently >>> supported with xsort". We could use Solr's cursorMark, but this takes a >>> very long time ... >>> >>> Exporting results does work, however, when exporting result sets by a >>> specific document field that has docValues set to true. >>> >>> Question: >>> Does anyone know if/when it will be possible to sort by score in the >>> /export handler? >>> >>> Research on the problem: >>> We've seen https://issues.apache.org/jira/browse/SOLR-5244 and >>> https://issues.apache.org/jira/browse/SOLR-8664, which are related to >> this >>> issue, but don't fix it. Maybe I've missed a more relevant issue? >>> >>> Our use-case We are using Solrcloud in our team and it's added a huge >>> amount of value to our users. >>> >>> We show a table of search results ordered by score (relevancy) that was >>> obtained from sending a query to the standard /select handler. We're >>> working in the life-sciences domain and it is common for our result sets >> to >>> contain many millions of results (unfortunately). After users browse >> their >>> results, they then may want to download the results that they see, to do >>> some post-processing. However, to do this, such that the results appear >> in >>> the order that the user originally saw them, we'd need to be able to >> export >>> results based on score/relevancy. >>> >>> Any suggestions or advice on this would be greatly appreciated! >>> >>> Many thanks! >>> >>> Edd >>> >>> PS. apologies for posting also on Stackoverflow ( >>> >> https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score >> ) >>> -- >>> I only discovered the Solr mailing-list afterwards and thought it >> probably >>> better to reach out directly to Solr's people (I can share any answer >> from >>> this forum on there retrospectively). >> >>