Hi Erick, Many thanks for your detailed reply. It's really good information for us to know, and although not exactly what we wanted to hear (that /export wasn't designed to handle ranking), it's much better for us to definitively know one way or the other -- and this allows us to move forward. We'll experiment by going the cursorMark route. I'm hoping that the bottleneck then isn't Solr, but rather the fetching and writing of the full records (we use Solr as just a search engine, which gives us IDs of records of interest; and we use a separate key-value store to get the actual record data). Anyway, we'll see and fingers crossed :).
Best wishes, Edd On Tue, 1 Oct 2019 at 17:15, Erick Erickson <erickerick...@gmail.com> wrote: > First, thanks for taking the time to ask a question with enough supporting > details that I can hope to be able to answer in one exchange ;). It’s a > pleasure to see. > > Second, NP with asking on Stack Overflow, they have some excellent answers > there. But you’re right, this list gets more Solr-centered eyeballs. > > On to your question. I think the best answer was that “/export wasn’t > designed to deal with scores”, which you’ll find disappointing. > > You could use the Streaming “search” expression (using qt=/select or just > leave qt out) but that’ll sort all of the docs you’re exporting into a huge > list, which may perform worse than CursorMark even if it doesn’t blow up > memory. > > The root of this problem is that export can sort in batches since the > values it’s sorting on are contained in each document, so it can iterate in > batches, send them out, then iterate again on the remaining documents. > > Score, since it’s dynamic, can’t do that. Solr has to score _all_ the docs > to know where a doc lands in the final set relative to any other doc, so if > it were going to work it’d have to have enough memory to hold the scores of > all the docs in an ordered list, which is very expensive. Conceptually this > is an ordered list up to maxDoc long. Not only does there have to be enough > memory to hold the entire list, every doc has to be inserted individually > which can kill performance. This is the “deep paging” problem. > > In the usual case of returning, say, 20 docs, the sorted list only has to > be 20 long, higher scoring docs evict lower scoring docs. > > So I think CursorMark is your best bet. > > Best, > Erick > > > On Oct 1, 2019, at 3:59 AM, Edward Turner <eddtur...@gmail.com> wrote: > > > > Hi all, > > > > As far as I understand, SolrCloud currently does not allow the use of > > sorting by the pseudofield, score in the /export request handler (i.e., > get > > the results in relevancy order). If we do attempt this, we get an > > exception, "org.apache.solr.search.SyntaxError: Scoring is not currently > > supported with xsort". We could use Solr's cursorMark, but this takes a > > very long time ... > > > > Exporting results does work, however, when exporting result sets by a > > specific document field that has docValues set to true. > > > > Question: > > Does anyone know if/when it will be possible to sort by score in the > > /export handler? > > > > Research on the problem: > > We've seen https://issues.apache.org/jira/browse/SOLR-5244 and > > https://issues.apache.org/jira/browse/SOLR-8664, which are related to > this > > issue, but don't fix it. Maybe I've missed a more relevant issue? > > > > Our use-case We are using Solrcloud in our team and it's added a huge > > amount of value to our users. > > > > We show a table of search results ordered by score (relevancy) that was > > obtained from sending a query to the standard /select handler. We're > > working in the life-sciences domain and it is common for our result sets > to > > contain many millions of results (unfortunately). After users browse > their > > results, they then may want to download the results that they see, to do > > some post-processing. However, to do this, such that the results appear > in > > the order that the user originally saw them, we'd need to be able to > export > > results based on score/relevancy. > > > > Any suggestions or advice on this would be greatly appreciated! > > > > Many thanks! > > > > Edd > > > > PS. apologies for posting also on Stackoverflow ( > > > https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score > ) > > -- > > I only discovered the Solr mailing-list afterwards and thought it > probably > > better to reach out directly to Solr's people (I can share any answer > from > > this forum on there retrospectively). > >