Note, {!terms} query is more efficient for long ids list. I'd try to group ids by boost, and cache long ids lists. Something like: q=filter({!terms f=id}1,3,5)^=100 filter({!terms f=id}2,4,6)^=-1 Thus, it let to reuse heavy terms lists between queries. Another idea, extract boost score to the separate core/index (strictly single shard in SolrCloud so far), and use {!join score=sum} to bring ranks to the main index. It let to update smaller core faster. Although, it might require some hack to decouple updates and cache invalidation. Also, Solr has in-place updates, which might update columns with boosts and score by this column.
On Tue, Feb 18, 2020 at 9:27 PM Ashwin Ramesh <ash...@canva.com.invalid> wrote: > ping on this :) > > On Tue, Feb 18, 2020 at 11:50 AM Ashwin Ramesh <ash...@canva.com> wrote: > > > Hi, > > > > We are in the process of applying a scoring model to our search results. > > In particular, we would like to add scores for documents per query and > user > > context. > > > > For example, we want to have a score from 500 to 1 for the top 500 > > documents for the query “dog” for users who speak US English. > > > > We believe it becomes infeasible to store these scores in Solr because we > > want to update the scores regularly, and the number of scores increases > > rapidly with increased user attributes. > > > > One solution we explored was to store these scores in a secondary data > > store, and use this at Solr query time with a boost function such as: > > > > `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) … > > mul(termfreq(id,'ID-500'),1)` > > > > We have over a hundred thousand documents in one Solr collection, and > > about fifty million in another Solr collection. We have some queries for > > which roughly 80% of the results match, although this is an edge case. We > > wanted to know the worst case performance, so we tested with such a > query. > > For both of these collections we found the a message similar to the > > following in the Solr cloud logs (tested on a laptop): > > > > Elapsed time: 5020. Exceeded allowed search time: 5000 ms. > > > > We then tried using the following boost, which seemed simpler: > > > > `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)` > > > > We then saw the following in the Solr cloud logs: > > > > `The request took too long to iterate over terms.` > > > > All responses above took over 5000 milliseconds to return. > > > > We are considering Solr’s re-ranker, but I don’t know how we would use > > this without pushing all the query-context-document scores to Solr. > > > > > > The alternative solution that we are currently considering involves > > invoking multiple solr queries. > > > > This means we would make a request to solr to fetch the top N results > (id, > > score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar, > limit=N. > > > > Another request would be made using a filter query with a set of doc ids > > that we know are high value for the user’s query. E.g. q=*:*, > > fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N. > > > > We would then do a reranking phase in our service layer. > > > > Do you have any suggestions for known patterns of how we can store and > > retrieve scores per user context and query? > > > > Regards, > > Ash & Spirit. > > > > -- > ** > ** <https://www.canva.com/>Empowering the world to design > Also, we're > hiring. Apply here! <https://about.canva.com/careers/> > > <https://twitter.com/canva> <https://facebook.com/canva> > <https://au.linkedin.com/company/canva> <https://twitter.com/canva> > <https://facebook.com/canva> <https://au.linkedin.com/company/canva> > <https://instagram.com/canva> > > > > > > > > > > > -- Sincerely yours Mikhail Khludnev