ping on this :) On Tue, Feb 18, 2020 at 11:50 AM Ashwin Ramesh <ash...@canva.com> wrote:
> Hi, > > We are in the process of applying a scoring model to our search results. > In particular, we would like to add scores for documents per query and user > context. > > For example, we want to have a score from 500 to 1 for the top 500 > documents for the query “dog” for users who speak US English. > > We believe it becomes infeasible to store these scores in Solr because we > want to update the scores regularly, and the number of scores increases > rapidly with increased user attributes. > > One solution we explored was to store these scores in a secondary data > store, and use this at Solr query time with a boost function such as: > > `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) … > mul(termfreq(id,'ID-500'),1)` > > We have over a hundred thousand documents in one Solr collection, and > about fifty million in another Solr collection. We have some queries for > which roughly 80% of the results match, although this is an edge case. We > wanted to know the worst case performance, so we tested with such a query. > For both of these collections we found the a message similar to the > following in the Solr cloud logs (tested on a laptop): > > Elapsed time: 5020. Exceeded allowed search time: 5000 ms. > > We then tried using the following boost, which seemed simpler: > > `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)` > > We then saw the following in the Solr cloud logs: > > `The request took too long to iterate over terms.` > > All responses above took over 5000 milliseconds to return. > > We are considering Solr’s re-ranker, but I don’t know how we would use > this without pushing all the query-context-document scores to Solr. > > > The alternative solution that we are currently considering involves > invoking multiple solr queries. > > This means we would make a request to solr to fetch the top N results (id, > score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar, limit=N. > > Another request would be made using a filter query with a set of doc ids > that we know are high value for the user’s query. E.g. q=*:*, > fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N. > > We would then do a reranking phase in our service layer. > > Do you have any suggestions for known patterns of how we can store and > retrieve scores per user context and query? > > Regards, > Ash & Spirit. > -- ** ** <https://www.canva.com/>Empowering the world to design Also, we're hiring. Apply here! <https://about.canva.com/careers/> <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://instagram.com/canva>