Hi, We are in the process of applying a scoring model to our search results. In particular, we would like to add scores for documents per query and user context.
For example, we want to have a score from 500 to 1 for the top 500 documents for the query “dog” for users who speak US English. We believe it becomes infeasible to store these scores in Solr because we want to update the scores regularly, and the number of scores increases rapidly with increased user attributes. One solution we explored was to store these scores in a secondary data store, and use this at Solr query time with a boost function such as: `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) … mul(termfreq(id,'ID-500'),1)` We have over a hundred thousand documents in one Solr collection, and about fifty million in another Solr collection. We have some queries for which roughly 80% of the results match, although this is an edge case. We wanted to know the worst case performance, so we tested with such a query. For both of these collections we found the a message similar to the following in the Solr cloud logs (tested on a laptop): Elapsed time: 5020. Exceeded allowed search time: 5000 ms. We then tried using the following boost, which seemed simpler: `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)` We then saw the following in the Solr cloud logs: `The request took too long to iterate over terms.` All responses above took over 5000 milliseconds to return. We are considering Solr’s re-ranker, but I don’t know how we would use this without pushing all the query-context-document scores to Solr. The alternative solution that we are currently considering involves invoking multiple solr queries. This means we would make a request to solr to fetch the top N results (id, score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar, limit=N. Another request would be made using a filter query with a set of doc ids that we know are high value for the user’s query. E.g. q=*:*, fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N. We would then do a reranking phase in our service layer. Do you have any suggestions for known patterns of how we can store and retrieve scores per user context and query? Regards, Ash & Spirit. -- ** ** <https://www.canva.com/>Empowering the world to design Also, we're hiring. Apply here! <https://about.canva.com/careers/> <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://instagram.com/canva>