ping on this :)

On Tue, Feb 18, 2020 at 11:50 AM Ashwin Ramesh <ash...@canva.com> wrote:

> Hi,
>
> We are in the process of applying a scoring model to our search results.
> In particular, we would like to add scores for documents per query and user
> context.
>
> For example, we want to have a score from 500 to 1 for the top 500
> documents for the query “dog” for users who speak US English.
>
> We believe it becomes infeasible to store these scores in Solr because we
> want to update the scores regularly, and the number of scores increases
> rapidly with increased user attributes.
>
> One solution we explored was to store these scores in a secondary data
> store, and use this at Solr query time with a boost function such as:
>
> `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
> mul(termfreq(id,'ID-500'),1)`
>
> We have over a hundred thousand documents in one Solr collection, and
> about fifty million in another Solr collection. We have some queries for
> which roughly 80% of the results match, although this is an edge case. We
> wanted to know the worst case performance, so we tested with such a query.
> For both of these collections we found the a message similar to the
> following in the Solr cloud logs (tested on a laptop):
>
> Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
>
> We then tried using the following boost, which seemed simpler:
>
> `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)`
>
> We then saw the following in the Solr cloud logs:
>
> `The request took too long to iterate over terms.`
>
> All responses above took over 5000 milliseconds to return.
>
> We are considering Solr’s re-ranker, but I don’t know how we would use
> this without pushing all the query-context-document scores to Solr.
>
>
> The alternative solution that we are currently considering involves
> invoking multiple solr queries.
>
> This means we would make a request to solr to fetch the top N results (id,
> score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar, limit=N.
>
> Another request would be made using a filter query with a set of doc ids
> that we know are high value for the user’s query. E.g. q=*:*,
> fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
>
> We would then do a reranking phase in our service layer.
>
> Do you have any suggestions for known patterns of how we can store and
> retrieve scores per user context and query?
>
> Regards,
> Ash & Spirit.
>

-- 
**
** <https://www.canva.com/>Empowering the world to design
Also, we're 
hiring. Apply here! <https://about.canva.com/careers/>
 
<https://twitter.com/canva> <https://facebook.com/canva> 
<https://au.linkedin.com/company/canva> <https://twitter.com/canva>  
<https://facebook.com/canva>  <https://au.linkedin.com/company/canva>  
<https://instagram.com/canva>










Reply via email to