Note, {!terms} query is more efficient for long ids list. I'd try to group
ids by boost, and cache long ids lists. Something like:
q=filter({!terms f=id}1,3,5)^=100  filter({!terms f=id}2,4,6)^=-1
Thus, it let to reuse heavy terms lists between queries.
Another idea, extract boost score to the separate core/index (strictly
single shard in SolrCloud so far), and use {!join score=sum} to bring ranks
to the main index. It let to update smaller core faster. Although, it might
require some hack to decouple updates and cache invalidation.
Also, Solr has in-place updates, which might update columns with boosts and
score by this column.

On Tue, Feb 18, 2020 at 9:27 PM Ashwin Ramesh <ash...@canva.com.invalid>
wrote:

> ping on this :)
>
> On Tue, Feb 18, 2020 at 11:50 AM Ashwin Ramesh <ash...@canva.com> wrote:
>
> > Hi,
> >
> > We are in the process of applying a scoring model to our search results.
> > In particular, we would like to add scores for documents per query and
> user
> > context.
> >
> > For example, we want to have a score from 500 to 1 for the top 500
> > documents for the query “dog” for users who speak US English.
> >
> > We believe it becomes infeasible to store these scores in Solr because we
> > want to update the scores regularly, and the number of scores increases
> > rapidly with increased user attributes.
> >
> > One solution we explored was to store these scores in a secondary data
> > store, and use this at Solr query time with a boost function such as:
> >
> > `bf=mul(termfreq(id,’ID-1'),500) mul(termfreq(id,'ID-2'),499) …
> > mul(termfreq(id,'ID-500'),1)`
> >
> > We have over a hundred thousand documents in one Solr collection, and
> > about fifty million in another Solr collection. We have some queries for
> > which roughly 80% of the results match, although this is an edge case. We
> > wanted to know the worst case performance, so we tested with such a
> query.
> > For both of these collections we found the a message similar to the
> > following in the Solr cloud logs (tested on a laptop):
> >
> > Elapsed time: 5020. Exceeded allowed search time: 5000 ms.
> >
> > We then tried using the following boost, which seemed simpler:
> >
> > `boost=if(query($qq), 10, 1)&qq=id:(ID-1 OR ID-2 OR … OR ID-500)`
> >
> > We then saw the following in the Solr cloud logs:
> >
> > `The request took too long to iterate over terms.`
> >
> > All responses above took over 5000 milliseconds to return.
> >
> > We are considering Solr’s re-ranker, but I don’t know how we would use
> > this without pushing all the query-context-document scores to Solr.
> >
> >
> > The alternative solution that we are currently considering involves
> > invoking multiple solr queries.
> >
> > This means we would make a request to solr to fetch the top N results
> (id,
> > score) for the query. E.g. q=dog, fq=featureA:foo, fq=featureB=bar,
> limit=N.
> >
> > Another request would be made using a filter query with a set of doc ids
> > that we know are high value for the user’s query. E.g. q=*:*,
> > fq=featureA:foo, fq=featureB:bar, fq=id:(d1, d2, d3), limit=N.
> >
> > We would then do a reranking phase in our service layer.
> >
> > Do you have any suggestions for known patterns of how we can store and
> > retrieve scores per user context and query?
> >
> > Regards,
> > Ash & Spirit.
> >
>
> --
> **
> ** <https://www.canva.com/>Empowering the world to design
> Also, we're
> hiring. Apply here! <https://about.canva.com/careers/>
>
> <https://twitter.com/canva> <https://facebook.com/canva>
> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
> <https://instagram.com/canva>
>
>
>
>
>
>
>
>
>
>
>

-- 
Sincerely yours
Mikhail Khludnev

Reply via email to