Re: SOLR Learning to Rank Questions

Michael Nilsson Tue, 22 Aug 2017 13:23:39 -0700

Hey Jaoa!

To also address your second question, the purpose of the normalizers is to
ensure that whatever manipulation you did to your feature values offline at
training time (say to minimize floating point precision roundoff) also get
reflected online at query rerank time, since you will be passing those
online values into the model you trained.  If you are inconsistent with the
values you use to train your model offline vs the values you feed the model
online, your model will probably produce garbage results.


Say you wanted to query 2 different collections, and merge their results
together into a single list.  With your approach, there is no way to know
for query "foo" if the first result against Collection A with score 1.0
should be placed above the first result against Collection B with the same
score of 1.0.  If you trained your offline model against the data of both
Collection A and B (as a global model like you mentioned), it would learn
how to rank the documents across collections, and their rerank scores would
then in fact be comparable, allowing you to do a simple merge and sort of
both documents into a single result list.

There are other approaches you can use besides a standard Learning to Rank
global model though to handle targeted per query rankings (click modeling,
intent), personalization (different models per cluster), extremely trendy
topics (online learning, bandits), etc, but this should be able handle most
cases decently without tremendous specialization.

-Mike



On Wed, Aug 9, 2017 at 5:02 AM, Christine Poerschke (BLOOMBERG/ LONDON) <
cpoersc...@bloomberg.net> wrote:

> Hello Joao!
>
> re: your first question, at present there is no direct way to use document
> hits in an external SQL database in a feature. Having said that, Solr has a
> so-called "ExternalFileField" type and using that in combination e.g. with
> the "FieldValueFeature" feature class should work, I think.
>
> In essence you would periodically 'export' document hits from the database
> (and depending on your use case perhaps you might wish to consider only
> document hits that exceed a certain threshold or in other ways filter the
> raw hits data) into an external file for use by Solr.
>
> The Apache Solr Reference Guide has more information at
> http://lucene.apache.org/solr/guide/6_6/working-with-
> external-files-and-processes.html for 6.6 version.
>
> Hope that helps.
>
> Christine
>
> ----- Original Message -----
> From: solr-user@lucene.apache.org
> To: solr-user@lucene.apache.org
> At: 08/03/17 11:15:10
>
> 
> Dear all,
>
> First of all, I would like to thank you guys for the amazing job with SOLR.
> In special, I highly appreciate the learning to rank plugin. It is a
> fantastic work.
>
> I have two
>  
> two questions for the LTR people and I hope this mailing list is the right
> place for that.
>
> *1) This is a direct implementation doubt:*
>
> Let's say that I have the popularity of my documents (document hits) in an
> external SQL database instead of saving it in the index.
>
> Can I use this information as a feature? How?
>
>
> *2) This is slightly more philosophical than a practical question:*
>
> Let's say I would like to normalize the score of my documents, for example,
> with MinMaxNormalizer. If I correctly understood it, I would have to
> calculate the min and the max values for the score seen in the training set
> and upload these values in my model.
> When using the model, MinMaxNormalizer will apply its normalization formula
> for each value retrieved based on the max and the min set in the model.
>
> Although this is a valid approach, I see it as a global approach, not a
> local (per query) one.
> Hope you understand what I am talking about here.
>
> I was expecting to have a MinMaxNormalizer without previously min and max
> set. This would simply apply the min_max formula to all results for
> each query. Thus, when I use this new approach, the first document would
> have score 1.0 and the last document retrieved would have score 0.0.
>
> Would it be better to normalize per query instead of a global
> normalization?
>
>
> Thanks a lot in advance.
> Looking forward to hearing back from you soon.
>
> Best,
> --
> João Palotti
> Website: joaopalotti.com
> Twitter: @joaopalotti <https://twitter.com/joaopalotti>
> Me at Google Scholar
> <https://scholar.google.com/citations?user=ZEoF2A4AAAAJ&hl=en>
>
>

Re: SOLR Learning to Rank Questions

Reply via email to