Re:SOLR Learning to Rank Questions

Christine Poerschke (BLOOMBERG/ LONDON) Wed, 09 Aug 2017 02:02:47 -0700

Hello Joao!

re: your first question, at present there is no direct way to use document hits 
in an external SQL database in a feature. Having said that, Solr has a 
so-called "ExternalFileField" type and using that in combination e.g. with the 
"FieldValueFeature" feature class should work, I think.


In essence you would periodically 'export' document hits from the database (and 
depending on your use case perhaps you might wish to consider only document 
hits that exceed a certain threshold or in other ways filter the raw hits data) 
into an external file for use by Solr.

The Apache Solr Reference Guide has more information at 
http://lucene.apache.org/solr/guide/6_6/working-with-external-files-and-processes.html
 for 6.6 version.

Hope that helps.

Christine

----- Original Message -----
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 08/03/17 11:15:10


Dear all,

First of all, I would like to thank you guys for the amazing job with SOLR.
In special, I highly appreciate the learning to rank plugin. It is a
fantastic work.

I have two
 
two questions for the LTR people and I hope this mailing list is the right
place for that.

*1) This is a direct implementation doubt:*

Let's say that I have the popularity of my documents (document hits) in an
external SQL database instead of saving it in the index.

Can I use this information as a feature? How?


*2) This is slightly more philosophical than a practical question:*

Let's say I would like to normalize the score of my documents, for example,
with MinMaxNormalizer. If I correctly understood it, I would have to
calculate the min and the max values for the score seen in the training set
and upload these values in my model.
When using the model, MinMaxNormalizer will apply its normalization formula
for each value retrieved based on the max and the min set in the model.

Although this is a valid approach, I see it as a global approach, not a
local (per query) one.
Hope you understand what I am talking about here.

I was expecting to have a MinMaxNormalizer without previously min and max
set. This would simply apply the min_max formula to all results for
each query. Thus, when I use this new approach, the first document would
have score 1.0 and the last document retrieved would have score 0.0.

Would it be better to normalize per query instead of a global normalization?


Thanks a lot in advance.
Looking forward to hearing back from you soon.

Best,
--
João Palotti
Website: joaopalotti.com
Twitter: @joaopalotti <https://twitter.com/joaopalotti>
Me at Google Scholar
<https://scholar.google.com/citations?user=ZEoF2A4AAAAJ&hl=en>

Re:SOLR Learning to Rank Questions

Reply via email to