I would like to use a pairwise ranking model trained with 
XGBoost<https://github.com/dmlc/xgboost> in Apache Solr (xgboost objective: 
rank:pairwise). I guess the XGBoost model should generally be handled by the 
MultipleAdditiveTreesModel<https://lucene.apache.org/solr/7_7_0/solr-ltr/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.html>
 class from the Solr LTR plugin.

However, when mapping the XGBoost output to the JSON expected by the Solr LTR 
plugin, it is not clear how to handle the missing condition. Since, XGBoost has 
a non-trivial logic for routing missing values they cannot just always be send 
to the left or right branch of a tree.

How should this issue be handled? Would it be the right path to extend the 
MultipleAdditiveTreesModel so that it optionally can handle missing values? 
Then the input JSON could look as follows. Note that the missing values are 
routed to the left in one node and to the right in another node:

{

   "class" : "org.apache.solr.ltr.model.MultipleAdditiveTreesModel",

   "name" : "multipleadditivetreesmodel",

   "features":[

       { "name" : "userTextTitleMatch"},

       { "name" : "originalScore"}

   ],

   "params" : {

       "trees" : [

           {

               "weight" : "1",

               "root": {

                   "feature" : "userTextTitleMatch",

                   "threshold" : "0.5",

                   "missing" :"left"

                   "left" : {

                       "value" : "-100"

                   },

                   "right" : {

                       "feature" : "originalScore",

                       "threshold" : "10.0",

                       "missing" :"right"

                       "left" : {

                           "value" : "50"

                       },

                       "right" : {

                           "value" : "75"

                       }

                   }

               }

           },

           {

               "weight" : "2",

               "root" : {

                   "value" : "-10"

               }

           }

       ]

   }

}

I will appreciate any hint on how to use XGBoost model parameters in the Solr 
LTR plugin.
Kind regards

Georgios

Reply via email to