I would like to use a pairwise ranking model trained with XGBoost<https://github.com/dmlc/xgboost> in Apache Solr (xgboost objective: rank:pairwise). I guess the XGBoost model should generally be handled by the MultipleAdditiveTreesModel<https://lucene.apache.org/solr/7_7_0/solr-ltr/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.html> class from the Solr LTR plugin.
However, when mapping the XGBoost output to the JSON expected by the Solr LTR plugin, it is not clear how to handle the missing condition. Since, XGBoost has a non-trivial logic for routing missing values they cannot just always be send to the left or right branch of a tree. How should this issue be handled? Would it be the right path to extend the MultipleAdditiveTreesModel so that it optionally can handle missing values? Then the input JSON could look as follows. Note that the missing values are routed to the left in one node and to the right in another node: { "class" : "org.apache.solr.ltr.model.MultipleAdditiveTreesModel", "name" : "multipleadditivetreesmodel", "features":[ { "name" : "userTextTitleMatch"}, { "name" : "originalScore"} ], "params" : { "trees" : [ { "weight" : "1", "root": { "feature" : "userTextTitleMatch", "threshold" : "0.5", "missing" :"left" "left" : { "value" : "-100" }, "right" : { "feature" : "originalScore", "threshold" : "10.0", "missing" :"right" "left" : { "value" : "50" }, "right" : { "value" : "75" } } } }, { "weight" : "2", "root" : { "value" : "-10" } } ] } } I will appreciate any hint on how to use XGBoost model parameters in the Solr LTR plugin. Kind regards Georgios