I was observing a high degradation in performance when adding more features to my solr LTR model even if the model complexity (no of trees, depth of tree) remains same. I am using the MultipleAdditiveTreesModel model
Moreover, if model complexity increases keeping no of features constant, performance degrades only slightly. This seemed odd as model complexity should have been much more performance heavy than just looking up features, so I looked at LTR code to understand cause. This is my findings in solr 7.7 Use case: - The features to my model are very dynamic and request dependent. - The features are mainly scoring features rather than filter/boolean features Findings - The assumption was that features are computed only for top N docs which need to be reranked by LTR - The problem starts in the LTRRescorer.scoreFeatures. - This ends up calling SolrIndexSearcher.getProcessedFilter() for each top doc to be reranked and for each feature required. - Each feature is an individual query to SolrIndexSearcher.getProcessedFilter(). And each query is looked up / inserted into filter cache in getPositiveDocSet(). - The bulk of the cost (>90%) of LTRRescorer.scoreFeatures() is in DefaultBulkScorer.scoreAll() method which actually creates the doc set for these queries. - This ends up collecting all docs for few features which are scoring features rather than filtering features - Because features are dynamic, there is actually very little reuse of the filter cache except for the ongoing request thus the doc bit set collection happens almost every request - We probably need to change SolrFeature.scorer() to - only operate on doc required to be scored - utilise a cache where applicable for features which can be reused across requests Please let me know if this seems appropriate and valid and will file a JIRA request