Solr LTR Performance Issues

krishan goyal Mon, 21 Sep 2020 04:48:46 -0700

I was observing a high degradation in performance when adding more features
to my solr LTR model even if the model complexity (no of trees, depth of
tree) remains same. I am using the MultipleAdditiveTreesModel model


Moreover, if model complexity increases keeping no of features constant,
performance degrades only slightly.

This seemed odd as model complexity should have been much more performance
heavy than just looking up features, so I looked at LTR code to understand
cause. This is my findings in solr 7.7

Use case:

   - The features to my model are very dynamic and request dependent.
   - The features are mainly scoring features rather than filter/boolean
   features


Findings

   - The assumption was that features are computed only for top N docs
   which need to be reranked by LTR
   - The problem starts in the LTRRescorer.scoreFeatures.
      - This ends up calling SolrIndexSearcher.getProcessedFilter() for
      each top doc to be reranked and for each feature required.
      - Each feature is an individual query
      to SolrIndexSearcher.getProcessedFilter(). And each query is looked up /
      inserted into filter cache in getPositiveDocSet().
      - The bulk of the cost (>90%) of LTRRescorer.scoreFeatures() is in
      DefaultBulkScorer.scoreAll() method which actually creates the
doc set for
      these queries.
      - This ends up collecting all docs for few features which are scoring
      features rather than filtering features
      - Because features are dynamic, there is actually very little reuse
      of the filter cache except for the ongoing request thus the doc bit set
      collection happens almost every request
   - We probably need to change SolrFeature.scorer() to
      - only operate on doc required to be scored
      - utilise a cache where applicable for features which can be reused
      across requests

Please let me know if this seems appropriate and valid and will file a JIRA
request

Solr LTR Performance Issues

Reply via email to