Hi, Michael, Thank for very valuable feedbacks. > You can pass in different params in the > features.json config for each feature, even though they use the same > feature class. I used this idea to extract some features in this paper (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/letor3.pdf) e.g. Table 2 (1-15) features are just <query, doc> term features in various forms.
{ "store" : "MyFeatureStore", "name" : "term_count_1", "class" : "com.apache.solr.ltr.feature.TermCountFeature", "params" : { "field" : "a_text", "terms" : "${user_terms}", "method" : "1" } }, { "store" : "MyFeatureStore", "name" : "term_count_2", "class" : "com.apache.solr.ltr.feature.TermCountFeature", "params" : { "field" : "a_text", "terms" : "${user_terms}", "method" : "2" } }, where method id corresponds to features on Table 2 (1-15). Although those features share the same class, the differences are minor. In product deployment, this overhead may not be an issue. After feature selection, probably only a small number of features are useful. Another use case: use convolution neural network or LSTM to extract embedded feature vector for both query and document, where dimension of the embedded feature vectors should be 50-100. Then we feed those features into learning-to-rank models. > Your performance point about 100 features vs 1 feature is true, > and pull requests to improve the plugin's performance and usability would I will do some performance benchmark for some user cases to justify whether supporting new multi-features for one feature class is worthy. If yes, I will share the results and create pull request. Thanks Jianxiong On 4/18/17, Michael Nilsson <mnilsson2...@gmail.com> wrote: > Hi Jianxiong, > > What you say is true. If you want 100 different feature values extracted, > you need to specify 100 different features in the > features.json config so that there is a direct mapping of features in and > features out. However, you more than likely need > to only implement 1 feature class that you will use for those 100 feature > values. You can pass in different params in the > features.json config for each feature, even though they use the same > feature class. In some cases you might be able to > just have 1 feature output 1 value that changes per document, if you can > collapse those features together. This 2nd option > may or may not work for you depending on your data, what you are trying to > bucket, and what algorithm you are trying to > use because not all algorithms can easily handle this case. To illustrate: > > > *A) Multiple binary features using the same 1 class* > { > "name" : "isProductCheap", > "class" : "org.apache.solr.ltr.feature.SolrFeature", > "params" : { > "fq": [ "price:[0 TO 100]" ] > } > },{ > "name" : "isProductExpensive", > "class" : "org.apache.solr.ltr.feature.SolrFeature", > "params" : { > "fq": [ "price:[101 TO 1000]" ] > } > },{ > "name" : "isProductCrazyExpensive", > "class" : "org.apache.solr.ltr.feature.SolrFeature", > "params" : { > "fq": [ "price:[1001 TO *]" ] > } > } > > > *B) 1 feature that outputs different values (some algorithms don't handle > discrete features well)* > { > "name" : "productPricePoint", > "class" : "org.apache.solr.ltr.feature.MyPricePointFeature", > "params" : { > > // Either hard code price map in MyPricePointFeature.java, or > // pass it in through params for flexible customization, > // and return different values for cheap, expensive, and > crazyExpensive > > } > } > > The 2 options above satisfy most use cases, which is what we were > targeting. > In my specific use case, I opted for option A, > and wrote a simple script that generates the features.json so I wouldn't > have to write 100 similar features by hand. You > also mentioned that you want to extract features sparsely. You can change > the configuration of the Feature Transformer > <http://lucene.apache.org/solr/6_5_0/solr-ltr/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.html> > > to return features that actually triggered in a sparse format > <https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank#LearningToRank-Advancedoptions>. > Your performance point about 100 features vs 1 feature is true, > and pull requests to improve the plugin's performance and usability would > be more than welcome! > > -Michael > > > > On Fri, Apr 14, 2017 at 12:51 PM, Jianxiong Dong <jdongca2...@gmail.com> > wrote: > >> Hi, >> I found that solr learning-to-rank (LTR) supports only ONE feature >> for a given feature extractor. >> >> See interface: >> >> https://github.com/apache/lucene-solr/blob/master/solr/ >> contrib/ltr/src/java/org/apache/solr/ltr/feature/Feature.java >> >> Line (281, 282) (in FeatureScorer) >> @Override >> public abstract float score() throws IOException; >> >> I have a user case: given a <query, doc>, I like to extract multiple >> features (e.g. 100 features. In the current framework, I have to >> define 100 features in feature.json. Also more cost for scored doc >> iterations). >> >> I would like to have an interface: >> >> public abstract Map<String, Float> score() throws IOException; >> >> It helps support sparse vector feature. >> >> Can anybody provide an insight? >> >> Thanks >> >> Jianxiong >> >