Hi, Michael,
Thank for very valuable feedbacks.
> You can pass in different params in the
> features.json config for each feature, even though they use the same
> feature class.
I used this idea to extract some features in this paper
(https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/letor3.pdf)
e.g.
Table 2 (1-15) features are just <query, doc> term features in various forms.
{
"store" : "MyFeatureStore",
"name" : "term_count_1",
"class" : "com.apache.solr.ltr.feature.TermCountFeature",
"params" : {
"field" : "a_text",
"terms" : "${user_terms}",
"method" : "1"
}
},
{
"store" : "MyFeatureStore",
"name" : "term_count_2",
"class" : "com.apache.solr.ltr.feature.TermCountFeature",
"params" : {
"field" : "a_text",
"terms" : "${user_terms}",
"method" : "2"
}
},
where method id corresponds to features on Table 2 (1-15). Although
those features share the same class, the differences are minor. In
product deployment, this overhead may not be an issue. After feature
selection, probably only a small number of features are useful.
Another use case:
use convolution neural network or LSTM to extract embedded feature
vector for both query and document, where dimension of the embedded
feature vectors should be 50-100. Then we feed those features into
learning-to-rank models.
> Your performance point about 100 features vs 1 feature is true,
> and pull requests to improve the plugin's performance and usability would
I will do some performance benchmark for some user cases to justify
whether supporting new multi-features for one feature class is worthy.
If yes, I will share the results and create pull request.
Thanks
Jianxiong
On 4/18/17, Michael Nilsson <[email protected]> wrote:
> Hi Jianxiong,
>
> What you say is true. If you want 100 different feature values extracted,
> you need to specify 100 different features in the
> features.json config so that there is a direct mapping of features in and
> features out. However, you more than likely need
> to only implement 1 feature class that you will use for those 100 feature
> values. You can pass in different params in the
> features.json config for each feature, even though they use the same
> feature class. In some cases you might be able to
> just have 1 feature output 1 value that changes per document, if you can
> collapse those features together. This 2nd option
> may or may not work for you depending on your data, what you are trying to
> bucket, and what algorithm you are trying to
> use because not all algorithms can easily handle this case. To illustrate:
>
>
> *A) Multiple binary features using the same 1 class*
> {
> "name" : "isProductCheap",
> "class" : "org.apache.solr.ltr.feature.SolrFeature",
> "params" : {
> "fq": [ "price:[0 TO 100]" ]
> }
> },{
> "name" : "isProductExpensive",
> "class" : "org.apache.solr.ltr.feature.SolrFeature",
> "params" : {
> "fq": [ "price:[101 TO 1000]" ]
> }
> },{
> "name" : "isProductCrazyExpensive",
> "class" : "org.apache.solr.ltr.feature.SolrFeature",
> "params" : {
> "fq": [ "price:[1001 TO *]" ]
> }
> }
>
>
> *B) 1 feature that outputs different values (some algorithms don't handle
> discrete features well)*
> {
> "name" : "productPricePoint",
> "class" : "org.apache.solr.ltr.feature.MyPricePointFeature",
> "params" : {
>
> // Either hard code price map in MyPricePointFeature.java, or
> // pass it in through params for flexible customization,
> // and return different values for cheap, expensive, and
> crazyExpensive
>
> }
> }
>
> The 2 options above satisfy most use cases, which is what we were
> targeting.
> In my specific use case, I opted for option A,
> and wrote a simple script that generates the features.json so I wouldn't
> have to write 100 similar features by hand. You
> also mentioned that you want to extract features sparsely. You can change
> the configuration of the Feature Transformer
> <http://lucene.apache.org/solr/6_5_0/solr-ltr/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.html>
>
> to return features that actually triggered in a sparse format
> <https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank#LearningToRank-Advancedoptions>.
> Your performance point about 100 features vs 1 feature is true,
> and pull requests to improve the plugin's performance and usability would
> be more than welcome!
>
> -Michael
>
>
>
> On Fri, Apr 14, 2017 at 12:51 PM, Jianxiong Dong <[email protected]>
> wrote:
>
>> Hi,
>> I found that solr learning-to-rank (LTR) supports only ONE feature
>> for a given feature extractor.
>>
>> See interface:
>>
>> https://github.com/apache/lucene-solr/blob/master/solr/
>> contrib/ltr/src/java/org/apache/solr/ltr/feature/Feature.java
>>
>> Line (281, 282) (in FeatureScorer)
>> @Override
>> public abstract float score() throws IOException;
>>
>> I have a user case: given a <query, doc>, I like to extract multiple
>> features (e.g. 100 features. In the current framework, I have to
>> define 100 features in feature.json. Also more cost for scored doc
>> iterations).
>>
>> I would like to have an interface:
>>
>> public abstract Map<String, Float> score() throws IOException;
>>
>> It helps support sparse vector feature.
>>
>> Can anybody provide an insight?
>>
>> Thanks
>>
>> Jianxiong
>>
>