thongnt99 commented on issue #11799:
URL: https://github.com/apache/lucene/issues/11799#issuecomment-1254119695

   @jtibshirani  The query side is same as document side, which is a dictionary 
of terms and weights. To make it compatible with Lucene, people just repeat the 
terms with its frequency. This is fine because queries are usually much 
shorter. 
   Yes, FeatureField is something similar, but we want a single Field 
containing a list of key-value pairs or a json formatted. 
   @msokolov @rmuir @mocobeta: I fould 
[this](https://github.com/apache/lucene/blob/475fbd0bdde31c6a2ae62c59505cf9e8becd50e4/lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/DelimitedTermFrequencyTokenFilter.java),
 which could somehow achieves what we want;  But I think it is not so flexible, 
we need to turn the json file into a token stream formatted as:  
[<term><delimiter><frequency>......] ...  I think this step is redundant. Can 
we just load the json file directly? For this I think we might have to move 
away from TokenStream pipeline?  
   What do you think? Your thought is very much appreciated as I am not very 
familiar with Lucene. 
   
   We can form a group to do this if you guys are interested in. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to