thongnt99 commented on issue #11799: URL: https://github.com/apache/lucene/issues/11799#issuecomment-1254119695
@jtibshirani The query side is same as document side, which is a dictionary of terms and weights. To make it compatible with Lucene, people just repeat the terms with its frequency. This is fine because queries are usually much shorter. Yes, FeatureField is something similar, but we want a single Field containing a list of key-value pairs or a json formatted. @msokolov @rmuir @mocobeta: I fould [this](https://github.com/apache/lucene/blob/475fbd0bdde31c6a2ae62c59505cf9e8becd50e4/lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/DelimitedTermFrequencyTokenFilter.java), which could somehow achieves what we want; But I think it is not so flexible, we need to turn the json file into a token stream formatted as: [<term><delimiter><frequency>......] ... I think this step is redundant. Can we just load the json file directly? For this I think we might have to move away from TokenStream pipeline? What do you think? Your thought is very much appreciated as I am not very familiar with Lucene. We can form a group to do this if you guys are interested in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org