msokolov commented on PR #15584: URL: https://github.com/apache/lucene/pull/15584#issuecomment-4304936890
OK, I think this is ready. The feature works and is internally consistent. However it might still be a little obscure for users. To use it today you must pass text in a special format and use `DelimitedTermFrequencyTokenFilter`, or else you can do something similar using `FeatureField`. Today `FeatureField` encodes scores as term frequencies using a custom floating point format to work around the same limitations (summing large term frequencies may overflow various counters) that this change is designed to fix. So in the future we could switch `FeatureField` to use `DOCS_AND_CUSTOM_FREQS`; if we did that we would no longer need to encode and decode scores using the custom floating point encoding but could instead use full 32-bit values as @msfroh pointed out, and we could also support int scores that are more like term frequencies. Finally it might be nice to have some sugar enabling the user to index multiple scores at once, perhaps by providing a Map<String, Integer> or Map<String, Float> so that checking for duplicates can be done higher up in the API rather than down in IndexWriter. But I think this can be tackled later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
