msokolov commented on PR #15584:
URL: https://github.com/apache/lucene/pull/15584#issuecomment-4304936890

   OK, I think this is ready.  The feature works and is internally consistent. 
   
   However it might still be a little obscure for users.  To use it today you 
must pass text in a special format and use `DelimitedTermFrequencyTokenFilter`, 
or else you can do something similar using `FeatureField`.  Today 
`FeatureField` encodes scores as term frequencies using a custom floating point 
format to work around the same limitations (summing large term frequencies may 
overflow various counters) that this change is designed to fix.  So in the 
future we could switch `FeatureField` to use `DOCS_AND_CUSTOM_FREQS`; if we did 
that we would no longer need to encode and decode scores using the custom 
floating point encoding but could instead use full 32-bit values as @msfroh 
pointed out, and we could also support int scores that are more like term 
frequencies.  Finally it might be nice to have some sugar enabling the user to 
index multiple scores at once, perhaps by providing a Map<String, Integer> or 
Map<String, Float> so that checking for duplicates can be done higher up in the
  API rather than down in IndexWriter. But I think this can be tackled later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to