Re: Measuring the entropy of a field

2016-11-15 Thread Joel Bernstein
You may be interested in: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/IGainTermsQParserPlugin.java This iterates through a training set and scores terms in a text field using Information Gain. You'll see entropy calculations in the implementation. I

Measuring the entropy of a field

2016-11-15 Thread Davis, Daniel (NIH/NLM) [C]
Does Lucene/Solr include any tools for measuring the entropy/information of a field? My intuition is that this would only work if the field were a single-value field and the analysis identified characters rather than tokens. Also, Unicode does through a wrench in it - I suppose such a thing