You may be interested in:
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/IGainTermsQParserPlugin.java
This iterates through a training set and scores terms in a text field using
Information Gain. You'll see entropy calculations in the implementation.
I
Does Lucene/Solr include any tools for measuring the entropy/information of a
field? My intuition is that this would only work if the field were a
single-value field and the analysis identified characters rather than tokens.
Also, Unicode does through a wrench in it - I suppose such a thing