Does Lucene/Solr include any tools for measuring the entropy/information of a 
field?   My intuition is that this would only work if the field were a 
single-value field and the analysis identified characters rather than tokens.   
 Also, Unicode does through a wrench in it - I suppose such a thing would also 
need to have a set of expected symbols as by entropy I mean against ASCII or 
Latin-1.

Just curious here - I have no problem to solve, and you guys are expert in this 
sort of thing solved in Java, so if there are other libraries or corners of 
OpenNLP that address this, let me know.  I know more at this point about 
tackling this stuff from Python.

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH

Reply via email to