: You could extend HTMLStripReader to not decode named character entities,
: e.g. by overriding HTMLStripReader.read() so that it calls an
: alternative readEntity(), which instead of converting entity references
: to characters would just leave the entity references as-is, something
: like:
Hi Frances,
HTMLStripWhitespaceTokenizerFactory wraps a WhitespaceTokenizer around an
HTMLStripReader.
You could extend HTMLStripReader to not decode named character entities, e.g.
by overriding HTMLStripReader.read() so that it calls an alternative
readEntity(), which instead of converting en