: You could extend HTMLStripReader to not decode named character entities,
: e.g. by overriding HTMLStripReader.read() so that it calls an
: alternative readEntity(), which instead of converting entity references
: to characters would just leave the entity references as-is, something
: like:
Hi Frances,
HTMLStripWhitespaceTokenizerFactory wraps a WhitespaceTokenizer around an
HTMLStripReader.
You could extend HTMLStripReader to not decode named character entities, e.g.
by overriding HTMLStripReader.read() so that it calls an alternative
readEntity(), which instead of converting en
te but
again was not successful.
Do I need to create a custom tokenizer?
Thanks
Frances
--
View this message in context:
http://www.nabble.com/Tokenizing-and-searching-named-character-entity-references-tp18632403p18632403.html
Sent from the Solr - User mailing list archive at Nabble.com.