subject:"Tokenizing and searching named character entity references"

RE: Tokenizing and searching named character entity references

2008-07-28 Thread Chris Hostetter

: You could extend HTMLStripReader to not decode named character entities, : e.g. by overriding HTMLStripReader.read() so that it calls an : alternative readEntity(), which instead of converting entity references : to characters would just leave the entity references as-is, something : like:

RE: Tokenizing and searching named character entity references

2008-07-28 Thread Steven A Rowe

Hi Frances, HTMLStripWhitespaceTokenizerFactory wraps a WhitespaceTokenizer around an HTMLStripReader. You could extend HTMLStripReader to not decode named character entities, e.g. by overriding HTMLStripReader.read() so that it calls an alternative readEntity(), which instead of converting en

Tokenizing and searching named character entity references

2008-07-24 Thread F Knudson

te but again was not successful. Do I need to create a custom tokenizer? Thanks Frances -- View this message in context: http://www.nabble.com/Tokenizing-and-searching-named-character-entity-references-tp18632403p18632403.html Sent from the Solr - User mailing list archive at Nabble.com.