Hello.
Thanks for the hints. Still some trouble, though.
I added just the HTMLStripCharFilterFactory because, according to
documentation, it should also replace HTML entities. It did, but still
left a space after the entity, so I got two tokens from "Günther".
That seems like a bug?
Adding Mappi
Hi Anders,
Sorry, I don't know this is a bug or a feature, but
I'd like to show an alternate way if you'd like.
In Solr trunk, HTMLStripWhitespaceTokenizerFactory is
marked as deprecated. Instead, HTMLStripCharFilterFactory and
an arbitrary TokenizerFactory are encouraged to use.
And I'd recomme
Hi.
When indexing the string "Günther" with
HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens,
"Gü" and "nther".
Is this a bug, or am I doing something wrong?
(Using a Solr nightly from 2009-05-29)
Anders.