subject:"HTML decoder is splitting tokens"

Re: HTML decoder is splitting tokens

2009-08-27 Thread Anders Melchiorsen

Hello. Thanks for the hints. Still some trouble, though. I added just the HTMLStripCharFilterFactory because, according to documentation, it should also replace HTML entities. It did, but still left a space after the entity, so I got two tokens from "Günther". That seems like a bug? Adding Mappi

Re: HTML decoder is splitting tokens

2009-08-26 Thread Koji Sekiguchi

Hi Anders, Sorry, I don't know this is a bug or a feature, but I'd like to show an alternate way if you'd like. In Solr trunk, HTMLStripWhitespaceTokenizerFactory is marked as deprecated. Instead, HTMLStripCharFilterFactory and an arbitrary TokenizerFactory are encouraged to use. And I'd recomme

HTML decoder is splitting tokens

2009-08-26 Thread Anders Melchiorsen

Hi. When indexing the string "Günther" with HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens, "Gü" and "nther". Is this a bug, or am I doing something wrong? (Using a Solr nightly from 2009-05-29) Anders.

Re: HTML decoder is splitting tokens

Re: HTML decoder is splitting tokens

HTML decoder is splitting tokens

3 matches

Site Navigation

Mail list logo

Footer information