I haven't used the German analyzer (either Snowball or the one we have in 
Lucene's contrib), but have you checked if that does the trick of keeping words 
together?  Or maybe the compound tokenizer has this option? (check Lucene JIRA, 
not sure now where the compound tokenizer went)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Batzenmann <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, September 30, 2008 7:28:53 AM
> Subject: Howto concatenate tokens at index time (without spaces)
> 
> 
> Hi,
> 
> I'm looking for a way to create a fieldtype which will apart from the
> whitespacedtokenized tokens also store concatenated versions of the tokens.
> 
> The ShingleFilter does s.th. very similar but keeps spaces in between words.
> In german a shoe(Schuh) you wear in your 'spare time'(Freizeit) is actually
> a "Freizeitschuh" and not a "Freizeit Schuh".
> The WorddelimterFactory could be incorporated for this as well if the space
> character could be configured as delimiter-character - but these cant be
> configured at all or am I wrong?
> 
> Synonyms are in my opinion not the solution for this, as this it is imho
> aboslutely not neccessary to persist any data for this     requirement.
> 
> cheers, Axel
> -- 
> View this message in context: 
> http://www.nabble.com/Howto-concatenate-tokens-at-index-time-%28without-spaces%29-tp19740271p19740271.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to