On Thu, Nov 20, 2008 at 9:20 AM, Daniel Rosher <[EMAIL PROTECTED]> wrote:
> I'm trying to index some content that has things like 'java/J2EE' but with
> solr.WordDelimiterFilterFactory and parameters [generateWordParts="1"
> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"] this ends up tokenized as
> 'java','j','2',EE'
>
> Does anyone know a way of having this tokenized as 'java','j2ee'.
>
> Perhaps this filter need something like a protected list of tokens not to
> tokenize like EnglishPorterFilter ?

In addition to the other replies, you could use the SynonymFilter to
normalize certain terms before the WDF (assuming you want to keep the
WDF for other things).

Perhaps try the following synonym rules at both index and query time:

j2ee => javatwoee
java/j2ee => java javatwoee

-Yonik

Reply via email to