On Thu, Nov 20, 2008 at 9:20 AM, Daniel Rosher <[EMAIL PROTECTED]> wrote: > I'm trying to index some content that has things like 'java/J2EE' but with > solr.WordDelimiterFilterFactory and parameters [generateWordParts="1" > generateNumberParts="0" catenateWords="0" catenateNumbers="0" > catenateAll="0" splitOnCaseChange="0"] this ends up tokenized as > 'java','j','2',EE' > > Does anyone know a way of having this tokenized as 'java','j2ee'. > > Perhaps this filter need something like a protected list of tokens not to > tokenize like EnglishPorterFilter ?
In addition to the other replies, you could use the SynonymFilter to normalize certain terms before the WDF (assuming you want to keep the WDF for other things). Perhaps try the following synonym rules at both index and query time: j2ee => javatwoee java/j2ee => java javatwoee -Yonik