Maybe it would be simplest to use a PatternReplaceCharFilter to eliminate the ".jpg", and then use the StandardTokenizer, or use the white space tokenizer and the Word Delimiter Filter.

-- Jack Krupansky

-----Original Message----- From: RL
Sent: Tuesday, October 30, 2012 3:57 AM
To: solr-user@lucene.apache.org
Subject: Tokenizer question

I could not find a solution to that in the documentation or the mailing list,
so here's my question.

I have files following the pattern: firstname_lastname_employeenumber.jpg

I'm able to search for the single terms firstname or lastname or the
employeenumber using a solr.PatternTokenizerFactory. Where I split at
underscore and dot.

But, now I also want to search for firstname_lastname or
lastname_employeenumber
Which does not work because the underscore was tokenized and is not part of
the indexed token anymore.


Any suggestions how to do that?

Thanks in advance.

RL



--
View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-question-tp4016932.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to