I think the answer to my question is contained in the wiki when discussing the SynonymFilter, "The Lucene QueryParser tokenizes on white space before giving any text to the Analyzer". This would indeed explain what I am getting. Next question - can I avoid that behavior?
j philoon wrote: > > I have defined a comma-delimited pattern tokenizer as follows: > <fieldType name="text_comma" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.PatternTokenizerFactory" pattern=","/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > <field name="commafld" type="text_comma" indexed="true" stored="true"/> > > This appears to work fine when adding documents, since if I add a field > commafld as "word1,WORD2,word 3" I see terms in the index as expected: > "word1", "word2", and "word 3". > > When I query, I am expecting that the same tokenization would take place, > so a query that has 'commafld:(word 3)' would match term "word 3". > However, I find I have to submit the query as 'commafld:("word 3")'. That > is, it seems as if whitespace tokenization is taking place, not the > comma-delimited tokenization. > > Am I misunderstanding what should be happening or making some basic > mistake? Thanks. > -- View this message in context: http://old.nabble.com/PatternTokenizer-question-tp26497675p26503324.html Sent from the Solr - User mailing list archive at Nabble.com.