Re: PatternTokenizer question

j philoon Tue, 24 Nov 2009 13:11:01 -0800

I think the answer to my question is contained in the wiki when discussing
the SynonymFilter, "The Lucene QueryParser tokenizes on white space before
giving any text to the Analyzer".  This would indeed explain what I am
getting.  Next question - can I avoid that behavior?



j philoon wrote:
> 
> I have defined a comma-delimited pattern tokenizer as follows:
>     <fieldType name="text_comma" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.PatternTokenizerFactory" pattern=","/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
> 
> <field name="commafld" type="text_comma" indexed="true" stored="true"/>
> 
> This appears to work fine when adding documents, since if I add a field
> commafld as "word1,WORD2,word 3" I see terms in the index as expected:
> "word1", "word2", and "word 3".
> 
> When I query, I am expecting that the same tokenization would take place,
> so a query that has 'commafld:(word 3)' would match term "word 3". 
> However, I find I have to submit the query as 'commafld:("word 3")'.  That
> is, it seems as if whitespace tokenization is taking place, not the
> comma-delimited tokenization.
> 
> Am I misunderstanding what should be happening or making some basic
> mistake?  Thanks. 
> 

-- 
View this message in context: 
http://old.nabble.com/PatternTokenizer-question-tp26497675p26503324.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: PatternTokenizer question

Reply via email to