No, the "query" analyzer tokenizer will simply be applied to each term or quoted string AFTER the query parser has already parsed it. You may have escaped or quoted characters which will then be seen by the analyzer tokenizer.

-- Jack Krupansky

-----Original Message----- From: Dirk Högemann
Sent: Monday, December 17, 2012 11:01 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Ok- right, changed that... Nevertheless I thought I should always use the
same analyzers for the query and the index section to have consistent
results.
Does this mean that the tokenizer in the query section will always be
ignored by the given query parsers?



2012/12/17 Jack Krupansky <j...@basetechnology.com>

The query parsers normally tokenize on white space and query operators,
but you can escape any white space with backslash or put the text in quotes and then it will be tokenized by the analyzer rather than the query parser.

Also, you have:

<analyzer type="search">

Change "search" to "query", but that won't change your problem since Solr
defaults to using the "index" analyzer if it doesn't "see" a "query"
analyzer.

-- Jack Krupansky

-----Original Message----- From: Dirk Högemann
Sent: Monday, December 17, 2012 5:59 AM
To: solr-user@lucene.apache.org
Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at
whitespace?


Hi,

I am not sure if am missing something, or maybe I do not exactly understand
the index/search analyzer definition and their execution.

I have a field definition like this:


   <fieldType name="cl2tokenized_string" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
     <analyzer type="index">
       <tokenizer class="solr.**PatternTokenizerFactory" pattern="###"
group="-1"/>
       <filter class="solr.**LowerCaseFilterFactory"/>
     </analyzer>
     <analyzer type="search">
       <tokenizer class="solr.**PatternTokenizerFactory" pattern="###"
group="-1"/>
       <filter class="solr.**LowerCaseFilterFactory"/>
     </analyzer>
   </fieldType>

Any field starting with cl2 should be recognized as being of type
cl2Tokenized_string:
<dynamicField name="cl2*" type="cl2tokenized_string" indexed="true"
stored="true" />

When I try to search for a token in that sense the query is tokenized at
whitespaces:

<arr name="filter_queries"><str>{!**q.op=AND
df=cl2Categories_NACE}**cl2Categories_NACE:08 Gewinnung von Steinen und
Erden, sonstiger Bergbau</str></arr><arr
name="parsed_filter_queries"><**str>+cl2Categories_NACE:08
+cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
+cl2Categories_NACE:steinen +cl2Categories_NACE:und
+cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
+cl2Categories_NACE:bergbau</**str></arr>

I expected the query parser would also tokenize ONLY at the pattern ###,
instead of using a white space tokenizer here?
Is is possible to define a filter query, without using phrases, to achieve
the desired behavior?
Maybe local parameters are not the way to go here?

Best
Dirk


Reply via email to