Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Jack Krupansky Mon, 17 Dec 2012 09:02:33 -0800

No, the "query" analyzer tokenizer will simply be applied to each term orquoted string AFTER the query parser has already parsed it. You may haveescaped or quoted characters which will then be seen by the analyzertokenizer.


-- Jack Krupansky

-----Original Message-----From: Dirk Högemann

Sent: Monday, December 17, 2012 11:01 AM
To: solr-user@lucene.apache.org

Subject: Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always atwhitespace?


Ok- right, changed that... Nevertheless I thought I should always use the
same analyzers for the query and the index section to have consistent
results.
Does this mean that the tokenizer in the query section will always be
ignored by the given query parsers?



2012/12/17 Jack Krupansky <j...@basetechnology.com>

The query parsers normally tokenize on white space and query operators,

but you can escape any white space with backslash or put the text inquotesand then it will be tokenized by the analyzer rather than the queryparser.


Also, you have:

<analyzer type="search">

Change "search" to "query", but that won't change your problem since Solr
defaults to using the "index" analyzer if it doesn't "see" a "query"
analyzer.

-- Jack Krupansky

-----Original Message----- From: Dirk Högemann
Sent: Monday, December 17, 2012 5:59 AM
To: solr-user@lucene.apache.org
Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at
whitespace?


Hi,

I am not sure if am missing something, or maybe I do not exactlyunderstand

the index/search analyzer definition and their execution.

I have a field definition like this:


   <fieldType name="cl2tokenized_string" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
     <analyzer type="index">
       <tokenizer class="solr.**PatternTokenizerFactory" pattern="###"
group="-1"/>
       <filter class="solr.**LowerCaseFilterFactory"/>
     </analyzer>
     <analyzer type="search">
       <tokenizer class="solr.**PatternTokenizerFactory" pattern="###"
group="-1"/>
       <filter class="solr.**LowerCaseFilterFactory"/>
     </analyzer>
   </fieldType>

Any field starting with cl2 should be recognized as being of type
cl2Tokenized_string:
<dynamicField name="cl2*" type="cl2tokenized_string" indexed="true"
stored="true" />

When I try to search for a token in that sense the query is tokenized at
whitespaces:

<arr name="filter_queries"><str>{!**q.op=AND
df=cl2Categories_NACE}**cl2Categories_NACE:08 Gewinnung von Steinen und
Erden, sonstiger Bergbau</str></arr><arr
name="parsed_filter_queries"><**str>+cl2Categories_NACE:08
+cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
+cl2Categories_NACE:steinen +cl2Categories_NACE:und
+cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
+cl2Categories_NACE:bergbau</**str></arr>

I expected the query parser would also tokenize ONLY at the pattern ###,
instead of using a white space tokenizer here?
Is is possible to define a filter query, without using phrases, to achieve
the desired behavior?
Maybe local parameters are not the way to go here?

Best
Dirk

Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Reply via email to