Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Dirk Högemann Mon, 17 Dec 2012 10:37:27 -0800

Ah - now I got it. My solution to this was to use phrase queries - now I
know why: Thanks!
2012/12/17 Jack Krupansky <j...@basetechnology.com>


> No, the "query" analyzer tokenizer will simply be applied to each term or
> quoted string AFTER the query parser has already parsed it. You may have
> escaped or quoted characters which will then be seen by the analyzer
> tokenizer.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Dirk Högemann
> Sent: Monday, December 17, 2012 11:01 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always
> at whitespace?
>
>
> Ok- right, changed that... Nevertheless I thought I should always use the
> same analyzers for the query and the index section to have consistent
> results.
> Does this mean that the tokenizer in the query section will always be
> ignored by the given query parsers?
>
>
>
> 2012/12/17 Jack Krupansky <j...@basetechnology.com>
>
>  The query parsers normally tokenize on white space and query operators,
>> but you can escape any white space with backslash or put the text in
>> quotes
>> and then it will be tokenized by the analyzer rather than the query
>> parser.
>>
>> Also, you have:
>>
>> <analyzer type="search">
>>
>> Change "search" to "query", but that won't change your problem since Solr
>> defaults to using the "index" analyzer if it doesn't "see" a "query"
>> analyzer.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Dirk Högemann
>> Sent: Monday, December 17, 2012 5:59 AM
>> To: solr-user@lucene.apache.org
>> Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at
>> whitespace?
>>
>>
>> Hi,
>>
>> I am not sure if am missing something, or maybe I do not exactly
>> understand
>> the index/search analyzer definition and their execution.
>>
>> I have a field definition like this:
>>
>>
>>    <fieldType name="cl2tokenized_string" class="solr.TextField"
>> sortMissingLast="true" omitNorms="true">
>>      <analyzer type="index">
>>        <tokenizer class="solr.****PatternTokenizerFactory" pattern="###"
>> group="-1"/>
>>        <filter class="solr.****LowerCaseFilterFactory"/>
>>      </analyzer>
>>      <analyzer type="search">
>>        <tokenizer class="solr.****PatternTokenizerFactory" pattern="###"
>> group="-1"/>
>>        <filter class="solr.****LowerCaseFilterFactory"/>
>>
>>      </analyzer>
>>    </fieldType>
>>
>> Any field starting with cl2 should be recognized as being of type
>> cl2Tokenized_string:
>> <dynamicField name="cl2*" type="cl2tokenized_string" indexed="true"
>> stored="true" />
>>
>> When I try to search for a token in that sense the query is tokenized at
>> whitespaces:
>>
>> <arr name="filter_queries"><str>{!****q.op=AND
>> df=cl2Categories_NACE}****cl2Categories_NACE:08 Gewinnung von Steinen
>> und
>>
>> Erden, sonstiger Bergbau</str></arr><arr
>> name="parsed_filter_queries"><****str>+cl2Categories_NACE:08
>>
>> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
>> +cl2Categories_NACE:steinen +cl2Categories_NACE:und
>> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
>> +cl2Categories_NACE:bergbau</****str></arr>
>>
>>
>> I expected the query parser would also tokenize ONLY at the pattern ###,
>> instead of using a white space tokenizer here?
>> Is is possible to define a filter query, without using phrases, to achieve
>> the desired behavior?
>> Maybe local parameters are not the way to go here?
>>
>> Best
>> Dirk
>>
>>
>

Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Reply via email to