Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Dirk Högemann Mon, 17 Dec 2012 08:02:48 -0800

Ok- right, changed that... Nevertheless I thought I should always use the
same analyzers for the query and the index section to have consistent
results.
Does this mean that the tokenizer in the query section will always be
ignored by the given query parsers?




2012/12/17 Jack Krupansky <[email protected]>

> The query parsers normally tokenize on white space and query operators,
> but you can escape any white space with backslash or put the text in quotes
> and then it will be tokenized by the analyzer rather than the query parser.
>
> Also, you have:
>
> <analyzer type="search">
>
> Change "search" to "query", but that won't change your problem since Solr
> defaults to using the "index" analyzer if it doesn't "see" a "query"
> analyzer.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Dirk Högemann
> Sent: Monday, December 17, 2012 5:59 AM
> To: [email protected]
> Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at
> whitespace?
>
>
> Hi,
>
> I am not sure if am missing something, or maybe I do not exactly understand
> the index/search analyzer definition and their execution.
>
> I have a field definition like this:
>
>
>    <fieldType name="cl2tokenized_string" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
>      <analyzer type="index">
>        <tokenizer class="solr.**PatternTokenizerFactory" pattern="###"
> group="-1"/>
>        <filter class="solr.**LowerCaseFilterFactory"/>
>      </analyzer>
>      <analyzer type="search">
>        <tokenizer class="solr.**PatternTokenizerFactory" pattern="###"
> group="-1"/>
>        <filter class="solr.**LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> Any field starting with cl2 should be recognized as being of type
> cl2Tokenized_string:
> <dynamicField name="cl2*" type="cl2tokenized_string" indexed="true"
> stored="true" />
>
> When I try to search for a token in that sense the query is tokenized at
> whitespaces:
>
> <arr name="filter_queries"><str>{!**q.op=AND
> df=cl2Categories_NACE}**cl2Categories_NACE:08 Gewinnung von Steinen und
> Erden, sonstiger Bergbau</str></arr><arr
> name="parsed_filter_queries"><**str>+cl2Categories_NACE:08
> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
> +cl2Categories_NACE:steinen +cl2Categories_NACE:und
> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
> +cl2Categories_NACE:bergbau</**str></arr>
>
> I expected the query parser would also tokenize ONLY at the pattern ###,
> instead of using a white space tokenizer here?
> Is is possible to define a filter query, without using phrases, to achieve
> the desired behavior?
> Maybe local parameters are not the way to go here?
>
> Best
> Dirk
>

Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Reply via email to