Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Dirk Högemann Mon, 17 Dec 2012 02:59:49 -0800

Hi,

I am not sure if am missing something, or maybe I do not exactly understand
the index/search analyzer definition and their execution.


I have a field definition like this:


    <fieldType name="cl2tokenized_string" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
      <analyzer type="index">
        <tokenizer class="solr.PatternTokenizerFactory" pattern="###"
group="-1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="search">
        <tokenizer class="solr.PatternTokenizerFactory" pattern="###"
group="-1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

Any field starting with cl2 should be recognized as being of type
cl2Tokenized_string:
<dynamicField name="cl2*" type="cl2tokenized_string" indexed="true"
stored="true" />

When I try to search for a token in that sense the query is tokenized at
whitespaces:

<arr name="filter_queries"><str>{!q.op=AND
df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen und
Erden, sonstiger Bergbau</str></arr><arr
name="parsed_filter_queries"><str>+cl2Categories_NACE:08
+cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
+cl2Categories_NACE:steinen +cl2Categories_NACE:und
+cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
+cl2Categories_NACE:bergbau</str></arr>

I expected the query parser would also tokenize ONLY at the pattern ###,
instead of using a white space tokenizer here?
Is is possible to define a filter query, without using phrases, to achieve
the desired behavior?
Maybe local parameters are not the way to go here?

Best
Dirk

Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Reply via email to