Ah - now I got it. My solution to this was to use phrase queries - now I know why: Thanks! 2012/12/17 Jack Krupansky <j...@basetechnology.com>
> No, the "query" analyzer tokenizer will simply be applied to each term or > quoted string AFTER the query parser has already parsed it. You may have > escaped or quoted characters which will then be seen by the analyzer > tokenizer. > > > -- Jack Krupansky > > -----Original Message----- From: Dirk Högemann > Sent: Monday, December 17, 2012 11:01 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always > at whitespace? > > > Ok- right, changed that... Nevertheless I thought I should always use the > same analyzers for the query and the index section to have consistent > results. > Does this mean that the tokenizer in the query section will always be > ignored by the given query parsers? > > > > 2012/12/17 Jack Krupansky <j...@basetechnology.com> > > The query parsers normally tokenize on white space and query operators, >> but you can escape any white space with backslash or put the text in >> quotes >> and then it will be tokenized by the analyzer rather than the query >> parser. >> >> Also, you have: >> >> <analyzer type="search"> >> >> Change "search" to "query", but that won't change your problem since Solr >> defaults to using the "index" analyzer if it doesn't "see" a "query" >> analyzer. >> >> -- Jack Krupansky >> >> -----Original Message----- From: Dirk Högemann >> Sent: Monday, December 17, 2012 5:59 AM >> To: solr-user@lucene.apache.org >> Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at >> whitespace? >> >> >> Hi, >> >> I am not sure if am missing something, or maybe I do not exactly >> understand >> the index/search analyzer definition and their execution. >> >> I have a field definition like this: >> >> >> <fieldType name="cl2tokenized_string" class="solr.TextField" >> sortMissingLast="true" omitNorms="true"> >> <analyzer type="index"> >> <tokenizer class="solr.****PatternTokenizerFactory" pattern="###" >> group="-1"/> >> <filter class="solr.****LowerCaseFilterFactory"/> >> </analyzer> >> <analyzer type="search"> >> <tokenizer class="solr.****PatternTokenizerFactory" pattern="###" >> group="-1"/> >> <filter class="solr.****LowerCaseFilterFactory"/> >> >> </analyzer> >> </fieldType> >> >> Any field starting with cl2 should be recognized as being of type >> cl2Tokenized_string: >> <dynamicField name="cl2*" type="cl2tokenized_string" indexed="true" >> stored="true" /> >> >> When I try to search for a token in that sense the query is tokenized at >> whitespaces: >> >> <arr name="filter_queries"><str>{!****q.op=AND >> df=cl2Categories_NACE}****cl2Categories_NACE:08 Gewinnung von Steinen >> und >> >> Erden, sonstiger Bergbau</str></arr><arr >> name="parsed_filter_queries"><****str>+cl2Categories_NACE:08 >> >> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von >> +cl2Categories_NACE:steinen +cl2Categories_NACE:und >> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger >> +cl2Categories_NACE:bergbau</****str></arr> >> >> >> I expected the query parser would also tokenize ONLY at the pattern ###, >> instead of using a white space tokenizer here? >> Is is possible to define a filter query, without using phrases, to achieve >> the desired behavior? >> Maybe local parameters are not the way to go here? >> >> Best >> Dirk >> >> >