Thanks Jack! It's too bad I can't have catenate and generateParts both set to "1" at query time. If I set catenate to "0", then I miss the case where "wifi" is indexed but "wi-fi" is queried. If I set generateParts to "0", then I miss the case where "wi fi" is queried but "wi-fi" is canceled. I guess I'll just have to pick one!
Chung On Mon, May 14, 2012 at 4:50 PM, Jack Krupansky <j...@basetechnology.com>wrote: > The extra terms are okay at index time - they simply overlap the base > words and make composite terms more searchable, but you need to have a > separate query analyzer that sets the various catenate options to "0" since > the query generator doesn't know what to do with the extra terms. Synonyms > are a little more tricky - the simplest thing is to disable them in the > index analyzer and do them only in the query analyzer - and multi-term > synonyms don't work well, except for replacement synonyms at index time. > > See the "text_en_splitting" field type in the example schema. > > -- Jack Krupansky > > -----Original Message----- From: Chung Wu > Sent: Monday, May 14, 2012 7:01 PM > To: solr-user@lucene.apache.org > Subject: Unexpected query rewrite from WordDelimiterFilterFactory and > SynonymFilterFactory > > > Hi all! > > I'm using Solr 3.6, and I'm seeing unexpected query rewriting when either > using WordDelimiterFilterFactory with catenateWords="1", or with > SynonymFilterFactory with multi-word synonyms. > > For example, in this type where a WordDelimiterFilterFactory is used for > the query analyzer, with catenateWords="1": > > <fieldType name="testType" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="**true"> > <analyzer> > <tokenizer class="solr.**WhitespaceTokenizerFactory"/> > <filter class="solr.**WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/> > </analyzer> > </fieldType> > > For the query "wi-fi", the term positions after the > WordDelimiterFilterFactory looks like this: > > position 1 2 term text wi fi wifi startOffset 0 3 0 endOffset 2 5 5 > typewordwordword > > > And looking at debug output, the parsed query looks like this, which is > surprising: > > <str name="rawquerystring">test1:"**wi-fi"</str> > <str name="querystring">test1:"wi-**fi"</str> > <str name="parsedquery">**MultiPhraseQuery(test1:"wi (fi wifi)")</str> > <str name="parsedquery_toString">***test1:"wi (fi wifi)*"</str> > > > I see similar things happening if I use SynonymFilterFactory with > multi-word synonyms (maybe related to this bug: > https://issues.apache.org/**jira/browse/SOLR-3390<https://issues.apache.org/jira/browse/SOLR-3390>; > I originally asked about > it here: > http://stackoverflow.com/**questions/10218224/in-solr-** > expanding-multi-word-synonyms-**and-term-positions<http://stackoverflow.com/questions/10218224/in-solr-expanding-multi-word-synonyms-and-term-positions> > ) > > Any ideas on what I'm supposed to do to make this work as expected? > > Thanks! > > Chung >