If it is important enough for you, you could expand multi-word and compound word synonyms as a preprocessing step and generate an "OR" expression in the query.

-- Jack Krupansky

-----Original Message----- From: Chung Wu
Sent: Monday, May 14, 2012 8:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory

Thanks Jack!  It's too bad I can't have catenate and generateParts both set
to "1" at query time.  If I set catenate to "0", then I miss the case where
"wifi" is indexed but "wi-fi" is queried.  If I set generateParts to "0",
then I miss the case where "wi fi" is queried but "wi-fi" is canceled.   I
guess I'll just have to pick one!

Chung

On Mon, May 14, 2012 at 4:50 PM, Jack Krupansky <j...@basetechnology.com>wrote:

The extra terms are okay at index time - they simply overlap the base
words and make composite terms more searchable, but you need to have a
separate query analyzer that sets the various catenate options to "0" since
the query generator doesn't know what to do with the extra terms. Synonyms
are a little more tricky - the simplest thing is to disable them in the
index analyzer and do them only in the query analyzer - and multi-term
synonyms don't work well, except for replacement synonyms at index time.

See the "text_en_splitting" field type in the example schema.

-- Jack Krupansky

-----Original Message----- From: Chung Wu
Sent: Monday, May 14, 2012 7:01 PM
To: solr-user@lucene.apache.org
Subject: Unexpected query rewrite from WordDelimiterFilterFactory and
SynonymFilterFactory


Hi all!

I'm using Solr 3.6, and I'm seeing unexpected query rewriting when either
using WordDelimiterFilterFactory with catenateWords="1", or with
SynonymFilterFactory with multi-word synonyms.

For example, in this type where a WordDelimiterFilterFactory is used for
the query analyzer, with catenateWords="1":

  <fieldType name="testType" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="**true">
    <analyzer>
      <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
      <filter class="solr.**WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
    </analyzer>
  </fieldType>

For the query "wi-fi", the term positions after the
WordDelimiterFilterFactory looks like this:

position 1 2 term text wi fi wifi startOffset 0 3 0 endOffset 2 5 5
typewordwordword


And looking at debug output, the parsed query looks like this, which is
surprising:

<str name="rawquerystring">test1:"**wi-fi"</str>
<str name="querystring">test1:"wi-**fi"</str>
<str name="parsedquery">**MultiPhraseQuery(test1:"wi (fi wifi)")</str>
<str name="parsedquery_toString">***test1:"wi (fi wifi)*"</str>


I see similar things happening if I use SynonymFilterFactory with
multi-word synonyms (maybe related to this bug:
https://issues.apache.org/**jira/browse/SOLR-3390<https://issues.apache.org/jira/browse/SOLR-3390>;
I originally asked about
it here:
http://stackoverflow.com/**questions/10218224/in-solr-**
expanding-multi-word-synonyms-**and-term-positions<http://stackoverflow.com/questions/10218224/in-solr-expanding-multi-word-synonyms-and-term-positions>
)

Any ideas on what I'm supposed to do to make this work as expected?

Thanks!

Chung


Reply via email to