If it is important enough for you, you could expand multi-word and compound
word synonyms as a preprocessing step and generate an "OR" expression in the
query.
-- Jack Krupansky
-----Original Message-----
From: Chung Wu
Sent: Monday, May 14, 2012 8:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Unexpected query rewrite from WordDelimiterFilterFactory and
SynonymFilterFactory
Thanks Jack! It's too bad I can't have catenate and generateParts both set
to "1" at query time. If I set catenate to "0", then I miss the case where
"wifi" is indexed but "wi-fi" is queried. If I set generateParts to "0",
then I miss the case where "wi fi" is queried but "wi-fi" is canceled. I
guess I'll just have to pick one!
Chung
On Mon, May 14, 2012 at 4:50 PM, Jack Krupansky
<j...@basetechnology.com>wrote:
The extra terms are okay at index time - they simply overlap the base
words and make composite terms more searchable, but you need to have a
separate query analyzer that sets the various catenate options to "0"
since
the query generator doesn't know what to do with the extra terms. Synonyms
are a little more tricky - the simplest thing is to disable them in the
index analyzer and do them only in the query analyzer - and multi-term
synonyms don't work well, except for replacement synonyms at index time.
See the "text_en_splitting" field type in the example schema.
-- Jack Krupansky
-----Original Message----- From: Chung Wu
Sent: Monday, May 14, 2012 7:01 PM
To: solr-user@lucene.apache.org
Subject: Unexpected query rewrite from WordDelimiterFilterFactory and
SynonymFilterFactory
Hi all!
I'm using Solr 3.6, and I'm seeing unexpected query rewriting when either
using WordDelimiterFilterFactory with catenateWords="1", or with
SynonymFilterFactory with multi-word synonyms.
For example, in this type where a WordDelimiterFilterFactory is used for
the query analyzer, with catenateWords="1":
<fieldType name="testType" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="**true">
<analyzer>
<tokenizer class="solr.**WhitespaceTokenizerFactory"/>
<filter class="solr.**WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
</analyzer>
</fieldType>
For the query "wi-fi", the term positions after the
WordDelimiterFilterFactory looks like this:
position 1 2 term text wi fi wifi startOffset 0 3 0 endOffset 2 5 5
typewordwordword
And looking at debug output, the parsed query looks like this, which is
surprising:
<str name="rawquerystring">test1:"**wi-fi"</str>
<str name="querystring">test1:"wi-**fi"</str>
<str name="parsedquery">**MultiPhraseQuery(test1:"wi (fi wifi)")</str>
<str name="parsedquery_toString">***test1:"wi (fi wifi)*"</str>
I see similar things happening if I use SynonymFilterFactory with
multi-word synonyms (maybe related to this bug:
https://issues.apache.org/**jira/browse/SOLR-3390<https://issues.apache.org/jira/browse/SOLR-3390>;
I originally asked about
it here:
http://stackoverflow.com/**questions/10218224/in-solr-**
expanding-multi-word-synonyms-**and-term-positions<http://stackoverflow.com/questions/10218224/in-solr-expanding-multi-word-synonyms-and-term-positions>
)
Any ideas on what I'm supposed to do to make this work as expected?
Thanks!
Chung