From: Bernd Fehling<bernd.fehl...@uni-bielefeld.de>
Subject: Re: query synonym expansion howto?
To: solr-user@lucene.apache.org
Date: Thursday, October 6, 2011, 4:41 PM
OK, I have changed my
synonyms_test.txt:
philosophie, philosophy, filosofia
So there are no multi-word synonyms but it is still not
working.
And also if setting qs=0 I get a query slop.
search for "philosophie" --> 13 hits
search for "philosophy" --> 21 hits
search for "filosofia" --> 51 hits
search for "philosophy" with synonym expansion --> 0
hits.
<str name="q">textth:philosophy</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"
maxScore="0.0"/>
−
<lst name="debug">
<str
name="rawquerystring">textth:philosophy</str>
<str
name="querystring">textth:philosophy</str>
−
<str name="parsedquery">
+((textth:philosophie textth:philosophy
textth:filosofia)~3)
</str>
−
<str name="parsedquery_toString">
+((textth:philosophie textth:philosophy
textth:filosofia)~3)
</str>
<lst name="explain"/>
<str
name="QParser">ExtendedDismaxQParser</str>
org.apache.solr.analysis.SynonymFilterFactory
{tokenizerFactory=solr.WhitespaceTokenizerFactory,
synonyms=synonyms_test.txt, expand=true,
format=solr, ignoreCase=true,
luceneMatchVersion=LUCENE_35}
position 1
term text philosophie
philosophy
filosofia
type SYNONYM
SYNONYM
SYNONYM
startOffset 0
0
0
endOffset 10
10
10
Very strange.
Anything else to try?
Regards
Bernd
Am 06.10.2011 13:58, schrieb Ahmet Arslan:
Query time synonym expansion has problems with
multi-word synonyms.
Query parser splits query string according to
white-spaces before query string reaches to analysis chain.
This is a known limitation explained here :
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
But I think using synonyms at index time has its
problems as well. E.g. You need to re-index if you
add/remove/edit synonym list. For some systems re-indexing
takes a lot of time.
I am wondering if a "query expansion module" that
injects (before analysis chain) synonymy to initial query
string would makes sense.
E.g. If the query string contains 'adult education' it
will add "educación de adultos" phrase as an injected
optional clause.
About query slop, since you are using (e)dismax query
parser, it is controlled via qs parameter.
http://wiki.apache.org/solr/DisMaxQParserPlugin#qs_.28Query_Phrase_Slop.29
has anyone managed to get querytime synonym
expansion
working?
Synonym expansion itself is working but I get no
search
results.
synonyms_test.txt
erwachsenenbildung, adult education, educación de
adultos,
éducation des adultes
search for
"erwachsenenbildung" --> 8
hits
search for "adult education"
--> 13
hits
search for "educación de adultos"
--> 3 hits
search for "adult education" with synonym
expansion -->
0 hits.
RESULT:
-------
<str name="q">textth:"adult
education"</str>
<str name="q.op">OR</str>
<result name="response" numFound="0" start="0"
maxScore="0.0"/>
−
<lst name="debug">
<str name="rawquerystring">textth:"adult
education"</str>
<str name="querystring">textth:"adult
education"</str>
−
<str name="parsedquery">
+((textth:erwachsenenbildung textth:adult
education
textth:educación de adultos textth:éducation
des
adultes)~4)
</str>
−
<str name="parsedquery_toString">
+((textth:erwachsenenbildung textth:adult
education
textth:educación de adultos textth:éducation
des
adultes)~4)
</str>
<lst name="explain"/>
<str
name="QParser">ExtendedDismaxQParser</str>
Can it be that the "q.op=OR" parameter is
ignored?
Why is the a slop of ~4 added to the parsedquery?
Regards,
Bernd
--
*************************************************************
Bernd Fehling
Universitätsbibliothek Bielefeld
Dipl.-Inform. (FH)
Universitätsstr.
25
Tel. +49 521 106-4060
Fax. +49 521
106-4052
bernd.fehl...@uni-bielefeld.de
33615
Bielefeld
BASE - Bielefeld Academic Search Engine -
www.base-search.net
*************************************************************