Re: Edismax query using different strings for different fields

David Zimmermann Sun, 07 Jun 2020 06:03:19 -0700

Thanks for the support Erick. Not using the “qf" parameter at all seems to give 
me valid query results now. The query debug information:

"debug":{ "rawquerystring":"claims_en:(An English sentence) description_en:(An
English sentence) claims_de:(Ein Deutscher Satz) description_de:(Ein Deutscher
Satz)", "querystring":"claims_en:(An English sentence) description_en:(An
English sentence) claims_de:(Ein Deutscher Satz) description_de:(Ein Deutscher
Satz)", "parsedquery":"+((claims_en:english claims_en:sentenc)
(description_en:english description_en:sentenc) (claims_de:deutsch
claims_de:satz) (description_de:deutsch description_de:satz))",
"parsedquery_toString":"+((claims_en:english claims_en:sentenc)
(description_en:english description_en:sentenc) (claims_de:deutsch
claims_de:satz) (description_de:deutsch description_de:satz))"

But this way it now seems like the “tie” parameter has no impact anymore. The
fact that I wanted something between a sum and a max query was the original
reason why I intend to use a edismax query. Also since I do have full sentences
as query, I thought it would be a good idea to use the phrase query feature at
a later stage.

If the edismax query is not the way to achieve my goal, do you see a proper way
to do this? The only alternative I see is running 2 seperate edismax query, one
for the English fields and one for the German fields and then recombining the
results. But that way I don’t know if the resulting scores are comparable? Can
I assume a score of 15 from the English edismax is better than a score of 13
from the German edismax?

Best regards
David

On 5 Jun 2020, at 19:39, Erick Erickson
<erickerick...@gmail.com<mailto:erickerick...@gmail.com>> wrote:

Let’s see the results of adding &debug=query to the query, in particular the
parsed version.

Because what you’re reporting doesn’t really make sense. edismax should be
totally
ignoring the “qf” parameter since you’re specifically qualifying all the
clauses with
a field. Unless you’re not really enclosing the search text in parentheses (or
quotes
if they should be phrases).

Also, if you’re willing to form separate clauses like this, there's no reason
to even
use edismax since its purpose is to automatically distribute search terms over
multiple
fields and you’re explicitly specifying the fields..

Best,
Erick

On Jun 5, 2020, at 10:10 AM, David Zimmermann
<david.zimmerm...@usi.ch<mailto:david.zimmerm...@usi.ch>> wrote:

I could need some advice on how to handle a particular cross language search
with Solr. I posted it on Stackoverflow 2 months ago, but could not find a
solution.
I have documents in 3 languages (English, German, French). For simplicity let's
assume it's just two languages (English and German). The documents are
standardised in the sense that they contain the same parts (text_part1 and
text_part2), just the language they are written in is different. The language
of the documents is known. In my index schema I use one core with different
fields for each language.

For a German document the index will look something like this:

* text_part1_en: empty
* text_part2_en: empty
* text_part1_de: German text
* text_part2_de: Another German text

For an English document it will be the other way around.

What I want to achieve: A user entering a query in English should receive both,
English and German documents that are relevant to his search. Further
conditions are:

* I want results with hits in text_part1 and text_part2 to be higher ranked
than results with hits only in one field (tie value > 0).
* The queries will not be single words, but full sentences (stop word removal
needed and partial hits [only a few words out of the sentences] must be valid).
* English and German documents must output into one ranking. I need to be
able to compare the relevance of an English document to the relevance of a
German document.
* the text parts need to stay separate, I want to boost the importance of
(let's say part1) over the other.

My general approach so far has been to get a German translation of the user's
query by sending it to a translation API. Then I want use an edismax query,
since it seems to fulfill all of my requirements. The problem is that I cannot
manage to search for the German query in the German fields and the English
query in the English fields only. The Solr edismax
documentation<https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html>
states that it supports the full Lucene query parser syntax, but I can't find
a way to address different fields with different inputs. I tried:

q=text_part1_en: (A sentence in English) text_part1_de: (Ein Satz auf Deutsch)
text_part2_en: (A sentence in English) text_part2_de: (Ein Satz auf Deutsch)
qf=text_part1_en text_part2_en text_part1_de text_part2_de

This syntax should be in line with what MatsLindh wrote in this
thread<https://stackoverflow.com/questions/53371028/different-search-term-on-different-fields-using-edismax-query-parser-in-solr>.
I tried different versions of writing this q, but whatever I do Solr always
search for the full q string in all four fields given by qf, which totally
messes up the result. Am I just making mistakes in the query syntax or is it
even possible to do what I'm trying to do using edismax?

Any help would be highly appreciated.

Re: Edismax query using different strings for different fields

Reply via email to