I admit to not fully understanding the examples, but ComplexQueryParser
looks like something worth at least reviewing:

https://lucene.apache.org/solr/guide/8_8/other-parsers.html#complex-phrase-query-parser

Also I did not see any references to trying to copyField and process same
content in different ways. If copyField is not stored, the overhead is not
as large.

Regards,
    Alex



On Tue., Mar. 2, 2021, 7:08 a.m. Martin Graney, <martin.gra...@sooqr.com>
wrote:

> Hi All
>
> I have been trying to implement multi word synonyms using `sow=false` into
> a pre-existing system that applied pre-processing to the phrase to apply
> wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`.
>
> I got the synonyms expansion working perfectly, after discovering the
> `preserveOriginal` filter param, but then I needed to re-implement the
> existing wildcard behaviour.
> I tried using the edge-ngram filter, but found that when searching for the
> phrase `bread stick` on a field containing the word `breadstick` and
> `q.op=AND` it returns no results, as the content `breadstick` does not
> _start with_ `stick`. The previous wildcard behaviour would return all
> documents that contain the substrings `bread` AND `stick`, which is the
> desired behaviour.
> I tried using the ngram filter, but this does not support the
> `preserveOriginal`, and so loses a lot of relevance for exact matches, but
> it also results in matches that are far too broad, creating 21 tokens from
> `breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice
> essentially matches all of the documents. Which means that boosts applied
> to other fields, such as 'in stock', push irrelevant documents to the top.
>
> Finally, I tried to strip out ngrams entirely and use subquery/LocalParam
> syntax and local params, a solr feature that is not very well documented.
> I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax
> sow=false v=$plain}` to effectively create a union of results, one with
> multi word synonyms support and one with wildcard support.
> But then I had to implement the other edismax params and immediately
> stumbled.
> Each query in production normally has a slew of `bf` and `bq` params, and I
> cannot see a way to pass these into the nested query using local variables.
> If I have 3 different `bf` params how can I pass them into the local param
> subqueries?
>
> Also, as the search in production is across multiple fields I found passing
> `qf` to both subqueries using dereferencing failed, as the parser saw it as
> a single field and threw a 'number format exception'.
> i.e.
> q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf}
> $tw=*bread* *stick*
> $tp=bread stick
> $tqf=title^2 desctiption^0.5
>
> As you can guess, I have spent quite some time going down this rabbit hole
> in my attempt to reproduce the existing desired functionality alongside
> multiterm synonyms.
> Is there a way to get multiterm synonyms working with substring matching
> effectively?
> I am sure there is a much simpler way that I am missing than all of my
> attempts so far.
>
> Solr: 8.3
>
> Thanks
> Martin Graney
>
> --
>  <https://www.linkedin.com/company/sooqr-com/>
>

Reply via email to