Hi Alex

Thanks for the reply.
We are not using the 'copyField bucket' approach as it is inflexible. Our
textual fields are all multivalued dynamic fields, which allows us to craft
a list of `pf` (phrase fields) with associated weighting boosts that are
meant to be used in the search on a *per-collection* basis. This allows us
to have all of the textual fields indexed independently and then simply
change the query when we want to include/exclude a field from the search
without the need to reindex the entire collection. e/dismax makes this more
flexible approach possible.

I'll take a look at the ComplexQueryParser and see if it is a good fit.
We use a lot of the e/dismax params though, such as `bf` (boost functions),
`bq` (boost queries), and 'pf' (phrase fields), to influence the relevance
score.

FYI: We are using Solr 8.3.

On Tue, 2 Mar 2021 at 13:38, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> I admit to not fully understanding the examples, but ComplexQueryParser
> looks like something worth at least reviewing:
>
>
> https://lucene.apache.org/solr/guide/8_8/other-parsers.html#complex-phrase-query-parser
>
> Also I did not see any references to trying to copyField and process same
> content in different ways. If copyField is not stored, the overhead is not
> as large.
>
> Regards,
>     Alex
>
>
>
> On Tue., Mar. 2, 2021, 7:08 a.m. Martin Graney, <martin.gra...@sooqr.com>
> wrote:
>
> > Hi All
> >
> > I have been trying to implement multi word synonyms using `sow=false`
> into
> > a pre-existing system that applied pre-processing to the phrase to apply
> > wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`.
> >
> > I got the synonyms expansion working perfectly, after discovering the
> > `preserveOriginal` filter param, but then I needed to re-implement the
> > existing wildcard behaviour.
> > I tried using the edge-ngram filter, but found that when searching for
> the
> > phrase `bread stick` on a field containing the word `breadstick` and
> > `q.op=AND` it returns no results, as the content `breadstick` does not
> > _start with_ `stick`. The previous wildcard behaviour would return all
> > documents that contain the substrings `bread` AND `stick`, which is the
> > desired behaviour.
> > I tried using the ngram filter, but this does not support the
> > `preserveOriginal`, and so loses a lot of relevance for exact matches,
> but
> > it also results in matches that are far too broad, creating 21 tokens
> from
> > `breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice
> > essentially matches all of the documents. Which means that boosts applied
> > to other fields, such as 'in stock', push irrelevant documents to the
> top.
> >
> > Finally, I tried to strip out ngrams entirely and use subquery/LocalParam
> > syntax and local params, a solr feature that is not very well documented.
> > I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax
> > sow=false v=$plain}` to effectively create a union of results, one with
> > multi word synonyms support and one with wildcard support.
> > But then I had to implement the other edismax params and immediately
> > stumbled.
> > Each query in production normally has a slew of `bf` and `bq` params,
> and I
> > cannot see a way to pass these into the nested query using local
> variables.
> > If I have 3 different `bf` params how can I pass them into the local
> param
> > subqueries?
> >
> > Also, as the search in production is across multiple fields I found
> passing
> > `qf` to both subqueries using dereferencing failed, as the parser saw it
> as
> > a single field and threw a 'number format exception'.
> > i.e.
> > q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf}
> > $tw=*bread* *stick*
> > $tp=bread stick
> > $tqf=title^2 desctiption^0.5
> >
> > As you can guess, I have spent quite some time going down this rabbit
> hole
> > in my attempt to reproduce the existing desired functionality alongside
> > multiterm synonyms.
> > Is there a way to get multiterm synonyms working with substring matching
> > effectively?
> > I am sure there is a much simpler way that I am missing than all of my
> > attempts so far.
> >
> > Solr: 8.3
> >
> > Thanks
> > Martin Graney
> >
> > --
> >  <https://www.linkedin.com/company/sooqr-com/>
> >
>


-- 
Martin Graney
Lead Developer

http://sooqr.com <http://www.sooqr.com/>
http://twitter.com/sooqrcom

Office: +31 (0) 88 766 7700
Mobile: +31 (0) 64 660 8543

-- 
 <https://www.linkedin.com/company/sooqr-com/>

Reply via email to