Hi Alex Thanks for the reply. We are not using the 'copyField bucket' approach as it is inflexible. Our textual fields are all multivalued dynamic fields, which allows us to craft a list of `pf` (phrase fields) with associated weighting boosts that are meant to be used in the search on a *per-collection* basis. This allows us to have all of the textual fields indexed independently and then simply change the query when we want to include/exclude a field from the search without the need to reindex the entire collection. e/dismax makes this more flexible approach possible.
I'll take a look at the ComplexQueryParser and see if it is a good fit. We use a lot of the e/dismax params though, such as `bf` (boost functions), `bq` (boost queries), and 'pf' (phrase fields), to influence the relevance score. FYI: We are using Solr 8.3. On Tue, 2 Mar 2021 at 13:38, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > I admit to not fully understanding the examples, but ComplexQueryParser > looks like something worth at least reviewing: > > > https://lucene.apache.org/solr/guide/8_8/other-parsers.html#complex-phrase-query-parser > > Also I did not see any references to trying to copyField and process same > content in different ways. If copyField is not stored, the overhead is not > as large. > > Regards, > Alex > > > > On Tue., Mar. 2, 2021, 7:08 a.m. Martin Graney, <martin.gra...@sooqr.com> > wrote: > > > Hi All > > > > I have been trying to implement multi word synonyms using `sow=false` > into > > a pre-existing system that applied pre-processing to the phrase to apply > > wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`. > > > > I got the synonyms expansion working perfectly, after discovering the > > `preserveOriginal` filter param, but then I needed to re-implement the > > existing wildcard behaviour. > > I tried using the edge-ngram filter, but found that when searching for > the > > phrase `bread stick` on a field containing the word `breadstick` and > > `q.op=AND` it returns no results, as the content `breadstick` does not > > _start with_ `stick`. The previous wildcard behaviour would return all > > documents that contain the substrings `bread` AND `stick`, which is the > > desired behaviour. > > I tried using the ngram filter, but this does not support the > > `preserveOriginal`, and so loses a lot of relevance for exact matches, > but > > it also results in matches that are far too broad, creating 21 tokens > from > > `breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice > > essentially matches all of the documents. Which means that boosts applied > > to other fields, such as 'in stock', push irrelevant documents to the > top. > > > > Finally, I tried to strip out ngrams entirely and use subquery/LocalParam > > syntax and local params, a solr feature that is not very well documented. > > I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax > > sow=false v=$plain}` to effectively create a union of results, one with > > multi word synonyms support and one with wildcard support. > > But then I had to implement the other edismax params and immediately > > stumbled. > > Each query in production normally has a slew of `bf` and `bq` params, > and I > > cannot see a way to pass these into the nested query using local > variables. > > If I have 3 different `bf` params how can I pass them into the local > param > > subqueries? > > > > Also, as the search in production is across multiple fields I found > passing > > `qf` to both subqueries using dereferencing failed, as the parser saw it > as > > a single field and threw a 'number format exception'. > > i.e. > > q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf} > > $tw=*bread* *stick* > > $tp=bread stick > > $tqf=title^2 desctiption^0.5 > > > > As you can guess, I have spent quite some time going down this rabbit > hole > > in my attempt to reproduce the existing desired functionality alongside > > multiterm synonyms. > > Is there a way to get multiterm synonyms working with substring matching > > effectively? > > I am sure there is a much simpler way that I am missing than all of my > > attempts so far. > > > > Solr: 8.3 > > > > Thanks > > Martin Graney > > > > -- > > <https://www.linkedin.com/company/sooqr-com/> > > > -- Martin Graney Lead Developer http://sooqr.com <http://www.sooqr.com/> http://twitter.com/sooqrcom Office: +31 (0) 88 766 7700 Mobile: +31 (0) 64 660 8543 -- <https://www.linkedin.com/company/sooqr-com/>