Hi All,
still struggling with payloads. Trying to understand better my problem I've created a minimal reproducible example. Basically I have a multivalued field with payloads with this schema configuration: <fieldType name="payloads" stored="true" indexed="true" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float" delimiter=":"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldType> <field name="multipayload" type="payloads" indexed="true" stored="true" multiValued="true" /> That are populated with data like this: <doc> <field name="id">1</field> <field name="multipayload">A:1 B:2 C:3 D:4</field> <field name="multipayload">A:0.1 B:0.2 E:5 F:6</field> <field name="multipayload">E:0.5 F:0.6</field> </doc> I want to be able to query on the multipayload field with a free number of token in any possible sequence and having as a result the SUM of the payloads values of those tokens only for the rows of the multipayload field that satisfy the condition of having all the tokens of the query as clause (basically the same of saying AND condition on the row). For example: 1. I run the query having B F A as clauses, I expect to obtain a match on the second row for doc with id=1, and so a score of 0.2 + 0.1 + 6 = 6.3 2. I run the query having F E as clauses, I expect to obtain a match on the second and the third row for doc with id=1 and thus a score of (6 + 5) + (0.6 + 0.5) = 12.1 3. I run the query having A F as clauses, I expect to have no match and thus a score of 0.0 I tried to use a query like this: http://localhost:8983/solr/test/select?debugQuery=true&q={!payload_score f=multipayload v=$pl func=sum includeSpanScore=false operator=phrase}&pl=__MY_CLAUSES__ The results I obtain are: 1. B F A: No results 2. F E: 6.5 (resulting from match of row#2: 6 and row#3: 0.5) – as result of the span query I presume 3. E F: 12.1 (as expected, but only because “by chance” the sequence matches as a phrase on rows #2 and #3) 4. A F: No results (as expected) Looking into Solr payloads code ( https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/solr/core/src/java/org/apache/solr/util/PayloadUtils.java#L139 ), I see that: - There are only two options: OR and phrase, while I think that my case should need to have an AND operator - The phrase option has an hardwired distance of 0 for the span query: query = new SpanNearQuery(terms.toArray(new SpanTermQuery[terms.size()]), 0, true); I think that a phrase query with a huge distance (i.e. 100) could behave as an AND query, but I’m just guessing. But anyway to suit my case I think that in general I’d need an AND option or the possibility to define the span behaviour in a more flexible way for the phrase query). Even if my case is quite specific, I think that the current implementation of the phrase option is not really well suited also for a more general case of having weights associated to Part-of-speech classes, that is in my opinion a more classic usage of payloads, where for example I want to deboost adjectives against nouns, as for example: - a *race horse* is a *horse* that runs in races - a *horse race* is a *race* for horses In general it seems to me that the absence of an AND option and the hardwired phrase span to 0 is quite limiting. Thanks in advance for your time, Vincenzo -- Vincenzo D'Amore