[ https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205764#comment-17205764 ]
Gus Heck commented on SOLR-14787: --------------------------------- So after spending some more time with this I have the following thoughts: # The threshold parameter is redundant with the payloads parameter. This should all be choosing operators in the same manner in the code, with "equals" being the default operator rather than having two distinct code paths. I think {{"\{!payload_check f=vals_dpf payloads='0.75' op='gt'}one"}} makes more sense. This also opens up the possibility of testing vs multiple payload values just like the equals case. Accepting a different operator per payload value can be a future enhancement however if anyone wants it. # There is a lucene class change here and so there definitely should be lucene level tests and we should have a lucene ticket too. # As you mentioned in a separate channel, this doesn't work with integers ( ie. {{"\{!payload_check f=vals_dpi payloads='1' op='gt' threshold='0.75'}A"}} won't work... this is because the integer payload (from the index, not the query) gets decoded as a float and winds up being some very very small value (saw it in debug, forgot to copy it down, but something ten to the minus 14 IIRC), so this deceptively gives wrong answers and does not throw errors which is bad. I think this needs to be addressed by communicating the payload type to the query at the lucene layer (where folks are responsible for knowing the types info of their own fields) and deriving it from schema at the solr level where folks expect stuff to just work, because they declared a schema. Additionally, by analogy with range queries, probably strings should work via lexical order.... but possibly that could be for future enhancement, since users are less likely to expect strings to work in the same fashion as floats. # I'm still trying to explain why I get different results in IDE vs build here, but the build and the running applications is the important thing. # Needs docs of course > Inequality support in Payload Check query parser > ------------------------------------------------ > > Key: SOLR-14787 > URL: https://issues.apache.org/jira/browse/SOLR-14787 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Kevin Watters > Assignee: Gus Heck > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The goal of this ticket/pull request is to support a richer set of matching > and filtering based on term payloads. This patch extends the > PayloadCheckQueryParser to add a new local param for "op" > The value of OP could be one of the following > * gt - greater than > * gte - greater than or equal > * lt - less than > * lte - less than or equal > default value for "op" if not specified is to be the current behavior of > equals. > Additionally to the operation you can specify a threshold local parameter > This will provide the ability to search for the term "cat" so long as the > payload has a value of greater than 0.75. > One use case is to classify a document into various categories with an > associated confidence or probability that the classification is correct. > That can be indexed into a delimited payload field. The searches can find > and match documents that were tagged with the "cat" category with a > confidence of greater than 0.5. > Example Document > {code:java} > { > "id":"doc_1", > "classifications_payload":["cat|0.75 dog|2.0"] > } > {code} > Example Syntax > {code:java} > {!payload_check f=classifications_payload payloads='1' op='gt' > threshold='0.5'}cat {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org