Required operator (+) is being ignored when using default conjunction operator AND

2020-04-01 Thread Eran Buchnick
Using solr 8.3.0 it seems like required operator isn't functioning properly
when default conjunction operator is AND.


Steps to reproduce:

20 docs

all have text field

17 have the value A

13 have the value B

10 have both A and B (the intersection)

===the data===
[
{
"id": "0",
"_text_": [
"abc",
"123",
"xyz"
]
},
{
"id": "1",
"_text_": [
"abc",
"123",
"xyz"
]
},
{
"id": "2",
"_text_": [
"abc",
"123",
"xyz"
]
},
{
"id": "3",
"_text_": [
"abc",
"123"
]
},
{
"id": "4",
"_text_": [
"abc",
"123"
]
},
{
"id": "5",
"_text_": [
"abc",
"123",
"xyz"
]
},
{
"id": "6",
"_text_": [
"abc",
"123"
]
},
{
"id": "7",
"_text_": [
"abc",
"123"
]
},
{
"id": "8",
"_text_": [
"abc",
"123",
"xyz"
]
},
{
"id": "9",
"_text_": [
"abc",
"123"
]
},
{
"id": "10",
"_text_": [
"abc",
"xyz"
]
},
{
"id": "11",
"_text_": [
"abc"
]
},
{
"id": "12",
"_text_": [
"abc",
"xyz"
]
},
{
"id": "13",
"_text_": [
"abc",
"xyz"
]
},
{
"id": "14",
"_text_": [
"abc",
"xyz"
]
},
{
"id": "15",
"_text_": [
"abc",
"xyz"
]
},
{
"id": "16",
"_text_": [
"abc",
"xyz"
]
},
{
"id": "17",
"_text_": [
"xyz",
"123"
]
},
{
"id": "18",
"_text_": [
"def",
"123",
"xyz"
]
},
{
"id": "19",
"_text_": [
"def",
"123"
]
}
]
==
 default operator is set to AND


my query is:
http://localhost:8983/solr/new_core/select?debug.explain.structured=true&debugQuery=on&q=%7B!q.op%3DAND%7D%20%2Babc%20OR%20123&rows=20
the response:
{
"responseHeader":{
  "status":0,
  "QTime":7,
  "params":{
"q":"{!q.op=AND} +abc OR 123",
"rows":"20",
"debug.explain.structured":"true",
"debugQuery":"on"}},
"response":{"numFound":20,"start":0,"docs":[
{
  "id":"3",
  "_version_":1662786291343818752},
{
  "id":"4",
  "_version_":1662786291343818753},
{
  "id":"6",
  "_version_":1662786291344867329},
{
  "id":"7",
  "_version_":1662786291345915904},
{
  "id":"9",
  "_version_":1662786291346964480},
{
  "id":"0",
  "_version_":1662786291339624448},
{
  "id":"1",
  "_version_":1662786291342770176},
{
  "id":"2",
  "_version_":1662786291342770177},
{
  "id":"5",
  "_version_":1662786291344867328},
{
  "id":"8",
  "_version_":1662786291345915905},
{
  "id":"17",
  "_version_":1662786291350110209},
{
  "id":"19",
  "_version_":1662786291351158784},
{
  "id":"18",
  "_version_":1662786291350110210},
{
  "id":"11",
  "_version_":1662786291348013056},
{
  "id":"10",
  "_version_":1662786291346964481},
{
  "id":"12",
  "_version_":1662786291348013057},
{
  "id":"13",
  "_version_":1662786291348013058},
{
  "id":"14",
  "_version_":1662786291349061632},
{
  "id":"15",
  "_version_":1662786291349061633},
{
  "id":"16",
  "_version_":1662786291350110208}]
},
"debug":{
  "rawquerystring":"{!q.op=AND} +abc OR 123",
  "querystring":"{!q.op=AND} +abc OR 123",
  "parsedquery":"_text_:abc _text_:123",
  "parsedquery_toString":"_text_:abc _text_:123",
  "explain":{
"3":{
  "match":true,
  "value":0.29721633,
  "description":"sum of:",
  "details":[{
  "match":true,
  "value":0.08681979,
  "description":"weight(_text_:abc in 3) [SchemaSimilarity],
result of:",
  "details":[{
  "match":true,
  "value":0.08681979,
  "description":"score(freq=1.0), computed as boost * idf *
tf from:"

Re: Required operator (+) is being ignored when using default conjunction operator AND

2020-04-05 Thread Eran Buchnick
Hoss, thanks a lot for the response.
OK, so it seems like I got into to the "uncanny valley" of the search
operators:/
I red your attached blog post (and more) but still the penny hasn't dropped
yet about what causes the operator clash when the default operator is AND.
I red that when q.op=AND, OR will change the left(if not MUST_NOT) and
right clause Occurs to SHOULD - what that means is that the "order of
operations" in this case is giving the infix operator the mandate to
control the prefix operator?
 A little background - I am trying to implement a google search like
service and want to have the ability to have required and prohibit
operators while still allowing default intersection operation as default
operator. How can I achieve this with this limitation?


On Wed, Apr 1, 2020, 20:08 Chris Hostetter  wrote:

>
> : Using solr 8.3.0 it seems like required operator isn't functioning
> properly
> : when default conjunction operator is AND.
>
> You're mixing the "prefix operators" with the "infix operators" which is
> always a recipe for disaster.




>
> The use of q.op=AND vs q.op=OR in these examples only
> complicates the issue because q.op isn't really overriding any sort of
> implicit
> "infix operator" when clauses exist w/o an infix operator between them, it
> is overriding the implicit MUST/SHOULD/MUST_NOT given to each clause as
> parsed ... but in general setting q.op-AND really only makes sense when
> you expect/intend to only be using "infix operators"
>
> This write up i did several years ago is still very accurate -- the bottom
> line is you REALLY don't want to mix infix and prefix operators..
>
> https://lucidworks.com/post/why-not-and-or-and-not/
>
> ...because the results of mixing them really only "make sense" given the
> context that the parser goes left to right (ie: no precedence) and has
> no explicit "prefix" operator syntax for "SHOULD"
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Required operator (+) is being ignored when using default conjunction operator AND

2020-04-11 Thread Eran Buchnick
Hoss, thanks a lot for the informative response. I understood my
misunderstanding with infix and prefix operators. Need to rethink about the
term occurrence support in my search service.

Cheers!

On Mon, Apr 6, 2020, 20:43 Chris Hostetter  wrote:

>
> : I red your attached blog post (and more) but still the penny hasn't
> dropped
> : yet about what causes the operator clash when the default operator is
> AND.
> : I red that when q.op=AND, OR will change the left(if not MUST_NOT) and
> : right clause Occurs to SHOULD - what that means is that the "order of
> : operations" in this case is giving the infix operator the mandate to
> : control the prefix operator?
>
> Not quite anything that complex... sorry, but the blog post was focused
> on
> describe *what* happens when parsing, do explain why mixng prefix/infix is
> bad ... i avoided getting bogged down into *why* it happens exactly the
> way it does.
>
>
> To get to the "why" you have to circle back to the higher level concept
> that the "prefix" operators very closely align to the underlying concepts
> of the BooleanQuery/BooleanClause data structures: that each clause has an
> "Occur" property which is either: MUST/SHOULD/MUST_NOT (or FILTER, but
> setting asside scoring that's functionally equivilent to MUST).
>
> The 'infix' operators just manipulate the Occur property of the clauses on
> either side of them.
>
> 'q.op=AND' and 'q.op=OR' are functionally really about setting the
> "Default Occur Value For All Clauses That Do Not Have An Explicit Occur
> Value" (ie: q.op=Occur.MUST and q.op=Occur.SHOULD) ... where the explicit
> Occur value for each clause would be specified by it's prefix (+=MUST,
> -=MUST_NOT ... there is no supported prefix for SHOULD, which is why
> q.op=SHOULD is the defualt nad chaning it complicates the parser logic)
>
> In essence: After the q.op/default.occur is applied to all clauses (that
> don't already have a prefix), then there is a left to right parsing that
> let's the infix operators modify the "Occur" values of the clauses on
> either side of them -- if those Occur values match the "default" for this
> parser.
>
> So let's imagine 2 requests...
>
> 1)  {!q.op=AND}a +b OR c +d AND e
> 2)  {!q.op=OR} x +y OR z +r AND s
>
> Here's what those wind up looking like internally with the default
> applied...
>
> 1) q.op=MUST:MUST(a)   MUST(b) OR MUST(c)   MUST(d) AND MUST(e)
> 2) q.op=SHOULD:  SHOULD(x) MUST(y) OR SHOULD(z) MUST(r) AND SHOULD(s)
>
> And here's how the infix operators change things as it parses left to
> right building up the clauses...
>
> 1) q.op=MUST:MUST(a)   SHOULD(b) SHOULD(c) MUST(d)  MUST(e)
> 2) q.op=SHOULD:  SHOULD(x) MUST(y)   SHOULD(z) MUST(r)  MUST(s)
>
> It's not actually done in "two passes" -- it's just that as the parsing
> is done left to right, the default Occur is used unless/until set by a
> prefix operators, and infix operators not only set the occur value
> for the "next" clause, but also reach back to override the prior
> Occur value if it matches the Default: because there is no "history" kept
> to indicate that it was explicitly set, or how.  the left to right parsing
> just does the best it can with the context it's got.
>
> :  A little background - I am trying to implement a google search like
> : service and want to have the ability to have required and prohibit
> : operators while still allowing default intersection operation as default
> : operator. How can I achieve this with this limitation?
>
> If you want "intersection" to be the defualt, i'm not sure why you care
> about having a "required" operator? (you didn't mention anything about an
> "optional" operator even though your original example explicitly used
> "OR" ... so not really sure if that was just a contrived example or if you
> actaully care about supporting it?
>
> If you're not hung up on using a specific syntax, you might want to
> consider the "simple" QParser -- it unfortunately re-uses the 'q.op=AND'
> param syntax to indicate what the default Occur should be for clauses, but
> the overall syntax is much simple: there is a prefix negation operator,
> but other wise the infix "+" and "|" operators support boolean AND and OR
> -- there is no prefix operators for MUST/SHOULD.  You can also turn off
> individual operators you don't like...
>
>
> https://lucene.apache.org/solr/guide/8_5/other-parsers.html#OtherParsers-SimpleQueryParser
>
>
> -Hoss
> http://www.lucidworks.com/
>


Can solr index replacement character

2020-11-30 Thread Eran Buchnick
Hi community,
During integration tests with new data source I have noticed weird scenario
where replacement character can't be searched, though, seems to be stored.
I mean, honestly, I don't want that irrelevant data stored in my index but
I wondered if solr can index replacement character (U+FFFD �) as string, if
so, how to search it?
And in general, is there any built-in char filtration?!

Thanks