date:20190606

enhancement of documentation

2019-06-06 Thread Stefan Kärst

Hi list,

I think it's a good idea add the chroot zookeeper folder at the end of:
https://lucene.apache.org/solr/guide/7_0/setting-up-an-external-zookeeper-ensemble.html

"Once these servers are running, you can reference them from Solr just
as you did before:"

bin/solr start -e cloud -z
localhost:2181,localhost:2182,localhost:2183/solrchroot -noprompt"


/solrchroot as an example.

it took me a while to find out by blindly trying. looks like other users
face the same problem (according to google).

the error message in the solr logs isn't really helpful in that case.

besides this. why would someone create a (zookeeper) cluster while
running all server on the same machine??

"bin/solr start -e cloud -z
localhost:2181,zknode1:2181,zknode2:2181/solrchroot -noprompt"

makes more sense. IMHO

Cheers!
Stefan K.

Re: Unexpected behaviour when Solr 6 Admin UI pages are cached and server is Solr 8?

2019-06-06 Thread Colvin Cowie

I've raised https://issues.apache.org/jira/browse/SOLR-13522 - feel free to
update the description as you like

Cheers

On Wed, 5 Jun 2019 at 21:48, Shawn Heisey  wrote:

> On 6/5/2019 2:40 PM, Gus Heck wrote:
> > Experiences that force the user to think about the browser cache are
> > sub-par :). Anything that changes the URL will interrupt caching so just
> > adding a query parameter &_v=8.1.1 (or whatever) to every request would
> > probably do the trick, there's no need to mess with file names or file
> > locations IF the UI can easily do such a thing. One could write
> javascript
> > to find all the src/href etc on the page and append... as for whether
> > that's easy in our actual UI, I don't know. haven't tried to work with it
> > yet.
>
> I agree, we definitely don't want it to become necessary for the user to
> complete a complex action like clearing their cache.
>
> A parameter like what you describe is already on some of the requests
> ... but not all.  Here are two requests that I see from the browser:
>
> http://localhost:8983/solr/css/angular/segments.css?_=8.1.0
> http://localhost:8983/solr/libs/angular.js
>
> So if we can figure out how to get that parameter applied to more
> requests, it would hopefully solve this.
>
> Thanks,
> Shawn
>

Streaming expression function which can give parent document along with its child documents ?

2019-06-06 Thread Jai Jamba

Hi,
I was wondering if there is any way in streaming expression which can return
parent document along with its child documents just like we have child
document transformer, i.e. ** [child limit=-1]* which returns child document
as well of the parent document

-
Jai



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Exclude update & read from authentication.

2019-06-06 Thread Mannar mannan

Hi Team,

Iam setting up basic authentication and authorization in Solr 7.7.1. Need
to exclude read & update (predefined permission) from authentication.

In solr 7.7.1 created a user to access console & another user for
dataimport purpose, with basicauth plugin. Have to access(update, read) the
index without authentiaction. Kindly check my security.json file.

{
  "authentication":{"blockUnknown":true, "class":"solr.BasicAuthPlugin",
"credentials":{
  "solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0=
Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c=",
  "testuser":"Z9MhKDHQvGqugG57RRpLdvguQrDXHKKJUnsyyw1909k=
LvXPDZMr4ePt6QWKK+MFoQyxrAM3EXSUWosSseqyFhU=",
  "testuser1":"qx3kJ+XdXdMVEf1Kn9lw1ZU8VpSkf2jc7KZWrYyqLbc=
vmr3o5L+zVprh8G+9+6/vFpd08z6VpPoOVgsMPMHEAQ="},
"":{"v":42}},
  "authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[{"name":"security-edit", "role":"admin", "index":1},
{"name":"schema-read", "role":["admin", "prod", "guest"], "index":2},
{"name":"config-read", "role":["admin", "prod", "guest"], "index":3},
{"name":"collection-admin-edit", "role":["admin", "prod"], "index":4},
{"name":"system-coll-import", "collection":"*", "path":"/dataimport/*",
"role":["admin", "prod"], "index":5}],
"user-role":{"admin":"admin", "testuser":"guest", "testuser1":"prod"},
"":{"v":46}}}

Have to update the index and select from index without authentication.
Kindly let me know the possible way.

Re: Query takes a long time Solr 6.1.0

2019-06-06 Thread vishal patel

Thanks for your reply.

> How much index data is on one server with 256GB of memory?  What is the
> max heap size on the Solr instance?  Is there only one Solr instance?

One server(256GB RAM) has two below Solr instance and other application also
1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)

The second server(256GB RAM and 1 TB storage) has two below Solr instance and 
other application also
1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)

Both server memory and disk usage:
https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5

Note: Average 40GB heap used normally in each Solr instance. when replica gets 
down at that time disk IO are high and also GC pause time above 15 seconds. We 
can not identify the exact issue of replica recovery OR down from logs. due to 
the GC pause? OR due to disk IO high? OR due to time-consuming query? OR due to 
heavy indexing?

Regards,
Vishal

From: Shawn Heisey 
Sent: Wednesday, June 5, 2019 7:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

On 6/5/2019 7:08 AM, vishal patel wrote:
> I have attached RAR file but not attached properly. Again attached txt file.
>
> For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram
> and 1 TB storage. One shard and another shard replica in one server.

You got lucky.  Even text files usually don't make it to the list --
yours did this time.  Use a file sharing website in the future.

That is a massive query.  The primary reason that Lucene defaults to a
maxBooleanClauses value of 1024, which you are definitely exceeding
here, is that queries with that many clauses tend to be slow and consume
massive levels of resources.  It might not be possible to improve the
query speed very much here if you cannot reduce the size of the query.

Your query doesn't look like it is simple enough to replace with the
terms query parser, which has better performance than a boolean query
with thousands of "OR" clauses.

How much index data is on one server with 256GB of memory?  What is the
max heap size on the Solr instance?  Is there only one Solr instance?

The screenshot mentioned here will most likely relay all the info I am
looking for.  Be sure the sort is correct:

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

You will not be able to successfully attach the screenshot to a message.
  That will require a file sharing website.

Thanks,
Shawn

Suggest highlight is not working with context filter query

2019-06-06 Thread Ritesh Kumar

Hello Team,

I am not able to get highlighted terms from the Suggest component when
using a context filter query. My definition of the suggest search component
looks as follows.



  mySuggester
  BlendedInfixLookupFactory
  position_linear
  DocumentDictionaryFactory
  fieldName
  contextFieldName
  
  textSuggest
  lowercase
  suggest
  false
  false
  true



The BlendedInfixLookupFactory does support context filtering. However,
after doing a bit of finding it appears, this issue is a bug. Please, refer
to this ticket .

Is there any way I will be able to get both context filtering and
highlighted response?

Thanks,
Ritesh Kumar

Issues with the handling of NULLs in Streaming Expressions

2019-06-06 Thread Oleksandr Chornyi

Hi guys!

I'm working on a generic query builder for Streaming Expressions which
allows building various requests containing row level expressions (i.e.
evaluators), aggregations/metrics, sorts, etc. On this way, I bumped into
many issues related to the handling of NULL values by the engine. Here are
the issues in the descending order of their severity (from my standpoint):

1. *There is no way to check if a value in a tuple is NULL* because
*eq* function
fails to accept *null *as an argument:

> *eq(1,null) *

fails with

> "Unable to check eq(...) because a null value was found"

even though the documentation says

that "If any any parameters are null and there is at least one parameter
that is not null then false will be returned."
This issue makes it impossible to evaluate an expression from the *if* function
documentation

:

> if(eq(fieldB,null), null, div(fieldA,fieldB)) // if fieldB is null then
> null else fieldA / fieldB

I think that the root cause of the issue is coming from the fact that
*EqualToEvaluator* extends *RecursiveBooleanEvaluator* which checks that
none of the arguments is *null*, but I don't think that's what we want
here. *Can you confirm that what I see is a bug and I should file it?*

2. The fact that *FieldValueEvaluator returns a field name when a value is
null* breaks any evaluator/decorator which otherwise would handle
*nulls*. Consider
these examples (I'm using *cartesianProduct *on an integer array to get
several tuples with integers because I couldn't find a way to do so
directly):

> cartesianProduct(
> tuple(a=array(1,null,3)),
> a
> )

returns values preserving *nulls: *

> "docs": [
>   {"a": 1},
>   {"a": null},
>   {"a": 3},
> ...]

If I just execute *add(1, null) *it works as expected and returns *null.* Now,
if I'm trying to apply any stream evaluator which should work fine with
*nulls* to this stream:

> select(
> cartesianProduct(
> tuple(a=array(1,null,3)),
> a
> ),
> add(a, 1) as a
> )

it fails to process the second record saying that:

> "docs": [
>   {"a": 2},
>   {
> "EXCEPTION": "Failed to evaluate expression add(a,val(1)) - Numeric
> value expected but found type java.lang.String for value a",
> ...
>   }
> ]

It looks even more confusing when running the following query:

> select(
> cartesianProduct(
> tuple(a=array(1,null,3)),
> a
> ),
> coalesce(a, 42) as a
> )

produces

> "docs": [
>   {"a": 1},
>   {"a": "a"},
>   {"a": 3},
> ...]

 instead of

> "docs": [
>   {"a": 1},
>   {"a": *42*},
>   {"a": 3},
> ...]

As I mentioned in the issue description, I think the issue lies in these
lines of *FieldValueEvaluator:*

> if(value == null) {
>return fieldName;
> }

I consider this to be very counterintuitive. *Can you confirm that this is
a bug, rather than a designed feature?*

3. *Most Boolean Stream Evaluators* state that they *don't work with
NULLs.* However,
it's very inconvenient and there is no other way to work around it (see
item #1)*. *I'm talking about the following evaluators: *and, eor, or, gt,
lt, gteq, lteq. *At the moment these evaluators just throw exceptions when
an argument is *null. **Have you considered making their behavior more
SQL-like?* When the behavior is like this:

   - *gt, lt, gteq, lteq *evaluators return *null* if any of the arguments
   is *null*
   - *or(true, null)* returns *true*
   - *and(true, null)* returns *false*
   - *having* decorator treats *null* returned by *booleanEvaluator* as
   *false*

4. Some *inconsistencies in evaluators behavior* and/or documentation:

   - *div(1, null)* fails while *mult(1, null), add(1, null), sub(1, null)*
   return *null*. *Should I file a bug for div?*
   - documentation for *not *says that "The function will fail to execute
   if the parameter is non-boolean or null" however it returns *null*
for *not(null).
   **Should I create a task to fix the doc?*

I know I mixed many questions into one thread, however for me they are all
interrelated. Thank in advance for your help.

-- 
Best Regards,
Alex Chornyi

RE: query parsed in different ways in two identical solr instances

2019-06-06 Thread Danilo Tomasoni

Hello, and thank you for your answer.
Attached you will find the two logs for the working solr1 server, and the 
non-working solr-test server.


Danilo Tomasoni


Fondazione The Microsoft Research - University of Trento Centre for 
Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
tomas...@cosbi.eu
http://www.cosbi.eu

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatment in the 
respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to


From: Shawn Heisey [apa...@elyograg.org]
Sent: 05 June 2019 17:52
To: solr-user@lucene.apache.org
Subject: Re: query parsed in different ways in two identical solr instances

On 6/5/2019 8:41 AM, Danilo Tomasoni wrote:
> Hello,
> I have two solr instances with exactly the same configuration.
> The only difference that i know is that the first (the working one, is solr 
> 7.3.0,
> while the one that's not working is solr 7.3.1)
>
> If I execute the same query (with debugQuery=on) it gets parsed in different 
> ways on the two systems and I don't understand why.

Look in solr.log.  The full query, including parameters that are used
but not on the URL, will be shown there.  Provide that whole line from
both versions.

An example of the kind of line you need to find, with a very simple
query, is below:

2019-06-05 15:50:23.691 INFO  (qtp1264413185-43) [   x:foo]
o.a.s.c.S.Request [foo]  webapp=/solr path=/select
params={q=*:*&_=1559749821933} hits=0 status=0 QTime=38

If your index has multiple shards, there can be multiple lines.  In that
situation, we need the last one, which should be the main query itself
rather than the subqueries.

Thanks,
Shawn
2019-06-06 11:27:39.303 INFO  (qtp2114889273-802) [   x:COSBIBioIndex] o.a.s.c.S.Request [COSBIBioIndex]  webapp=/solr path=/select params={f.f2.qf=medline_chemical_terms+medline_mesh_terms&df=abstract_results^2+body_unnamed^1+titles^3+sponsors^1+keywords^10+abstracts^2+body_subjects^1+medline_mesh_terms^8+abstract_objective^2+body_nonstandard^1+body_materials_and_methods^1+body_results^1+annotation_terms^7+body_conclusions^1+body_ethics^1+abstract_methods^2+body_introduction^1+body_discussion^1+abstract_background^2+body_supplementary_materials^1+medline_chemical_terms^8+abstract_conclusions^2+body_funding^1+annotation_canonical_terms^7&indent=on&fl=*,+score,freq_secondary_ids_PUBMEDPMID12159614:termfreq(secondary_ids,+'PUBMEDPMID12159614'),freq_pmid_PUBMEDPMID12159614:termfreq(pmid,+'PUBMEDPMID12159614'),freq_doi_PUBMEDPMID12159614:termfreq(doi,+'PUBMEDPMID12159614'),freq_pmc_PUBMEDPMID12159614:termfreq(pmc,+'PUBMEDPMID12159614'),freq_id_PUBMEDPMID12159614:termfreq(id,+'PUBMEDPMID12159614'),freq_source_id_PUBMEDPMID12159614:termfreq(source_id,+'PUBMEDPMID12159614'),freq_manuscript_id_PUBMEDPMID12159614:termfreq(manuscript_id,+'PUBMEDPMID12159614'),freq_other_id_PUBMEDPMID12159614:termfreq(other_id,+'PUBMEDPMID12159614'),freq_publication_id_PUBMEDPMID12159614:termfreq(publication_id,+'PUBMEDPMID12159614'),freq_medline_mesh_terms_Models:termfreq(medline_mesh_terms,+'Models'),freq_medline_chemical_terms_Models:termfreq(medline_chemical_terms,+'Models'),freq_medline_mesh_terms_Molecular:termfreq(medline_mesh_terms,+'Molecular'),freq_medline_chemical_terms_Molecular:termfreq(medline_chemical_terms,+'Molecular'),freq_medline_mesh_terms_Models:termfreq(medline_mesh_terms,+'Models'),freq_medline_chemical_terms_Models:termfreq(medline_chemical_terms,+'Models'),freq_medline_mesh_terms_Theoretical:termfreq(medline_mesh_terms,+'Theoretical'),freq_medline_chemical_terms_Theoretical:termfreq(medline_chemical_terms,+'Theoretical'),freq_medline_mesh_terms_Models:termfreq(medline_mesh_terms,+'Models'),freq_medline_chemical_terms_Models:termfreq(medline_chemical_terms,+'Models'),freq_medline_mesh_terms_Statistical:termfreq(medline_mesh_terms,+'Statistical'),freq_medline_chemical_terms_Statistical:termfreq(medline_chemical_terms,+'Statistical'),freq_medline_mesh_terms_Models:termfreq(medline_mesh_terms,+'Models'),freq_medline_chemical_terms_Models:termfreq(medline_chemical_terms,+'Models'),freq_medline_mesh_terms_Immunological:termfreq(medline_mesh_terms,+'Immunological'),freq_medline_chemical_terms_Immunological:termfreq(medline_chemical_terms,+'Immunological'),freq_medline_mesh_terms_Molecular:termfreq(medline_mesh_terms,+'Molecular'),freq_medline_chemical_terms_Molecular:termfreq(medline_chemical_term

Re: query parsed in different ways in two identical solr instances

2019-06-06 Thread Alexandre Rafalovitch

Those two queries look same after sorting the parameters, yet the
results are clearly different. That means the difference is deeper.

1) Have you checked that both collections have the same amount of
documents (e.g. mismatched final commit). Does basic "query=*:*"
return the same counts in the same initial order?
2) Are you absolutely sure you are comparing 7.3.0 with 7.3.1? There
was SOLR-11501 that may be relevant, but it was fixed in 7.2:
https://issues.apache.org/jira/browse/SOLR-11501

Regards,
   Alex.

Are you absolutely sure that your instances are 7.3.0 and 7.3.1?

On Thu, 6 Jun 2019 at 09:26, Danilo Tomasoni  wrote:
>
> Hello, and thank you for your answer.
> Attached you will find the two logs for the working solr1 server, and the 
> non-working solr-test server.
>
>
> Danilo Tomasoni
>
>
> Fondazione The Microsoft Research - University of Trento Centre for 
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu
> http://www.cosbi.eu
>
> As for the European General Data Protection Regulation 2016/679 on the 
> protection of natural persons with regard to the processing of personal data, 
> we inform you that all the data we possess are object of treatment in the 
> respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how; you 
> may ask for their correction, cancellation or you may oppose to their use by 
> written request sent by recorded delivery to The Microsoft Research – 
> University of Trento Centre for Computational and Systems Biology Scarl, 
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
>
> 
> From: Shawn Heisey [apa...@elyograg.org]
> Sent: 05 June 2019 17:52
> To: solr-user@lucene.apache.org
> Subject: Re: query parsed in different ways in two identical solr instances
>
> On 6/5/2019 8:41 AM, Danilo Tomasoni wrote:
> > Hello,
> > I have two solr instances with exactly the same configuration.
> > The only difference that i know is that the first (the working one, is solr 
> > 7.3.0,
> > while the one that's not working is solr 7.3.1)
> >
> > If I execute the same query (with debugQuery=on) it gets parsed in 
> > different ways on the two systems and I don't understand why.
>
> Look in solr.log.  The full query, including parameters that are used
> but not on the URL, will be shown there.  Provide that whole line from
> both versions.
>
> An example of the kind of line you need to find, with a very simple
> query, is below:
>
> 2019-06-05 15:50:23.691 INFO  (qtp1264413185-43) [   x:foo]
> o.a.s.c.S.Request [foo]  webapp=/solr path=/select
> params={q=*:*&_=1559749821933} hits=0 status=0 QTime=38
>
> If your index has multiple shards, there can be multiple lines.  In that
> situation, we need the last one, which should be the main query itself
> rather than the subqueries.
>
> Thanks,
> Shawn

RE: query parsed in different ways in two identical solr instances

2019-06-06 Thread Danilo Tomasoni

The two collections are not identical, many overlapping documents but with some 
different field names (test has also extra fields that 1 didn't have).
Actually we have 42.000.000 docs in solr1, and 40.000.000 in solr-test, but I 
think this shouldn'd be relevant because the query is basically like

id=x AND mesh=list of phrase queries

where the second part of the and is handled through a nested query (_query_ 
magic keyword).

I expect that a query like this one would return 1 documents (x) or 0 documents.

The thing that puzzles me is that on solr1 the engine is returning 1 document 
(x)
while on test the engine is returning 68.000 documents.. 
If you look at my first e-mail you will notice that in the correct engine the 
parsed query is like

+(+(...) +(...))

That is correct for an AND

while in the test engine the query is parsed like

+((...) (...))

which is more like an OR...

Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for 
Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
tomas...@cosbi.eu
http://www.cosbi.eu

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatment in the 
respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to

From: Alexandre Rafalovitch [arafa...@gmail.com]
Sent: 06 June 2019 15:53
To: solr-user
Subject: Re: query parsed in different ways in two identical solr instances

Those two queries look same after sorting the parameters, yet the
results are clearly different. That means the difference is deeper.

1) Have you checked that both collections have the same amount of
documents (e.g. mismatched final commit). Does basic "query=*:*"
return the same counts in the same initial order?
2) Are you absolutely sure you are comparing 7.3.0 with 7.3.1? There
was SOLR-11501 that may be relevant, but it was fixed in 7.2:
https://issues.apache.org/jira/browse/SOLR-11501

Regards,
   Alex.

Are you absolutely sure that your instances are 7.3.0 and 7.3.1?

On Thu, 6 Jun 2019 at 09:26, Danilo Tomasoni  wrote:
>
> Hello, and thank you for your answer.
> Attached you will find the two logs for the working solr1 server, and the 
> non-working solr-test server.
>
>
> Danilo Tomasoni
>
>
> Fondazione The Microsoft Research - University of Trento Centre for 
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu
> http://www.cosbi.eu
>
> As for the European General Data Protection Regulation 2016/679 on the 
> protection of natural persons with regard to the processing of personal data, 
> we inform you that all the data we possess are object of treatment in the 
> respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how; you 
> may ask for their correction, cancellation or you may oppose to their use by 
> written request sent by recorded delivery to The Microsoft Research – 
> University of Trento Centre for Computational and Systems Biology Scarl, 
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
>
> 
> From: Shawn Heisey [apa...@elyograg.org]
> Sent: 05 June 2019 17:52
> To: solr-user@lucene.apache.org
> Subject: Re: query parsed in different ways in two identical solr instances
>
> On 6/5/2019 8:41 AM, Danilo Tomasoni wrote:
> > Hello,
> > I have two solr instances with exactly the same configuration.
> > The only difference that i know is that the first (the working one, is solr 
> > 7.3.0,
> > while the one that's not working is solr 7.3.1)
> >
> > If I execute the same query (with debugQuery=on) it gets parsed in 
> > different ways on the two systems and I don't understand why.
>
> Look in solr.log.  The full query, including parameters that are used
> but not on the URL, will be shown there.  Provide that whole line from
> both versions.
>
> An example of the kind of line you need to find, with a very simple
> query, is below:
>
> 2019-06-05 15:50:23.691 INFO  (qtp1264413185-43) [   x:foo]
> o.a.s.c.S.Request [foo]  webapp=/solr path=/select
> params={q=*:*&_=1559749821933} hits=0 status=0 QTime=38
>
> If your index has multiple shards, there can be multiple lines.  In that
> situation, we need the last one, which should be the main query itself
> rather than the subqueries.

RE: query parsed in different ways in two identical solr instances

2019-06-06 Thread Danilo Tomasoni

ah yes, I'm sure we are using solr 7.3.1 as test (non working) and solr 7.3.0 
as 1 (working)

7.3.0 98a6b3d642928b1ac9076c6c5a369472581f7633 - woody - 2018-03-28 14:37:45

vs 

7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 
09:30:57

Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for 
Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
tomas...@cosbi.eu
http://www.cosbi.eu

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatment in the 
respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to


From: Alexandre Rafalovitch [arafa...@gmail.com]
Sent: 06 June 2019 15:53
To: solr-user
Subject: Re: query parsed in different ways in two identical solr instances

Those two queries look same after sorting the parameters, yet the
results are clearly different. That means the difference is deeper.

1) Have you checked that both collections have the same amount of
documents (e.g. mismatched final commit). Does basic "query=*:*"
return the same counts in the same initial order?
2) Are you absolutely sure you are comparing 7.3.0 with 7.3.1? There
was SOLR-11501 that may be relevant, but it was fixed in 7.2:
https://issues.apache.org/jira/browse/SOLR-11501

Regards,
   Alex.

Are you absolutely sure that your instances are 7.3.0 and 7.3.1?

On Thu, 6 Jun 2019 at 09:26, Danilo Tomasoni  wrote:
>
> Hello, and thank you for your answer.
> Attached you will find the two logs for the working solr1 server, and the 
> non-working solr-test server.
>
>
> Danilo Tomasoni
>
>
> Fondazione The Microsoft Research - University of Trento Centre for 
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu
> http://www.cosbi.eu
>
> As for the European General Data Protection Regulation 2016/679 on the 
> protection of natural persons with regard to the processing of personal data, 
> we inform you that all the data we possess are object of treatment in the 
> respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how; you 
> may ask for their correction, cancellation or you may oppose to their use by 
> written request sent by recorded delivery to The Microsoft Research – 
> University of Trento Centre for Computational and Systems Biology Scarl, 
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
>
> 
> From: Shawn Heisey [apa...@elyograg.org]
> Sent: 05 June 2019 17:52
> To: solr-user@lucene.apache.org
> Subject: Re: query parsed in different ways in two identical solr instances
>
> On 6/5/2019 8:41 AM, Danilo Tomasoni wrote:
> > Hello,
> > I have two solr instances with exactly the same configuration.
> > The only difference that i know is that the first (the working one, is solr 
> > 7.3.0,
> > while the one that's not working is solr 7.3.1)
> >
> > If I execute the same query (with debugQuery=on) it gets parsed in 
> > different ways on the two systems and I don't understand why.
>
> Look in solr.log.  The full query, including parameters that are used
> but not on the URL, will be shown there.  Provide that whole line from
> both versions.
>
> An example of the kind of line you need to find, with a very simple
> query, is below:
>
> 2019-06-05 15:50:23.691 INFO  (qtp1264413185-43) [   x:foo]
> o.a.s.c.S.Request [foo]  webapp=/solr path=/select
> params={q=*:*&_=1559749821933} hits=0 status=0 QTime=38
>
> If your index has multiple shards, there can be multiple lines.  In that
> situation, we need the last one, which should be the main query itself
> rather than the subqueries.
>
> Thanks,
> Shawn

Re: SolrCloud indexing triggers merges and timeouts

2019-06-06 Thread Rahul Goswami

Thank you for your responses. Please find additional details about the
setup below:

We are using Solr 7.2.1

> I have a solrcloud setup on Windows server with below config:
> 3 nodes,
> 24 shards with replication factor 2
> Each node hosts 16 cores.

16 CPU cores, or 16 Solr cores?  The info may not be all that useful
either way, but just in case, it should be clarified.

*OP Reply:* 16 Solr cores (i.e. replicas)

> Index size is 1.4 TB per node
> Xms 8 GB , Xmx 24 GB
> Directory factory used is SimpleFSDirectoryFactory

How much total memory in the server?  Is there other software using
significant levels of memory?

*OP Reply* : Total 48 GB per node... I couldn't see another software using
a lot of memory.
I am honestly not sure about the reason for change of directory factory to
SimpleFSDirectoryFactory. But I was told that with mmap at one point we
started to see the shared memory usage on Windows go up significantly,
intermittently freezing the system.
Could the choice of DirectoryFactory here be a factor for the long
updates/frequent merges?

> How many total documents (maxDoc, not numDoc) are in that 1.4 TB of
space?
*OP Reply:* Also, there are nearly 12.8 million total docs (maxDoc, NOT
numDoc) in that 1.4 TB space

> Can you share the GC log that Solr writes?
*OP Reply:*  Please find the GC logs and thread dumps at this location
https://drive.google.com/open?id=1slsYkAcsH7OH-7Pma91k6t5T72-tIPlw

Another observation is that the CPU usage reaches around 70% (through
manual monitoring) when the indexing starts and the merges are observed. It
is well below 50% otherwise.

Also, should something be altered with the mergeScheduler setting ?
"mergeScheduler":{
"class":"org.apache.lucene.index.ConcurrentMergeScheduler",
"maxMergeCount":2,
"maxThreadCount":2},

Thanks,
Rahul


On Wed, Jun 5, 2019 at 4:24 PM Shawn Heisey  wrote:

> On 6/5/2019 9:39 AM, Rahul Goswami wrote:
> > I have a solrcloud setup on Windows server with below config:
> > 3 nodes,
> > 24 shards with replication factor 2
> > Each node hosts 16 cores.
>
> 16 CPU cores, or 16 Solr cores?  The info may not be all that useful
> either way, but just in case, it should be clarified.
>
> > Index size is 1.4 TB per node
> > Xms 8 GB , Xmx 24 GB
> > Directory factory used is SimpleFSDirectoryFactory
>
> How much total memory in the server?  Is there other software using
> significant levels of memory?
>
> Why did you opt to change the DirectoryFactory away from Solr's default?
>   The default is chosen with care ... any other choice will probably
> result in lower performance.  The default in recent versions of Solr is
> NRTCachingDirectoryFactory, which uses MMap for file access.
>
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> The screenshot described here might become useful for more in-depth
> troubleshooting:
>
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Process_listing_on_Windows
>
> How many total documents (maxDoc, not numDoc) are in that 1.4 TB of space?
>
> > The cloud is all nice and green for the most part. Only when we start
> > indexing, within a few seconds, I start seeing Read timeouts and socket
> > write errors and replica recoveries thereafter. We are indexing in 2
> > parallel threads in batches of 50 docs per update request. After
> examining
> > the thread dump, I see segment merges happening. My understanding is that
> > this is the cause, and the timeouts and recoveries are the symptoms. Is
> my
> > understanding correct? If yes, what steps could I take to help the
> > situation. I do see that the difference between "Num Docs" and "Max Docs"
> > is about 20%.
>
> Segment merges are a completely normal part of Lucene's internal
> operation.  They should never cause problems like you have described.
>
> My best guess is that a 24GB heap is too small.  Or possibly WAY too
> large, although with the index size you have mentioned, that seems
> unlikely.
>
> Can you share the GC log that Solr writes?  The problem should occur
> during the timeframe covered by the log, and the log should be as large
> as possible.  You'll need to use a file sharing site -- attaching it to
> an email is not going to work.
>
> What version of Solr?
>
> Thanks,
> Shawn
>

Re: Loading pre created index files into MiniSolrCloudCluster of test framework

2019-06-06 Thread Pratik Patel

Thanks for the reply Alexandre, only special thing about JSON/XML is that
in order to export the data in that form, I need to have "docValues"
enabled for all the fields which are to be retrieved. I need to retrieve
all the fields and I can not enable docValues on all fields.
If there was a way to export data in JSON format without having to change
schema and index then I would have no issues with JSON.
I can not use "select" handler as it does not include parent/child
relationships.

The options I have are following I guess. I am not sure if they are real
possibilities though.

1. Find a way to load pre-created index files either through
SolrCloudClient or directly to ZK
2. Find a way to export the data in JSON format without having to make all
fields docValues enabled.
3. Use Merge Index tool with an empty index and a real index. I am don't
know if it is possible to do this through solrJ though.

Please let me know if there is better way available, it would really help.
Just so you know, I am trying to do this for unit tests related to solr
queries. Ultimately I want to load some pre-created data into
MiniSolrCloudCluster.

Thanks a lot,
Pratik

On Wed, Jun 5, 2019 at 6:56 PM Alexandre Rafalovitch 
wrote:

> Is there something special about parent/child blocks you cannot do through
> JSON? Or XML?
>
> Both Solr XML and Solr JSON support it.
>
> New style parent/child mapping is also supported in latest Solr but I think
> it is done differently.
>
> Regards,
> Alex
>
> On Wed, Jun 5, 2019, 6:29 PM Pratik Patel,  wrote:
>
> > Hello Everyone,
> >
> > I am trying to write some unit tests for solr queries which requires some
> > data in specific state. There is a way to load this data through json
> files
> > but the problem is that the required data needs to have parent-child
> blocks
> > to be present.
> > Because of this, I would prefer if there is a way to load pre-created
> index
> > files into the cluster.
> > I checked the solr test framework and related examples but couldn't find
> > any example of index files being loaded in cloud mode.
> >
> > Is there a way to load index files into solr running in cloud mode?
> >
> > Thanks!
> > Pratik
> >
>

Re: Solr test framework not able to upload configuration to zk and fails with KeeperException

2019-06-06 Thread Pratik Patel

Thanks guys, I found that the issue I had was because of some binary files
(NLP models) in my configuration. Once I fixed that, I was able to set up a
cluster. These exceptions are still logged but they are logged as INFO and
were not the real issue.

Thanks Again
Pratik

On Tue, Jun 4, 2019 at 4:15 PM Angie Rabelero 
wrote:

> For what I know the configuration files need to be already in the
> test/resource directory before runnin. I copy them to the directory using a
> maven maven-antrun-plugin in the generate-test-sources phase. And the
> framework can "create a collection” without the configfiles, but it will
> obviously fail when try to use it.
>
>
> On the surface, this znode already exists:
>
> /solr/configs/collection2
>
> So it looks like somehow you're
>
> > On Jun 4, 2019, at 12:29 PM, Pratik Patel  pra...@semandex.net>> wrote:
> >
> > /solr/configs/collection2
>
> > On Jun 4, 2019, at 14:29, Pratik Patel  wrote:
> >
> > Hello Everyone,
> >
> > I am trying to run a simple unit test using solr test framework. At this
> > point, all I am trying to achieve is to be able to upload some
> > configuration and create a collection using solr test framework.
> >
> > Following is the simple code which I am trying to run.
> >
> > private static final String COLLECTION = "collection2" ;
> >
> > private static final int numShards = 1;
> > private static final int numReplicas = 1;
> > private static final int maxShardsPerNode = 1;
> > private static final int nodeCount = (numShards*numReplicas +
> > (maxShardsPerNode-1))/maxShardsPerNode;
> >
> > private static final String id = "id";
> > private static final String CONFIG_DIR =
> > "src/test/resources/testdata/solr/collection2";
> >
> > @BeforeClass
> > public static void setupCluster() throws Exception {
> >
> >// create and configure cluster
> >configureCluster(nodeCount)
> >.addConfig("collection2", getFile(CONFIG_DIR).toPath())
> >.configure();
> >
> >// create an empty collection
> >CollectionAdminRequest.createCollection(COLLECTION, "collection2",
> > numShards, numReplicas)
> >.setMaxShardsPerNode(maxShardsPerNode)
> >.process(cluster.getSolrClient());
> >
> >// add further document(s) here
> >// TODO
> > }
> >
> >
> > However, I see that solr fails to upload the configuration to zk.
> > Following method of ZooKeeper class fails with the "KeeperException"
> >
> > public String create(final String path, byte data[], List acl,
> >CreateMode createMode)
> >throws KeeperException, InterruptedException
> > {
> >final String clientPath = path;
> >PathUtils.validatePath(clientPath, createMode.isSequential());
> >
> >final String serverPath = prependChroot(clientPath);
> >
> >RequestHeader h = new RequestHeader();
> >h.setType(ZooDefs.OpCode.create);
> >CreateRequest request = new CreateRequest();
> >CreateResponse response = new CreateResponse();
> >request.setData(data);
> >request.setFlags(createMode.toFlag());
> >request.setPath(serverPath);
> >if (acl != null && acl.size() == 0) {
> >throw new KeeperException.InvalidACLException();
> >}
> >request.setAcl(acl);
> >ReplyHeader r = cnxn.submitRequest(h, request, response, null);
> >if (r.getErr() != 0) {
> >throw KeeperException.create(KeeperException.Code.get(r.getErr()),
> >clientPath);
> >}
> >if (cnxn.chrootPath == null) {
> >return response.getPath();
> >} else {
> >return response.getPath().substring(cnxn.chrootPath.length());
> >}
> > }
> >
> >
> > And following are the Keeper exceptions thrown for each file of the
> > configuration.
> >
> > Basically, it says
> > Got user-level KeeperException when processing sessionid: Error
> > Path:/solr/configs Error:KeeperErrorCode = NodeExists for /solr/configs
> >
> >
> **
> > 2019-06-04T15:07:01,157 [ProcessThread(sid:0 cport:50192):] INFO
> > org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
> > KeeperException when processing sessionid:0x1003ec815f30007 type:create
> > cxid:0xe zxid:0x40 txntype:-1 reqpath:n/a Error Path:/solr/configs
> > Error:KeeperErrorCode = NodeExists for /solr/configs
> > 2019-06-04T15:07:01,158 [ProcessThread(sid:0 cport:50192):] INFO
> > org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
> > KeeperException when processing sessionid:0x1003ec815f30007 type:create
> > cxid:0xf zxid:0x41 txntype:-1 reqpath:n/a Error
> > Path:/solr/configs/collection2 Error:KeeperErrorCode = NodeExists for
> > /solr/configs/collection2
> > 2019-06-04T15:07:01,158 [ProcessThread(sid:0 cport:50192):] INFO
> > org.apache.zookeeper.server.PrepRequestProcessor - Got user-level
> > KeeperException when processing sessionid:0x1003ec815f30007 type:create
> > cxid:0x10 zxid:0x42 txntype:-1 reqpath:n/a Error
> > Path:

Re: Solr Migration to The AWS Cloud

2019-06-06 Thread Joe Lerner

Ooohh...interesting. Then, presumably there is some way to have what was the
cross-data-center replica become the new "primary"? 

It's getting too easy!

Joe



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: enhancement of documentation

2019-06-06 Thread Erick Erickson

> besides this. why would someone create a (zookeeper) cluster while
> running all server on the same machine??

People wouldn’t in production, the “-e cloud” options without a ZK ensemble is 
just a convenience to get started without setting up a ZK ensemble before you 
can even try out Solr.

> On Jun 6, 2019, at 12:46 AM, Stefan Kärst  wrote:
> 
> Hi list,
> 
> I think it's a good idea add the chroot zookeeper folder at the end of:
> https://lucene.apache.org/solr/guide/7_0/setting-up-an-external-zookeeper-ensemble.html
> 
> "Once these servers are running, you can reference them from Solr just
> as you did before:"
> 
> bin/solr start -e cloud -z
> localhost:2181,localhost:2182,localhost:2183/solrchroot -noprompt"
> 
> 
> /solrchroot as an example.
> 
> it took me a while to find out by blindly trying. looks like other users
> face the same problem (according to google).
> 
> the error message in the solr logs isn't really helpful in that case.
> 
> besides this. why would someone create a (zookeeper) cluster while
> running all server on the same machine??
> 
> "bin/solr start -e cloud -z
> localhost:2181,zknode1:2181,zknode2:2181/solrchroot -noprompt"
> 
> makes more sense. IMHO
> 
> Cheers!
> Stefan K.

Odd error with Solr 8 log / ingestion

2019-06-06 Thread Erie Data Systems

Hello everyone,

I recently setup Solr 8 in SolrCloud mode, previously I was using
standalone mode and was able to easily push 10,000 records in per HTTP call
wit autocommit. Ingestion occurs when server A pushes (HTTPS) payload to
server B (SolrCloud) on LAN network.

However, once converted to SolrCloud (1 node, 3 shards, 1 replica) I am
seeing the following error :

ConcurrentUpdateHttp2SolrClient
Error consuming and closing http response stream.

Im wondering, what possibly causes could be, im not seeing much
documentation online specific to Solr.

Thanks in advance for any assistance,
Craig

Re: Odd error with Solr 8 log / ingestion

2019-06-06 Thread Erick Erickson

Probably your packet size is too big for the Solr<->Solr default settings. 
Quick test would be to try sending 10 docs per packet, then 100, then 1,000 etc.

There’s not much to be gained efficiency-wise once you get past 100 docs/shard, 
see: https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/

Second, you’ll get improved throughput if you use SolrJ rather than a straight 
HTTP connection, but your setup may not be amenable to that alternative.

Best,
Erick

> On Jun 6, 2019, at 11:23 AM, Erie Data Systems  wrote:
> 
> Hello everyone,
> 
> I recently setup Solr 8 in SolrCloud mode, previously I was using
> standalone mode and was able to easily push 10,000 records in per HTTP call
> wit autocommit. Ingestion occurs when server A pushes (HTTPS) payload to
> server B (SolrCloud) on LAN network.
> 
> However, once converted to SolrCloud (1 node, 3 shards, 1 replica) I am
> seeing the following error :
> 
> ConcurrentUpdateHttp2SolrClient
> Error consuming and closing http response stream.
> 
> Im wondering, what possibly causes could be, im not seeing much
> documentation online specific to Solr.
> 
> Thanks in advance for any assistance,
> Craig

Re: Odd error with Solr 8 log / ingestion

2019-06-06 Thread Kevin Risden

Do you see a message about idle timeout? There is a jetty bug with HTTP/2
and idle timeout that causes some stream closing. The jira below says test
error, but I'm pretty sure it could come up in real usage.

* https://issues.apache.org/jira/browse/SOLR-13413
* https://github.com/eclipse/jetty.project/issues/3605

Kevin Risden


On Thu, Jun 6, 2019 at 2:38 PM Erick Erickson 
wrote:

> Probably your packet size is too big for the Solr<->Solr default settings.
> Quick test would be to try sending 10 docs per packet, then 100, then 1,000
> etc.
>
> There’s not much to be gained efficiency-wise once you get past 100
> docs/shard, see:
> https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/
>
> Second, you’ll get improved throughput if you use SolrJ rather than a
> straight HTTP connection, but your setup may not be amenable to that
> alternative.
>
> Best,
> Erick
>
> > On Jun 6, 2019, at 11:23 AM, Erie Data Systems 
> wrote:
> >
> > Hello everyone,
> >
> > I recently setup Solr 8 in SolrCloud mode, previously I was using
> > standalone mode and was able to easily push 10,000 records in per HTTP
> call
> > wit autocommit. Ingestion occurs when server A pushes (HTTPS) payload to
> > server B (SolrCloud) on LAN network.
> >
> > However, once converted to SolrCloud (1 node, 3 shards, 1 replica) I am
> > seeing the following error :
> >
> > ConcurrentUpdateHttp2SolrClient
> > Error consuming and closing http response stream.
> >
> > Im wondering, what possibly causes could be, im not seeing much
> > documentation online specific to Solr.
> >
> > Thanks in advance for any assistance,
> > Craig
>
>

strange behavior

2019-06-06 Thread Wendy2



Hi,

Why "AND" didn't work anymore?  

I use Solr 7.3.1 and edismax parser.
Could someone explain to me why the following query doesn't work any more?  
What could be the cause? Thanks! 

q=audit_author.name:Burley,%20S.K.%20AND%20entity.type:polymer

It worked previously but now returned very lower number of documents. 
I had to use "fq" to make it work correctly:

q=audit_author.name:Burley,%20S.K.&fq=entity.type:polymer&rows=1







--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: strange behavior

2019-06-06 Thread David Hastings

audit_author.name:Burley,%20S.K.

translates to
audit_author.name:Burley, DEFAULT_OPERATOR DEFAULT_FIELD:S.K.




On Thu, Jun 6, 2019 at 2:46 PM Wendy2  wrote:

>
> Hi,
>
> Why "AND" didn't work anymore?
>
> I use Solr 7.3.1 and edismax parser.
> Could someone explain to me why the following query doesn't work any
> more?
> What could be the cause? Thanks!
>
> q=audit_author.name:Burley,%20S.K.%20AND%20entity.type:polymer
>
> It worked previously but now returned very lower number of documents.
> I had to use "fq" to make it work correctly:
>
> q=audit_author.name:Burley,%20S.K.&fq=entity.type:polymer&rows=1
>
>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: strange behavior

2019-06-06 Thread Shawn Heisey


On 6/6/2019 12:46 PM, Wendy2 wrote:

Why "AND" didn't work anymore?

I use Solr 7.3.1 and edismax parser.
Could someone explain to me why the following query doesn't work any more?
What could be the cause? Thanks!

q=audit_author.name:Burley,%20S.K.%20AND%20entity.type:polymer

It worked previously but now returned very lower number of documents.
I had to use "fq" to make it work correctly:

q=audit_author.name:Burley,%20S.K.&fq=entity.type:polymer&rows=1


That should work no problem with edismax.  It would not however work 
properly with dismax, and it would be easy to mix up the two query parsers.


The way you have written your query is somewhat ambiguous, because of 
the space after the comma.  That ambiguity exists in both of the queries 
mentioned, even the one with the fq.


Thanks,
Shawn

Re: strange behavior

2019-06-06 Thread Wendy2

Hi Shawn,

I see. 

I added () and it works now. Thank you very much for your help!

q=audit_author.name:(Burley,%20S.K.)%20AND%20entity.type:polymer&rows=1





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: strange behavior

2019-06-06 Thread Wendy2

Hi David,

I see. It fixed now by adding the ().  Thank you so much!
q=audit_author.name:(Burley,%20S.K.)%20AND%20entity.type:polymer



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Group by and filtering on child documents.

2019-06-06 Thread Mikhail Khludnev

On Wed, Jun 5, 2019 at 12:23 PM Jai Jamba 
wrote:

> Can you help me with the subquery way, i tried that long back but it was
> giving me some exception (can't remember that).
>

Well... me either.


>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Issues with the handling of NULLs in Streaming Expressions

2019-06-06 Thread Joel Bernstein

Interesting questions. I suspect we need to beef up our test cases that
deal with nulls and make sure they behave in a consistent manner.

One of the things that likely needs to be looked at more carefully is how
string literals are handled as opposed to nulls. In some cases I believe if
null is encountered it's treated as a string literal and doesn't preserve
the null. So I think it's worth creating a ticket outlining your findings
and we can think about solutions.

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Jun 6, 2019 at 9:22 AM Oleksandr Chornyi 
wrote:

> Hi guys!
>
> I'm working on a generic query builder for Streaming Expressions which
> allows building various requests containing row level expressions (i.e.
> evaluators), aggregations/metrics, sorts, etc. On this way, I bumped into
> many issues related to the handling of NULL values by the engine. Here are
> the issues in the descending order of their severity (from my standpoint):
>
> 1. *There is no way to check if a value in a tuple is NULL* because
> *eq* function
> fails to accept *null *as an argument:
>
> > *eq(1,null) *
>
> fails with
>
> > "Unable to check eq(...) because a null value was found"
>
> even though the documentation says
> <
> https://lucene.apache.org/solr/guide/7_7/stream-evaluator-reference.html#eq
> >
> that "If any any parameters are null and there is at least one parameter
> that is not null then false will be returned."
> This issue makes it impossible to evaluate an expression from the *if*
> function
> documentation
> <
> https://lucene.apache.org/solr/guide/7_7/stream-evaluator-reference.html#if
> >
> :
>
> > if(eq(fieldB,null), null, div(fieldA,fieldB)) // if fieldB is null then
> > null else fieldA / fieldB
>
> I think that the root cause of the issue is coming from the fact that
> *EqualToEvaluator* extends *RecursiveBooleanEvaluator* which checks that
> none of the arguments is *null*, but I don't think that's what we want
> here. *Can you confirm that what I see is a bug and I should file it?*
>
> 2. The fact that *FieldValueEvaluator returns a field name when a value is
> null* breaks any evaluator/decorator which otherwise would handle
> *nulls*. Consider
> these examples (I'm using *cartesianProduct *on an integer array to get
> several tuples with integers because I couldn't find a way to do so
> directly):
>
> > cartesianProduct(
> > tuple(a=array(1,null,3)),
> > a
> > )
>
> returns values preserving *nulls: *
>
> > "docs": [
> >   {"a": 1},
> >   {"a": null},
> >   {"a": 3},
> > ...]
>
> If I just execute *add(1, null) *it works as expected and returns *null.*
> Now,
> if I'm trying to apply any stream evaluator which should work fine with
> *nulls* to this stream:
>
> > select(
> > cartesianProduct(
> > tuple(a=array(1,null,3)),
> > a
> > ),
> > add(a, 1) as a
> > )
>
> it fails to process the second record saying that:
>
> > "docs": [
> >   {"a": 2},
> >   {
> > "EXCEPTION": "Failed to evaluate expression add(a,val(1)) - Numeric
> > value expected but found type java.lang.String for value a",
> > ...
> >   }
> > ]
>
> It looks even more confusing when running the following query:
>
> > select(
> > cartesianProduct(
> > tuple(a=array(1,null,3)),
> > a
> > ),
> > coalesce(a, 42) as a
> > )
>
> produces
>
> > "docs": [
> >   {"a": 1},
> >   {"a": "a"},
> >   {"a": 3},
> > ...]
>
>  instead of
>
> > "docs": [
> >   {"a": 1},
> >   {"a": *42*},
> >   {"a": 3},
> > ...]
>
> As I mentioned in the issue description, I think the issue lies in these
> lines of *FieldValueEvaluator:*
>
> > if(value == null) {
> >return fieldName;
> > }
>
> I consider this to be very counterintuitive. *Can you confirm that this is
> a bug, rather than a designed feature?*
>
> 3. *Most Boolean Stream Evaluators* state that they *don't work with
> NULLs.* However,
> it's very inconvenient and there is no other way to work around it (see
> item #1)*. *I'm talking about the following evaluators: *and, eor, or, gt,
> lt, gteq, lteq. *At the moment these evaluators just throw exceptions when
> an argument is *null. **Have you considered making their behavior more
> SQL-like?* When the behavior is like this:
>
>- *gt, lt, gteq, lteq *evaluators return *null* if any of the arguments
>is *null*
>- *or(true, null)* returns *true*
>- *and(true, null)* returns *false*
>- *having* decorator treats *null* returned by *booleanEvaluator* as
>*false*
>
> 4. Some *inconsistencies in evaluators behavior* and/or documentation:
>
>- *div(1, null)* fails while *mult(1, null), add(1, null), sub(1, null)*
>return *null*. *Should I file a bug for div?*
>- documentation for *not *says that "The function will fail to execute
>if the parameter is non-boolean or null" however it returns *null*
> for *not(null).
>**Should I create a task to fix the doc?*
>
> I know I mixed many questions into one thread, however for me they are all

Re: Solr Migration to The AWS Cloud

2019-06-06 Thread Jörn Franke

I guess you can do this by switching off the source data center, but you would 
need to look more in your architecture and especially applications that use 
solr to verify this.

It may look easy but I would test it before.

> Am 06.06.2019 um 17:24 schrieb Joe Lerner :
> 
> Ooohh...interesting. Then, presumably there is some way to have what was the
> cross-data-center replica become the new "primary"? 
> 
> It's getting too easy!
> 
> Joe
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Custom cache for Solr Cloud mode

2019-06-06 Thread abhishek



Thanks for the response.

Eric, 
Are you suggesting to download this file from zookeeper, and upload it after
changing ? 

Mikhail,
Thanks. I will try solrCore.SolrConfg.userCacheConfigs option.
Any idea why, CoreContainer->getCores() would be returning empty list for me
?

(CoreAdminRequest.setAction(CoreAdminAction.STATUS);
CoreAdminRequest.process(solrClient); -> gives me list of cores correctly)

-Abhishek




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

enhancement of documentation

Re: Unexpected behaviour when Solr 6 Admin UI pages are cached and server is Solr 8?

Streaming expression function which can give parent document along with its child documents ?

Exclude update & read from authentication.

Re: Query takes a long time Solr 6.1.0

Suggest highlight is not working with context filter query

Issues with the handling of NULLs in Streaming Expressions

RE: query parsed in different ways in two identical solr instances

Re: query parsed in different ways in two identical solr instances

RE: query parsed in different ways in two identical solr instances

RE: query parsed in different ways in two identical solr instances

Re: SolrCloud indexing triggers merges and timeouts

Re: Loading pre created index files into MiniSolrCloudCluster of test framework

Re: Solr test framework not able to upload configuration to zk and fails with KeeperException

Re: Solr Migration to The AWS Cloud

Re: enhancement of documentation

Odd error with Solr 8 log / ingestion

Re: Odd error with Solr 8 log / ingestion

Re: Odd error with Solr 8 log / ingestion

strange behavior

Re: strange behavior

Re: strange behavior

Re: strange behavior

Re: strange behavior

Re: Group by and filtering on child documents.

Re: Issues with the handling of NULLs in Streaming Expressions

Re: Solr Migration to The AWS Cloud

Re: Custom cache for Solr Cloud mode

28 matches

Site Navigation

Mail list logo

Footer information