Re: Multiple "df" fields

2020-08-12 Thread Edward Turner
Many thanks for your suggestions.

We do use edismax and bq fields to help with our result ranking, but we'd
never thought about using it for this purpose (we were stuck on the
copyfield pattern + df pattern). This is a good suggestion though thank you.

We're now exploring the use of the pf field (thanks to Alexandre R. for
this) to automatically search on multiple fields, rather than relying on df.

Kind regards,

Edd

Edward Turner


On Tue, 11 Aug 2020 at 15:44, Erick Erickson 
wrote:

> Have you explored edismax?
>
> > On Aug 11, 2020, at 10:34 AM, Alexandre Rafalovitch 
> wrote:
> >
> > I can't remember if field aliasing works with df but it may be worth a
> try:
> >
> >
> https://lucene.apache.org/solr/guide/8_1/the-extended-dismax-query-parser.html#field-aliasing-using-per-field-qf-overrides
> >
> > Another example:
> >
> https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml
> >
> > Regards,
> >Alex
> >
> > On Tue., Aug. 11, 2020, 9:59 a.m. Edward Turner, 
> > wrote:
> >
> >> Hi all,
> >>
> >> Is it possible to have multiple "df" fields? (We think the answer is no
> >> because our experiments did not work when adding multiple "df" values to
> >> solrconfig.xml -- but we just wanted to double check with those who know
> >> better.) The reason we would like to do this is that we have two main
> field
> >> types (with different analyzers) and we'd like queries without a field
> to
> >> be searched over both of them. We could also use copyfields, but this
> would
> >> require us to have a common analyzer, which isn't exactly what we want.
> >>
> >> An alternative solution is to pre-process the query prior to sending it
> to
> >> Solr, so that queries with no field are changed as follows:
> >>
> >> q=value -> q=(field1:value OR field2:value)
> >>
> >> ... however, we feel a bit uncomfortable doing this though via String
> >> manipulation.
> >>
> >> Is there an obvious way we should tackle this problem that we are
> missing
> >> (e.g., which would be cleaner/safer and perhaps works at the Query
> object
> >> level)?
> >>
> >> Many thanks and best wishes,
> >>
> >> Edd
> >>
>
>


[Subquery] Transform Documents across Collections

2020-08-12 Thread Norbert Kutasi
Hello,

We have been using [subquery] to come up with arbitrary complex hierarchies
in our document responses.

It works well as long as the documents are in the same collection however
based on the reference guide I infer it can bring in documents from
different collections except it throws an error.
https://lucene.apache.org/solr/guide/8_2/transforming-result-documents.html#subquery


We are on SOLR 8.2 and in this sandbox we have a 2 node SOLRCloud cluster,
where both collections have 1 shard and 2 NRT replicas to ensure nodes have
a core from each collection.
Basic Authorization enabled.

Simple steps to reproduce this issue in this 2 node environment:
./solr create -c Collection1 -s 1 -rf 2
./solr create -c Collection2 -s 1 -rf 2

Note: these collections are schemaless, however we observed the ones with
schemas.

Collection 1:

   
  1
  John
   
   
  2
  Peter
   


Collection 2:

   
  3
  Thomas
 2
   
   
  4
  Charles
  1
   
   
  5
  Susan
 3
   



http://localhost:8983/solr/Collection1/query
{
  params: {
q: "*",
fq: "*",
rows: 5,
fl:"*,subordinate:[subquery fromIndex=Collection2]",
subordinate.fl:"*",
subordinate.q:"{!field f=reporting v=$row.id}",
subordinate.fq:"*",
subordinate.rows:"5"
  }
}

{
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"while invoking subordinate:[subqueryfromIndex=Collection2] on
doc=SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS,
first_name=[stored,index",
"code":400}}


Where do we make a mistake?

Thank you in advance,
Norbert


Re: [Subquery] Transform Documents across Collections

2020-08-12 Thread Dominique Bejean
Hi Norbert,

The field name in collection2 is  "reporting_to" not "reporting".

Dominique



Le mer. 12 août 2020 à 11:59, Norbert Kutasi  a
écrit :

> Hello,
>
> We have been using [subquery] to come up with arbitrary complex hierarchies
> in our document responses.
>
> It works well as long as the documents are in the same collection however
> based on the reference guide I infer it can bring in documents from
> different collections except it throws an error.
>
> https://lucene.apache.org/solr/guide/8_2/transforming-result-documents.html#subquery
>
>
> We are on SOLR 8.2 and in this sandbox we have a 2 node SOLRCloud cluster,
> where both collections have 1 shard and 2 NRT replicas to ensure nodes have
> a core from each collection.
> Basic Authorization enabled.
>
> Simple steps to reproduce this issue in this 2 node environment:
> ./solr create -c Collection1 -s 1 -rf 2
> ./solr create -c Collection2 -s 1 -rf 2
>
> Note: these collections are schemaless, however we observed the ones with
> schemas.
>
> Collection 1:
> 
>
>   1
>   John
>
>
>   2
>   Peter
>
> 
>
> Collection 2:
> 
>
>   3
>   Thomas
>  2
>
>
>   4
>   Charles
>   1
>
>
>   5
>   Susan
>  3
>
> 
>
>
> http://localhost:8983/solr/Collection1/query
> {
>   params: {
> q: "*",
> fq: "*",
> rows: 5,
> fl:"*,subordinate:[subquery fromIndex=Collection2]",
> subordinate.fl:"*",
> subordinate.q:"{!field f=reporting v=$row.id}",
> subordinate.fq:"*",
> subordinate.rows:"5"
>   }
> }
>
> {
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"while invoking subordinate:[subqueryfromIndex=Collection2] on
>
> doc=SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS,
> first_name=[stored,index",
> "code":400}}
>
>
> Where do we make a mistake?
>
> Thank you in advance,
> Norbert
>


Re: [Subquery] Transform Documents across Collections

2020-08-12 Thread Norbert Kutasi
Hi Dominique,

Sorry, I was in a hurry to create a simple enough yet similar case that we
face with internally.

reporting_to indeed is the right field , but the same error still persists,
something is seemingly wrong when invoking the *subquery *with *fromIndex*

{
  params: {
q: "*",
fq: "*",
rows: 5,
fl:"*,subordinate:[subquery fromIndex=Collection2]",
subordinate.fl:"*",
subordinate.q:"{!field f=reporting_to v=$row.id}",
subordinate.fq:"*",
subordinate.rows:"5",
  }
}

{
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"while invoking subordinate:[subqueryfromIndex=Collection2] on
doc=SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS,
first_name=[stored,index",
"code":400}}

Any help much appreciated, hopefully it's an error with the syntax I've
been using.

Regards,
Norbert

On Wed, 12 Aug 2020 at 12:49, Dominique Bejean 
wrote:

> Hi Norbert,
>
> The field name in collection2 is  "reporting_to" not "reporting".
>
> Dominique
>
>
>
> Le mer. 12 août 2020 à 11:59, Norbert Kutasi  a
> écrit :
>
> > Hello,
> >
> > We have been using [subquery] to come up with arbitrary complex
> hierarchies
> > in our document responses.
> >
> > It works well as long as the documents are in the same collection however
> > based on the reference guide I infer it can bring in documents from
> > different collections except it throws an error.
> >
> >
> https://lucene.apache.org/solr/guide/8_2/transforming-result-documents.html#subquery
> >
> >
> > We are on SOLR 8.2 and in this sandbox we have a 2 node SOLRCloud
> cluster,
> > where both collections have 1 shard and 2 NRT replicas to ensure nodes
> have
> > a core from each collection.
> > Basic Authorization enabled.
> >
> > Simple steps to reproduce this issue in this 2 node environment:
> > ./solr create -c Collection1 -s 1 -rf 2
> > ./solr create -c Collection2 -s 1 -rf 2
> >
> > Note: these collections are schemaless, however we observed the ones with
> > schemas.
> >
> > Collection 1:
> > 
> >
> >   1
> >   John
> >
> >
> >   2
> >   Peter
> >
> > 
> >
> > Collection 2:
> > 
> >
> >   3
> >   Thomas
> >  2
> >
> >
> >   4
> >   Charles
> >   1
> >
> >
> >   5
> >   Susan
> >  3
> >
> > 
> >
> >
> > http://localhost:8983/solr/Collection1/query
> > {
> >   params: {
> > q: "*",
> > fq: "*",
> > rows: 5,
> > fl:"*,subordinate:[subquery fromIndex=Collection2]",
> > subordinate.fl:"*",
> > subordinate.q:"{!field f=reporting v=$row.id}",
> > subordinate.fq:"*",
> > subordinate.rows:"5"
> >   }
> > }
> >
> > {
> >   "error":{
> > "metadata":[
> >   "error-class","org.apache.solr.common.SolrException",
> >   "root-error-class","org.apache.solr.common.SolrException"],
> > "msg":"while invoking subordinate:[subqueryfromIndex=Collection2] on
> >
> >
> doc=SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS,
> > first_name=[stored,index",
> > "code":400}}
> >
> >
> > Where do we make a mistake?
> >
> > Thank you in advance,
> > Norbert
> >
>


Solr 8.3.1 - NullPointer during Autoscaling

2020-08-12 Thread Anton Pfennig
Hi guys,

in my solr setup as SolrCloud (8.3.1) I’m using 5 nodes for one collection (say 
“collection1”, one on each node.
Now I would like to add a new collection on the same solr cluster but 
additionally the new collection (say “collection2”) should be only replicated 
on only nodes with sysprop.channel=mysysprop.

Whole setup runs on GKE (Kubernetes)

So If i scale and add additionally 5 nodes wich correctly sysprop.channel being 
set and cluster gets a new node, autoscaling dies with nullpointerexception 
trying to fetch metrics from new node. (see logs attached).

This is not an IO issue because the nodes and zookeeper can talk to each other.
And if I call /metrics on these new nodes I also see the right properties.

Also I see no violations in the suggester.

Please help 😊

THX
Anton

Attached files:

  *   
autoscaling.json
  *   Logs



{
  "policies":{"my-channel-policy[{
"replica":"<100",
"shard":"#EACH",
"collection":"collection2",
"nodeset":{"sysprop.channel":"mychannel"}
}]},
  "cluster-preferences":[
{
  "minimize":"cores",
  "precision":1},
{"maximize":"freedisk"}],
  "triggers":{
".auto_add_replicas":{
  "name":".auto_add_replicas",
  "event":"nodeLost",
  "waitFor":120,
  "enabled":true,
  "actions":[
{
  "name":"auto_add_replicas_plan",
  "class":"solr.AutoAddReplicasPlanAction"},
{
  "name":"execute_plan",
  "class":"solr.ExecutePlanAction"}]},
".scheduled_maintenance":{
  "name":".scheduled_maintenance",
  "event":"scheduled",
  "startTime":"NOW",
  "every":"+1DAY",
  "enabled":true,
  "actions":[
{
  "name":"inactive_shard_plan",
  "class":"solr.InactiveShardPlanAction"},
{
  "name":"inactive_markers_plan",
  "class":"solr.InactiveMarkersPlanAction"},
{
  "name":"execute_plan",
  "class":"solr.ExecutePlanAction"}]},
"node_added_trigger":{
  "event":"nodeAdded",
  "waitFor":20,
  "enabled":true,
  "actions":[
{
  "name":"compute_plan",
  "class":"solr.ComputePlanAction"},
{
  "name":"execute_plan",
  "class":"solr.ExecutePlanAction"}]}},
  "listeners":{
".auto_add_replicas.system":{
  "beforeAction":[],
  "afterAction":[],
  "stage":[
"STARTED",
"ABORTED",
"SUCCEEDED",
"FAILED",
"BEFORE_ACTION",
"AFTER_ACTION",
"IGNORED"],
  "trigger":".auto_add_replicas",
  "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"},
".scheduled_maintenance.system":{
  "beforeAction":[],
  "afterAction":[],
  "stage":[
"STARTED",
"ABORTED",
"SUCCEEDED",
"FAILED",
"BEFORE_ACTION",
"AFTER_ACTION",
"IGNORED"],
  "trigger":".scheduled_maintenance",
  "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"},
"node_added_trigger.system":{
  "beforeAction":[],
  "afterAction":[],
  "stage":[
"STARTED",
"ABORTED",
"SUCCEEDED",
"FAILED",
"BEFORE_ACTION",
"AFTER_ACTION",
"IGNORED"],
  "trigger":"node_added_trigger",
  "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"}},
  "properties":{}}



2020-07-29 14:54:52.372 WARN  (AutoscalingActionExecutor-8-thread-1) [   ] 
o.a.s.c.a.ScheduledTriggers Exception executing actions => 
org.apache.solr.cloud.autoscaling.TriggerActionException: Error processing 
action for trigger event: {
  "id":"ca5c53772b355Teaxi61tbu5rj7uqysz5vrcgll",
org.apache.solr.cloud.autoscaling.TriggerActionException: Error processing 
action for trigger event: {
  "id":"ca5c53772b355Teaxi61tbu5rj7uqysz5vrcgll",
  "source":"node_added_trigger",
  "eventTime":3559966177932117,
  "eventType":"NODEADDED",
  "properties":{
"eventTimes":[3559966177932117],
"preferredOperation":"movereplica",
"_enqueue_time_":3559967181279816,

"nodeNames":["solr-sede-v3-1.solr-headless-v3.search-preprod-europe-west4-b.svc.cluster.local:8983_solr"],
"replicaType":"NRT"}}
at 
org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$null$3(ScheduledTriggers.java:327)
 ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
 ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: org.apache.solr.common.SolrException: Unexpected exception while 
processing event: {
  "id":"ca5c53772b355Teaxi61tbu5rj7uqysz5v

Re: Solr 8.3.1 - NullPointer during Autoscaling

2020-08-12 Thread Erick Erickson
Autoscaling may be overkill, Is this a one-time thing or something you need 
automated?
Because for a one-time event, it’s simpler/faster just to specify 
createNodeSet with the CREATE command that lists the new machines you want the
collection to be placed on.

Note that there’s the special value EMPTY, if you use that then you can 
ADDREPLICA
to build out your shards with the “node” parameter to place each one exactly
where you want.

Best,
Erick

> On Aug 12, 2020, at 7:47 AM, Anton Pfennig  wrote:
> 
> Hi guys,
> 
> in my solr setup as SolrCloud (8.3.1) I’m using 5 nodes for one collection 
> (say “collection1”, one on each node.
> Now I would like to add a new collection on the same solr cluster but 
> additionally the new collection (say “collection2”) should be only replicated 
> on only nodes with sysprop.channel=mysysprop.
> 
> Whole setup runs on GKE (Kubernetes)
> 
> So If i scale and add additionally 5 nodes wich correctly sysprop.channel 
> being set and cluster gets a new node, autoscaling dies with 
> nullpointerexception trying to fetch metrics from new node. (see logs 
> attached).
> 
> This is not an IO issue because the nodes and zookeeper can talk to each 
> other.
> And if I call /metrics on these new nodes I also see the right properties.
> 
> Also I see no violations in the suggester.
> 
> Please help 😊
> 
> THX
> Anton
> 
> Attached files:
> 
>  *   
> autoscaling.json
>  *   Logs
> 
> 
> 
> {
>  "policies":{"my-channel-policy[{
>"replica":"<100",
>"shard":"#EACH",
>"collection":"collection2",
>"nodeset":{"sysprop.channel":"mychannel"}
>}]},
>  "cluster-preferences":[
>{
>  "minimize":"cores",
>  "precision":1},
>{"maximize":"freedisk"}],
>  "triggers":{
>".auto_add_replicas":{
>  "name":".auto_add_replicas",
>  "event":"nodeLost",
>  "waitFor":120,
>  "enabled":true,
>  "actions":[
>{
>  "name":"auto_add_replicas_plan",
>  "class":"solr.AutoAddReplicasPlanAction"},
>{
>  "name":"execute_plan",
>  "class":"solr.ExecutePlanAction"}]},
>".scheduled_maintenance":{
>  "name":".scheduled_maintenance",
>  "event":"scheduled",
>  "startTime":"NOW",
>  "every":"+1DAY",
>  "enabled":true,
>  "actions":[
>{
>  "name":"inactive_shard_plan",
>  "class":"solr.InactiveShardPlanAction"},
>{
>  "name":"inactive_markers_plan",
>  "class":"solr.InactiveMarkersPlanAction"},
>{
>  "name":"execute_plan",
>  "class":"solr.ExecutePlanAction"}]},
>"node_added_trigger":{
>  "event":"nodeAdded",
>  "waitFor":20,
>  "enabled":true,
>  "actions":[
>{
>  "name":"compute_plan",
>  "class":"solr.ComputePlanAction"},
>{
>  "name":"execute_plan",
>  "class":"solr.ExecutePlanAction"}]}},
>  "listeners":{
>".auto_add_replicas.system":{
>  "beforeAction":[],
>  "afterAction":[],
>  "stage":[
>"STARTED",
>"ABORTED",
>"SUCCEEDED",
>"FAILED",
>"BEFORE_ACTION",
>"AFTER_ACTION",
>"IGNORED"],
>  "trigger":".auto_add_replicas",
>  "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"},
>".scheduled_maintenance.system":{
>  "beforeAction":[],
>  "afterAction":[],
>  "stage":[
>"STARTED",
>"ABORTED",
>"SUCCEEDED",
>"FAILED",
>"BEFORE_ACTION",
>"AFTER_ACTION",
>"IGNORED"],
>  "trigger":".scheduled_maintenance",
>  "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"},
>"node_added_trigger.system":{
>  "beforeAction":[],
>  "afterAction":[],
>  "stage":[
>"STARTED",
>"ABORTED",
>"SUCCEEDED",
>"FAILED",
>"BEFORE_ACTION",
>"AFTER_ACTION",
>"IGNORED"],
>  "trigger":"node_added_trigger",
>  "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"}},
>  "properties":{}}
> 
> 
> 
> 2020-07-29 14:54:52.372 WARN  (AutoscalingActionExecutor-8-thread-1) [   ] 
> o.a.s.c.a.ScheduledTriggers Exception executing actions => 
> org.apache.solr.cloud.autoscaling.TriggerActionException: Error processing 
> action for trigger event: {
>  "id":"ca5c53772b355Teaxi61tbu5rj7uqysz5vrcgll",
> org.apache.solr.cloud.autoscaling.TriggerActionException: Error processing 
> action for trigger event: {
>  "id":"ca5c53772b355Teaxi61tbu5rj7uqysz5vrcgll",
>  "source":"node_added_trigger",
>  "eventTime":3559966177932117,
>  "eventType":"NODEADDED",
>  "properties":{
>"eventTimes":[3559966177932117],
>"preferredOperation":"movereplica",
>"_enqueue_time_":3559967181279816,
>
> "nodeNames":["solr-sede-v3-1.solr-headless-v3.search-preprod-europe-west4-b.svc.cluster.local:8983_solr"],
>"replicaType":"NRT"}}
>

Re: Solr 8.3.1 - NullPointer during Autoscaling

2020-08-12 Thread Anton Pfennig
Hi Erick,

thx!
the idea behind is to have a dedicated Kubernetes deployment for each 
collection. So e.g. if I need more solr nodes for particular collection I would 
just scale the Kubernetes deployment and solr should automatically add new 
replicas to these new nodes.

does it makes sense?

Br,
Anton

On 12.08.20, 14:05, "Erick Erickson"  wrote:

Autoscaling may be overkill, Is this a one-time thing or something you need 
automated?
Because for a one-time event, it’s simpler/faster just to specify 
createNodeSet with the CREATE command that lists the new machines you want 
the
collection to be placed on.

Note that there’s the special value EMPTY, if you use that then you can 
ADDREPLICA
to build out your shards with the “node” parameter to place each one exactly
where you want.

Best,
Erick

> On Aug 12, 2020, at 7:47 AM, Anton Pfennig  
wrote:
> 
> Hi guys,
> 
> in my solr setup as SolrCloud (8.3.1) I’m using 5 nodes for one 
collection (say “collection1”, one on each node.
> Now I would like to add a new collection on the same solr cluster but 
additionally the new collection (say “collection2”) should be only replicated 
on only nodes with sysprop.channel=mysysprop.
> 
> Whole setup runs on GKE (Kubernetes)
> 
> So If i scale and add additionally 5 nodes wich correctly sysprop.channel 
being set and cluster gets a new node, autoscaling dies with 
nullpointerexception trying to fetch metrics from new node. (see logs attached).
> 
> This is not an IO issue because the nodes and zookeeper can talk to each 
other.
> And if I call /metrics on these new nodes I also see the right properties.
> 
> Also I see no violations in the suggester.
> 
> Please help 😊
> 
> THX
> Anton
> 
> Attached files:
> 
>  *   
autoscaling.json
>  *   Logs
> 
> 
> 
> {
>  "policies":{"my-channel-policy[{
>"replica":"<100",
>"shard":"#EACH",
>"collection":"collection2",
>"nodeset":{"sysprop.channel":"mychannel"}
>}]},
>  "cluster-preferences":[
>{
>  "minimize":"cores",
>  "precision":1},
>{"maximize":"freedisk"}],
>  "triggers":{
>".auto_add_replicas":{
>  "name":".auto_add_replicas",
>  "event":"nodeLost",
>  "waitFor":120,
>  "enabled":true,
>  "actions":[
>{
>  "name":"auto_add_replicas_plan",
>  "class":"solr.AutoAddReplicasPlanAction"},
>{
>  "name":"execute_plan",
>  "class":"solr.ExecutePlanAction"}]},
>".scheduled_maintenance":{
>  "name":".scheduled_maintenance",
>  "event":"scheduled",
>  "startTime":"NOW",
>  "every":"+1DAY",
>  "enabled":true,
>  "actions":[
>{
>  "name":"inactive_shard_plan",
>  "class":"solr.InactiveShardPlanAction"},
>{
>  "name":"inactive_markers_plan",
>  "class":"solr.InactiveMarkersPlanAction"},
>{
>  "name":"execute_plan",
>  "class":"solr.ExecutePlanAction"}]},
>"node_added_trigger":{
>  "event":"nodeAdded",
>  "waitFor":20,
>  "enabled":true,
>  "actions":[
>{
>  "name":"compute_plan",
>  "class":"solr.ComputePlanAction"},
>{
>  "name":"execute_plan",
>  "class":"solr.ExecutePlanAction"}]}},
>  "listeners":{
>".auto_add_replicas.system":{
>  "beforeAction":[],
>  "afterAction":[],
>  "stage":[
>"STARTED",
>"ABORTED",
>"SUCCEEDED",
>"FAILED",
>"BEFORE_ACTION",
>"AFTER_ACTION",
>"IGNORED"],
>  "trigger":".auto_add_replicas",
>  "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"},
>".scheduled_maintenance.system":{
>  "beforeAction":[],
>  "afterAction":[],
>  "stage":[
>"STARTED",
>"ABORTED",
>"SUCCEEDED",
>"FAILED",
>"BEFORE_ACTION",
>"AFTER_ACTION",
>"IGNORED"],
>  "trigger":".scheduled_maintenance",
>  "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"},
>"node_added_trigger.system":{
>  "beforeAction":[],
>  "afterAction":[],
>  "stage":[
>"STARTED",
>"ABORTED",
>"SUCCEEDED",
>"FAILED",
>"BEFORE_ACTION",
>"AFTER_ACTION",
>"IGNORED"],
>  "trigger":"node_added_trigger",
>  "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"}},
>  

Re: Solr 8.3.1 - NullPointer during Autoscaling

2020-08-12 Thread Erick Erickson
Yeah, unfortunately I don’t have much do offer when it comes to autoscaling….

> On Aug 12, 2020, at 8:09 AM, Anton Pfennig  wrote:
> 
> Hi Erick,
> 
> thx!
> the idea behind is to have a dedicated Kubernetes deployment for each 
> collection. So e.g. if I need more solr nodes for particular collection I 
> would just scale the Kubernetes deployment and solr should automatically add 
> new replicas to these new nodes.
> 
> does it makes sense?
> 
> Br,
> Anton
> 
> On 12.08.20, 14:05, "Erick Erickson"  wrote:
> 
>Autoscaling may be overkill, Is this a one-time thing or something you 
> need automated?
>Because for a one-time event, it’s simpler/faster just to specify 
>createNodeSet with the CREATE command that lists the new machines you want 
> the
>collection to be placed on.
> 
>Note that there’s the special value EMPTY, if you use that then you can 
> ADDREPLICA
>to build out your shards with the “node” parameter to place each one 
> exactly
>where you want.
> 
>Best,
>Erick
> 
>> On Aug 12, 2020, at 7:47 AM, Anton Pfennig  wrote:
>> 
>> Hi guys,
>> 
>> in my solr setup as SolrCloud (8.3.1) I’m using 5 nodes for one collection 
>> (say “collection1”, one on each node.
>> Now I would like to add a new collection on the same solr cluster but 
>> additionally the new collection (say “collection2”) should be only 
>> replicated on only nodes with sysprop.channel=mysysprop.
>> 
>> Whole setup runs on GKE (Kubernetes)
>> 
>> So If i scale and add additionally 5 nodes wich correctly sysprop.channel 
>> being set and cluster gets a new node, autoscaling dies with 
>> nullpointerexception trying to fetch metrics from new node. (see logs 
>> attached).
>> 
>> This is not an IO issue because the nodes and zookeeper can talk to each 
>> other.
>> And if I call /metrics on these new nodes I also see the right properties.
>> 
>> Also I see no violations in the suggester.
>> 
>> Please help 😊
>> 
>> THX
>> Anton
>> 
>> Attached files:
>> 
>> *   
>> autoscaling.json
>> *   Logs
>> 
>> 
>> 
>> {
>> "policies":{"my-channel-policy[{
>>   "replica":"<100",
>>   "shard":"#EACH",
>>   "collection":"collection2",
>>   "nodeset":{"sysprop.channel":"mychannel"}
>>   }]},
>> "cluster-preferences":[
>>   {
>> "minimize":"cores",
>> "precision":1},
>>   {"maximize":"freedisk"}],
>> "triggers":{
>>   ".auto_add_replicas":{
>> "name":".auto_add_replicas",
>> "event":"nodeLost",
>> "waitFor":120,
>> "enabled":true,
>> "actions":[
>>   {
>> "name":"auto_add_replicas_plan",
>> "class":"solr.AutoAddReplicasPlanAction"},
>>   {
>> "name":"execute_plan",
>> "class":"solr.ExecutePlanAction"}]},
>>   ".scheduled_maintenance":{
>> "name":".scheduled_maintenance",
>> "event":"scheduled",
>> "startTime":"NOW",
>> "every":"+1DAY",
>> "enabled":true,
>> "actions":[
>>   {
>> "name":"inactive_shard_plan",
>> "class":"solr.InactiveShardPlanAction"},
>>   {
>> "name":"inactive_markers_plan",
>> "class":"solr.InactiveMarkersPlanAction"},
>>   {
>> "name":"execute_plan",
>> "class":"solr.ExecutePlanAction"}]},
>>   "node_added_trigger":{
>> "event":"nodeAdded",
>> "waitFor":20,
>> "enabled":true,
>> "actions":[
>>   {
>> "name":"compute_plan",
>> "class":"solr.ComputePlanAction"},
>>   {
>> "name":"execute_plan",
>> "class":"solr.ExecutePlanAction"}]}},
>> "listeners":{
>>   ".auto_add_replicas.system":{
>> "beforeAction":[],
>> "afterAction":[],
>> "stage":[
>>   "STARTED",
>>   "ABORTED",
>>   "SUCCEEDED",
>>   "FAILED",
>>   "BEFORE_ACTION",
>>   "AFTER_ACTION",
>>   "IGNORED"],
>> "trigger":".auto_add_replicas",
>> "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"},
>>   ".scheduled_maintenance.system":{
>> "beforeAction":[],
>> "afterAction":[],
>> "stage":[
>>   "STARTED",
>>   "ABORTED",
>>   "SUCCEEDED",
>>   "FAILED",
>>   "BEFORE_ACTION",
>>   "AFTER_ACTION",
>>   "IGNORED"],
>> "trigger":".scheduled_maintenance",
>> "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"},
>>   "node_added_trigger.system":{
>> "beforeAction":[],
>> "afterAction":[],
>> "stage":[
>>   "STARTED",
>>   "ABORTED",
>>   "SUCCEEDED",
>>   "FAILED",
>>   "BEFORE_ACTION",
>>   "AFTER_ACTION",
>>   "IGNORED"],
>> "trigger":"node_added_trigger",
>> "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"}},
>> "properties":{}}
>> 
>> 
>> 
>> 2020-07-29 14:54:52.372 WARN  (AutoscalingActionExecutor-8-thread-1) [   ] 
>> o.a.s.c.a.ScheduledTriggers Exception executing actions => 
>> org.apache.solr.cloud.autoscaling.TriggerActionException: Error proces

Re: Multiple "df" fields

2020-08-12 Thread Erick Erickson
Probably a typo but I think you mean qf rather than pf?

They’re both actually valid, but pf is “phrase field” which will give different 
results….

 Best,
Erick

> On Aug 12, 2020, at 5:26 AM, Edward Turner  wrote:
> 
> Many thanks for your suggestions.
> 
> We do use edismax and bq fields to help with our result ranking, but we'd
> never thought about using it for this purpose (we were stuck on the
> copyfield pattern + df pattern). This is a good suggestion though thank you.
> 
> We're now exploring the use of the pf field (thanks to Alexandre R. for
> this) to automatically search on multiple fields, rather than relying on df.
> 
> Kind regards,
> 
> Edd
> 
> Edward Turner
> 
> 
> On Tue, 11 Aug 2020 at 15:44, Erick Erickson 
> wrote:
> 
>> Have you explored edismax?
>> 
>>> On Aug 11, 2020, at 10:34 AM, Alexandre Rafalovitch 
>> wrote:
>>> 
>>> I can't remember if field aliasing works with df but it may be worth a
>> try:
>>> 
>>> 
>> https://lucene.apache.org/solr/guide/8_1/the-extended-dismax-query-parser.html#field-aliasing-using-per-field-qf-overrides
>>> 
>>> Another example:
>>> 
>> https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml
>>> 
>>> Regards,
>>>   Alex
>>> 
>>> On Tue., Aug. 11, 2020, 9:59 a.m. Edward Turner, 
>>> wrote:
>>> 
 Hi all,
 
 Is it possible to have multiple "df" fields? (We think the answer is no
 because our experiments did not work when adding multiple "df" values to
 solrconfig.xml -- but we just wanted to double check with those who know
 better.) The reason we would like to do this is that we have two main
>> field
 types (with different analyzers) and we'd like queries without a field
>> to
 be searched over both of them. We could also use copyfields, but this
>> would
 require us to have a common analyzer, which isn't exactly what we want.
 
 An alternative solution is to pre-process the query prior to sending it
>> to
 Solr, so that queries with no field are changed as follows:
 
 q=value -> q=(field1:value OR field2:value)
 
 ... however, we feel a bit uncomfortable doing this though via String
 manipulation.
 
 Is there an obvious way we should tackle this problem that we are
>> missing
 (e.g., which would be cleaner/safer and perhaps works at the Query
>> object
 level)?
 
 Many thanks and best wishes,
 
 Edd
 
>> 
>> 



Re: Managing leaders when recycling a cluster

2020-08-12 Thread Erick Erickson
There’s no particular need to do this unless you have a very large
number of leaders on a single node. That functionality was added
for a special case where there were 100s of leaders on the same
node.

The fact that a leader happens to be on a node that’s going away
shouldn’t matter at all; as soon as the node goes away a new
leader will be elected for that shard. I’d recommend that you shut
the Solr node that’s going away gracefully (i.e. something like
bin/solr stop) to make the election as smooth as possible, but even
that’s not required.

Best,
Erick

> On Aug 11, 2020, at 11:44 PM, Adam Woods  wrote:
> 
> Hi,
> 
> We've just recently gone through the process of upgrading Solr the 8.6 and
> have implemented an automated rolling update mechanism to allow us to more
> easily make changes to our cluster in the future.
> 
> Our process for this looks like this:
> 1. Cluster has 3 nodes.
> 2. Scale out to 6 nodes.
> 3. Protect the cluster overseer from scale in.
> 4. Scale in to 5 nodes.
> 5. Scale in to 4 nodes.
> 6. Expose the cluster overseer to scale in.
> 7. Scale in to 3 nodes.
> 
> When scaling in, the nodes are removed by the oldest first. Whenever we
> scale in or out, we ensure that the cluster reaches a state where it has
> the required number of active nodes, and each node contains an active
> replica for each collection.
> 
> It appears to work quite well. We were scaling down more than one node at a
> time previously, but we ran into this bug:
> https://issues.apache.org/jira/browse/SOLR-11208. Scaling down one at a
> time works around this for now.
> 
> We were wondering if we should be taking more care around managing the
> leaders of our collections during this process. Should we move the
> collection leaders across to the new nodes that were created as part of
> step 2 before we start removing the old nodes?
> 
> It looks like it's possible as Solr provides the ability to be able to do
> this by calling the REBALANCELEADERS api after setting preferedLeader=true
> on the replicas. Using this we could shift the leaders to the new nodes.
> 
> A thought I had while looking at the APIs available to set the
> preferredLeader property was that the BALANCESHARDUNIQUE api would be
> perfect for this scenario if it had the ability to limit the nodes to a
> specific set. Otherwise our option is to do this balancing logic ourselves
> and call the ADDREPLICAPROP api.
> 
> https://lucene.apache.org/solr/guide/8_6/cluster-node-management.html#balanceshardunique
> 
> Cheers,
> Adam



Re: [Subquery] Transform Documents across Collections

2020-08-12 Thread Erick Erickson
This works from a browser:
http://localhost:8981/solr/Collection1/query?q=*&fl=*,subordinate:[subquery]&subordinate.q=*:*&subordinate.fl=*&subordinate.collection=Collection2

One problem you’re having is that “fromIndex” is a _core_ not a collection. See:
https://lucene.apache.org/solr/guide/8_2/transforming-result-documents.html

It’s vaguely possible you could make it work by specifying something like
fromIndex=Collection2_shard1_replica_n1
if it was colocated on the node you’re querying, but you don’t want to go there…

Best,
Erick

> On Aug 12, 2020, at 7:17 AM, Norbert Kutasi  wrote:
> 
> Hi Dominique,
> 
> Sorry, I was in a hurry to create a simple enough yet similar case that we
> face with internally.
> 
> reporting_to indeed is the right field , but the same error still persists,
> something is seemingly wrong when invoking the *subquery *with *fromIndex*
> 
> {
>  params: {
>q: "*",
>fq: "*",
>rows: 5,
> fl:"*,subordinate:[subquery fromIndex=Collection2]",
>subordinate.fl:"*",
>subordinate.q:"{!field f=reporting_to v=$row.id}",
>subordinate.fq:"*",
>subordinate.rows:"5",
>  }
> }
> 
> {
>  "error":{
>"metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","org.apache.solr.common.SolrException"],
>"msg":"while invoking subordinate:[subqueryfromIndex=Collection2] on
> doc=SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS,
> first_name=[stored,index",
>"code":400}}
> 
> Any help much appreciated, hopefully it's an error with the syntax I've
> been using.
> 
> Regards,
> Norbert
> 
> On Wed, 12 Aug 2020 at 12:49, Dominique Bejean 
> wrote:
> 
>> Hi Norbert,
>> 
>> The field name in collection2 is  "reporting_to" not "reporting".
>> 
>> Dominique
>> 
>> 
>> 
>> Le mer. 12 août 2020 à 11:59, Norbert Kutasi  a
>> écrit :
>> 
>>> Hello,
>>> 
>>> We have been using [subquery] to come up with arbitrary complex
>> hierarchies
>>> in our document responses.
>>> 
>>> It works well as long as the documents are in the same collection however
>>> based on the reference guide I infer it can bring in documents from
>>> different collections except it throws an error.
>>> 
>>> 
>> https://lucene.apache.org/solr/guide/8_2/transforming-result-documents.html#subquery
>>> 
>>> 
>>> We are on SOLR 8.2 and in this sandbox we have a 2 node SOLRCloud
>> cluster,
>>> where both collections have 1 shard and 2 NRT replicas to ensure nodes
>> have
>>> a core from each collection.
>>> Basic Authorization enabled.
>>> 
>>> Simple steps to reproduce this issue in this 2 node environment:
>>> ./solr create -c Collection1 -s 1 -rf 2
>>> ./solr create -c Collection2 -s 1 -rf 2
>>> 
>>> Note: these collections are schemaless, however we observed the ones with
>>> schemas.
>>> 
>>> Collection 1:
>>> 
>>>   
>>>  1
>>>  John
>>>   
>>>   
>>>  2
>>>  Peter
>>>   
>>> 
>>> 
>>> Collection 2:
>>> 
>>>   
>>>  3
>>>  Thomas
>>> 2
>>>   
>>>   
>>>  4
>>>  Charles
>>>  1
>>>   
>>>   
>>>  5
>>>  Susan
>>> 3
>>>   
>>> 
>>> 
>>> 
>>> http://localhost:8983/solr/Collection1/query
>>> {
>>>  params: {
>>>q: "*",
>>>fq: "*",
>>>rows: 5,
>>> fl:"*,subordinate:[subquery fromIndex=Collection2]",
>>>subordinate.fl:"*",
>>>subordinate.q:"{!field f=reporting v=$row.id}",
>>>subordinate.fq:"*",
>>>subordinate.rows:"5"
>>>  }
>>> }
>>> 
>>> {
>>>  "error":{
>>>"metadata":[
>>>  "error-class","org.apache.solr.common.SolrException",
>>>  "root-error-class","org.apache.solr.common.SolrException"],
>>>"msg":"while invoking subordinate:[subqueryfromIndex=Collection2] on
>>> 
>>> 
>> doc=SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS,
>>> first_name=[stored,index",
>>>"code":400}}
>>> 
>>> 
>>> Where do we make a mistake?
>>> 
>>> Thank you in advance,
>>> Norbert
>>> 
>> 



Re: [Subquery] Transform Documents across Collections

2020-08-12 Thread Norbert Kutasi
I see what you mean, however the request results in cartesian products ,
because of subordinate.q=*:* :
http://localhost:8981/solr/Collection1/query?q=*&fl=*,subordinate:[subquery]&subordinate.q=*:*&subordinate.fl=*&subordinate.collection=Collection2

{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
  "q":"*",
  "fl":"*,subordinate:[subquery]",
  "subordinate.fl":"*",
  "subordinate.collection":"Collection2",
  "subordinate.q":"*:*"}},
  "response":{"numFound":2,"start":0,"docs":[
  {
"id":"1",
"first_name":["John"],
"_version_":1674811207656144896,
"subordinate":{"numFound":3,"start":0,"docs":[
{
  "id":"3",
  "reporting_to":[2],
  "first_name":["Thomas"],
  "_version_":1674811297814806528},
{
  "id":"4",
  "reporting_to":[1],
  "first_name":["Charles"],
  "_version_":1674811297816903680},
{
  "id":"5",
  "reporting_to":[3],
  "first_name":["Susan"],
  "_version_":1674811297816903681}]
}},
  {
"id":"2",
"first_name":["Peter"],
"_version_":1674811207659290624,
"subordinate":{"numFound":3,"start":0,"docs":[
{
  "id":"3",
  "reporting_to":[2],
  "first_name":["Thomas"],
  "_version_":1674811297814806528},
{
  "id":"4",
  "reporting_to":[1],
  "first_name":["Charles"],
  "_version_":1674811297816903680},
{
  "id":"5",
  "reporting_to":[3],
  "first_name":["Susan"],
  "_version_":1674811297816903681}]
}}]
  }}


Once I add back the "join"criteria q={!fields f=reporting_to v=$row.id},
the error comes back...

http://localhost:8983/solr/Collection1/query?q=*&fl=*,subordinate:[subquery]&subordinate.fl=*&subordinate.collection=Collection2&subordinate.q={!fields
f=reporting_to v=$row.id}

{
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"while invoking subordinate:[subquery] on
doc=SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS,
first_name=[stored,index",
"code":400}}


While I was writing an extensive response, just came across what seems to
be the solution:

http://localhost:8983/solr/Collection1/query?q=*&fl=*,subordinate:[subquery]&subordinate.fl=*&subordinate.collection=Collection2&subordinate.q={!term
f=reporting_to v=$row.id}

{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":1,
"params":{
  "json":"{\r\n  params: {\r\nq: \"*\",\r\nfq: \"*\",\r\n
 rows: 5,\r\n\tfl:\"*,subordinate:[subquery]\",\r\n
 subordinate.fl:\"*\",\r\nsubordinate.q:\"{!term f=reporting_to v=$
row.id}\",\r\nsubordinate.fq:\"*\",\r\nsubordinate.rows:\"5\",\r\n
   subordinate.collection:\"Collection2\"\r\n  }\r\n}\r\n\r\n\r\n\r\n"}},
  "response":{"numFound":2,"start":0,"docs":[
  {
"id":"1",
"first_name":["John"],
"_version_":1674811207656144896,
"subordinate":{"numFound":1,"start":0,"docs":[
{
  "id":"4",
  "reporting_to":[1],
  "first_name":["Charles"],
  "_version_":1674811297816903680}]
}},
  {
"id":"2",
"first_name":["Peter"],
"_version_":1674811207659290624,
"subordinate":{"numFound":1,"start":0,"docs":[
{
  "id":"3",
  "reporting_to":[2],
  "first_name":["Thomas"],
  "_version_":1674811297814806528}]
}}]
  }}

I don't remember when did I change it to !fields, the documentation had it
with !terms... which seems to be not working ether
q=name:john&fl=name,id,depts:[subquery]&depts.q={!terms
f=id *v=$row.dept_id*}&depts.rows=10

Erick, thanks for the suggestion of adding:
&subordinate.collection=Collection2

The solution is
http://torvmlnx03.temenosgroup.com:8983/solr/Collection1/query?q=*&fl=*,subordinate:[subquery]&subordinate.fl=*&subordinate.collection=Collection2&subordinate.q={!term%20f=reporting_to%20v=$row.id}
Regards,
Norbert


On Wed, 12 Aug 2020 at 14:41, Erick Erickson 
wrote:

> This works from a browser:
>
> http://localhost:8981/solr/Collection1/query?q=*&fl=*,subordinate:[subquery]&subordinate.q=*:*&subordinate.fl=*&subordinate.collection=Collection2
>
> One problem you’re having is that “fromIndex” is a _core_ not a
> collection. See:
> https://lucene.apache.org/solr/guide/8_2/transforming-result-documents.html
>
> It’s vaguely possible you could make it work by specifying something like
> fromIndex=Collection2_shard1_replica_n1
> if it was colocated on the node you’re querying, but you don’t want to go
> there…
>
> Be

Number of times in document

2020-08-12 Thread David Hastings
Is there any way to do a query for the minimum number of times a phrase or
string exists in a document?  This has been a request from some users as
other search services (names not to be mentioned) have such a
functionality.  Ive been using solr since 1.4 and i think ive tried finding
this ability before but pretty sure its completely against the standard
ranking functionality, but figured I would send out a feeler if this is
something that can be done


Re: [Subquery] Transform Documents across Collections

2020-08-12 Thread Norbert Kutasi
The version it's working on is 8.5!


On Wed, 12 Aug 2020 at 17:16, Norbert Kutasi 
wrote:

> I see what you mean, however the request results in cartesian products ,
> because of subordinate.q=*:* :
>
> http://localhost:8981/solr/Collection1/query?q=*&fl=*,subordinate:[subquery]&subordinate.q=*:*&subordinate.fl=*&subordinate.collection=Collection2
>
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":0,
> "params":{
>   "q":"*",
>   "fl":"*,subordinate:[subquery]",
>   "subordinate.fl":"*",
>   "subordinate.collection":"Collection2",
>   "subordinate.q":"*:*"}},
>   "response":{"numFound":2,"start":0,"docs":[
>   {
> "id":"1",
> "first_name":["John"],
> "_version_":1674811207656144896,
> "subordinate":{"numFound":3,"start":0,"docs":[
> {
>   "id":"3",
>   "reporting_to":[2],
>   "first_name":["Thomas"],
>   "_version_":1674811297814806528},
> {
>   "id":"4",
>   "reporting_to":[1],
>   "first_name":["Charles"],
>   "_version_":1674811297816903680},
> {
>   "id":"5",
>   "reporting_to":[3],
>   "first_name":["Susan"],
>   "_version_":1674811297816903681}]
> }},
>   {
> "id":"2",
> "first_name":["Peter"],
> "_version_":1674811207659290624,
> "subordinate":{"numFound":3,"start":0,"docs":[
> {
>   "id":"3",
>   "reporting_to":[2],
>   "first_name":["Thomas"],
>   "_version_":1674811297814806528},
> {
>   "id":"4",
>   "reporting_to":[1],
>   "first_name":["Charles"],
>   "_version_":1674811297816903680},
> {
>   "id":"5",
>   "reporting_to":[3],
>   "first_name":["Susan"],
>   "_version_":1674811297816903681}]
> }}]
>   }}
>
>
> Once I add back the "join"criteria q={!fields f=reporting_to v=$row.id},
> the error comes back...
>
>
> http://localhost:8983/solr/Collection1/query?q=*&fl=*,subordinate:[subquery]&subordinate.fl=*&subordinate.collection=Collection2&subordinate.q={!fields
> f=reporting_to v=$row.id}
>
> {
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"while invoking subordinate:[subquery] on 
> doc=SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS,
>  first_name=[stored,index",
> "code":400}}
>
>
> While I was writing an extensive response, just came across what seems to
> be the solution:
>
>
> http://localhost:8983/solr/Collection1/query?q=*&fl=*,subordinate:[subquery]&subordinate.fl=*&subordinate.collection=Collection2&subordinate.q={!term
> f=reporting_to v=$row.id}
>
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":1,
> "params":{
>   "json":"{\r\n  params: {\r\nq: \"*\",\r\nfq: \"*\",\r\n
>  rows: 5,\r\n\tfl:\"*,subordinate:[subquery]\",\r\n
>  subordinate.fl:\"*\",\r\nsubordinate.q:\"{!term f=reporting_to v=$
> row.id}\",\r\nsubordinate.fq:\"*\",\r\n
>  subordinate.rows:\"5\",\r\nsubordinate.collection:\"Collection2\"\r\n
>  }\r\n}\r\n\r\n\r\n\r\n"}},
>   "response":{"numFound":2,"start":0,"docs":[
>   {
> "id":"1",
> "first_name":["John"],
> "_version_":1674811207656144896,
> "subordinate":{"numFound":1,"start":0,"docs":[
> {
>   "id":"4",
>   "reporting_to":[1],
>   "first_name":["Charles"],
>   "_version_":1674811297816903680}]
> }},
>   {
> "id":"2",
> "first_name":["Peter"],
> "_version_":1674811207659290624,
> "subordinate":{"numFound":1,"start":0,"docs":[
> {
>   "id":"3",
>   "reporting_to":[2],
>   "first_name":["Thomas"],
>   "_version_":1674811297814806528}]
> }}]
>   }}
>
> I don't remember when did I change it to !fields, the documentation had it
> with !terms... which seems to be not working ether 
> q=name:john&fl=name,id,depts:[subquery]&depts.q={!terms
> f=id *v=$row.dept_id*}&depts.rows=10
>
> Erick, thanks for the suggestion of adding:
> &subordinate.collection=Collection2
>
> The solution is
> http://localhost:8983/solr/Collection1/query?q=*&fl=*,subordinate:[subquery]&subordinate.fl=*&subordinate.collection=Collection2&subordinate.q={!term%20f=reporting_to%20v=$row.id}
> Regards,
> Norbert
>
>
> On Wed, 12 Aug 2020 at 14:41, Erick Erickson 
> wrote:
>
>> This works from a browser:
>>
>> http://localhost:8981/solr/Collection1/query?q=*&fl=*,subordinate:[subquery]&subordinate.q=*:*&subordinate.fl=*&subordinate.collection=Collection2
>>
>> One problem you’re having

Managing leaders when recycling a cluster

2020-08-12 Thread Adam Woods
Hi,

We've just recently gone through the process of upgrading Solr the 8.6 and
have implemented an automated rolling update mechanism to allow us to more
easily make changes to our cluster in the future.

Our process for this looks like this:
1. Cluster has 3 nodes.
2. Scale out to 6 nodes.
3. Protect the cluster overseer from scale in.
4. Scale in to 5 nodes.
5. Scale in to 4 nodes.
6. Expose the cluster overseer to scale in.
7. Scale in to 3 nodes.

When scaling in, the nodes are removed by the oldest first. Whenever we
scale in or out, we ensure that the cluster reaches a state where it has
the required number of active nodes, and each node contains an active
replica for each collection.

It appears to work quite well. We were scaling down more than one node at a
time previously, but we ran into this bug:
https://issues.apache.org/jira/browse/SOLR-11208. Scaling down one at a
time works around this for now.

We were wondering if we should be taking more care around managing the
leaders of our collections during this process. Should we move the
collection leaders across to the new nodes that were created as part of
step 2 before we start removing the old nodes?

It looks like it's possible as Solr provides the ability to be able to do
this by calling the REBALANCELEADERS api after setting preferedLeader=true
on the replicas. Using this we could shift the leaders to the new nodes.

A thought I had while looking at the APIs available to set the
preferredLeader property was that the BALANCESHARDUNIQUE api would be
perfect for this scenario if it had the ability to limit the nodes to a
specific set. Otherwise our option is to do this balancing logic ourselves
and call the ADDREPLICAPROP api.

https://lucene.apache.org/solr/guide/8_6/cluster-node-management.html#balanceshardunique

Cheers,
Adam


Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo
I have 10 collections in single alias and having different result sets for
every time with the same query.

Is it as designed or do I miss something?

The configuration and schema for all 10 collections are identical.
Thanks,

Jae


Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly
Most likely you have 1 or more collections behind the alias that have replicas 
out of sync :) 

Try querying each collection to find the one out of sync.

> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
> 
> I have 10 collections in single alias and having different result sets for
> every time with the same query.
> 
> Is it as designed or do I miss something?
> 
> The configuration and schema for all 10 collections are identical.
> Thanks,
> 
> Jae



Re: Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo
The replications are all synched and there are no updates while I was
testing.


On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
 wrote:

> Most likely you have 1 or more collections behind the alias that have
> replicas out of sync :)
>
> Try querying each collection to find the one out of sync.
>
> > On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
> >
> > I have 10 collections in single alias and having different result sets
> for
> > every time with the same query.
> >
> > Is it as designed or do I miss something?
> >
> > The configuration and schema for all 10 collections are identical.
> > Thanks,
> >
> > Jae
>
>


Re: Multiple Collections in a Alias.

2020-08-12 Thread Walter Underwood
Are the scores the same for the documents that are ordered differently?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
> 
> The replications are all synched and there are no updates while I was
> testing.
> 
> 
> On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>  wrote:
> 
>> Most likely you have 1 or more collections behind the alias that have
>> replicas out of sync :)
>> 
>> Try querying each collection to find the one out of sync.
>> 
>>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>>> 
>>> I have 10 collections in single alias and having different result sets
>> for
>>> every time with the same query.
>>> 
>>> Is it as designed or do I miss something?
>>> 
>>> The configuration and schema for all 10 collections are identical.
>>> Thanks,
>>> 
>>> Jae
>> 
>> 



Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly
Try a simple test of querying each collection 5 times in a row, if the numFound 
are different for a single collection within tase 5 calls then u have it.
Please try it, what you may think is sync’d may actually not be. How do you 
validate correct sync ?

> On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
> 
> The replications are all synched and there are no updates while I was
> testing.
> 
> 
> On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>  wrote:
> 
>> Most likely you have 1 or more collections behind the alias that have
>> replicas out of sync :)
>> 
>> Try querying each collection to find the one out of sync.
>> 
>>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>>> 
>>> I have 10 collections in single alias and having different result sets
>> for
>>> every time with the same query.
>>> 
>>> Is it as designed or do I miss something?
>>> 
>>> The configuration and schema for all 10 collections are identical.
>>> Thanks,
>>> 
>>> Jae
>> 
>> 



Re: Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo
numFound  is same but different score.








On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
 wrote:

> Try a simple test of querying each collection 5 times in a row, if the
> numFound are different for a single collection within tase 5 calls then u
> have it.
> Please try it, what you may think is sync’d may actually not be. How do
> you validate correct sync ?
>
> > On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
> >
> > The replications are all synched and there are no updates while I was
> > testing.
> >
> >
> > On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
> >  wrote:
> >
> >> Most likely you have 1 or more collections behind the alias that have
> >> replicas out of sync :)
> >>
> >> Try querying each collection to find the one out of sync.
> >>
> >>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
> >>>
> >>> I have 10 collections in single alias and having different result sets
> >> for
> >>> every time with the same query.
> >>>
> >>> Is it as designed or do I miss something?
> >>>
> >>> The configuration and schema for all 10 collections are identical.
> >>> Thanks,
> >>>
> >>> Jae
> >>
> >>
>
>


Re: Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo
Good question. How can I validate if the replicas are all synched?


On Wed, Aug 12, 2020 at 7:28 PM Jae Joo  wrote:

> numFound  is same but different score.
> 
> 
> 
> 
> 
> 
> 
>
> On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
>  wrote:
>
>> Try a simple test of querying each collection 5 times in a row, if the
>> numFound are different for a single collection within tase 5 calls then u
>> have it.
>> Please try it, what you may think is sync’d may actually not be. How do
>> you validate correct sync ?
>>
>> > On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
>> >
>> > The replications are all synched and there are no updates while I was
>> > testing.
>> >
>> >
>> > On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>> >  wrote:
>> >
>> >> Most likely you have 1 or more collections behind the alias that have
>> >> replicas out of sync :)
>> >>
>> >> Try querying each collection to find the one out of sync.
>> >>
>> >>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>> >>>
>> >>> I have 10 collections in single alias and having different result sets
>> >> for
>> >>> every time with the same query.
>> >>>
>> >>> Is it as designed or do I miss something?
>> >>>
>> >>> The configuration and schema for all 10 collections are identical.
>> >>> Thanks,
>> >>>
>> >>> Jae
>> >>
>> >>
>>
>>


Re: Multiple Collections in a Alias.

2020-08-12 Thread Walter Underwood
Different absolute scores from different collections are OK, because
the exact values depend on the number of deleted documents.

For the set of documents that are in different orders from different
collections, are the scores of that set identical? If they are, then it
is normal to have a different order from different collections.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 12, 2020, at 4:29 PM, Jae Joo  wrote:
> 
> Good question. How can I validate if the replicas are all synched?
> 
> 
> On Wed, Aug 12, 2020 at 7:28 PM Jae Joo  wrote:
> 
>> numFound  is same but different score.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
>>  wrote:
>> 
>>> Try a simple test of querying each collection 5 times in a row, if the
>>> numFound are different for a single collection within tase 5 calls then u
>>> have it.
>>> Please try it, what you may think is sync’d may actually not be. How do
>>> you validate correct sync ?
>>> 
 On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
 
 The replications are all synched and there are no updates while I was
 testing.
 
 
 On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
  wrote:
 
> Most likely you have 1 or more collections behind the alias that have
> replicas out of sync :)
> 
> Try querying each collection to find the one out of sync.
> 
>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>> 
>> I have 10 collections in single alias and having different result sets
> for
>> every time with the same query.
>> 
>> Is it as designed or do I miss something?
>> 
>> The configuration and schema for all 10 collections are identical.
>> Thanks,
>> 
>> Jae
> 
> 
>>> 
>>> 



Re: Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo
I found it the root cause. I have 3 collections assigned to a alias and one
of them are NOT synched.
By the alias.











Collection 1











Collection 2











Collection 3











On Wed, Aug 12, 2020 at 7:29 PM Jae Joo  wrote:

> Good question. How can I validate if the replicas are all synched?
>
>
> On Wed, Aug 12, 2020 at 7:28 PM Jae Joo  wrote:
>
>> numFound  is same but different score.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>
>> On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
>>  wrote:
>>
>>> Try a simple test of querying each collection 5 times in a row, if the
>>> numFound are different for a single collection within tase 5 calls then u
>>> have it.
>>> Please try it, what you may think is sync’d may actually not be. How do
>>> you validate correct sync ?
>>>
>>> > On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
>>> >
>>> > The replications are all synched and there are no updates while I was
>>> > testing.
>>> >
>>> >
>>> > On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>>> >  wrote:
>>> >
>>> >> Most likely you have 1 or more collections behind the alias that have
>>> >> replicas out of sync :)
>>> >>
>>> >> Try querying each collection to find the one out of sync.
>>> >>
>>> >>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>>> >>>
>>> >>> I have 10 collections in single alias and having different result
>>> sets
>>> >> for
>>> >>> every time with the same query.
>>> >>>
>>> >>> Is it as designed or do I miss something?
>>> >>>
>>> >>> The configuration and schema for all 10 collections are identical.
>>> >>> Thanks,
>>> >>>
>>> >>> Jae
>>> >>
>>> >>
>>>
>>>


Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly
Glad u nailed the out of sync one :) 

> On Aug 12, 2020, at 4:38 PM, Jae Joo  wrote:
> 
> I found it the root cause. I have 3 collections assigned to a alias and one
> of them are NOT synched.
> By the alias.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Collection 1
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Collection 2
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Collection 3
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Aug 12, 2020 at 7:29 PM Jae Joo  wrote:
> 
>> Good question. How can I validate if the replicas are all synched?
>> 
>> 
>> On Wed, Aug 12, 2020 at 7:28 PM Jae Joo  wrote:
>> 
>>> numFound  is same but different score.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
>>>  wrote:
>>> 
 Try a simple test of querying each collection 5 times in a row, if the
 numFound are different for a single collection within tase 5 calls then u
 have it.
 Please try it, what you may think is sync’d may actually not be. How do
 you validate correct sync ?
 
> On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
> 
> The replications are all synched and there are no updates while I was
> testing.
> 
> 
> On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>  wrote:
> 
>> Most likely you have 1 or more collections behind the alias that have
>> replicas out of sync :)
>> 
>> Try querying each collection to find the one out of sync.
>> 
>>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>>> 
>>> I have 10 collections in single alias and having different result
 sets
>> for
>>> every time with the same query.
>>> 
>>> Is it as designed or do I miss something?
>>> 
>>> The configuration and schema for all 10 collections are identical.
>>> Thanks,
>>> 
>>> Jae
>> 
>> 
 
 



Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly
There may be other ways, easiest way is to write a script that gets the cluster 
status, and for each collection per replica you will have these details:

"collections":{
  “collection1":{
"pullReplicas":"0",
"replicationFactor":"1",
"shards":{
  "shard1":{
"range":"8000-8ccb",
"state":"active",
"replicas":{"core_node33":{
"core”:"collection1_shard1_replica_n30",
"base_url":"http://host:port/solr";,
"node_name”:”host:port",
"state":"active",
"type":"NRT",
"force_set_state":"false",
"leader":"true"}}},

For each replica of each shard make a localized call for numRecords:  
base_url/core/sleect?q=*:*&shard=shardX&distrib=false&rows=0
If you have replicas that disagree with each other with the number of records 
per shard then u have an issue with replicas not being in sync for a collection.
This is what I meant when I said “replicas out of sync”.


Your situation was actually very simple :) one of you collections has less data.
You seem to have a sync requirement between collections which is interesting, 
but thats beyond solr.
Your inter collection sync script needs some debugging most likely :) 




> On Aug 12, 2020, at 4:29 PM, Jae Joo  wrote:
> 
> Good question. How can I validate if the replicas are all synched?
> 
> 
> On Wed, Aug 12, 2020 at 7:28 PM Jae Joo  wrote:
> 
>> numFound  is same but different score.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
>>  wrote:
>> 
>>> Try a simple test of querying each collection 5 times in a row, if the
>>> numFound are different for a single collection within tase 5 calls then u
>>> have it.
>>> Please try it, what you may think is sync’d may actually not be. How do
>>> you validate correct sync ?
>>> 
 On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
 
 The replications are all synched and there are no updates while I was
 testing.
 
 
 On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
  wrote:
 
> Most likely you have 1 or more collections behind the alias that have
> replicas out of sync :)
> 
> Try querying each collection to find the one out of sync.
> 
>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>> 
>> I have 10 collections in single alias and having different result sets
> for
>> every time with the same query.
>> 
>> Is it as designed or do I miss something?
>> 
>> The configuration and schema for all 10 collections are identical.
>> Thanks,
>> 
>> Jae
> 
> 
>>> 
>>> 



SPLITSHARD failed after running for hours

2020-08-12 Thread sanjay dutt
Hello Solr community,
We tried to split shard of one collection which contains 80M documents. After 
running for few hours it failed with the exception 
org.apache.solr.common.SolrException.
Upon further investigation, I found below exception
Caused by: java.util.concurrent.RejectedExecutionException: Task 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1@4c995faa 
rejected from 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@4cea9dd8[Terminated,
 pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 
115769]: 

We just ended up with two more sub shards but with ZERO documents in them. Can 
someone please assist me here what exactly may have happened here. And if we 
have to try again what should we keep in our mind so that it will execute 
successfully.
Thanks,Sanjay


Sent from Yahoo Mail on Android

Solr and commits

2020-08-12 Thread Jayadevan Maymala
Hi all,

A few doubts about commits.

1)If no commit parameters are passed from a client (solarium) update, will
the autoSoftCommit values automatically work?
2) When we are not committing from the client, when will the data actually
be flushed to disk?

Regards,
Jayadevan