RE: CDCR performance issues

2018-03-09 Thread Davis, Daniel (NIH/NLM) [C]
These are general guidelines, I've done loads of networking, but may be less 
familiar with SolrCloud  and CDCR architecture.  However, I know it's all TCP 
sockets, so general guidelines do apply.

Check the round-trip time between the data centers using ping or TCP ping.   
Throughput tests may be high, but if Solr has to wait for a response to a 
request before sending the next action, then just like any network protocol 
that does that, it will get slow.

I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check 
whether some proxy/load balancer between data centers is causing it to be a 
single connection per operation.   That will *kill* performance.   Some proxies 
default to HTTP/1.0 (open, send request, server send response, close), and that 
will hurt.

Why you should listen to me even without SolrCloud knowledge - checkout paper 
"Latency performance of SOAP Implementations".   Same distribution of skills - 
I knew TCP well, but Apache Axis 1.1 not so well.   I still improved response 
time of Apache Axis 1.1 by 250ms per call with 1-line of code.

-Original Message-
From: Tom Peters [mailto:tpet...@synacor.com] 
Sent: Wednesday, March 7, 2018 6:19 PM
To: solr-user@lucene.apache.org
Subject: CDCR performance issues

I'm having issues with the target collection staying up-to-date with indexing 
from the source collection using CDCR.
 
This is what I'm getting back in terms of OPS:

curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
{
  "responseHeader": {
"status": 0,
"QTime": 0
  },
  "operationsPerSecond": [
"zook01,zook02,zook03/solr",
[
  "mycollection",
  [
"all",
49.10140553500938,
"adds",
10.27612635309587,
"deletes",
38.82527896994054
  ]
]
  ]
}

The source and target collections are in separate data centers.

Doing a network test between the leader node in the source data center and the 
ZooKeeper nodes in the target data center show decent enough network 
performance: ~181 Mbit/s

I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
2000, 2500) and they've haven't made much of a difference.

Any suggestions on potential settings to tune to improve the performance?

Thanks

--

Here's some relevant log lines from the source data center's leader:

2018-03-07 23:16:11.984 INFO  
(cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:23.062 INFO  
(cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
2018-03-07 23:16:32.063 INFO  
(cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:36.209 INFO  
(cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
2018-03-07 23:16:42.091 INFO  
(cdcr-replicator-207-thread-2-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
2018-03-07 23:16:46.790 INFO  
(cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:50.004 INFO  
(cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection


And what the log looks like in the target:

2018-03-07 23:18:46.475 INFO  (qtp1595212853-26) [c:mycollection s:shard1 
r:core_node2 x:mycollection_shard1_replica_n1] o.a.s.c.S.Request 
[mycollection_shard1_replica_n1]  webapp=/solr path=/update 
params={_stateVer_=mycollection:30&_version_=-1594317067896487950&cdcr.update=&wt=javabin&version=2}
 status=0 QTime=0
2018-03-07 23:18:46.500 IN

Setting Up Solr Authentication/Authorization

2018-03-09 Thread Terry Steichen
I'm trying to set up basic authentication/authorization with solr 6.6.0.

The documentation says to create a security.json file and describes the
content as:

{
"authentication":{
   "class":"solr.BasicAuthPlugin",
   "credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
},
"authorization":{
   "class":"solr.RuleBasedAuthorizationPlugin",
   "permissions":[{"name":"security-edit",
  "role":"admin"}]
   "user-role":{"solr":"admin"}
}}

Does that mean to literally use exactly the above as the security.json content, 
or customize it (in some fashion)?

The documentation  also mentions that the initial admin person is a user named 
"solr" with a password: "SolrRocks"  What's unclear is whether that's the 
password on which the hash (in security.json) was created or what?

What I can't figure out is whether the password hash is fixed, or whether it 
should be generated, and if so, how?

Also, some people on the web recommend altering the jetty xml files to do this 
- is it necessary too?

I'm certain this is fairly simple once I can get started - but I'm having 
trouble getting past step 1, and any help would be appreciated.

Terry



Re: HDInsight with Solr 4.9.0 Create Collection

2018-03-09 Thread Abhi Basu
Thanks for the reply, this really helped me.

For Solr 4.9, what is the actual zkcli command to upload config?

java -classpath example/solr-webapp/WEB-INF/lib/*
 org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
 -confdir example/solr/collection1/conf -confname conf1 -solrhome
example/solr

OR

./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd
upconfig -confname my_new_config -confdir
server/solr/configsets/basic_configs/conf

I dont know why HDP/HDInsight does not provide something like solrctl
commands to make life easier for all!




On Thu, Mar 8, 2018 at 5:43 PM, Shawn Heisey  wrote:

> On 3/8/2018 1:26 PM, Abhi Basu wrote:
> > I'm in a bind. Added Solr 4.9.0 to HDInsight cluster and find no Solrctl
> > commands installed. So, I am doing the following to create a collection.
>
> This 'solrctl' command is NOT part of Solr.  Google tells me it's part
> of software from Cloudera.
>
> You need to talk to Cloudera for support on that software.
>
> > I have my collection schema in a location:
> >
> > /home/sshuser/abhi/ems-collection/conf
> >
> > Using this command to create a collection:
> >
> > http://headnode1:8983/solr/admin/cores?action=CREATE&;
> name=ems-collection&instanceDir=/home/sshuser/abhi/ems-collection/conf
> >  internal.cloudapp.net:8983/solr/admin/cores?action=
> CREATE&name=ems-collection&instanceDir=/home/sshuser/
> abhi/ems-collection/conf/>
> > /
>
> You're using the term "collection".  And later you mention ZooKeeper. So
> you're almost certainly running in SolrCloud mode.  If your Solr is
> running in SolrCloud mode, do not try to use the CoreAdmin API
> (/solr/admin/cores).  Use the Collections API instead.  But before that,
> you need to get the configuration into ZooKeeper.  For standard Solr
> without Cloudera's tools, you would typically use the "zkcli" script
> (either zkcli.sh or zkcli.bat).  See page 376 of the reference guide for
> that specific version of Solr for help with the "upconfig" command for
> that script:
>
> http://archive.apache.org/dist/lucene/solr/ref-guide/
> apache-solr-ref-guide-4.9.pdf
>
> > I guess i need to register my config name with Zk. How do I register the
> > collection schema with Zookeeper?
> >
> > Is there way to bypass the registration with zk and build the collection
> > directly from my schema files at that folder location, like I was able to
> > do in Solr 4.10 in CDH 5.14:
> >
> > solrctl --zk hadoop-dn6.eso.local:2181/solr instancedir --create
> > ems-collection /home/sshuser/abhi/ems-collection/
> >
> > solrctl --zk hadoop-dn6.eso.local:2181/solr collection --create
> > ems-collection -s 3 -r 2
>
> The solrctl command is not something we can help you with on this
> mailing list.  Cloudera customizes Solr to the point where only they are
> able to really provide support for their version.  Your best bet will be
> to talk to Cloudera.
>
> When Solr is running with ZooKeeper, it's in SolrCloud mode.  In
> SolrCloud mode, you cannot create cores in the same way that you can in
> standalone mode -- you MUST create collections, and all configuration
> will be in zookeeper, not on the disk.
>
> Thanks,
> Shawn
>
>


-- 
Abhi Basu


Re: HDInsight with Solr 4.9.0 Create Collection

2018-03-09 Thread Abhi Basu
Ok, so I tried the following:

/usr/hdp/current/solr/example/scripts/cloud-scripts/zkcli.sh -cmd upconfig
-zkhost zk0-esohad.mzwz3dh4pb1evcdwc1lcsddrbe.jx.internal.cloudapp.net:2181
-confdir /home/sshuser/abhi/ems-collection/conf -confname ems-collection

And got this exception:
java.lang.IllegalArgumentException: Illegal directory:
/home/sshuser/abhi/ems-collection/conf


On Fri, Mar 9, 2018 at 10:43 AM, Abhi Basu <9000r...@gmail.com> wrote:

> Thanks for the reply, this really helped me.
>
> For Solr 4.9, what is the actual zkcli command to upload config?
>
> java -classpath example/solr-webapp/WEB-INF/lib/*
>  org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
>  -confdir example/solr/collection1/conf -confname conf1 -solrhome
> example/solr
>
> OR
>
> ./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd
> upconfig -confname my_new_config -confdir server/solr/configsets/basic_
> configs/conf
>
> I dont know why HDP/HDInsight does not provide something like solrctl
> commands to make life easier for all!
>
>
>
>
> On Thu, Mar 8, 2018 at 5:43 PM, Shawn Heisey  wrote:
>
>> On 3/8/2018 1:26 PM, Abhi Basu wrote:
>> > I'm in a bind. Added Solr 4.9.0 to HDInsight cluster and find no Solrctl
>> > commands installed. So, I am doing the following to create a collection.
>>
>> This 'solrctl' command is NOT part of Solr.  Google tells me it's part
>> of software from Cloudera.
>>
>> You need to talk to Cloudera for support on that software.
>>
>> > I have my collection schema in a location:
>> >
>> > /home/sshuser/abhi/ems-collection/conf
>> >
>> > Using this command to create a collection:
>> >
>> > http://headnode1:8983/solr/admin/cores?action=CREATE&name=
>> ems-collection&instanceDir=/home/sshuser/abhi/ems-collection/conf
>> > > cloudapp.net:8983/solr/admin/cores?action=CREATE&name=ems-
>> collection&instanceDir=/home/sshuser/abhi/ems-collection/conf/>
>> > /
>>
>> You're using the term "collection".  And later you mention ZooKeeper. So
>> you're almost certainly running in SolrCloud mode.  If your Solr is
>> running in SolrCloud mode, do not try to use the CoreAdmin API
>> (/solr/admin/cores).  Use the Collections API instead.  But before that,
>> you need to get the configuration into ZooKeeper.  For standard Solr
>> without Cloudera's tools, you would typically use the "zkcli" script
>> (either zkcli.sh or zkcli.bat).  See page 376 of the reference guide for
>> that specific version of Solr for help with the "upconfig" command for
>> that script:
>>
>> http://archive.apache.org/dist/lucene/solr/ref-guide/apache-
>> solr-ref-guide-4.9.pdf
>>
>> > I guess i need to register my config name with Zk. How do I register the
>> > collection schema with Zookeeper?
>> >
>> > Is there way to bypass the registration with zk and build the collection
>> > directly from my schema files at that folder location, like I was able
>> to
>> > do in Solr 4.10 in CDH 5.14:
>> >
>> > solrctl --zk hadoop-dn6.eso.local:2181/solr instancedir --create
>> > ems-collection /home/sshuser/abhi/ems-collection/
>> >
>> > solrctl --zk hadoop-dn6.eso.local:2181/solr collection --create
>> > ems-collection -s 3 -r 2
>>
>> The solrctl command is not something we can help you with on this
>> mailing list.  Cloudera customizes Solr to the point where only they are
>> able to really provide support for their version.  Your best bet will be
>> to talk to Cloudera.
>>
>> When Solr is running with ZooKeeper, it's in SolrCloud mode.  In
>> SolrCloud mode, you cannot create cores in the same way that you can in
>> standalone mode -- you MUST create collections, and all configuration
>> will be in zookeeper, not on the disk.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Abhi Basu
>



-- 
Abhi Basu


Re: Matching Queries with Wildcards and Numbers

2018-03-09 Thread rakeshaspl
Hi,
Do you find any solution for above issue?
Br,
Rakesh



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Matching Queries with Wildcards and Numbers

2018-03-09 Thread rakeshaspl
Do you find any solution for above issue ?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: HDInsight with Solr 4.9.0 Create Collection

2018-03-09 Thread Abhi Basu
That was due to a folder not being present. Is this something to do with
version?

http://hn0-esohad.mzwz3dh4pb1evcdwc1lcsddrbe.jx.internal.cloudapp.net:8983/solr/admin/collections?action=CREATE&name=ems-collection2&numShards=2&replicationFactor=2&maxShardsPerNode=1


org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
CREATEing SolrCore 'ems-collection2_shard2_replica2': Unable to create
core: ems-collection2_shard2_replica2 Caused by: No enum constant
org.apache.lucene.util.Version.4.10.3

On Fri, Mar 9, 2018 at 11:11 AM, Abhi Basu <9000r...@gmail.com> wrote:

> Ok, so I tried the following:
>
> /usr/hdp/current/solr/example/scripts/cloud-scripts/zkcli.sh -cmd
> upconfig -zkhost zk0-esohad.mzwz3dh4pb1evcdwc1lcsddrbe.jx.
> internal.cloudapp.net:2181 -confdir /home/sshuser/abhi/ems-collection/conf
> -confname ems-collection
>
> And got this exception:
> java.lang.IllegalArgumentException: Illegal directory:
> /home/sshuser/abhi/ems-collection/conf
>
>
> On Fri, Mar 9, 2018 at 10:43 AM, Abhi Basu <9000r...@gmail.com> wrote:
>
>> Thanks for the reply, this really helped me.
>>
>> For Solr 4.9, what is the actual zkcli command to upload config?
>>
>> java -classpath example/solr-webapp/WEB-INF/lib/*
>>  org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
>>  -confdir example/solr/collection1/conf -confname conf1 -solrhome
>> example/solr
>>
>> OR
>>
>> ./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd
>> upconfig -confname my_new_config -confdir server/solr/configsets/basic_c
>> onfigs/conf
>>
>> I dont know why HDP/HDInsight does not provide something like solrctl
>> commands to make life easier for all!
>>
>>
>>
>>
>> On Thu, Mar 8, 2018 at 5:43 PM, Shawn Heisey  wrote:
>>
>>> On 3/8/2018 1:26 PM, Abhi Basu wrote:
>>> > I'm in a bind. Added Solr 4.9.0 to HDInsight cluster and find no
>>> Solrctl
>>> > commands installed. So, I am doing the following to create a
>>> collection.
>>>
>>> This 'solrctl' command is NOT part of Solr.  Google tells me it's part
>>> of software from Cloudera.
>>>
>>> You need to talk to Cloudera for support on that software.
>>>
>>> > I have my collection schema in a location:
>>> >
>>> > /home/sshuser/abhi/ems-collection/conf
>>> >
>>> > Using this command to create a collection:
>>> >
>>> > http://headnode1:8983/solr/admin/cores?action=CREATE&name=em
>>> s-collection&instanceDir=/home/sshuser/abhi/ems-collection/conf
>>> > >> oudapp.net:8983/solr/admin/cores?action=CREATE&name=ems-coll
>>> ection&instanceDir=/home/sshuser/abhi/ems-collection/conf/>
>>> > /
>>>
>>> You're using the term "collection".  And later you mention ZooKeeper. So
>>> you're almost certainly running in SolrCloud mode.  If your Solr is
>>> running in SolrCloud mode, do not try to use the CoreAdmin API
>>> (/solr/admin/cores).  Use the Collections API instead.  But before that,
>>> you need to get the configuration into ZooKeeper.  For standard Solr
>>> without Cloudera's tools, you would typically use the "zkcli" script
>>> (either zkcli.sh or zkcli.bat).  See page 376 of the reference guide for
>>> that specific version of Solr for help with the "upconfig" command for
>>> that script:
>>>
>>> http://archive.apache.org/dist/lucene/solr/ref-guide/apache-
>>> solr-ref-guide-4.9.pdf
>>>
>>> > I guess i need to register my config name with Zk. How do I register
>>> the
>>> > collection schema with Zookeeper?
>>> >
>>> > Is there way to bypass the registration with zk and build the
>>> collection
>>> > directly from my schema files at that folder location, like I was able
>>> to
>>> > do in Solr 4.10 in CDH 5.14:
>>> >
>>> > solrctl --zk hadoop-dn6.eso.local:2181/solr instancedir --create
>>> > ems-collection /home/sshuser/abhi/ems-collection/
>>> >
>>> > solrctl --zk hadoop-dn6.eso.local:2181/solr collection --create
>>> > ems-collection -s 3 -r 2
>>>
>>> The solrctl command is not something we can help you with on this
>>> mailing list.  Cloudera customizes Solr to the point where only they are
>>> able to really provide support for their version.  Your best bet will be
>>> to talk to Cloudera.
>>>
>>> When Solr is running with ZooKeeper, it's in SolrCloud mode.  In
>>> SolrCloud mode, you cannot create cores in the same way that you can in
>>> standalone mode -- you MUST create collections, and all configuration
>>> will be in zookeeper, not on the disk.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>>
>> --
>> Abhi Basu
>>
>
>
>
> --
> Abhi Basu
>



-- 
Abhi Basu


Re: Matching Queries with Wildcards and Numbers

2018-03-09 Thread tapan1707
Hello Rakesh, 
As pointed out by Erick, changing *catenateAll* from 0 to 1 should work.
What this means is that, generateWordParts="1" generates tokens for words
for e.g. in the case of i-pad, it generates i, pad and ipad.and
generateNumberParts="1" generates tokens for numbers for e.g in the case of
88-77, it would generate 88,77 and 8877.
So When using catenateAll="1", Solr would generate a token Sidem2(query
asked in the original post).
Also as already been pointed out by Erick, one has to reindex the documents
so that Solr can refect the changes and create tokenize indexes. 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Altering the query if query contains all stopwods

2018-03-09 Thread tapan1707
Hello Ryan,
Solr has a Filter class called solr.SuggestStopFilterFactory, which
basically works similar to solr.StopFilterFactory but with a slight
modification that if all of the words are present in stopwords.txt then it
won't remove the last one. 
I am not sure about wildcard search but if all of the query tokens are
stopwords.txt then at the very least it won't be returning the zero
results.(assuming that search results for the last word exists)  



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


LTR Model size

2018-03-09 Thread Roopa Rao
what is the way to configure the model size for LTR? We have about 3MB
model and Solr is not holding this model as a ManagedResource.

How can this be configured?

Thanks,
Roopa


Re: LTR Model size

2018-03-09 Thread tapan1707
Could you elaborate a little bit more? Otherwise, I think that you might be
experiencing the same issue reported in
https://issues.apache.org/jira/browse/SOLR-11049 .
Default zookeeper znode limit is 1mb, so I think it might not be able to
handle the size of your model.
Correct me if I misunderstood anything.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Large number of HTTP requests to Solr-5.2.1 throwing errors

2018-03-09 Thread Shawn Heisey
On 3/8/2018 1:21 PM, Deeksha Sharma wrote:
> Version of Solr : Solr-5.2.1
> I am sending large number of HTTP GET requests to Solr server for querying 
> indexes. These requests to Solr are generated via a Node.js service.
>
> When the number of requests to Solr are ~250, I am intermittently facing 
> these kinds of issues:
>
>   *   Some times Socket hang up
>   *   Sometimes Solr-server doesn’t respond and send back HTTP 500.
>   *   At times I have received error of ECONNRESET -> indicating that TCP 
> connection on the other side closed abruptly.
>
> I wanted to know if there is a way to indicate to Solr that it needs to keep 
> the connective alive or there is a way to send solr server large number of 
> query requests.

We need to know exactly what is happening from Solr's point of view. 
Can you please share your solr.log file, making sure that it covers the
time of the errors?  Chances are very good that if you try to attach
that file to an email reply that the mailing list will eat the
attachment, so it's better if you use a paste website or a file sharing
website and give us a URL for accessing it.  Dropbox and Gist tend to be
good choices.

Whenever Solr sends back an error response, like a 500, there should be
something in the solr.log that provides more detail about what went wrong.

There is not typically anything in the log file that's particularly
sensitive, but if you do find something you don't want to share, feel
free to redact it.  But take care to ensure that the redaction is as
small as possible, and that it is possible for us to tell one bit of
redacted information apart from others.

Thanks,
Shawn



Re: LTR Model size

2018-03-09 Thread Roopa Rao
Thank you, this is exactly what I am facing.

There is a mention of increasing jute.maxbuffer in the Jira I will try that
option.

Thanks,
Roopa

On Fri, Mar 9, 2018 at 1:30 PM, tapan1707  wrote:

> Could you elaborate a little bit more? Otherwise, I think that you might be
> experiencing the same issue reported in
> https://issues.apache.org/jira/browse/SOLR-11049 .
> Default zookeeper znode limit is 1mb, so I think it might not be able to
> handle the size of your model.
> Correct me if I misunderstood anything.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: LTR Model size

2018-03-09 Thread spoonerk
Please unsubscribe me.  I have tried and tried but still get emails

On Mar 9, 2018 10:19 AM, "Roopa Rao"  wrote:

> what is the way to configure the model size for LTR? We have about 3MB
> model and Solr is not holding this model as a ManagedResource.
>
> How can this be configured?
>
> Thanks,
> Roopa
>


Re: CDCR performance issues

2018-03-09 Thread Tom Peters
Thanks. This was helpful. I did some tcpdumps and I'm noticing that the 
requests to the target data center are not batched in any way. Each update 
comes in as an independent update. Some follow-up questions:

1. Is it accurate that updates are not actually batched in transit from the 
source to the target and instead each document is posted separately?

2. Are they done synchronously? I assume yes (since you wouldn't want 
operations applied out of order)

3. If they are done synchronously, and are not batched in any way, does that 
mean that the best performance I can expect would be roughly how long it takes 
to round-trip a single document? ie. If my average ping is 25ms, then I can 
expect a peak performance of roughly 40 ops/s.

Thanks



> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C] 
>  wrote:
> 
> These are general guidelines, I've done loads of networking, but may be less 
> familiar with SolrCloud  and CDCR architecture.  However, I know it's all TCP 
> sockets, so general guidelines do apply.
> 
> Check the round-trip time between the data centers using ping or TCP ping.   
> Throughput tests may be high, but if Solr has to wait for a response to a 
> request before sending the next action, then just like any network protocol 
> that does that, it will get slow.
> 
> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check 
> whether some proxy/load balancer between data centers is causing it to be a 
> single connection per operation.   That will *kill* performance.   Some 
> proxies default to HTTP/1.0 (open, send request, server send response, 
> close), and that will hurt.
> 
> Why you should listen to me even without SolrCloud knowledge - checkout paper 
> "Latency performance of SOAP Implementations".   Same distribution of skills 
> - I knew TCP well, but Apache Axis 1.1 not so well.   I still improved 
> response time of Apache Axis 1.1 by 250ms per call with 1-line of code.
> 
> -Original Message-
> From: Tom Peters [mailto:tpet...@synacor.com] 
> Sent: Wednesday, March 7, 2018 6:19 PM
> To: solr-user@lucene.apache.org
> Subject: CDCR performance issues
> 
> I'm having issues with the target collection staying up-to-date with indexing 
> from the source collection using CDCR.
> 
> This is what I'm getting back in terms of OPS:
> 
>curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
>{
>  "responseHeader": {
>"status": 0,
>"QTime": 0
>  },
>  "operationsPerSecond": [
>"zook01,zook02,zook03/solr",
>[
>  "mycollection",
>  [
>"all",
>49.10140553500938,
>"adds",
>10.27612635309587,
>"deletes",
>38.82527896994054
>  ]
>]
>  ]
>}
> 
> The source and target collections are in separate data centers.
> 
> Doing a network test between the leader node in the source data center and 
> the ZooKeeper nodes in the target data center show decent enough network 
> performance: ~181 Mbit/s
> 
> I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
> 2000, 2500) and they've haven't made much of a difference.
> 
> Any suggestions on potential settings to tune to improve the performance?
> 
> Thanks
> 
> --
> 
> Here's some relevant log lines from the source data center's leader:
> 
>2018-03-07 23:16:11.984 INFO  
> (cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>2018-03-07 23:16:23.062 INFO  
> (cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
>2018-03-07 23:16:32.063 INFO  
> (cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>2018-03-07 23:16:36.209 INFO  
> (cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
>2018-03-07 23:16:42.091 INFO  
> (cdcr-replicator-207-thread-2-processing-n:solr2-a:8080_solr 
> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
> o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
>2018-03-07 23:16:46.790 

Re: CDCR performance issues

2018-03-09 Thread john spooner

please unsubscribe i tried to manaually unsubscribe


On 3/9/2018 12:59 PM, Tom Peters wrote:

Thanks. This was helpful. I did some tcpdumps and I'm noticing that the 
requests to the target data center are not batched in any way. Each update 
comes in as an independent update. Some follow-up questions:

1. Is it accurate that updates are not actually batched in transit from the 
source to the target and instead each document is posted separately?

2. Are they done synchronously? I assume yes (since you wouldn't want 
operations applied out of order)

3. If they are done synchronously, and are not batched in any way, does that 
mean that the best performance I can expect would be roughly how long it takes 
to round-trip a single document? ie. If my average ping is 25ms, then I can 
expect a peak performance of roughly 40 ops/s.

Thanks




On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C]  
wrote:

These are general guidelines, I've done loads of networking, but may be less 
familiar with SolrCloud  and CDCR architecture.  However, I know it's all TCP 
sockets, so general guidelines do apply.

Check the round-trip time between the data centers using ping or TCP ping.   
Throughput tests may be high, but if Solr has to wait for a response to a 
request before sending the next action, then just like any network protocol 
that does that, it will get slow.

I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check 
whether some proxy/load balancer between data centers is causing it to be a 
single connection per operation.   That will *kill* performance.   Some proxies 
default to HTTP/1.0 (open, send request, server send response, close), and that 
will hurt.

Why you should listen to me even without SolrCloud knowledge - checkout paper 
"Latency performance of SOAP Implementations".   Same distribution of skills - 
I knew TCP well, but Apache Axis 1.1 not so well.   I still improved response time of 
Apache Axis 1.1 by 250ms per call with 1-line of code.

-Original Message-
From: Tom Peters [mailto:tpet...@synacor.com]
Sent: Wednesday, March 7, 2018 6:19 PM
To: solr-user@lucene.apache.org
Subject: CDCR performance issues

I'm having issues with the target collection staying up-to-date with indexing 
from the source collection using CDCR.

This is what I'm getting back in terms of OPS:

curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
{
  "responseHeader": {
"status": 0,
"QTime": 0
  },
  "operationsPerSecond": [
"zook01,zook02,zook03/solr",
[
  "mycollection",
  [
"all",
49.10140553500938,
"adds",
10.27612635309587,
"deletes",
38.82527896994054
  ]
]
  ]
}

The source and target collections are in separate data centers.

Doing a network test between the leader node in the source data center and the 
ZooKeeper nodes in the target data center show decent enough network 
performance: ~181 Mbit/s

I've tried playing around with the "batchSize" value (128, 512, 728, 1000, 
2000, 2500) and they've haven't made much of a difference.

Any suggestions on potential settings to tune to improve the performance?

Thanks

--

Here's some relevant log lines from the source data center's leader:

2018-03-07 23:16:11.984 INFO  
(cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:23.062 INFO  
(cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
2018-03-07 23:16:32.063 INFO  
(cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
2018-03-07 23:16:36.209 INFO  
(cdcr-replicator-207-thread-1-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
2018-03-07 23:16:42.091 INFO  
(cdcr-replicator-207-thread-2-processing-n:solr2-a:8080_solr 
x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9) 
[c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6] 
o.a.s.h.CdcrReplicator Forwarded 512 updates to target mycollection
2018-03-07 23:16:46.790 INFO  
(cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr 
x:myco

Re: LTR Model size

2018-03-09 Thread Roopa Rao
Is there a way to patch the fix to use a external model to Solr 6.6 ?

Thanks,
Roopa

On Fri, Mar 9, 2018 at 1:30 PM, tapan1707  wrote:

> Could you elaborate a little bit more? Otherwise, I think that you might be
> experiencing the same issue reported in
> https://issues.apache.org/jira/browse/SOLR-11049 .
> Default zookeeper znode limit is 1mb, so I think it might not be able to
> handle the size of your model.
> Correct me if I misunderstood anything.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: HDInsight with Solr 4.9.0 Create Collection

2018-03-09 Thread Abhi Basu
This has been resolved!

Turned out to be schema and config file version diff between 4.10 and 4.9.

Thanks,

Abhi

On Fri, Mar 9, 2018 at 11:41 AM, Abhi Basu <9000r...@gmail.com> wrote:

> That was due to a folder not being present. Is this something to do with
> version?
>
> http://hn0-esohad.mzwz3dh4pb1evcdwc1lcsddrbe.jx.
> internal.cloudapp.net:8983/solr/admin/collections?action=
> CREATE&name=ems-collection2&numShards=2&replicationFactor=
> 2&maxShardsPerNode=1
>
>
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
> CREATEing SolrCore 'ems-collection2_shard2_replica2': Unable to create
> core: ems-collection2_shard2_replica2 Caused by: No enum constant
> org.apache.lucene.util.Version.4.10.3
>
> On Fri, Mar 9, 2018 at 11:11 AM, Abhi Basu <9000r...@gmail.com> wrote:
>
>> Ok, so I tried the following:
>>
>> /usr/hdp/current/solr/example/scripts/cloud-scripts/zkcli.sh -cmd
>> upconfig -zkhost zk0-esohad.mzwz3dh4pb1evcdwc1l
>> csddrbe.jx.internal.cloudapp.net:2181 -confdir
>> /home/sshuser/abhi/ems-collection/conf -confname ems-collection
>>
>> And got this exception:
>> java.lang.IllegalArgumentException: Illegal directory:
>> /home/sshuser/abhi/ems-collection/conf
>>
>>
>> On Fri, Mar 9, 2018 at 10:43 AM, Abhi Basu <9000r...@gmail.com> wrote:
>>
>>> Thanks for the reply, this really helped me.
>>>
>>> For Solr 4.9, what is the actual zkcli command to upload config?
>>>
>>> java -classpath example/solr-webapp/WEB-INF/lib/*
>>>  org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
>>>  -confdir example/solr/collection1/conf -confname conf1 -solrhome
>>> example/solr
>>>
>>> OR
>>>
>>> ./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd
>>> upconfig -confname my_new_config -confdir server/solr/configsets/basic_c
>>> onfigs/conf
>>>
>>> I dont know why HDP/HDInsight does not provide something like solrctl
>>> commands to make life easier for all!
>>>
>>>
>>>
>>>
>>> On Thu, Mar 8, 2018 at 5:43 PM, Shawn Heisey 
>>> wrote:
>>>
 On 3/8/2018 1:26 PM, Abhi Basu wrote:
 > I'm in a bind. Added Solr 4.9.0 to HDInsight cluster and find no
 Solrctl
 > commands installed. So, I am doing the following to create a
 collection.

 This 'solrctl' command is NOT part of Solr.  Google tells me it's part
 of software from Cloudera.

 You need to talk to Cloudera for support on that software.

 > I have my collection schema in a location:
 >
 > /home/sshuser/abhi/ems-collection/conf
 >
 > Using this command to create a collection:
 >
 > http://headnode1:8983/solr/admin/cores?action=CREATE&name=em
 s-collection&instanceDir=/home/sshuser/abhi/ems-collection/conf
 > 
 > /

 You're using the term "collection".  And later you mention ZooKeeper. So
 you're almost certainly running in SolrCloud mode.  If your Solr is
 running in SolrCloud mode, do not try to use the CoreAdmin API
 (/solr/admin/cores).  Use the Collections API instead.  But before that,
 you need to get the configuration into ZooKeeper.  For standard Solr
 without Cloudera's tools, you would typically use the "zkcli" script
 (either zkcli.sh or zkcli.bat).  See page 376 of the reference guide for
 that specific version of Solr for help with the "upconfig" command for
 that script:

 http://archive.apache.org/dist/lucene/solr/ref-guide/apache-
 solr-ref-guide-4.9.pdf

 > I guess i need to register my config name with Zk. How do I register
 the
 > collection schema with Zookeeper?
 >
 > Is there way to bypass the registration with zk and build the
 collection
 > directly from my schema files at that folder location, like I was
 able to
 > do in Solr 4.10 in CDH 5.14:
 >
 > solrctl --zk hadoop-dn6.eso.local:2181/solr instancedir --create
 > ems-collection /home/sshuser/abhi/ems-collection/
 >
 > solrctl --zk hadoop-dn6.eso.local:2181/solr collection --create
 > ems-collection -s 3 -r 2

 The solrctl command is not something we can help you with on this
 mailing list.  Cloudera customizes Solr to the point where only they are
 able to really provide support for their version.  Your best bet will be
 to talk to Cloudera.

 When Solr is running with ZooKeeper, it's in SolrCloud mode.  In
 SolrCloud mode, you cannot create cores in the same way that you can in
 standalone mode -- you MUST create collections, and all configuration
 will be in zookeeper, not on the disk.

 Thanks,
 Shawn


>>>
>>>
>>> --
>>> Abhi Basu
>>>
>>
>>
>>
>> --
>> Abhi Basu
>>
>
>
>
> --
> Abhi Basu
>



-- 
Abhi Basu


Re: HDInsight with Solr 4.9.0 Create Collection

2018-03-09 Thread john spooner

would be nice to not get this email.


On 3/9/2018 1:23 PM, Abhi Basu wrote:

This has been resolved!

Turned out to be schema and config file version diff between 4.10 and 4.9.

Thanks,

Abhi

On Fri, Mar 9, 2018 at 11:41 AM, Abhi Basu <9000r...@gmail.com> wrote:


That was due to a folder not being present. Is this something to do with
version?

http://hn0-esohad.mzwz3dh4pb1evcdwc1lcsddrbe.jx.
internal.cloudapp.net:8983/solr/admin/collections?action=
CREATE&name=ems-collection2&numShards=2&replicationFactor=
2&maxShardsPerNode=1


org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
CREATEing SolrCore 'ems-collection2_shard2_replica2': Unable to create
core: ems-collection2_shard2_replica2 Caused by: No enum constant
org.apache.lucene.util.Version.4.10.3

On Fri, Mar 9, 2018 at 11:11 AM, Abhi Basu <9000r...@gmail.com> wrote:


Ok, so I tried the following:

/usr/hdp/current/solr/example/scripts/cloud-scripts/zkcli.sh -cmd
upconfig -zkhost zk0-esohad.mzwz3dh4pb1evcdwc1l
csddrbe.jx.internal.cloudapp.net:2181 -confdir
/home/sshuser/abhi/ems-collection/conf -confname ems-collection

And got this exception:
java.lang.IllegalArgumentException: Illegal directory:
/home/sshuser/abhi/ems-collection/conf


On Fri, Mar 9, 2018 at 10:43 AM, Abhi Basu <9000r...@gmail.com> wrote:


Thanks for the reply, this really helped me.

For Solr 4.9, what is the actual zkcli command to upload config?

java -classpath example/solr-webapp/WEB-INF/lib/*
  org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983
  -confdir example/solr/collection1/conf -confname conf1 -solrhome
example/solr

OR

./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd
upconfig -confname my_new_config -confdir server/solr/configsets/basic_c
onfigs/conf

I dont know why HDP/HDInsight does not provide something like solrctl
commands to make life easier for all!




On Thu, Mar 8, 2018 at 5:43 PM, Shawn Heisey 
wrote:


On 3/8/2018 1:26 PM, Abhi Basu wrote:

I'm in a bind. Added Solr 4.9.0 to HDInsight cluster and find no

Solrctl

commands installed. So, I am doing the following to create a

collection.

This 'solrctl' command is NOT part of Solr.  Google tells me it's part
of software from Cloudera.

You need to talk to Cloudera for support on that software.


I have my collection schema in a location:

/home/sshuser/abhi/ems-collection/conf

Using this command to create a collection:

http://headnode1:8983/solr/admin/cores?action=CREATE&name=em

s-collection&instanceDir=/home/sshuser/abhi/ems-collection/conf



/

You're using the term "collection".  And later you mention ZooKeeper. So
you're almost certainly running in SolrCloud mode.  If your Solr is
running in SolrCloud mode, do not try to use the CoreAdmin API
(/solr/admin/cores).  Use the Collections API instead.  But before that,
you need to get the configuration into ZooKeeper.  For standard Solr
without Cloudera's tools, you would typically use the "zkcli" script
(either zkcli.sh or zkcli.bat).  See page 376 of the reference guide for
that specific version of Solr for help with the "upconfig" command for
that script:

http://archive.apache.org/dist/lucene/solr/ref-guide/apache-
solr-ref-guide-4.9.pdf


I guess i need to register my config name with Zk. How do I register

the

collection schema with Zookeeper?

Is there way to bypass the registration with zk and build the

collection

directly from my schema files at that folder location, like I was

able to

do in Solr 4.10 in CDH 5.14:

solrctl --zk hadoop-dn6.eso.local:2181/solr instancedir --create
ems-collection /home/sshuser/abhi/ems-collection/

solrctl --zk hadoop-dn6.eso.local:2181/solr collection --create
ems-collection -s 3 -r 2

The solrctl command is not something we can help you with on this
mailing list.  Cloudera customizes Solr to the point where only they are
able to really provide support for their version.  Your best bet will be
to talk to Cloudera.

When Solr is running with ZooKeeper, it's in SolrCloud mode.  In
SolrCloud mode, you cannot create cores in the same way that you can in
standalone mode -- you MUST create collections, and all configuration
will be in zookeeper, not on the disk.

Thanks,
Shawn




--
Abhi Basu




--
Abhi Basu




--
Abhi Basu








Re: LTR Model size

2018-03-09 Thread Erick Erickson
Spoonerk:

Please follow the instructions here:
http://lucene.apache.org/solr/community.html#mailing-lists-irc

. You must use the _exact_ same e-mail as you used to subscribe.

If the initial try doesn't work and following the suggestions at the
"problems" link doesn't work for you, let us know. But note you need
to show us the _entire_ return header to allow anyone to diagnose the
problem.


Best,
Erick

On Fri, Mar 9, 2018 at 12:15 PM, spoonerk  wrote:
> Please unsubscribe me.  I have tried and tried but still get emails
>
> On Mar 9, 2018 10:19 AM, "Roopa Rao"  wrote:
>
>> what is the way to configure the model size for LTR? We have about 3MB
>> model and Solr is not holding this model as a ManagedResource.
>>
>> How can this be configured?
>>
>> Thanks,
>> Roopa
>>


Re: CDCR performance issues

2018-03-09 Thread Erick Erickson
John:

_What_ did you try and how did it fail?

Please follow the instructions here:
http://lucene.apache.org/solr/community.html#mailing-lists-irc

. You must use the _exact_ same e-mail as you used to subscribe.


If the initial try doesn't work and following the suggestions at the
"problems" link doesn't work for you, let us know. But note you need
to show us the _entire_ return header to allow anyone to diagnose the
problem.


Best,

Erick

On Fri, Mar 9, 2018 at 1:00 PM, john spooner  wrote:
> please unsubscribe i tried to manaually unsubscribe
>
>
>
> On 3/9/2018 12:59 PM, Tom Peters wrote:
>>
>> Thanks. This was helpful. I did some tcpdumps and I'm noticing that the
>> requests to the target data center are not batched in any way. Each update
>> comes in as an independent update. Some follow-up questions:
>>
>> 1. Is it accurate that updates are not actually batched in transit from
>> the source to the target and instead each document is posted separately?
>>
>> 2. Are they done synchronously? I assume yes (since you wouldn't want
>> operations applied out of order)
>>
>> 3. If they are done synchronously, and are not batched in any way, does
>> that mean that the best performance I can expect would be roughly how long
>> it takes to round-trip a single document? ie. If my average ping is 25ms,
>> then I can expect a peak performance of roughly 40 ops/s.
>>
>> Thanks
>>
>>
>>
>>> On Mar 9, 2018, at 11:21 AM, Davis, Daniel (NIH/NLM) [C]
>>>  wrote:
>>>
>>> These are general guidelines, I've done loads of networking, but may be
>>> less familiar with SolrCloud  and CDCR architecture.  However, I know it's
>>> all TCP sockets, so general guidelines do apply.
>>>
>>> Check the round-trip time between the data centers using ping or TCP
>>> ping.   Throughput tests may be high, but if Solr has to wait for a response
>>> to a request before sending the next action, then just like any network
>>> protocol that does that, it will get slow.
>>>
>>> I'm pretty sure CDCR uses HTTP/HTTPS rather than just TCP, so also check
>>> whether some proxy/load balancer between data centers is causing it to be a
>>> single connection per operation.   That will *kill* performance.   Some
>>> proxies default to HTTP/1.0 (open, send request, server send response,
>>> close), and that will hurt.
>>>
>>> Why you should listen to me even without SolrCloud knowledge - checkout
>>> paper "Latency performance of SOAP Implementations".   Same distribution of
>>> skills - I knew TCP well, but Apache Axis 1.1 not so well.   I still
>>> improved response time of Apache Axis 1.1 by 250ms per call with 1-line of
>>> code.
>>>
>>> -Original Message-
>>> From: Tom Peters [mailto:tpet...@synacor.com]
>>> Sent: Wednesday, March 7, 2018 6:19 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: CDCR performance issues
>>>
>>> I'm having issues with the target collection staying up-to-date with
>>> indexing from the source collection using CDCR.
>>>
>>> This is what I'm getting back in terms of OPS:
>>>
>>> curl -s 'solr2-a:8080/solr/mycollection/cdcr?action=OPS' | jq .
>>> {
>>>   "responseHeader": {
>>> "status": 0,
>>> "QTime": 0
>>>   },
>>>   "operationsPerSecond": [
>>> "zook01,zook02,zook03/solr",
>>> [
>>>   "mycollection",
>>>   [
>>> "all",
>>> 49.10140553500938,
>>> "adds",
>>> 10.27612635309587,
>>> "deletes",
>>> 38.82527896994054
>>>   ]
>>> ]
>>>   ]
>>> }
>>>
>>> The source and target collections are in separate data centers.
>>>
>>> Doing a network test between the leader node in the source data center
>>> and the ZooKeeper nodes in the target data center show decent enough network
>>> performance: ~181 Mbit/s
>>>
>>> I've tried playing around with the "batchSize" value (128, 512, 728,
>>> 1000, 2000, 2500) and they've haven't made much of a difference.
>>>
>>> Any suggestions on potential settings to tune to improve the performance?
>>>
>>> Thanks
>>>
>>> --
>>>
>>> Here's some relevant log lines from the source data center's leader:
>>>
>>> 2018-03-07 23:16:11.984 INFO
>>> (cdcr-replicator-207-thread-3-processing-n:solr2-a:8080_solr
>>> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9)
>>> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
>>> o.a.s.h.CdcrReplicator Forwarded 511 updates to target mycollection
>>> 2018-03-07 23:16:23.062 INFO
>>> (cdcr-replicator-207-thread-4-processing-n:solr2-a:8080_solr
>>> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9)
>>> [c:mycollection s:shard1 r:core_node9 x:mycollection_shard1_replica_n6]
>>> o.a.s.h.CdcrReplicator Forwarded 510 updates to target mycollection
>>> 2018-03-07 23:16:32.063 INFO
>>> (cdcr-replicator-207-thread-5-processing-n:solr2-a:8080_solr
>>> x:mycollection_shard1_replica_n6 s:shard1 c:mycollection r:core_node9)
>>> [c:myc

Re: Altering the query if query contains all stopwods

2018-03-09 Thread Rick Leir
Tav, Ryan
Now you have me wondering, should it be returning *:* or some general landing 
page.

Suppose you had typeahead or autocomplete, it should ignore any stopwords list.

By the way, people on this list have had good reasons why we should stop using 
stopwords.
Cheers -- Rick

On March 9, 2018 1:13:22 PM EST, tapan1707  wrote:
>Hello Ryan,
>Solr has a Filter class called solr.SuggestStopFilterFactory, which
>basically works similar to solr.StopFilterFactory but with a slight
>modification that if all of the words are present in stopwords.txt then
>it
>won't remove the last one. 
>I am not sure about wildcard search but if all of the query tokens are
>stopwords.txt then at the very least it won't be returning the zero
>results.(assuming that search results for the last word exists)  
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Setting Up Solr Authentication/Authorization

2018-03-09 Thread Shawn Heisey
On 3/9/2018 9:27 AM, Terry Steichen wrote:
> I'm trying to set up basic authentication/authorization with solr 6.6.0.
>
> The documentation says to create a security.json file and describes the
> content as:
>
> {
> "authentication":{
>"class":"solr.BasicAuthPlugin",
>"credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= 
> Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
> },
> "authorization":{
>"class":"solr.RuleBasedAuthorizationPlugin",
>"permissions":[{"name":"security-edit",
>   "role":"admin"}]
>"user-role":{"solr":"admin"}
> }}
>
> Does that mean to literally use exactly the above as the security.json 
> content, or customize it (in some fashion)?

Initial disclaimer: I have never used the authentication plugins
myself.  But I have seen what people on this mailing list get told when
they ask about it.

If you can figure out how to customize that file from the documentation
to do something that you need, then feel free to customize it.  But see
info below about passwords.

> The documentation  also mentions that the initial admin person is a user 
> named "solr" with a password: "SolrRocks"  What's unclear is whether that's 
> the password on which the hash (in security.json) was created or what?
>
> What I can't figure out is whether the password hash is fixed, or whether it 
> should be generated, and if so, how?

Last I checked, the Solr documentation does NOT explain how to create a
hash in security.json from a password.  It does list the *type* of hash,
which is sha256, password+salt.

With a little bit of research and a lot of trial and error, it is
possible to figure out how to create a valid hash with a tool like openssl.

What some people have done to customize user/password is use that
'solr/SolrRocks' login to *create* another login using the
authentication API, then once they're sure everything's working, access
the API again with the new user to delete the well-documented user.

http://lucene.apache.org/solr/guide/7_2/basic-authentication-plugin.html#editing-authentication-plugin-configuration

> Also, some people on the web recommend altering the jetty xml files to do 
> this - is it necessary too?

The servlet container (almost always Jetty if you're running version 5.0
or later) is capable of doing authentication, completely independently
of whatever software is running inside it.  But configuring that
authentication involves customization of software that is completely
separate from Solr.  The security.json method is a configuration for
Solr, which then programmatically configures the vanilla Jetty install
to do authentication.

Thanks,
Shawn



Re: HDInsight with Solr 4.9.0 Create Collection

2018-03-09 Thread Shawn Heisey
On 3/9/2018 10:41 AM, Abhi Basu wrote:
> That was due to a folder not being present. Is this something to do with
> version?
>
> http://hn0-esohad.mzwz3dh4pb1evcdwc1lcsddrbe.jx.internal.cloudapp.net:8983/solr/admin/collections?action=CREATE&name=ems-collection2&numShards=2&replicationFactor=2&maxShardsPerNode=1
>
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
> CREATEing SolrCore 'ems-collection2_shard2_replica2': Unable to create
> core: ems-collection2_shard2_replica2 Caused by: No enum constant
> org.apache.lucene.util.Version.4.10.3

I see that you've resolved both problems.

If you have additional issues in the future with errors you don't
understand:  We need far more detail from your log.  It is very likely
that the full error message is several lines long, possibly even dozens
of lines long, and may include *multiple* java stacktraces.  Try looking
at solr.log instead of the admin UI.  For 4.x versions, we usually have
no way of knowing where the logfile ends up, because there are MANY
different ways 4.x versions can be installed and started.

Thanks,
Shawn



The Impact of the Number of Collections on Indexing Performance in Solr 6.0

2018-03-09 Thread 苗海泉
hello,We found a problem. In solr 6.0, the indexing speed of solr is
influenced by the number of solr collections. The speed is normal before
the limit is reached. If the limit is reached, the indexing speed will
decrease by 50 times.

In our environment, there are 49 solr nodes. If each collection has 25
shards, you can maintain high-speed indexing until the total number of
collections is about 900. To reduce the number of collections to the limit,
the speed will increase. Go up.
If each collection is 49 shards, the total number of collections can only
be about 700, exceeding this value will cause the index to drop
dramatically.
In the explanation, we are single copies, and multiple copies will cause
serious stability problems in the large solr cluster environment.

At first I suspect that it was due to too many thread submissions, and
there are still problems with this method, so I'm inclined to
searcherExecutor thread pool thread. This is just my guess, I want to know
the real reason. Can someone know if I can help?

Also, I noticed that the searcherExecutor thread and solr collection's
shards basically correspond to each other. How can I reduce the number of
threads or even close it? Although there are many collections in our
environment, there are few queries and it is not necessary to keep the
threads open to provide queries. This is too wasteful.

thank you .