Re: How to maintain fast query speed during heavy indexing?

2018-05-22 Thread Erick Erickson
There are two issues:

1> autowarming on the replicas

2> Until https://issues.apache.org/jira/browse/SOLR-11982 (Solr 7.4,
unreleased), requests would go to the leaders along with the PULL and
TLOG replicas. Since the leaders were busily indexing, the entire
query would suffer speed-wise.

So what I'd do is see if you can apply the patch there and adjust your
autowarming. Solr 7.4 will be out in the not-too-distant future,
perhaps over the summer. No real schedule has been agreed on though,

Best,
Erick

On Mon, May 21, 2018 at 9:23 PM, Nguyen Nguyen
 wrote:
> Hello everyone,
>
> I'm running SolrCloud cluster of 5 nodes with 5 shards and 3 replicas per
> shard.  I usually see spikes in query performance during high indexing
> period. I would like to have stable query response time even during high
> indexing period.  I recently upgraded to Solr 7.3 and running with 2 TLOG
> replicas and 1 PULL replica.  Using a small maxWriteMBPerSec for
> replication and only query PULL replicas during indexing period, I'm still
> seeing long query time for some queries (although not as often as before
> the change).
>
> My first question is 'Is it possible to control replication of non-leader
> like in master/slave configuration (eg: disablepoll, fetchindex)?'.  This
> way, I can disable replication on the followers until committing is
> completed on the leaders while sending query requests to the followers (or
> just PULL replica) only.  Then when data is committed on leaders, I would
> send query requests back to only leaders and tell the followers to start to
> fetch the newly updated index.
>
> If manual replication control isn't possible, I'm planning to have
> duplicate collections and use an alias to switch between the two collection
> at different times.  For example: while 'collection1' collection being
> indexed, and alias 'search' would point to 'collection2' collection to
> serve query request.  Once indexing is completed on 'collection1', 'search'
> alias would now point to 'collection1', and 'collection2' will be updated
> to be in sync with 'collection1'.  The cycle repeats for  next indexing
> cycle.  My question for this method would be if there is any existing
> method to sync one collection to another so that I don't have to send the
> same update requests to the two collections.
>
> Also wondering if there are other better methods everyone is using?
>
> Thanks much!
>
> Cheers,
>
> -Nguyen


Solr Dates TimeZone

2018-05-22 Thread LOPEZ-CORTES Mariano-ext
Hi

It's possible to configure Solr with a timezone other than GMT?
It's possible to configure Solr Admin to view dates with a timezone other than 
GMT?
What is the best way to store a birth date in Solr? We use TrieDate type.

Thanks!


Zookeeper 3.4.12 with Solr 6.6.2?

2018-05-22 Thread Walter Underwood
Is anybody running Zookeeper 3.4.12 with Solr 6.6.2? Is that a recommended 
combination? Not recommended?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Atomic update error with JSON handler

2018-05-22 Thread Nándor Mátravölgyi
Hi,
Firstly, I have already tried the request body enclosed in [...] without
success. Turns out it was not the only issue. The path was not right for
the atomic updates:

On the v2 API:
localhost:8983/v2/c/testnode/update/json/commands?commit=true Succeeds
localhost:8983/v2/c/testnode/update/json?commit=true Fails
localhost:8983/v2/c/testnode/update?commit=true Fails

On the old API:
localhost:8983/solr/testnode/update/json?commit=true Succeeds
localhost:8983/solr/testnode/update/json/docs?commit=true Fails

Some insight about what caused my confusion:
In the (
https://lucene.apache.org/solr/guide/7_3/updating-parts-of-documents.html#example-updating-part-of-a-document
) page of the Solr Guide, it is not emphasized that for an atomic JSON
update to work you must use the command endpoint. Furthermore the example
JSONs in that paragraph do not have the actual commands like
add/delete/commit level as they are shown in a previous page. (
https://lucene.apache.org/solr/guide/7_3/uploading-data-with-index-handlers.html#sending-json-update-commands
)
Either boldly stating that the atomic updates are commands or showing
complete JSON requests as examples would have been much clearer.
It is also surprising to me that the command endpoint accepts the "list of
documents" format where the guide does not mention that. (at the second
link provided)

Thank you for pointing me in the right direction!
Nandor

On Tue, May 22, 2018 at 8:14 AM, Yasufumi Mizoguchi 
wrote:

> Hi,
>
> At least, it is better to enclose your json body with '[ ]', I think.
>
> Following is the result I tried using curl.
>
> $ curl -XPOST "localhost:8983/solr/test_core/update/json?commit=true"
> --data-binary '{"id":"test1","title":{"set":"Solr Rocks"}}'
> {
>   "responseHeader":{
> "status":400,
> "QTime":18},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Unknown command 'id' at [5]",
> "code":400}}
> $ curl -XPOST "localhost:8983/solr/test_core/update/json?commit=true"
> --data-binary '[{"id":"test1","title":{"set":"Solr Rocks"}}]'
> {
>   "responseHeader":{
> "status":0,
> "QTime":250}}
>
> Thanks,
> Yasufumi
>
>
> 2018年5月22日(火) 1:26 Nándor Mátravölgyi :
>
> > Hi,
> >
> > I'm trying to build a simple document search core with SolrCloud. I've
> run
> > into an issue when trying to partially update doucments. (aka atomic
> > updates) It appears to be a bug, because the semantically same request
> > succeeds in XML format, while it fails as JSON.
> >
> > The body of the XML request:
> > test1 > update="set">Solr Rocks
> >
> > The body of the JSON request:
> > {"id":"test1","title":{"set":"Solr Rocks"}}
> >
> > I'm using the requests library in Python3 to send the update request.
> > Sending the XML request with the following code works as expected:
> > r = requests.post('
> > http://localhost:8983/v2/c/testnode/update/xml?commit=true',
> > headers={'Content-type': 'application/xml'}, data=xml)
> >
> > Sending the JSON request with the following codes return with a
> > SolrException:
> > r = requests.post('
> > http://localhost:8983/v2/c/testnode/update/json?commit=true',
> > headers={'Content-type': 'application/json'}, data=json)
> > r = requests.post('
> > http://localhost:8983/solr/testnode/update/json/docs?commit=true',
> > headers={'Content-type': 'application/json'}, data=json)
> >
> > Using the same lines of code to send a JSON request that is not an atomic
> > update works as expected. Such JSON request body is like:
> > {"id":"test1","title":"Solr Rocks"}
> >
> > The error message in the response is: ERROR: [doc=test1] unknown field
> > 'title.set'
> > Here is the log of the exception: https://pastebin.com/raw/VJe5hR25
> >
> > Depending on which API I send the request to, the logs are identical
> except
> > on line 27 and 28:
> > This is with v2:
> >   at
> >
> > org.apache.solr.handler.UpdateRequestHandlerApi$1.
> call(UpdateRequestHandlerApi.java:48)
> >   at org.apache.solr.api.V2HttpCall.execute(V2HttpCall.java:325)
> > and this is with the other:
> >   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
> >   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
> >
> > I'm using Solr 7.3.1 and I believe I do everything according to the
> > documentation. (
> >
> > https://lucene.apache.org/solr/guide/7_3/updating-parts-
> of-documents.html#atomic-updates
> > )
> > The solrconfig.xml and managed-schema files are fairly simple, they have
> > code snippets from the examples mostly: https://pastebin.com/199JJkp0
> > https://pastebin.com/Dp1YK46k
> >
> > This could be a bug, or I can't fathom what I'm missing. Can anyone help
> me
> > out?
> > Thanks,
> > Nandor
> >
>


Re: Zookeeper 3.4.12 with Solr 6.6.2?

2018-05-22 Thread Tim Casey
We have 3.4.10 and have *tested* at a functional level 6.6.2.  So far it
works. We have not done any stress/load testing.  But would have to do this
prior to release.

On Tue, May 22, 2018 at 9:44 AM, Walter Underwood 
wrote:

> Is anybody running Zookeeper 3.4.12 with Solr 6.6.2? Is that a recommended
> combination? Not recommended?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


deletebyQuery vs deletebyId

2018-05-22 Thread Jay Potharaju
Hi,
I have a quick question about deletebyQuery vs deleteById. When using
deleteByQuery, if query is id:123 is that same as deleteById in terms of
performance.


Thanks
Jay


Re: deletebyQuery vs deletebyId

2018-05-22 Thread Shawn Heisey

On 5/22/2018 6:35 PM, Jay Potharaju wrote:

I have a quick question about deletebyQuery vs deleteById. When using
deleteByQuery, if query is id:123 is that same as deleteById in terms of
performance.


If there is absolutely nothing else happening to update the index, the 
difference between the two would probably be outside normal human 
perception of time -- I think you'd only be able to see the difference 
by measuring it with software, and you might need something that can 
show time units below one millisecond.  On a query that matches a lot of 
documents, the difference might be more pronounced, but likely still 
pretty small.


The issue with DBQ, which I already explained to you on another mailing 
list thread, is that DBQ can interact badly with other operations, 
segment merges in particular.  The delete itself won't take very long, 
but the simple fact that DBQ was used might result in a noticeable pause 
in your indexing operations.


http://lucene.472066.n3.nabble.com/Async-exceptions-during-distributed-update-td4388725.html#a4388787

As mentioned there, the pauses don't happen with id-based delete.

Thanks,
Shawn



Is it possible to index documents without storing their content?

2018-05-22 Thread Thomas Lustig
dear community,

Is it possible to index documents (e.g. pdf, word,...)  for fulltextsearch
without storing their content(payload) inside Solr server?

Thanking you in advance for your help

BR

Tom


Re: Question regarding TLS version for solr

2018-05-22 Thread Anchal Sharma2
 Hi Christopher /Shawn ,

Thank you for replying .But ,I checked the java version solr using ,and it is 
already  version 1.8.

@Christopher ,can you let me know what steps you followed for TLS 
authentication on solr version 7.3.0.

Thanks & Regards,
-
Anchal Sharma
e-Pricer Development
ES Team
Mobile: +9871290248

-Christopher Schultz  wrote: -
To: solr-user@lucene.apache.org
From: Christopher Schultz 
Date: 05/17/2018 06:29PM
Subject: Re: Question regarding TLS version for solr

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 5/17/18 4:23 AM, Shawn Heisey wrote:
> On 5/17/2018 1:53 AM, Anchal Sharma2 wrote:
>> We are using solr version 5.3.0 and  have been  trying to enable 
>> security on our solr .We followed steps mentioned on site 
>> -https://lucene.apache.org/solr/guide/6_6/enabling-ssl.html .But
>> by default it picks ,TLS version  1.0,which is causing an issue
>> as our application uses TLSv 1.2.We tried using online resources
>> ,but could not find anything regarding TLS enablement for solr .
>> 
>> It will be a huge help if anyone can provide some suggestions as
>> to how we can enable TLS v 1.2 for solr.
> 
> The choice of ciphers and encryption protocols is mostly made by
> Java. The servlet container might influence it as well. The only
> servlet container that is supported since Solr 5.0 is the Jetty
> that is bundled in the Solr download.
> 
> TLS 1.2 was added in Java 7, and it became default in Java 8. If
> you can install the latest version of Java 8 and make sure that it
> has the policy files for unlimited crypto strength installed,
> support for TLS 1.2 might happen automatically.

There is no "default" TLS version for either the client or the server:
the two endpoints always negotiate the highest mutual version they
both support. The key agreement, authentication, and cipher suites are
the items that are negotiated during the handshake.

> Solr 5.3.0 is running a fairly old version of Jetty -- 9.2.11. 
> Information for 9.2.x versions is hard to find, so although I think
> it probably CAN do TLS 1.2 if the Java version supports it, I can't
> be absolutely sure.  You'll need to upgrade Solr to get an upgraded
> Jetty.

I would be shocked if Jetty ships with its own crypto libraries; it
should be using JSSE.

Anchal,

Java 1.7 or later is an absolute requirement if you want to use
TLSv1.2 (and you SHOULD want to use it).

I have recently spent a lot of time getting Solr 7.3.0 running with
TLS mutual-authentication, but I haven't worked with the 5.3.x line. I
can tell you have I've done things for my version, but they may need
some adjustments for yours.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlr9fKYACgkQHPApP6U8
pFh8lRAAmmvBMUSk35keW0OG0/SHpUy/ExJK69JGIKGwi96ddbz2yH8MG+OjjE3G
GNq/o5+EMT7tP/nW6XuPQou5UQvA2nlA9jsskox3A+CqOH7e6cbSxfxIkTqf9YDl
Kxr4J6mYjvTIjJAqLXGF+ghJfswS6RjZezDgo1PdSUox+gUOvmY61tlSjuYTaAYw
vH1i1DRzb8PkkR4ULePF48Y4r5+ZYz/4ZwSvnJTTkyl97KCw93rZ/kI5v9p3cCHK
Ycuwi/ZirO/VNf/9ruAOtgET3aojNfuNCX/A+vrSbJfiY7mXo05lYKN+eT80elQr
X8OKQaqHP6haF2aNPHrqXGtY2YoiGrdyaGtrXkUHFDfXgQeOmlk/eSVWemcSsatk
eEHSWW9NALMaalRAM7NuXQtgqq1badJhKysiJwSqFgcdgVKcSt8SsQ/09qTPjaNE
Ce1/EHdR6j1hM0Bnv5Hzf85cZjM7PfLmh7P8fnUD5d8eSbBpeWYVBDsS+fXp8WWv
FO5axbnSYIScOIz33i0UZyxpJgcsAkABLGghL6WWQSkfBf4ANgdTumS7K9Pn7Thz
Uq+lD9QPEPWJ91Fc0gnCWtDAEIRjOyLLbYzgI4ebV5qo41GO1WDDHfQZEcqA0Vod
+K8oAMD8nnwU+TprTFkjlQwbDnW1q1efTD6IrpEL5H7h6Xw2cgg=
=RpO6
-END PGP SIGNATURE-




Re: How to maintain fast query speed during heavy indexing?

2018-05-22 Thread Nguyen Nguyen
Great info!  Thanks, Erick!

Cheers,
Nguyen

On Tue, May 22, 2018 at 5:45 AM Erick Erickson 
wrote:

> There are two issues:
>
> 1> autowarming on the replicas
>
> 2> Until https://issues.apache.org/jira/browse/SOLR-11982 (Solr 7.4,
> unreleased), requests would go to the leaders along with the PULL and
> TLOG replicas. Since the leaders were busily indexing, the entire
> query would suffer speed-wise.
>
> So what I'd do is see if you can apply the patch there and adjust your
> autowarming. Solr 7.4 will be out in the not-too-distant future,
> perhaps over the summer. No real schedule has been agreed on though,
>
> Best,
> Erick
>
> On Mon, May 21, 2018 at 9:23 PM, Nguyen Nguyen
>  wrote:
> > Hello everyone,
> >
> > I'm running SolrCloud cluster of 5 nodes with 5 shards and 3 replicas per
> > shard.  I usually see spikes in query performance during high indexing
> > period. I would like to have stable query response time even during high
> > indexing period.  I recently upgraded to Solr 7.3 and running with 2 TLOG
> > replicas and 1 PULL replica.  Using a small maxWriteMBPerSec for
> > replication and only query PULL replicas during indexing period, I'm
> still
> > seeing long query time for some queries (although not as often as before
> > the change).
> >
> > My first question is 'Is it possible to control replication of non-leader
> > like in master/slave configuration (eg: disablepoll, fetchindex)?'.  This
> > way, I can disable replication on the followers until committing is
> > completed on the leaders while sending query requests to the followers
> (or
> > just PULL replica) only.  Then when data is committed on leaders, I would
> > send query requests back to only leaders and tell the followers to start
> to
> > fetch the newly updated index.
> >
> > If manual replication control isn't possible, I'm planning to have
> > duplicate collections and use an alias to switch between the two
> collection
> > at different times.  For example: while 'collection1' collection being
> > indexed, and alias 'search' would point to 'collection2' collection to
> > serve query request.  Once indexing is completed on 'collection1',
> 'search'
> > alias would now point to 'collection1', and 'collection2' will be updated
> > to be in sync with 'collection1'.  The cycle repeats for  next indexing
> > cycle.  My question for this method would be if there is any existing
> > method to sync one collection to another so that I don't have to send the
> > same update requests to the two collections.
> >
> > Also wondering if there are other better methods everyone is using?
> >
> > Thanks much!
> >
> > Cheers,
> >
> > -Nguyen
>