date:20200618

[jira] [Commented] (SOLR-14575) Solr restore is failing when basic authentication is enabled

2020-06-18 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139151#comment-17139151
 ] 

Jan Høydahl commented on SOLR-14575:


Can you please attach the related log lines from solr.log file? Include enough 
context to see all that is going on. If you have more than one node, include 
logs from all nodes involved in the restore operation.

How did you install and configure Solr? How many nodes? What does your 
security.json look like?

> Solr restore is failing when basic authentication is enabled
> 
>
> Key: SOLR-14575
> URL: https://issues.apache.org/jira/browse/SOLR-14575
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 8.2
>Reporter: Yaswanth
>Priority: Blocker
>
> Hi Team,
> I was testing backup / restore for solrcloud and its failing exactly when I 
> am trying to restore a successfully backed up collection.
> I am using solr 8.2 with basic authentication enabled and then creating a 2 
> replica collection. When I run the backup like
> curl -u xxx:xxx -k 
> '[https://x.x.x.x:8080/solr/admin/collections?action=BACKUP&name=test&collection=test&location=/solrdatabkup'|https://x.x.x.x:8080/solr/admin/collections?action=BACKUP&name=test&collection=test&location=/solrdatabkup%27]
>  it worked fine and I do see a folder was created with the collection name 
> under /solrdatabackup
> But now when I deleted the existing collection and then try running restore 
> api like
> curl -u xxx:xxx -k 
> '[https://x.x.x.x:8080/solr/admin/collections?action=RESTORE&name=test&collection=test&location=/solrdatabkup'|https://x.x.x.x:8080/solr/admin/collections?action=BACKUP&name=test&collection=test&location=/solrdatabkup%27]
>  its throwing an error like 
> {
>  "responseHeader":{
>  "status":500,
>  "QTime":457},
>  "Operation restore caused 
> *exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>  ADDREPLICA failed to create replica",*
>  "exception":{
>  "msg":"ADDREPLICA failed to create replica",
>  "rspCode":500},
>  "error":{
>  "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","org.apache.solr.common.SolrException"],
>  "msg":"ADDREPLICA failed to create replica",
>  "trace":"org.apache.solr.common.SolrException: ADDREPLICA failed to create 
> replica\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:280)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:252)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:820)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:786)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:546)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
>  
> org.eclipse.jetty.server.

[jira] [Created] (LUCENE-9409) TestAllFilesDetectTruncation failures

2020-06-18 Thread Adrien Grand (Jira)

Adrien Grand created LUCENE-9409:


 Summary: TestAllFilesDetectTruncation failures
 Key: LUCENE-9409
 URL: https://issues.apache.org/jira/browse/LUCENE-9409
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


The Elastic CI found a seed that reproducibly fails 
TestAllFilesDetectTruncation.

https://elasticsearch-ci.elastic.co/job/apache+lucene-solr+nightly+branch_8x/85/console

This is a consequence of LUCENE-9396: we now check for truncation after 
creating slices, so in some cases you would get an IndexOutOfBoundsException 
rather than CorruptIndexException/EOFException if out-of-bounds slices get 
created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz opened a new pull request #1593: LUCENE-9409: Check file lengths before creating slices.

2020-06-18 Thread GitBox



jpountz opened a new pull request #1593:
URL: https://github.com/apache/lucene-solr/pull/1593


   This changes terms and points to check the length of the index/data
   files before creating slices in these files. A side-effect of this is
   that we can no longer verify checksums of the meta file before checking
   the length of other files, but this shouldn't be a problem. On the other
   hand it helps make sure that we would return a clear exception in case
   of truncation instead of a confusing OutOfBoundsException that isn't
   clear whether it's due to index corruption or a bug in Lucene.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-18 Thread GitBox



s1monw commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r442023300



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3255,7 +3302,16 @@ private long prepareCommitInternal() throws IOException {
   } finally {
 maybeCloseOnTragicEvent();
   }
- 
+
+  if (onCommitMerges != null) {
+mergeScheduler.merge(mergeSource, MergeTrigger.COMMIT);

Review comment:
   the last thing that I am afraid about is what if we has a MergeScheduler 
configured that blocks on this call like SerialMergeScheduler? I think there 
are multiple options like: documentation, skipping `COMMIT` merge triggers in 
SMS, adding a mergeAsync method to MS that has no impl in SMS... I think we 
should make sure that this is not trappy.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14210) Add replica state option for HealthCheckHandler

2020-06-18 Thread Nazerke Seidan (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139190#comment-17139190
 ] 

Nazerke Seidan commented on SOLR-14210:
---

Wondering if Zk is down/unavailable, can we still get the status of the cores? 

> Add replica state option for HealthCheckHandler
> ---
>
> Key: SOLR-14210
> URL: https://issues.apache.org/jira/browse/SOLR-14210
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.5
>Reporter: Houston Putman
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.6
>
> Attachments: docs.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> h2. Background
> As was brought up in SOLR-13055, in order to run Solr in a more cloud-native 
> way, we need some additional features around node-level healthchecks.
> {quote}Like in Kubernetes we need 'liveliness' and 'readiness' probe 
> explained in 
> [https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/n]
>  determine if a node is live and ready to serve live traffic.
> {quote}
>  
> However there are issues around kubernetes managing it's own rolling 
> restarts. With the current healthcheck setup, it's easy to envision a 
> scenario in which Solr reports itself as "healthy" when all of its replicas 
> are actually recovering. Therefore kubernetes, seeing a healthy pod would 
> then go and restart the next Solr node. This can happen until all replicas 
> are "recovering" and none are healthy. (maybe the last one restarted will be 
> "down", but still there are no "active" replicas)
> h2. Proposal
> I propose we make an additional healthcheck handler that returns whether all 
> replicas hosted by that Solr node are healthy and "active". That way we will 
> be able to use the [default kubernetes rolling restart 
> logic|https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#update-strategies]
>  with Solr.
> To add on to [Jan's point 
> here|https://issues.apache.org/jira/browse/SOLR-13055?focusedCommentId=16716559&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16716559],
>  this handler should be more friendly for other Content-Types and should use 
> bettter HTTP response statuses.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14210) Add replica state option for HealthCheckHandler

2020-06-18 Thread Nazerke Seidan (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139190#comment-17139190
 ] 

Nazerke Seidan edited comment on SOLR-14210 at 6/18/20, 8:03 AM:
-

If Zk is down/unavailable, we can't get the status of the cores. I think this 
should be configurable without Zk we should be able to ping the solr cores. 


was (Author: seidan):
Wondering if Zk is down/unavailable, can we still get the status of the cores? 

> Add replica state option for HealthCheckHandler
> ---
>
> Key: SOLR-14210
> URL: https://issues.apache.org/jira/browse/SOLR-14210
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.5
>Reporter: Houston Putman
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.6
>
> Attachments: docs.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> h2. Background
> As was brought up in SOLR-13055, in order to run Solr in a more cloud-native 
> way, we need some additional features around node-level healthchecks.
> {quote}Like in Kubernetes we need 'liveliness' and 'readiness' probe 
> explained in 
> [https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/n]
>  determine if a node is live and ready to serve live traffic.
> {quote}
>  
> However there are issues around kubernetes managing it's own rolling 
> restarts. With the current healthcheck setup, it's easy to envision a 
> scenario in which Solr reports itself as "healthy" when all of its replicas 
> are actually recovering. Therefore kubernetes, seeing a healthy pod would 
> then go and restart the next Solr node. This can happen until all replicas 
> are "recovering" and none are healthy. (maybe the last one restarted will be 
> "down", but still there are no "active" replicas)
> h2. Proposal
> I propose we make an additional healthcheck handler that returns whether all 
> replicas hosted by that Solr node are healthy and "active". That way we will 
> be able to use the [default kubernetes rolling restart 
> logic|https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#update-strategies]
>  with Solr.
> To add on to [Jan's point 
> here|https://issues.apache.org/jira/browse/SOLR-13055?focusedCommentId=16716559&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16716559],
>  this handler should be more friendly for other Content-Types and should use 
> bettter HTTP response statuses.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msfroh commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-18 Thread GitBox



msfroh commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r442043040



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3255,7 +3302,16 @@ private long prepareCommitInternal() throws IOException {
   } finally {
 maybeCloseOnTragicEvent();
   }
- 
+
+  if (onCommitMerges != null) {
+mergeScheduler.merge(mergeSource, MergeTrigger.COMMIT);

Review comment:
   Would it be sufficient to document the behavior in the Javadoc for 
`findFullFlushMerges`?
   
   I was assuming that any implementation of `findFullFlushMerges` would try to 
return merges that are very likely complete within whatever timeout someone 
would reasonably set (e.g. a few seconds). The timeout was intended just as an 
extra safeguard in case a merge takes longer. 
   
   Given that lots of IndexWriter operations can have pauses with 
`SerialMergeScheduler` (judging by the number of calls to `maybeMerge`, 
especially the one from `processEvents`, in IndexWriter), blocking on this 
particular `merge` call doesn't feel like it introduces more risk (especially 
since it needs to be used in conjunction with a `MergePolicy` that implements 
`findFullFlushMerges`).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msfroh commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-18 Thread GitBox



msfroh commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r442043040



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3255,7 +3302,16 @@ private long prepareCommitInternal() throws IOException {
   } finally {
 maybeCloseOnTragicEvent();
   }
- 
+
+  if (onCommitMerges != null) {
+mergeScheduler.merge(mergeSource, MergeTrigger.COMMIT);

Review comment:
   Would it be sufficient to document the behavior in the Javadoc for 
`findFullFlushMerges`?
   
   I was assuming that any implementation of `findFullFlushMerges` would try to 
return merges that are very likely to complete within whatever timeout someone 
would reasonably set (e.g. a few seconds). The timeout was intended just as an 
extra safeguard in case a merge takes longer. 
   
   Given that lots of IndexWriter operations can have pauses with 
`SerialMergeScheduler` (judging by the number of calls to `maybeMerge`, 
especially the one from `processEvents`, in IndexWriter), blocking on this 
particular `merge` call doesn't feel like it introduces more risk (especially 
since it needs to be used in conjunction with a `MergePolicy` that implements 
`findFullFlushMerges`).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14581) Document the way auto commits work in SolrCloud

2020-06-18 Thread Bram Van Dam (Jira)

Bram Van Dam created SOLR-14581:
---

 Summary: Document the way auto commits work in SolrCloud
 Key: SOLR-14581
 URL: https://issues.apache.org/jira/browse/SOLR-14581
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: documentation, SolrCloud
Reporter: Bram Van Dam


The documentation is unclear about how auto commits actually work in SolrCloud. 
A mailing list reply by Erick Erickson proved to be enlightening. 

Erick's reply verbatim:
{quote}Each node has its own timer that starts when it receives an update.
So in your situation, 60 seconds after any give replica gets it’s first
update, all documents that have been received in the interval will
be committed.

But note several things:

1> commits will tend to cluster for a given shard. By that I mean
they’ll tend to happen within a few milliseconds of each other
   ‘cause it doesn’t take that long for an update to get from the
   leader to all the followers.

2> this is per replica. So if you host replicas from multiple collections
   on some node, their commits have no relation to each other. And
   say for some reason you transmit exactly one document that lands
   on shard1. Further, say nodeA contains replicas for shard1 and shard2.
   Only the replica for shard1 would commit.

3> Solr promises eventual consistency. In this case, due to all the
   timing variables it is not guaranteed that every replica of a single
   shard has the same document available for search at any given time.
   Say doc1 hits the leader at time T and a follower at time T+10ms.
   Say doc2 hits the leader and gets indexed 5ms before the 
   commit is triggered, but for some reason it takes 15ms for it to get
   to the follower. The leader will be able to search doc2, but the
  follower won’t until 60 seconds later.{quote}

Perhaps the subject deserves a section of its own, but I'll attach a patch 
which includes the gist of Erick's reply as a Tip in the "indexing in 
SolrCloud"-section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14581) Document the way auto commits work in SolrCloud

2020-06-18 Thread Bram Van Dam (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bram Van Dam updated SOLR-14581:

Affects Version/s: master (9.0)

> Document the way auto commits work in SolrCloud
> ---
>
> Key: SOLR-14581
> URL: https://issues.apache.org/jira/browse/SOLR-14581
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation, SolrCloud
>Affects Versions: master (9.0)
>Reporter: Bram Van Dam
>Priority: Minor
> Attachments: SOLR-14581.patch
>
>
> The documentation is unclear about how auto commits actually work in 
> SolrCloud. A mailing list reply by Erick Erickson proved to be enlightening. 
> Erick's reply verbatim:
> {quote}Each node has its own timer that starts when it receives an update.
> So in your situation, 60 seconds after any give replica gets it’s first
> update, all documents that have been received in the interval will
> be committed.
> But note several things:
> 1> commits will tend to cluster for a given shard. By that I mean
> they’ll tend to happen within a few milliseconds of each other
>‘cause it doesn’t take that long for an update to get from the
>leader to all the followers.
> 2> this is per replica. So if you host replicas from multiple collections
>on some node, their commits have no relation to each other. And
>say for some reason you transmit exactly one document that lands
>on shard1. Further, say nodeA contains replicas for shard1 and shard2.
>Only the replica for shard1 would commit.
> 3> Solr promises eventual consistency. In this case, due to all the
>timing variables it is not guaranteed that every replica of a single
>shard has the same document available for search at any given time.
>Say doc1 hits the leader at time T and a follower at time T+10ms.
>Say doc2 hits the leader and gets indexed 5ms before the 
>commit is triggered, but for some reason it takes 15ms for it to get
>to the follower. The leader will be able to search doc2, but the
>   follower won’t until 60 seconds later.{quote}
> Perhaps the subject deserves a section of its own, but I'll attach a patch 
> which includes the gist of Erick's reply as a Tip in the "indexing in 
> SolrCloud"-section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14581) Document the way auto commits work in SolrCloud

2020-06-18 Thread Bram Van Dam (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bram Van Dam updated SOLR-14581:

Attachment: SOLR-14581.patch
Status: Open  (was: Open)

> Document the way auto commits work in SolrCloud
> ---
>
> Key: SOLR-14581
> URL: https://issues.apache.org/jira/browse/SOLR-14581
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation, SolrCloud
>Reporter: Bram Van Dam
>Priority: Minor
> Attachments: SOLR-14581.patch
>
>
> The documentation is unclear about how auto commits actually work in 
> SolrCloud. A mailing list reply by Erick Erickson proved to be enlightening. 
> Erick's reply verbatim:
> {quote}Each node has its own timer that starts when it receives an update.
> So in your situation, 60 seconds after any give replica gets it’s first
> update, all documents that have been received in the interval will
> be committed.
> But note several things:
> 1> commits will tend to cluster for a given shard. By that I mean
> they’ll tend to happen within a few milliseconds of each other
>‘cause it doesn’t take that long for an update to get from the
>leader to all the followers.
> 2> this is per replica. So if you host replicas from multiple collections
>on some node, their commits have no relation to each other. And
>say for some reason you transmit exactly one document that lands
>on shard1. Further, say nodeA contains replicas for shard1 and shard2.
>Only the replica for shard1 would commit.
> 3> Solr promises eventual consistency. In this case, due to all the
>timing variables it is not guaranteed that every replica of a single
>shard has the same document available for search at any given time.
>Say doc1 hits the leader at time T and a follower at time T+10ms.
>Say doc2 hits the leader and gets indexed 5ms before the 
>commit is triggered, but for some reason it takes 15ms for it to get
>to the follower. The leader will be able to search doc2, but the
>   follower won’t until 60 seconds later.{quote}
> Perhaps the subject deserves a section of its own, but I'll attach a patch 
> which includes the gist of Erick's reply as a Tip in the "indexing in 
> SolrCloud"-section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14581) Document the way auto commits work in SolrCloud

2020-06-18 Thread Bram Van Dam (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bram Van Dam updated SOLR-14581:

Status: Patch Available  (was: Open)

> Document the way auto commits work in SolrCloud
> ---
>
> Key: SOLR-14581
> URL: https://issues.apache.org/jira/browse/SOLR-14581
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation, SolrCloud
>Affects Versions: master (9.0)
>Reporter: Bram Van Dam
>Priority: Minor
> Attachments: SOLR-14581.patch
>
>
> The documentation is unclear about how auto commits actually work in 
> SolrCloud. A mailing list reply by Erick Erickson proved to be enlightening. 
> Erick's reply verbatim:
> {quote}Each node has its own timer that starts when it receives an update.
> So in your situation, 60 seconds after any give replica gets it’s first
> update, all documents that have been received in the interval will
> be committed.
> But note several things:
> 1> commits will tend to cluster for a given shard. By that I mean
> they’ll tend to happen within a few milliseconds of each other
>‘cause it doesn’t take that long for an update to get from the
>leader to all the followers.
> 2> this is per replica. So if you host replicas from multiple collections
>on some node, their commits have no relation to each other. And
>say for some reason you transmit exactly one document that lands
>on shard1. Further, say nodeA contains replicas for shard1 and shard2.
>Only the replica for shard1 would commit.
> 3> Solr promises eventual consistency. In this case, due to all the
>timing variables it is not guaranteed that every replica of a single
>shard has the same document available for search at any given time.
>Say doc1 hits the leader at time T and a follower at time T+10ms.
>Say doc2 hits the leader and gets indexed 5ms before the 
>commit is triggered, but for some reason it takes 15ms for it to get
>to the follower. The leader will be able to search doc2, but the
>   follower won’t until 60 seconds later.{quote}
> Perhaps the subject deserves a section of its own, but I'll attach a patch 
> which includes the gist of Erick's reply as a Tip in the "indexing in 
> SolrCloud"-section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-9060) Spellcheck sort by frequency in solrcloud

2020-06-18 Thread Thomas Corthals (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139269#comment-17139269
 ] 

Thomas Corthals commented on SOLR-9060:
---

I ran into this issue with Solr 7 and 8. The behaviour has changed a little 
since this was first reported. The sorting seems to be correct now with 
{{spellcheck.extendedResults=true}}, but not with 
{{spellcheck.extendedResults=false}}.

Tested on Solr 7.7.3 and Solr 8.5.2 in cloud mode with 2 nodes, 1 collection 
with numShards = 2 & replicationFactor = 1, techproducts configset and example 
data:
{code:java}
$ curl 
'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=false'

"suggestion":["cord",
  "corp",
  "card"]}],  

$ curl 
'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=true'

"suggestion":[{
"word":"corp",
"freq":2},
  {
"word":"cord",
"freq":1},
  {
"word":"card",
"freq":4}]}], 
{code}
 

I made a full comparison for standalone and cloud, on both the {{/spell}} and 
{{/browse}} request handlers in techproducts.

 
| |*Standalone*|*Cloud*|
|{{ ** }}|{{*/spell*}}|{{*/browse*}}|{{*/spell*}}|{{*/browse*}}|
|{{*spellcheck.extendedResults=false*}}|{{"suggestion":["corp",}}
{{ "cord",}}
{{ "card"]}],}}|{{"suggestion":["corp",}}
{{ "cord",}}
{{ "card"]}],}}|{{"suggestion":[{color:#FF}"cord"{color},}}
{{{color:#FF} "corp"{color},}}
{{ "card"]}],}}|{{"suggestion":[{color:#FF}"cord"{color},}}
{{{color:#FF} "corp"{color},}}
{{ "card"]}],}}|
|{{*spellcheck.extendedResults=true*}}|{{"suggestion":[{}}
{{ "word":"corp",}}
{{ "freq":2},}}
{{ {}}
{{ "word":"cord",}}
{{ "freq":1},}}
{{ {}}
{{ "word":"card",}}
{{ "freq":4}]}],}}|{{"suggestion":[{}}
{{ "word":"corp",}}
{{ "freq":2},}}
{{ {}}
{{ "word":"cord",}}
{{ "freq":1},}}
{{ {}}
{{ "word":"card",}}
{{ "freq":4}]}],}}|{{"suggestion":[{}}
{{ "word":"corp",}}
{{ "freq":2},}}
{{ {}}
{{ "word":"cord",}}
{{ "freq":1},}}
{{ {}}
{{ "word":"card",}}
{{ "freq":4}]}],}}|{{"suggestion":[{}}
{{ "word":"corp",}}
{{ "freq":2},}}
{{ {}}
{{ "word":"cord",}}
{{ "freq":1},}}
{{ {}}
{{ "word":"card",}}
{{ "freq":4}]}],}}|

> Spellcheck sort by frequency in solrcloud
> -
>
> Key: SOLR-9060
> URL: https://issues.apache.org/jira/browse/SOLR-9060
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Gitanjali Palwe
>Priority: Major
> Attachments: spellcheck-sort-frequency.png
>
>
> The sorting by frequency for spellchecker doesn't work in solrcloud mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-9060) Spellcheck sort by frequency in solrcloud

2020-06-18 Thread Thomas Corthals (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139269#comment-17139269
 ] 

Thomas Corthals edited comment on SOLR-9060 at 6/18/20, 9:53 AM:
-

I ran into this issue with Solr 7 and 8. The behaviour has changed a little 
since this was first reported. The sorting seems to be correct now with 
{{spellcheck.extendedResults=true}}, but not with 
{{spellcheck.extendedResults=false}}.

Tested on Solr 7.7.3 and Solr 8.5.2 in cloud mode with 2 nodes, 1 collection 
with numShards = 2 & replicationFactor = 1, techproducts configset and example 
data:
{code:java}
$ curl 
'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=false'

"suggestion":["cord",
  "corp",
  "card"]}],  

$ curl 
'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=true'

"suggestion":[{
"word":"corp",
"freq":2},
  {
"word":"cord",
"freq":1},
  {
"word":"card",
"freq":4}]}], 
{code}
 

I made a full comparison for standalone and cloud, on both the {{/spell}} and 
{{/browse}} request handlers in techproducts.

 
 
| |*Standalone*|*Standalone*|*Cloud*|*Cloud*|
| ** |*/spell*|*/browse*|*/spell*|*/browse*|
|*spellcheck.extendedResults=false*|"suggestion":["corp",
 "cord",
 "card"]}],|"suggestion":["corp",
 "cord",
 "card"]}],|"suggestion":[{color:#FF}"cord"{color},
 {color:#FF}"corp"{color},
 "card"]}],|"suggestion":[{color:#FF}"cord"{color},
 {color:#FF}"corp"{color},
 "card"]}],|
|*spellcheck.extendedResults=true*|"suggestion":[{
 "word":"corp",
 "freq":2},
 {
 "word":"cord",
 "freq":1},
 {
 "word":"card",
 "freq":4}]}],|"suggestion":[{
 "word":"corp",
 "freq":2},
 {
 "word":"cord",
 "freq":1},
 {
 "word":"card",
 "freq":4}]}],|"suggestion":[{
 "word":"corp",
 "freq":2},
 {
 "word":"cord",
 "freq":1},
 {
 "word":"card",
 "freq":4}]}],|"suggestion":[{
 "word":"corp",
 "freq":2},
 {
 "word":"cord",
 "freq":1},
 {
 "word":"card",
 "freq":4}]}],|


was (Author: thomascorthals):
I ran into this issue with Solr 7 and 8. The behaviour has changed a little 
since this was first reported. The sorting seems to be correct now with 
{{spellcheck.extendedResults=true}}, but not with 
{{spellcheck.extendedResults=false}}.

Tested on Solr 7.7.3 and Solr 8.5.2 in cloud mode with 2 nodes, 1 collection 
with numShards = 2 & replicationFactor = 1, techproducts configset and example 
data:
{code:java}
$ curl 
'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=false'

"suggestion":["cord",
  "corp",
  "card"]}],  

$ curl 
'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=true'

"suggestion":[{
"word":"corp",
"freq":2},
  {
"word":"cord",
"freq":1},
  {
"word":"card",
"freq":4}]}], 
{code}
 

I made a full comparison for standalone and cloud, on both the {{/spell}} and 
{{/browse}} request handlers in techproducts.

 
| |*Standalone*|*Cloud*|
|{{ ** }}|{{*/spell*}}|{{*/browse*}}|{{*/spell*}}|{{*/browse*}}|
|{{*spellcheck.extendedResults=false*}}|{{"suggestion":["corp",}}
{{ "cord",}}
{{ "card"]}],}}|{{"suggestion":["corp",}}
{{ "cord",}}
{{ "card"]}],}}|{{"suggestion":[{color:#FF}"cord"{color},}}
{{{color:#FF} "corp"{color},}}
{{ "card"]}],}}|{{"suggestion":[{color:#FF}"cord"{color},}}
{{{color:#FF} "corp"{color},}}
{{ "card"]}],}}|
|{{*spellcheck.extendedResults=true*}}|{{"suggestion":[{}}
{{ "word":"corp",}}
{{ "freq":2},}}
{{ {}}
{{ "word":"cord",}}
{{ "freq":1},}}
{{ {}}
{{ "word":"card",}}
{{ "freq":4}]}],}}|{{"suggestion":[{}}
{{ "word":"corp",}}
{{ "freq":2},}}
{{ {}}
{{ "word":"cord",}}
{{ "freq":1},}}
{{ {}}
{{ "word":"card",}}
{{ "freq":4}]}],}}|{{"suggestion":[{}}
{{ "word":"corp",}}
{{ "freq":2},}}
{{ {}}
{{ "word":"cord",}}
{{ "freq":1},}}
{{ {}}
{{ "word":"card",}}
{{ "freq":4}]}],}}|{{"suggestion":[{}}
{{ "word":"corp",}}
{{ "freq":2},}}
{{ {}}
{{ "word":"cord",}}
{{ "freq":1},}}
{{ {}}
{{ "word":"card",}}
{{ "freq":4}]}],}}|

> Spellcheck sort by frequency in solrcloud
> -
>
> Key: SOLR-9060
> URL: https://issues.apache.org/jira/browse/SOLR-9060
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Gitanjali Palwe
>Priority: Major
> Attachments: spellcheck-sort-frequency.png
>
>
> The sorting by frequency for spellchecker doesn't work in solrcloud mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[jira] [Comment Edited] (SOLR-9060) Spellcheck sort by frequency in solrcloud

2020-06-18 Thread Thomas Corthals (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139269#comment-17139269
 ] 

Thomas Corthals edited comment on SOLR-9060 at 6/18/20, 9:53 AM:
-

I ran into this issue with Solr 7 and 8. The behaviour has changed a little 
since this was first reported. The sorting seems to be correct now with 
{{spellcheck.extendedResults=true}}, but not with 
{{spellcheck.extendedResults=false}}.

Tested on Solr 7.7.3 and Solr 8.5.2 in cloud mode with 2 nodes, 1 collection 
with numShards = 2 & replicationFactor = 1, techproducts configset and example 
data:
{code:java}
$ curl 
'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=false'

"suggestion":["cord",
  "corp",
  "card"]}],  

$ curl 
'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=true'

"suggestion":[{
"word":"corp",
"freq":2},
  {
"word":"cord",
"freq":1},
  {
"word":"card",
"freq":4}]}], 
{code}
 

I made a full comparison for standalone and cloud, on both the {{/spell}} and 
{{/browse}} request handlers in techproducts.

 
  
| |*Standalone*|*Standalone*|*Cloud*|*Cloud*|
| |*/spell*|*/browse*|*/spell*|*/browse*|
|*spellcheck.extendedResults=false*|"suggestion":["corp",
 "cord",
 "card"]}],|"suggestion":["corp",
 "cord",
 "card"]}],|"suggestion":[{color:#ff}"cord"{color},
 {color:#ff}"corp"{color},
 "card"]}],|"suggestion":[{color:#ff}"cord"{color},
 {color:#ff}"corp"{color},
 "card"]}],|
|*spellcheck.extendedResults=true*|"suggestion":[ \{ "word":"corp", "freq":2},{ 
"word":"cord", "freq":1}
,{ "word":"card", "freq":4}
]}],|"suggestion":[ \{ "word":"corp", "freq":2},{ "word":"cord", "freq":1}
,{ "word":"card", "freq":4}
]}],|"suggestion":[ \{ "word":"corp", "freq":2},{ "word":"cord", "freq":1}
,{ "word":"card", "freq":4}
]}],|"suggestion":[ \{ "word":"corp", "freq":2},{ "word":"cord", "freq":1}
,{ "word":"card", "freq":4}
]}],|


was (Author: thomascorthals):
I ran into this issue with Solr 7 and 8. The behaviour has changed a little 
since this was first reported. The sorting seems to be correct now with 
{{spellcheck.extendedResults=true}}, but not with 
{{spellcheck.extendedResults=false}}.

Tested on Solr 7.7.3 and Solr 8.5.2 in cloud mode with 2 nodes, 1 collection 
with numShards = 2 & replicationFactor = 1, techproducts configset and example 
data:
{code:java}
$ curl 
'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=false'

"suggestion":["cord",
  "corp",
  "card"]}],  

$ curl 
'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=true'

"suggestion":[{
"word":"corp",
"freq":2},
  {
"word":"cord",
"freq":1},
  {
"word":"card",
"freq":4}]}], 
{code}
 

I made a full comparison for standalone and cloud, on both the {{/spell}} and 
{{/browse}} request handlers in techproducts.

 
 
| |*Standalone*|*Standalone*|*Cloud*|*Cloud*|
| ** |*/spell*|*/browse*|*/spell*|*/browse*|
|*spellcheck.extendedResults=false*|"suggestion":["corp",
 "cord",
 "card"]}],|"suggestion":["corp",
 "cord",
 "card"]}],|"suggestion":[{color:#FF}"cord"{color},
 {color:#FF}"corp"{color},
 "card"]}],|"suggestion":[{color:#FF}"cord"{color},
 {color:#FF}"corp"{color},
 "card"]}],|
|*spellcheck.extendedResults=true*|"suggestion":[{
 "word":"corp",
 "freq":2},
 {
 "word":"cord",
 "freq":1},
 {
 "word":"card",
 "freq":4}]}],|"suggestion":[{
 "word":"corp",
 "freq":2},
 {
 "word":"cord",
 "freq":1},
 {
 "word":"card",
 "freq":4}]}],|"suggestion":[{
 "word":"corp",
 "freq":2},
 {
 "word":"cord",
 "freq":1},
 {
 "word":"card",
 "freq":4}]}],|"suggestion":[{
 "word":"corp",
 "freq":2},
 {
 "word":"cord",
 "freq":1},
 {
 "word":"card",
 "freq":4}]}],|

> Spellcheck sort by frequency in solrcloud
> -
>
> Key: SOLR-9060
> URL: https://issues.apache.org/jira/browse/SOLR-9060
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Gitanjali Palwe
>Priority: Major
> Attachments: spellcheck-sort-frequency.png
>
>
> The sorting by frequency for spellchecker doesn't work in solrcloud mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14210) Add replica state option for HealthCheckHandler

2020-06-18 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139290#comment-17139290
 ] 

Jan Høydahl commented on SOLR-14210:


Please ask such questions on the solr-user mailing list.

> Add replica state option for HealthCheckHandler
> ---
>
> Key: SOLR-14210
> URL: https://issues.apache.org/jira/browse/SOLR-14210
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.5
>Reporter: Houston Putman
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.6
>
> Attachments: docs.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> h2. Background
> As was brought up in SOLR-13055, in order to run Solr in a more cloud-native 
> way, we need some additional features around node-level healthchecks.
> {quote}Like in Kubernetes we need 'liveliness' and 'readiness' probe 
> explained in 
> [https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/n]
>  determine if a node is live and ready to serve live traffic.
> {quote}
>  
> However there are issues around kubernetes managing it's own rolling 
> restarts. With the current healthcheck setup, it's easy to envision a 
> scenario in which Solr reports itself as "healthy" when all of its replicas 
> are actually recovering. Therefore kubernetes, seeing a healthy pod would 
> then go and restart the next Solr node. This can happen until all replicas 
> are "recovering" and none are healthy. (maybe the last one restarted will be 
> "down", but still there are no "active" replicas)
> h2. Proposal
> I propose we make an additional healthcheck handler that returns whether all 
> replicas hosted by that Solr node are healthy and "active". That way we will 
> be able to use the [default kubernetes rolling restart 
> logic|https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#update-strategies]
>  with Solr.
> To add on to [Jan's point 
> here|https://issues.apache.org/jira/browse/SOLR-13055?focusedCommentId=16716559&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16716559],
>  this handler should be more friendly for other Content-Types and should use 
> bettter HTTP response statuses.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-18 Thread GitBox



s1monw commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r442167608



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3255,7 +3302,16 @@ private long prepareCommitInternal() throws IOException {
   } finally {
 maybeCloseOnTragicEvent();
   }
- 
+
+  if (onCommitMerges != null) {
+mergeScheduler.merge(mergeSource, MergeTrigger.COMMIT);

Review comment:
   yeah I mean we don't have to do that and I think its rather a rare 
combination. My problem is that this entire configuration of max wait time is 
nonsense if SerialMS is used since we block until it has merged them all and 
potentially a bunch of other merges to a commit / refresh could take quite a 
long time. On the other hand, as you stated we will call maybeMerge anyway in 
the commit such that it's not really making any difference and the same is true 
for getReader so I think we are fine as it is.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14575) Solr restore is failing when basic authentication is enabled

2020-06-18 Thread Yaswanth (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139384#comment-17139384
 ] 

Yaswanth commented on SOLR-14575:
-

Collection: test operation: restore 
failed:org.apache.solr.common.SolrException: ADDREPLICA failed to create 
replicaCollection: test operation: restore 
failed:org.apache.solr.common.SolrException: ADDREPLICA failed to create 
replica at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler$ShardRequestTracker.processResponses(OverseerCollectionMessageHandler.java:1030)
 at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler$ShardRequestTracker.processResponses(OverseerCollectionMessageHandler.java:1013)
 at 
org.apache.solr.cloud.api.collections.AddReplicaCmd.lambda$addReplica$1(AddReplicaCmd.java:177)
 at 
org.apache.solr.cloud.api.collections.AddReplicaCmd$$Lambda$798/.run(Unknown
 Source) at 
org.apache.solr.cloud.api.collections.AddReplicaCmd.addReplica(AddReplicaCmd.java:199)
 at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:708)
 at org.apache.solr.cloud.api.collections.RestoreCmd.call(RestoreCmd.java:286) 
at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
 at 
org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
 at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
 at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown
 Source) at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
org.apache.solr.common.SolrException: javax.crypto.BadPaddingException: RSA 
private key operation failed at 
org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:325) at 
org.apache.solr.security.PKIAuthenticationPlugin.generateToken(PKIAuthenticationPlugin.java:305)
 at 
org.apache.solr.security.PKIAuthenticationPlugin.access$200(PKIAuthenticationPlugin.java:61)
 at 
org.apache.solr.security.PKIAuthenticationPlugin$2.onQueued(PKIAuthenticationPlugin.java:239)
 at 
org.apache.solr.client.solrj.impl.Http2SolrClient.decorateRequest(Http2SolrClient.java:468)
 at 
org.apache.solr.client.solrj.impl.Http2SolrClient.makeRequest(Http2SolrClient.java:455)
 at 
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:364)
 at 
org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:746)
 at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1274) at 
org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
 at 
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
 at 
org.apache.solr.handler.component.HttpShardHandler$$Lambda$512/.call(Unknown
 Source) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
 ... 5 moreCaused by: javax.crypto.BadPaddingException: RSA private key 
operation failed at 
java.base/sun.security.rsa.NativeRSACore.crtCrypt_Native(NativeRSACore.java:149)
 at java.base/sun.security.rsa.NativeRSACore.rsa(NativeRSACore.java:91) at 
java.base/sun.security.rsa.RSACore.rsa(RSACore.java:149) at 
java.base/com.sun.crypto.provider.RSACipher.doFinal(RSACipher.java:355) at 
java.base/com.sun.crypto.provider.RSACipher.engineDoFinal(RSACipher.java:392) 
at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2260) at 
org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:323) ... 20 
more

 

That's the error stack trace I am seeing, as soon as I call the restore API I 
am seeing the collection test with a single core on the cloud but its in down 
state.

 

No of nodes that I configured with solr cloud is : 2 

Testing on a single collection with 2 replicas

Here is my security.json looks like

{
 "authentication":{
 "class":"solr.BasicAuthPlugin",
 "credentials":{
 "admin":"",
 "dev":""},
 "":\{"v":11},
 "blockUnknown":true,
 "forwardCredentials":true},
 "authorization":{
 "class":"solr.RuleBasedAuthorizationPlugin",
 "user-role":{
 "solradmin":[
 "admin",
 "dev"],
 "dev":["read"]},
 "":\{"v":9},
 "permissions":[
 {
 "name":"read",
 "role":"*",
 "index":1},
 {
 "name":"security-read",
 "role":"admin",
 "index":2},
 {
 "name":"security-edit",
 "role":"admin",
 "index":3

[jira] [Commented] (SOLR-14581) Document the way auto commits work in SolrCloud

2020-06-18 Thread Lucene/Solr QA (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139416#comment-17139416
 ] 

Lucene/Solr QA commented on SOLR-14581:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
|| || || || {color:brown} master Compile Tests {color} ||
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m  4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m  4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate ref guide {color} | 
{color:green}  0m  4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:black}{color} | {color:black} {color} | {color:black}  2m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-14581 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13005940/SOLR-14581.patch |
| Optional Tests |  ratsources  validatesourcepatterns  validaterefguide  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP 
Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 0ea0358 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| modules | C: solr/solr-ref-guide U: solr/solr-ref-guide |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/766/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Document the way auto commits work in SolrCloud
> ---
>
> Key: SOLR-14581
> URL: https://issues.apache.org/jira/browse/SOLR-14581
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation, SolrCloud
>Affects Versions: master (9.0)
>Reporter: Bram Van Dam
>Priority: Minor
> Attachments: SOLR-14581.patch
>
>
> The documentation is unclear about how auto commits actually work in 
> SolrCloud. A mailing list reply by Erick Erickson proved to be enlightening. 
> Erick's reply verbatim:
> {quote}Each node has its own timer that starts when it receives an update.
> So in your situation, 60 seconds after any give replica gets it’s first
> update, all documents that have been received in the interval will
> be committed.
> But note several things:
> 1> commits will tend to cluster for a given shard. By that I mean
> they’ll tend to happen within a few milliseconds of each other
>‘cause it doesn’t take that long for an update to get from the
>leader to all the followers.
> 2> this is per replica. So if you host replicas from multiple collections
>on some node, their commits have no relation to each other. And
>say for some reason you transmit exactly one document that lands
>on shard1. Further, say nodeA contains replicas for shard1 and shard2.
>Only the replica for shard1 would commit.
> 3> Solr promises eventual consistency. In this case, due to all the
>timing variables it is not guaranteed that every replica of a single
>shard has the same document available for search at any given time.
>Say doc1 hits the leader at time T and a follower at time T+10ms.
>Say doc2 hits the leader and gets indexed 5ms before the 
>commit is triggered, but for some reason it takes 15ms for it to get
>to the follower. The leader will be able to search doc2, but the
>   follower won’t until 60 seconds later.{quote}
> Perhaps the subject deserves a section of its own, but I'll attach a patch 
> which includes the gist of Erick's reply as a Tip in the "indexing in 
> SolrCloud"-section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw opened a new pull request #1594: Replace DWPT.DocState with simple method parameters

2020-06-18 Thread GitBox



s1monw opened a new pull request #1594:
URL: https://github.com/apache/lucene-solr/pull/1594


   DWPT.DocState had some history value but today in a little bit more
   cleaned up DWPT and IndexingChain there is little to no value in having
   this class. It also requires explicit cleanup which is not not necessary
   anymore.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw commented on pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-18 Thread GitBox



s1monw commented on pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#issuecomment-646029945


   @msokolov you are more than welcome. I think it's a great example how OSS 
works or should work thanks for being so patient with me :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8574) ExpressionFunctionValues should cache per-hit value

2020-06-18 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139437#comment-17139437
 ] 

Michael McCandless commented on LUCENE-8574:


{quote}please, lets keep the boolean and not bring NaN into this.
{quote}
+1
{quote}And it seems the patch attached to this issue could not handle it as 
well (since DoubleValues generated for the same LeafReaderContext is not the 
same, we still get tons of DoubleValues created).
{quote}
Hmm, good catch!  So we somehow need to ensure that we use the same 
{{DoubleValues}} instance per-segment per-binding?  But how can we safely do 
that, i.e. we can't know that this current caller will consume the same 
{{DoubleValues}} in the same {{docid}} progression?

> ExpressionFunctionValues should cache per-hit value
> ---
>
> Key: LUCENE-8574
> URL: https://issues.apache.org/jira/browse/LUCENE-8574
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.5, 8.0
>Reporter: Michael McCandless
>Assignee: Robert Muir
>Priority: Major
> Attachments: LUCENE-8574.patch, unit_test.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The original version of {{ExpressionFunctionValues}} had a simple per-hit 
> cache, so that nested expressions that reference the same common variable 
> would compute the value for that variable the first time it was referenced 
> and then use that cached value for all subsequent invocations, within one 
> hit.  I think it was accidentally removed in LUCENE-7609?
> This is quite important if you have non-trivial expressions that reference 
> the same variable multiple times.
> E.g. if I have these expressions:
> {noformat}
> x = c + d
> c = b + 2 
> d = b * 2{noformat}
> Then evaluating x should only cause b's value to be computed once (for a 
> given hit), but today it's computed twice.  The problem is combinatoric if b 
> then references another variable multiple times, etc.
> I think to fix this we just need to restore the per-hit cache?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw commented on pull request #1594: Replace DWPT.DocState with simple method parameters

2020-06-18 Thread GitBox



s1monw commented on pull request #1594:
URL: https://github.com/apache/lucene-solr/pull/1594#issuecomment-646030388


   @dweiss maybe you have a moment to look at this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy merged pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated

2020-06-18 Thread GitBox



janhoy merged pull request #1572:
URL: https://github.com/apache/lucene-solr/pull/1572


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14561) Validate parameters to CoreAdminAPI

2020-06-18 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-14561:
---
Fix Version/s: 8.6

> Validate parameters to CoreAdminAPI
> ---
>
> Key: SOLR-14561
> URL: https://issues.apache.org/jira/browse/SOLR-14561
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> CoreAdminAPI does not validate parameter input. We should limit what users 
> can specify for at least {{instanceDir and dataDir}} params, perhaps restrict 
> them to be relative to SOLR_HOME or SOLR_DATA_HOME.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14561) Validate parameters to CoreAdminAPI

2020-06-18 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139453#comment-17139453
 ] 

Jan Høydahl commented on SOLR-14561:


Committed to master. Will let Jenkins work on it and then backport.

> Validate parameters to CoreAdminAPI
> ---
>
> Key: SOLR-14561
> URL: https://issues.apache.org/jira/browse/SOLR-14561
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> CoreAdminAPI does not validate parameter input. We should limit what users 
> can specify for at least {{instanceDir and dataDir}} params, perhaps restrict 
> them to be relative to SOLR_HOME or SOLR_DATA_HOME.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9410) German/French stemmers fail for common forms maux, gegrüßt, grüßend, schlummert

2020-06-18 Thread Ben Kazez (Jira)

Ben Kazez created LUCENE-9410:
-

 Summary: German/French stemmers fail for common forms maux, 
gegrüßt, grüßend, schlummert
 Key: LUCENE-9410
 URL: https://issues.apache.org/jira/browse/LUCENE-9410
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 8.5
 Environment: Elasticsearch 7.7.1 running on cloud.elastic.co
Reporter: Ben Kazez


I'm using Lucene via Elasticsearch 7.7.1 and have run into an issue where 
German and French stemming (either via the Snowball analyzer, or the "light" or 
"heavy" stemming analyzers) fails to identify some common forms:

- French:
  - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" is 
unchanged
- German:
  - "schlummert" should match "schlummern" (infinitive) but instead is unchanged
  - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend"
  - "gegrüßt"  should match "grüßen" (infinitive) but instead yields "gegrusst"

The folks from Elasticsearch said I should file a bug with Lucene: 
https://discuss.elastic.co/t/better-french-and-german-stemming/236283



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9410) German/French stemmers fail for common forms maux, gegrüßt, grüßend, schlummert

2020-06-18 Thread Ben Kazez (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kazez updated LUCENE-9410:
--
Description: 
I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either 
via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are 
failing to understand some common forms:

- French:
  - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" is 
unchanged

- German:
  - "schlummert" should match "schlummern" (infinitive) but instead is unchanged
  - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend"
  - "gegrüßt"  should match "grüßen" (infinitive) but instead yields "gegrusst"

The Elasticsearch folks 
[said|https://discuss.elastic.co/t/better-french-and-german-stemming/236283] I 
should file a bug with Lucene.

  was:
I'm using Lucene via Elasticsearch 7.7.1 and have run into an issue where 
German and French stemming (either via the Snowball analyzer, or the "light" or 
"heavy" stemming analyzers) fails to identify some common forms:

- French:
  - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" is 
unchanged
- German:
  - "schlummert" should match "schlummern" (infinitive) but instead is unchanged
  - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend"
  - "gegrüßt"  should match "grüßen" (infinitive) but instead yields "gegrusst"

The folks from Elasticsearch said I should file a bug with Lucene: 
https://discuss.elastic.co/t/better-french-and-german-stemming/236283


> German/French stemmers fail for common forms maux, gegrüßt, grüßend, 
> schlummert
> ---
>
> Key: LUCENE-9410
> URL: https://issues.apache.org/jira/browse/LUCENE-9410
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 8.5
> Environment: Elasticsearch 7.7.1 running on cloud.elastic.co
>Reporter: Ben Kazez
>Priority: Major
>  Labels: french, german, stemmer, stemming
>
> I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either 
> via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are 
> failing to understand some common forms:
> - French:
>   - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" 
> is unchanged
> - German:
>   - "schlummert" should match "schlummern" (infinitive) but instead is 
> unchanged
>   - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend"
>   - "gegrüßt"  should match "grüßen" (infinitive) but instead yields 
> "gegrusst"
> The Elasticsearch folks 
> [said|https://discuss.elastic.co/t/better-french-and-german-stemming/236283] 
> I should file a bug with Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9410) German/French stemmers fail for common forms maux, gegrüßt, grüßend, schlummert

2020-06-18 Thread Ben Kazez (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kazez updated LUCENE-9410:
--
Description: 
I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either 
via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are 
failing to understand some common forms:

French:
  - "maux" (plural) should match "mal" (singular) but instead "maux" is 
unchanged

German:
  - "schlummert" should match "schlummern" (infinitive) but instead is unchanged
  - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend"
  - "gegrüßt"  should match "grüßen" (infinitive) but instead yields "gegrusst"

The Elasticsearch folks 
[said|https://discuss.elastic.co/t/better-french-and-german-stemming/236283] I 
should file a bug with Lucene.

  was:
I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either 
via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are 
failing to understand some common forms:

- French:
  - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" is 
unchanged

- German:
  - "schlummert" should match "schlummern" (infinitive) but instead is unchanged
  - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend"
  - "gegrüßt"  should match "grüßen" (infinitive) but instead yields "gegrusst"

The Elasticsearch folks 
[said|https://discuss.elastic.co/t/better-french-and-german-stemming/236283] I 
should file a bug with Lucene.


> German/French stemmers fail for common forms maux, gegrüßt, grüßend, 
> schlummert
> ---
>
> Key: LUCENE-9410
> URL: https://issues.apache.org/jira/browse/LUCENE-9410
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 8.5
> Environment: Elasticsearch 7.7.1 running on cloud.elastic.co
>Reporter: Ben Kazez
>Priority: Major
>  Labels: french, german, stemmer, stemming
>
> I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either 
> via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are 
> failing to understand some common forms:
> French:
>   - "maux" (plural) should match "mal" (singular) but instead "maux" is 
> unchanged
> German:
>   - "schlummert" should match "schlummern" (infinitive) but instead is 
> unchanged
>   - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend"
>   - "gegrüßt"  should match "grüßen" (infinitive) but instead yields 
> "gegrusst"
> The Elasticsearch folks 
> [said|https://discuss.elastic.co/t/better-french-and-german-stemming/236283] 
> I should file a bug with Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14574) Fix or suppress warnings in solr/core/src/test

2020-06-18 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14574.
---
Fix Version/s: 8.6
   Resolution: Fixed

Hmmm, not sure where the commit message went for part 2, maybe it'll just take 
a while. Here are the shas anyway

936b9d770e7..84729edbba0  master -> master
2113597970b..9ed037074c1  branch_8x -> branch_8x

> Fix or suppress warnings in solr/core/src/test
> --
>
> Key: SOLR-14574
> URL: https://issues.apache.org/jira/browse/SOLR-14574
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.6
>
>
> Just when I thought I was done I ran testClasses
> I'm going to do this a little differently. Rather than do a directory at a 
> time, I'll just fix a bunch, push, fix a bunch more, push all on this Jira 
> until I'm done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msfroh commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-18 Thread GitBox



msfroh commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r442326229



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3226,15 +3235,53 @@ private long prepareCommitInternal() throws IOException 
{
   // sneak into the commit point:
   toCommit = segmentInfos.clone();
 
+  if (anyChanges && maxCommitMergeWaitSeconds > 0) {
+SegmentInfos committingSegmentInfos = toCommit;
+onCommitMerges = updatePendingMerges(new 
OneMergeWrappingMergePolicy(config.getMergePolicy(), toWrap ->
+new MergePolicy.OneMerge(toWrap.segments) {
+  @Override
+  public void mergeFinished(boolean committed) throws 
IOException {

Review comment:
   Oh -- I guess one minor complaint about moving this into 
`prepareCommitInternal` is that we won't be able to reuse it (without moving 
it) if we decide to apply the same logic to `IndexWriter.getReader()`.
   
   That said, moving it if/when someone gets around to applying the logic there 
isn't a big deal. (I think the real work there is reconciling logic from 
StandardDirectoryReader.open() with logic in 
IndexWriter.prepareCommitInternal(), since the functionality is kind of 
duplicated.)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-10059) In SolrCloud, every fq added via is computed twice.

2020-06-18 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Torsten Bøgh Köster updated SOLR-10059:
---
Attachment: SOLR-10059_7x.patch

> In SolrCloud, every fq added via  is computed twice.
> 
>
> Key: SOLR-10059
> URL: https://issues.apache.org/jira/browse/SOLR-10059
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 6.4
>Reporter: Marc Morissette
>Priority: Major
>  Labels: performance
> Attachments: SOLR-10059_7x.patch
>
>
> While researching another issue, I noticed that parameters appended to a 
> query via SearchHandler's  are added to the query twice 
> in SolrCloud: once on the aggregator and again on the shard.
> The FacetComponent corrects this automatically by removing duplicates. Field 
> queries added in this fashion are however computed twice and that hinders 
> performance on filter queries that aren't simple bitsets such as those 
> produced by the CollapsingQueryParser.
> To reproduce the issue, simply test this handler on a large enough 
> collection, then replace "appends" with "defaults". You'll notice significant 
> performance improvements.
> {code}
> 
> 
> {!collapse field=routingKey hint=top_fc}
> 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14566) Record "NOW" on "coordinator" log messages

2020-06-18 Thread Jason Gerlowski (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139557#comment-17139557
 ] 

Jason Gerlowski commented on SOLR-14566:


I ended up using the DebugComponent as Tomas suggested.  It uses a format 
similar to what Robert pointed to as well.  So 2 birds with one stone.

In terms of implementation I waffled a bit between putting the logic in its own 
SearchComponent impl that fires all the time, and just bundling it in to 
SearchHandler.  The former is simpler and easier to test, but I'm not sure that 
such a trivial Component impl really fits what that abstraction is intended 
for.  I implemented both methods since they were both small changes.

The Component-based approach is on a branch in my personal fork here: 
https://github.com/gerlowskija/lucene-solr-1/tree/SOLR_14566_move_rid_into_separate_component.
  I've updated the existing Github PR to use the SearchHandler impl, since I 
was leaning slightly in that direction: 
https://github.com/apache/lucene-solr/pull/1574

Once I choose an approach I still plan on adding a feature flag to disable it, 
and some tests (easier said than done for SearchHandler, but maybe I just need 
to sleep on it.)

Again, appreciate any feedback on the approach if people prefer one over the 
other.  A part of me still likes the simplicity of the {{NOW}} based impl, but 
oh well.

> Record "NOW" on "coordinator" log messages
> --
>
> Key: SOLR-14566
> URL: https://issues.apache.org/jira/browse/SOLR-14566
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, in SolrCore.java we log each search request that comes through 
> each core as it is finishing.  This includes the path, query-params, QTime, 
> and status.  In the case of a distributed search both the "coordinator" node 
> and each of the per-shard requests produce a log message.
> When Solr is fielding many identical queries, such as those created by a 
> healthcheck or dashboard, it can be hard when examining logs to link the 
> per-shard requests with the "cooordinator" request that came in upstream.
> One thing that would make this easier is if the {{NOW}} param added to 
> per-shard requests is also included in the log message from the 
> "coordinator".  While {{NOW}} isn't unique strictly speaking, it often is in 
> practice, and along with the query-params would allow debuggers to associate 
> shard requests with coordinator requests a large majority of the time.
> An alternative approach would be to create a {{qid}} or {{query-uuid}} when 
> the coordinator starts its work that can be logged everywhere.  This provides 
> a stronger expectation around uniqueness, but would require UUID generation 
> on the coordinator, which may be non-negligible work at high QPS (maybe? I 
> have no idea).  It also loses the neatness of reusing data already present on 
> the shard requests.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] tflobbe merged pull request #1567: LUCENE-9402: Let MultiCollector handle minCompetitiveScore

2020-06-18 Thread GitBox



tflobbe merged pull request #1567:
URL: https://github.com/apache/lucene-solr/pull/1567


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-9402) Let MultiCollector Scorer handle minCompetitiveScore calls

2020-06-18 Thread Tomas Eduardo Fernandez Lobbe (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Eduardo Fernandez Lobbe reassigned LUCENE-9402:
-

Assignee: Tomas Eduardo Fernandez Lobbe

> Let MultiCollector Scorer handle minCompetitiveScore calls
> --
>
> Key: LUCENE-9402
> URL: https://issues.apache.org/jira/browse/LUCENE-9402
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomas Eduardo Fernandez Lobbe
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> See SOLR-14554. MultiCollector creates a scorer that explicitly prevents 
> setting the {{minCompetitiveScore}}:
> {code:java}
> @Override
> public void setScorer(Scorable scorer) throws IOException {
>   if (cacheScores) {
> scorer = new ScoreCachingWrappingScorer(scorer);
>   }
>   scorer = new FilterScorable(scorer) {
> @Override
> public void setMinCompetitiveScore(float minScore) throws IOException 
> {
>   // Ignore calls to setMinCompetitiveScore so that if we wrap two
>   // collectors and one of them wants to skip low-scoring hits, then
>   // the other collector still sees all hits. We could try to 
> reconcile
>   // min scores and take the maximum min score across collectors, but
>   // this is very unlikely to be helpful in practice.
> }
>   };
>   for (int i = 0; i < numCollectors; ++i) {
> final LeafCollector c = collectors[i];
> c.setScorer(scorer);
>   }
> }
> {code}
> Solr uses MultiCollector when scores are requested (to collect the max 
> score), which means it wouldn't be able to use WAND algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw merged pull request #1594: Replace DWPT.DocState with simple method parameters

2020-06-18 Thread GitBox



s1monw merged pull request #1594:
URL: https://github.com/apache/lucene-solr/pull/1594


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on pull request #1594: Replace DWPT.DocState with simple method parameters

2020-06-18 Thread GitBox



mikemccand commented on pull request #1594:
URL: https://github.com/apache/lucene-solr/pull/1594#issuecomment-646224325


   +1, thanks for cleaning things up @s1monw.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-18 Thread GitBox



s1monw commented on a change in pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r442422185



##
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##
@@ -3226,15 +3235,53 @@ private long prepareCommitInternal() throws IOException 
{
   // sneak into the commit point:
   toCommit = segmentInfos.clone();
 
+  if (anyChanges && maxCommitMergeWaitSeconds > 0) {
+SegmentInfos committingSegmentInfos = toCommit;
+onCommitMerges = updatePendingMerges(new 
OneMergeWrappingMergePolicy(config.getMergePolicy(), toWrap ->
+new MergePolicy.OneMerge(toWrap.segments) {
+  @Override
+  public void mergeFinished(boolean committed) throws 
IOException {

Review comment:
   I like to move stuf once necessary I think we need to adjust it there 
anyway so we can move it in a followup. ok?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-10059) In SolrCloud, every fq added via is computed twice.

2020-06-18 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139917#comment-17139917
 ] 

Torsten Bøgh Köster commented on SOLR-10059:


I attached a patch for this issue, which still exists in 7.x and 8.x. 

In a distributes request, pre-configured query params in the "appends"-section 
get re-appended on the shards. If those parameters furthermore reference other 
parameters (like $qq), these do not get dereferenced. In our case, this broke 
the collapse component.

The patch skips re-appending on the shards (_isShard=true_) if the parameter 
_shards.handler.skipAppends=true_. The latter defaults to _false_.

> In SolrCloud, every fq added via  is computed twice.
> 
>
> Key: SOLR-10059
> URL: https://issues.apache.org/jira/browse/SOLR-10059
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 6.4
>Reporter: Marc Morissette
>Priority: Major
>  Labels: performance
> Attachments: SOLR-10059_7x.patch
>
>
> While researching another issue, I noticed that parameters appended to a 
> query via SearchHandler's  are added to the query twice 
> in SolrCloud: once on the aggregator and again on the shard.
> The FacetComponent corrects this automatically by removing duplicates. Field 
> queries added in this fashion are however computed twice and that hinders 
> performance on filter queries that aren't simple bitsets such as those 
> produced by the CollapsingQueryParser.
> To reproduce the issue, simply test this handler on a large enough 
> collection, then replace "appends" with "defaults". You'll notice significant 
> performance improvements.
> {code}
> 
> 
> {!collapse field=routingKey hint=top_fc}
> 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-10059) In SolrCloud, every fq added via is computed twice.

2020-06-18 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Torsten Bøgh Köster updated SOLR-10059:
---
Attachment: (was: SOLR-10059_7x.patch)

> In SolrCloud, every fq added via  is computed twice.
> 
>
> Key: SOLR-10059
> URL: https://issues.apache.org/jira/browse/SOLR-10059
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 6.4
>Reporter: Marc Morissette
>Priority: Major
>  Labels: performance
> Attachments: SOLR-10059_7x.patch
>
>
> While researching another issue, I noticed that parameters appended to a 
> query via SearchHandler's  are added to the query twice 
> in SolrCloud: once on the aggregator and again on the shard.
> The FacetComponent corrects this automatically by removing duplicates. Field 
> queries added in this fashion are however computed twice and that hinders 
> performance on filter queries that aren't simple bitsets such as those 
> produced by the CollapsingQueryParser.
> To reproduce the issue, simply test this handler on a large enough 
> collection, then replace "appends" with "defaults". You'll notice significant 
> performance improvements.
> {code}
> 
> 
> {!collapse field=routingKey hint=top_fc}
> 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-10059) In SolrCloud, every fq added via is computed twice.

2020-06-18 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Torsten Bøgh Köster updated SOLR-10059:
---
Attachment: SOLR-10059_7x.patch

> In SolrCloud, every fq added via  is computed twice.
> 
>
> Key: SOLR-10059
> URL: https://issues.apache.org/jira/browse/SOLR-10059
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 6.4
>Reporter: Marc Morissette
>Priority: Major
>  Labels: performance
> Attachments: SOLR-10059_7x.patch
>
>
> While researching another issue, I noticed that parameters appended to a 
> query via SearchHandler's  are added to the query twice 
> in SolrCloud: once on the aggregator and again on the shard.
> The FacetComponent corrects this automatically by removing duplicates. Field 
> queries added in this fashion are however computed twice and that hinders 
> performance on filter queries that aren't simple bitsets such as those 
> produced by the CollapsingQueryParser.
> To reproduce the issue, simply test this handler on a large enough 
> collection, then replace "appends" with "defaults". You'll notice significant 
> performance improvements.
> {code}
> 
> 
> {!collapse field=routingKey hint=top_fc}
> 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9411) Fail complation on warnings

2020-06-18 Thread Erick Erickson (Jira)

Erick Erickson created LUCENE-9411:
--

 Summary: Fail complation on warnings
 Key: LUCENE-9411
 URL: https://issues.apache.org/jira/browse/LUCENE-9411
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Erick Erickson
Assignee: Erick Erickson


Moving this over here from SOLR-11973 since it's part of the build system and 
affects Lucene as well as Solr. You might want to see the discussion there.

We have a clean compile for both Solr and Lucene, no rawtypes, unchecked, try, 
etc. warnings. There are some peculiar warnings (things like 
SuppressFBWarnings, i.e. FindBugs) that I'm not sure about at all, but let's 
assume those are not a problem. Now I'd like to start failing the compilation 
if people write new code that generates warnings.

>From what I can tell, just adding the flag is easy in both the Gradle and Ant 
>builds. I still have to prove out that adding -Werrors does what I expect, 
>i.e. succeeds now and fails when I introduce warnings.

But let's assume that works. Are there objections to this idea generally? I 
hope to have some data by next Monday.

FWIW, the Lucene code base had far fewer issues than Solr, but common-build.xml 
is in Lucene.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-10059) In SolrCloud, every fq added via is computed twice.

2020-06-18 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Torsten Bøgh Köster updated SOLR-10059:
---
Attachment: (was: SOLR-10059_7x.patch)

> In SolrCloud, every fq added via  is computed twice.
> 
>
> Key: SOLR-10059
> URL: https://issues.apache.org/jira/browse/SOLR-10059
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 6.4
>Reporter: Marc Morissette
>Priority: Major
>  Labels: performance
>
> While researching another issue, I noticed that parameters appended to a 
> query via SearchHandler's  are added to the query twice 
> in SolrCloud: once on the aggregator and again on the shard.
> The FacetComponent corrects this automatically by removing duplicates. Field 
> queries added in this fashion are however computed twice and that hinders 
> performance on filter queries that aren't simple bitsets such as those 
> produced by the CollapsingQueryParser.
> To reproduce the issue, simply test this handler on a large enough 
> collection, then replace "appends" with "defaults". You'll notice significant 
> performance improvements.
> {code}
> 
> 
> {!collapse field=routingKey hint=top_fc}
> 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-10059) In SolrCloud, every fq added via is computed twice.

2020-06-18 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Torsten Bøgh Köster updated SOLR-10059:
---
Attachment: SOLR-10059_7x.patch

> In SolrCloud, every fq added via  is computed twice.
> 
>
> Key: SOLR-10059
> URL: https://issues.apache.org/jira/browse/SOLR-10059
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 6.4
>Reporter: Marc Morissette
>Priority: Major
>  Labels: performance
> Attachments: SOLR-10059_7x.patch
>
>
> While researching another issue, I noticed that parameters appended to a 
> query via SearchHandler's  are added to the query twice 
> in SolrCloud: once on the aggregator and again on the shard.
> The FacetComponent corrects this automatically by removing duplicates. Field 
> queries added in this fashion are however computed twice and that hinders 
> performance on filter queries that aren't simple bitsets such as those 
> produced by the CollapsingQueryParser.
> To reproduce the issue, simply test this handler on a large enough 
> collection, then replace "appends" with "defaults". You'll notice significant 
> performance improvements.
> {code}
> 
> 
> {!collapse field=routingKey hint=top_fc}
> 
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1593: LUCENE-9409: Check file lengths before creating slices.

2020-06-18 Thread GitBox



mikemccand commented on a change in pull request #1593:
URL: https://github.com/apache/lucene-solr/pull/1593#discussion_r442474019



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene86/Lucene86PointsWriter.java
##
@@ -68,11 +71,9 @@ public Lucene86PointsWriter(SegmentWriteState writeState, 
int maxPointsInLeafNod
  writeState.segmentInfo.getId(),
  writeState.segmentSuffix);
 
-  String metaFileName = 
IndexFileNames.segmentFileName(writeState.segmentInfo.name,
-  writeState.segmentSuffix,
-  Lucene86PointsFormat.META_EXTENSION);
-  metaOut = writeState.directory.createOutput(metaFileName, 
writeState.context);
-  CodecUtil.writeIndexHeader(metaOut,
+  tempMetaOut = writeState.directory.createTempOutput(

Review comment:
   Why are we switching to a temp file and copying to the real file after 
closing?  Maybe add a comment explaining?

##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene86/Lucene86PointsReader.java
##
@@ -93,18 +97,12 @@ public Lucene86PointsReader(SegmentReadState readState) 
throws IOException {
 BKDReader reader = new BKDReader(metaIn, indexIn, dataIn);
 readers.put(fieldNumber, reader);
   }
-  indexLength = metaIn.readLong();
-  dataLength = metaIn.readLong();
 } catch (Throwable t) {
   priorE = t;
 } finally {
   CodecUtil.checkFooter(metaIn, priorE);
 }
   }
-  // At this point, checksums of the meta file have been validated so we

Review comment:
   Hmm are we losing this safety?
   
   Oh, actually, maybe not, because in the `finally` clause above, where we 
check meta's footer, if the checksum is bad we will throw an exception, adding 
it as suppressed exception if the `indexLength` or `dataLength` was wrong.  So 
I think we do not lose any safety with this change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1593: LUCENE-9409: Check file lengths before creating slices.

2020-06-18 Thread GitBox



jpountz commented on a change in pull request #1593:
URL: https://github.com/apache/lucene-solr/pull/1593#discussion_r442477973



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene86/Lucene86PointsReader.java
##
@@ -93,18 +97,12 @@ public Lucene86PointsReader(SegmentReadState readState) 
throws IOException {
 BKDReader reader = new BKDReader(metaIn, indexIn, dataIn);
 readers.put(fieldNumber, reader);
   }
-  indexLength = metaIn.readLong();
-  dataLength = metaIn.readLong();
 } catch (Throwable t) {
   priorE = t;
 } finally {
   CodecUtil.checkFooter(metaIn, priorE);
 }
   }
-  // At this point, checksums of the meta file have been validated so we

Review comment:
   we don't lose safety, but in case of a corrupt meta file, it might be 
slightly more confusing in the sense that the suppressed exception will 
complain about a truncated index/data file





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1593: LUCENE-9409: Check file lengths before creating slices.

2020-06-18 Thread GitBox



jpountz commented on a change in pull request #1593:
URL: https://github.com/apache/lucene-solr/pull/1593#discussion_r442479067



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene86/Lucene86PointsWriter.java
##
@@ -68,11 +71,9 @@ public Lucene86PointsWriter(SegmentWriteState writeState, 
int maxPointsInLeafNod
  writeState.segmentInfo.getId(),
  writeState.segmentSuffix);
 
-  String metaFileName = 
IndexFileNames.segmentFileName(writeState.segmentInfo.name,
-  writeState.segmentSuffix,
-  Lucene86PointsFormat.META_EXTENSION);
-  metaOut = writeState.directory.createOutput(metaFileName, 
writeState.context);
-  CodecUtil.writeIndexHeader(metaOut,
+  tempMetaOut = writeState.directory.createTempOutput(

Review comment:
   This is because we need to write file lengths of the index/data files 
before any offsets/lengths of slices into these files. But since these 
index/data files have not been written yet, we don't know the length yet. So I 
wrote into a temp file, and only then write the final metadata file that 
includes first the lengths of the index/data files and then metadata about the 
KD trees that includes offsets into these index/data files. I'll add a comment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1593: LUCENE-9409: Check file lengths before creating slices.

2020-06-18 Thread GitBox



jpountz commented on a change in pull request #1593:
URL: https://github.com/apache/lucene-solr/pull/1593#discussion_r442482722



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene86/Lucene86PointsWriter.java
##
@@ -68,11 +71,9 @@ public Lucene86PointsWriter(SegmentWriteState writeState, 
int maxPointsInLeafNod
  writeState.segmentInfo.getId(),
  writeState.segmentSuffix);
 
-  String metaFileName = 
IndexFileNames.segmentFileName(writeState.segmentInfo.name,
-  writeState.segmentSuffix,
-  Lucene86PointsFormat.META_EXTENSION);
-  metaOut = writeState.directory.createOutput(metaFileName, 
writeState.context);
-  CodecUtil.writeIndexHeader(metaOut,
+  tempMetaOut = writeState.directory.createTempOutput(

Review comment:
   As an alternative, I could buffer the metadata in memory like we do for 
terms. It will require changing some APIs to replace IndexOutput with 
DataOutputs but other than that it shouldn't be too hard.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-18 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139977#comment-17139977
 ] 

Adrien Grand commented on LUCENE-9378:
--

[~alexklibisz] Thanks for the details, what is the order of magnitude of the 
slowdown that you are observing?

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov merged pull request #1552: LUCENE-8962: merge small segments on commit

2020-06-18 Thread GitBox



msokolov merged pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9402) Let MultiCollector Scorer handle minCompetitiveScore calls

2020-06-18 Thread Tomas Eduardo Fernandez Lobbe (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Eduardo Fernandez Lobbe resolved LUCENE-9402.
---
Fix Version/s: 8.6
   master (9.0)
   Resolution: Fixed

Git tagging doesn’t seem to be working. Merged this.
Master 
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4db1e3895fec7cd50b0ad266af5db0757bb5780a
8x: 
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8d20e31b22244f4fa68d5d8d82da5d07c4b6a351

> Let MultiCollector Scorer handle minCompetitiveScore calls
> --
>
> Key: LUCENE-9402
> URL: https://issues.apache.org/jira/browse/LUCENE-9402
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomas Eduardo Fernandez Lobbe
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Fix For: master (9.0), 8.6
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> See SOLR-14554. MultiCollector creates a scorer that explicitly prevents 
> setting the {{minCompetitiveScore}}:
> {code:java}
> @Override
> public void setScorer(Scorable scorer) throws IOException {
>   if (cacheScores) {
> scorer = new ScoreCachingWrappingScorer(scorer);
>   }
>   scorer = new FilterScorable(scorer) {
> @Override
> public void setMinCompetitiveScore(float minScore) throws IOException 
> {
>   // Ignore calls to setMinCompetitiveScore so that if we wrap two
>   // collectors and one of them wants to skip low-scoring hits, then
>   // the other collector still sees all hits. We could try to 
> reconcile
>   // min scores and take the maximum min score across collectors, but
>   // this is very unlikely to be helpful in practice.
> }
>   };
>   for (int i = 0; i < numCollectors; ++i) {
> final LeafCollector c = collectors[i];
> c.setScorer(scorer);
>   }
> }
> {code}
> Solr uses MultiCollector when scores are requested (to collect the max 
> score), which means it wouldn't be able to use WAND algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14575) Solr restore is failing when basic authentication is enabled

2020-06-18 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140021#comment-17140021
 ] 

Jan Høydahl commented on SOLR-14575:


The interesting part is
{code:java}
java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
org.apache.solr.common.SolrException: javax.crypto.BadPaddingException: RSA 
private key operation failed at 
  org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:325) at 
  
org.apache.solr.security.PKIAuthenticationPlugin.generateToken(PKIAuthenticationPlugin.java:305)
 at  {code}
Somehow PKI plugin is trying to agree on PKI auth between the two nodes, but 
fail. However, you have explicitly enabled {{forwardCredentials=true}}, so PKI 
should not have been used here, instead the basic auth header should have been 
sent to the other node.

My guess is that there is a bug when using Http2SolrClient with 
forwardCredentials?? [~ichattopadhyaya]?

> Solr restore is failing when basic authentication is enabled
> 
>
> Key: SOLR-14575
> URL: https://issues.apache.org/jira/browse/SOLR-14575
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 8.2
>Reporter: Yaswanth
>Priority: Blocker
>
> Hi Team,
> I was testing backup / restore for solrcloud and its failing exactly when I 
> am trying to restore a successfully backed up collection.
> I am using solr 8.2 with basic authentication enabled and then creating a 2 
> replica collection. When I run the backup like
> curl -u xxx:xxx -k 
> '[https://x.x.x.x:8080/solr/admin/collections?action=BACKUP&name=test&collection=test&location=/solrdatabkup'|https://x.x.x.x:8080/solr/admin/collections?action=BACKUP&name=test&collection=test&location=/solrdatabkup%27]
>  it worked fine and I do see a folder was created with the collection name 
> under /solrdatabackup
> But now when I deleted the existing collection and then try running restore 
> api like
> curl -u xxx:xxx -k 
> '[https://x.x.x.x:8080/solr/admin/collections?action=RESTORE&name=test&collection=test&location=/solrdatabkup'|https://x.x.x.x:8080/solr/admin/collections?action=BACKUP&name=test&collection=test&location=/solrdatabkup%27]
>  its throwing an error like 
> {
>  "responseHeader":{
>  "status":500,
>  "QTime":457},
>  "Operation restore caused 
> *exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>  ADDREPLICA failed to create replica",*
>  "exception":{
>  "msg":"ADDREPLICA failed to create replica",
>  "rspCode":500},
>  "error":{
>  "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","org.apache.solr.common.SolrException"],
>  "msg":"ADDREPLICA failed to create replica",
>  "trace":"org.apache.solr.common.SolrException: ADDREPLICA failed to create 
> replica\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:280)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:252)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:820)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:786)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:546)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-06-18 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140039#comment-17140039
 ] 

Michael Sokolov commented on LUCENE-8962:
-

pushed [https://github.com/apache/lucene-solr/pull/1552] to master, and 
cherry-picked to branch_8x, resolving

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 8.6
>
> Attachments: LUCENE-8962_demo.png, failed-tests.patch
>
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-06-18 Thread Michael Sokolov (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Sokolov resolved LUCENE-8962.
-
Resolution: Fixed

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 8.6
>
> Attachments: LUCENE-8962_demo.png, failed-tests.patch
>
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] tflobbe commented on a change in pull request #1574: SOLR-14566: Add request-ID to all distrib-search requests

2020-06-18 Thread GitBox



tflobbe commented on a change in pull request #1574:
URL: https://github.com/apache/lucene-solr/pull/1574#discussion_r442599895



##
File path: 
solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java
##
@@ -500,6 +509,29 @@ public void handleRequestBody(SolrQueryRequest req, 
SolrQueryResponse rsp) throw
 }
   }
 
+  private void tagRequestWithRequestId(ResponseBuilder rb) {
+String rid = getRequestId(rb.req);
+if (StringUtils.isBlank(rb.req.getParams().get(CommonParams.REQUEST_ID))) {
+  ModifiableSolrParams params = new 
ModifiableSolrParams(rb.req.getParams());
+  params.add(CommonParams.REQUEST_ID, rid);//add rid to the request so 
that shards see it
+  rb.req.setParams(params);
+}
+if (rb.isDistrib) {
+  rb.rsp.addToLog(CommonParams.REQUEST_ID, rid); //to see it in the logs 
of the landing core

Review comment:
   Do we now want it also in the cordinator node?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14566) Record "NOW" on "coordinator" log messages

2020-06-18 Thread Tomas Eduardo Fernandez Lobbe (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140136#comment-17140136
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14566:
--

Agree with you, an isolated SearchComponent just for this sounds like too much. 
+1 for using SearchHandler.

> Record "NOW" on "coordinator" log messages
> --
>
> Key: SOLR-14566
> URL: https://issues.apache.org/jira/browse/SOLR-14566
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, in SolrCore.java we log each search request that comes through 
> each core as it is finishing.  This includes the path, query-params, QTime, 
> and status.  In the case of a distributed search both the "coordinator" node 
> and each of the per-shard requests produce a log message.
> When Solr is fielding many identical queries, such as those created by a 
> healthcheck or dashboard, it can be hard when examining logs to link the 
> per-shard requests with the "cooordinator" request that came in upstream.
> One thing that would make this easier is if the {{NOW}} param added to 
> per-shard requests is also included in the log message from the 
> "coordinator".  While {{NOW}} isn't unique strictly speaking, it often is in 
> practice, and along with the query-params would allow debuggers to associate 
> shard requests with coordinator requests a large majority of the time.
> An alternative approach would be to create a {{qid}} or {{query-uuid}} when 
> the coordinator starts its work that can be logged everywhere.  This provides 
> a stronger expectation around uniqueness, but would require UUID generation 
> on the coordinator, which may be non-negligible work at high QPS (maybe? I 
> have no idea).  It also loses the neatness of reusing data already present on 
> the shard requests.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9411) Fail complation on warnings

2020-06-18 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated LUCENE-9411:
---
Attachment: LUCENE-9411.patch
Status: Open  (was: Open)

The first step here is to get a successful compile with -Werror, and I'm 
starting with the gradle build. Actually, the very first step is getting a 
clean compile. I've been ignoring a couple of things but now I have to deal 
with them.

The attached patch gets rid of a couple of warnings, apparently from 
dependencies. Specifically:
{code:java}
/Users/Erick/.gradle/caches/modules-2/files-2.1/org.apache.zookeeper/zookeeper/3.5.7/12bdf55ba8be7fc891996319d37f35eaad7e63ea/zookeeper-3.5.7.jar(/org/apache/zookeeper/ZooDefs$Ids.class):
 warning: Cannot find annotation method 'value()' in type 'SuppressFBWarnings'
{code}
and
{code:java}
/Users/Erick/.gradle/caches/modules-2/files-2.1/com.google.guava/guava/25.1-jre/6c57e4b22b44e89e548b5c9f70f0c45fe10fb0b4/guava-25.1-jre.jar(/com/google/common/collect/Multimap.class):
 warning: Cannot find annotation method 'value()' in type 'CompatibleWith'
{code}
I can put these either in the bulid.gradle or solr/build.gradle, 
solr/build.gradle seems best since they aren't part of Lucene. The patch shows 
them (it's a small patch, don't be scared).

But I still get 6 warnings:
{code:java}
> Task :solr:solrj:compileJava
warning: [rawtypes] found raw type: Map
  missing type arguments for generic class Map
  where K,V are type-variables:
K extends Object declared in interface Map
V extends Object declared in interface Map
{code}
Problem is that I have no idea at all where they come from. For all the other 
8,000 warnings, the warnings were clearly identified with the file and line.

Cranking the Gradle logging up to debug doesn't shed any light on the problem.

Any clue how to find out what generates these would be appreciated.

A secondary question is why, after a build, I have references in .gradle/caches 
to:
{code:java}
./modules-2/files-2.1/com.google.code.findbugs
./modules-2/files-2.1/com.google.code.findbugs/jsr305
./modules-2/files-2.1/com.google.code.findbugs/jsr305/1.3.9
./modules-2/files-2.1/com.google.code.findbugs/jsr305/1.3.9/67ea333a3244bc20a17d6f0c29498071dfa409fc
./modules-2/files-2.1/com.google.code.findbugs/jsr305/1.3.9/67ea333a3244bc20a17d6f0c29498071dfa409fc/jsr305-1.3.9.pom
./modules-2/files-2.1/com.google.code.findbugs/jsr305/3.0.2
./modules-2/files-2.1/com.google.code.findbugs/jsr305/3.0.2/25ea2e8b0c338a877313bd4672d3fe056ea78f0d
./modules-2/files-2.1/com.google.code.findbugs/jsr305/3.0.2/25ea2e8b0c338a877313bd4672d3fe056ea78f0d/jsr305-3.0.2.jar
./modules-2/files-2.1/com.google.code.findbugs/jsr305/3.0.2/8d93cdf4d84d7e1de736df607945c6df0730a10f
./modules-2/files-2.1/com.google.code.findbugs/jsr305/3.0.2/8d93cdf4d84d7e1de736df607945c6df0730a10f/jsr305-3.0.2.pom
{code}
but gradlew dependencies only lists 3.0.2. I can live without knowing, but if 
anyone knows off the top of their heads

 

Similarly I have error_prone_annotations 2.1.3 and 2.3.4.

 

But the attached patch gets rid of all the warnings so I'm not inclined to 
pursue that very far.

> Fail complation on warnings
> ---
>
> Key: LUCENE-9411
> URL: https://issues.apache.org/jira/browse/LUCENE-9411
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>  Labels: build
> Attachments: LUCENE-9411.patch
>
>
> Moving this over here from SOLR-11973 since it's part of the build system and 
> affects Lucene as well as Solr. You might want to see the discussion there.
> We have a clean compile for both Solr and Lucene, no rawtypes, unchecked, 
> try, etc. warnings. There are some peculiar warnings (things like 
> SuppressFBWarnings, i.e. FindBugs) that I'm not sure about at all, but let's 
> assume those are not a problem. Now I'd like to start failing the compilation 
> if people write new code that generates warnings.
> From what I can tell, just adding the flag is easy in both the Gradle and Ant 
> builds. I still have to prove out that adding -Werrors does what I expect, 
> i.e. succeeds now and fails when I introduce warnings.
> But let's assume that works. Are there objections to this idea generally? I 
> hope to have some data by next Monday.
> FWIW, the Lucene code base had far fewer issues than Solr, but 
> common-build.xml is in Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-18 Thread Alex Klibisz (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Klibisz updated LUCENE-9378:
-
Attachment: snapshots-v76x.nps
hotspots-v76x.png
hotspots-v77x.png
snapshot-v77x.nps

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: hotspots-v76x.png, hotspots-v77x.png, 
> image-2020-06-12-22-17-30-339.png, image-2020-06-12-22-17-53-961.png, 
> image-2020-06-12-22-18-24-527.png, image-2020-06-12-22-18-48-919.png, 
> snapshot-v77x.nps, snapshots-v76x.nps
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-18 Thread Alex Klibisz (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Klibisz updated LUCENE-9378:
-
Attachment: snapshots-v76x.nps

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, 
> hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-18 Thread Alex Klibisz (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Klibisz updated LUCENE-9378:
-
Attachment: snapshots-v76x.nps
hotspots-v76x.png
hotspots-v77x.png
snapshot-v77x.nps

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, 
> hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-18 Thread Alex Klibisz (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Klibisz updated LUCENE-9378:
-
Attachment: hotspots-v76x.png
hotspots-v77x.png

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, hotspots-v77x.png, 
> image-2020-06-12-22-17-30-339.png, image-2020-06-12-22-17-53-961.png, 
> image-2020-06-12-22-18-24-527.png, image-2020-06-12-22-18-48-919.png, 
> snapshot-v77x.nps, snapshot-v77x.nps, snapshot-v77x.nps, snapshots-v76x.nps, 
> snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-18 Thread Alex Klibisz (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Klibisz updated LUCENE-9378:
-
Attachment: snapshot-v77x.nps

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, hotspots-v77x.png, 
> image-2020-06-12-22-17-30-339.png, image-2020-06-12-22-17-53-961.png, 
> image-2020-06-12-22-18-24-527.png, image-2020-06-12-22-18-48-919.png, 
> snapshot-v77x.nps, snapshot-v77x.nps, snapshot-v77x.nps, snapshots-v76x.nps, 
> snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-18 Thread Alex Klibisz (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Klibisz updated LUCENE-9378:
-
Attachment: hotspots-v76x.png

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, hotspots-v77x.png, 
> image-2020-06-12-22-17-30-339.png, image-2020-06-12-22-17-53-961.png, 
> image-2020-06-12-22-18-24-527.png, image-2020-06-12-22-18-48-919.png, 
> snapshot-v77x.nps, snapshot-v77x.nps, snapshot-v77x.nps, snapshots-v76x.nps, 
> snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-18 Thread Alex Klibisz (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Klibisz updated LUCENE-9378:
-
Attachment: hotspots-v77x.png

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-18 Thread Alex Klibisz (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Klibisz updated LUCENE-9378:
-
Attachment: hotspots-v76x.png

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-06-18 Thread Alex Klibisz (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140163#comment-17140163
 ] 

Alex Klibisz commented on LUCENE-9378:
--

[~jpountz] It's about 2x slower. 

I re-ran a benchmark to be sure. Here is the setup:
 * Storing a corpus of 18K binary vectors in a single shard.
 * Each vector contains ~500 ints denoting the positive indices. So each one is 
storing a bytearray of 500 * 4 = 2000 bytes in the binary doc values.
 * Running 2000 serial searches against these vectors. Each search reads, 
deserializes, and computes the Jaccard similarity against every vector in the 
corpus. So a total of 18K * 2K reads from the shard.
 * The read order is defined by Elasticsearch. Internally I'm using a 
FunctionScoreQuery, code here: 
[https://github.com/alexklibisz/elastiknn/blob/5246a26f76791362482a98066e31071cb03e0a74/plugin/src/main/scala/com/klibisz/elastiknn/query/ExactQuery.scala#L22-L29]
 
 * Ubuntu 20 on an Intel i7-8750H 2.20GHz x 12cores
 * Running Oracle Jdk 14 :
```
$ java -version
java version "14" 2020-03-17
Java(TM) SE Runtime Environment (build 14+36-1461)
Java HotSpot(TM) 64-Bit Server VM (build 14+36-1461, mixed mode, sharing)
```
 * Running all 2000 searches once, then again, and reporting the time from 
second run (JVM warmup, etc.).

Results:
 * Using Elasticsearch 7.6.2 w/ Lucene 8.4.0:
 ** 212 seconds for 2000 searches
 ** Search threads spend 95.5% of time computing similarities, 0.2% in the 
LZ4.decompress() method.
 * Using Elasticsearch 7.7.1 w/ Lucene 8.5.1:
 ** 445 seconds for 2000 searches
 ** Search threads spend 56% of total time computing similarities, 42% in the 
decompress method.

VisualVM screenshot for 7.6.x:

!hotspots-v76x.png!

VisualVM screenshot for 7.7.x:

!hotspots-v77x.png!

Attaching snapshots from VisualVM:

[^snapshots-v76x.nps]

[^snapshot-v77x.nps]

 

Thank you all for your help! :)

 

 

 

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Minor
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

66 matches

Mail list logo