[jira] [Commented] (SOLR-14575) Solr restore is failing when basic authentication is enabled
[ https://issues.apache.org/jira/browse/SOLR-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139151#comment-17139151 ] Jan Høydahl commented on SOLR-14575: Can you please attach the related log lines from solr.log file? Include enough context to see all that is going on. If you have more than one node, include logs from all nodes involved in the restore operation. How did you install and configure Solr? How many nodes? What does your security.json look like? > Solr restore is failing when basic authentication is enabled > > > Key: SOLR-14575 > URL: https://issues.apache.org/jira/browse/SOLR-14575 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore >Affects Versions: 8.2 >Reporter: Yaswanth >Priority: Blocker > > Hi Team, > I was testing backup / restore for solrcloud and its failing exactly when I > am trying to restore a successfully backed up collection. > I am using solr 8.2 with basic authentication enabled and then creating a 2 > replica collection. When I run the backup like > curl -u xxx:xxx -k > '[https://x.x.x.x:8080/solr/admin/collections?action=BACKUP&name=test&collection=test&location=/solrdatabkup'|https://x.x.x.x:8080/solr/admin/collections?action=BACKUP&name=test&collection=test&location=/solrdatabkup%27] > it worked fine and I do see a folder was created with the collection name > under /solrdatabackup > But now when I deleted the existing collection and then try running restore > api like > curl -u xxx:xxx -k > '[https://x.x.x.x:8080/solr/admin/collections?action=RESTORE&name=test&collection=test&location=/solrdatabkup'|https://x.x.x.x:8080/solr/admin/collections?action=BACKUP&name=test&collection=test&location=/solrdatabkup%27] > its throwing an error like > { > "responseHeader":{ > "status":500, > "QTime":457}, > "Operation restore caused > *exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > ADDREPLICA failed to create replica",* > "exception":{ > "msg":"ADDREPLICA failed to create replica", > "rspCode":500}, > "error":{ > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"ADDREPLICA failed to create replica", > "trace":"org.apache.solr.common.SolrException: ADDREPLICA failed to create > replica\n\tat > org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:280)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:252)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat > > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:820)\n\tat > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:786)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:546)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat > > org.eclipse.jetty.server.
[jira] [Created] (LUCENE-9409) TestAllFilesDetectTruncation failures
Adrien Grand created LUCENE-9409: Summary: TestAllFilesDetectTruncation failures Key: LUCENE-9409 URL: https://issues.apache.org/jira/browse/LUCENE-9409 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand The Elastic CI found a seed that reproducibly fails TestAllFilesDetectTruncation. https://elasticsearch-ci.elastic.co/job/apache+lucene-solr+nightly+branch_8x/85/console This is a consequence of LUCENE-9396: we now check for truncation after creating slices, so in some cases you would get an IndexOutOfBoundsException rather than CorruptIndexException/EOFException if out-of-bounds slices get created. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz opened a new pull request #1593: LUCENE-9409: Check file lengths before creating slices.
jpountz opened a new pull request #1593: URL: https://github.com/apache/lucene-solr/pull/1593 This changes terms and points to check the length of the index/data files before creating slices in these files. A side-effect of this is that we can no longer verify checksums of the meta file before checking the length of other files, but this shouldn't be a problem. On the other hand it helps make sure that we would return a clear exception in case of truncation instead of a confusing OutOfBoundsException that isn't clear whether it's due to index corruption or a bug in Lucene. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit
s1monw commented on a change in pull request #1552: URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r442023300 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3255,7 +3302,16 @@ private long prepareCommitInternal() throws IOException { } finally { maybeCloseOnTragicEvent(); } - + + if (onCommitMerges != null) { +mergeScheduler.merge(mergeSource, MergeTrigger.COMMIT); Review comment: the last thing that I am afraid about is what if we has a MergeScheduler configured that blocks on this call like SerialMergeScheduler? I think there are multiple options like: documentation, skipping `COMMIT` merge triggers in SMS, adding a mergeAsync method to MS that has no impl in SMS... I think we should make sure that this is not trappy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14210) Add replica state option for HealthCheckHandler
[ https://issues.apache.org/jira/browse/SOLR-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139190#comment-17139190 ] Nazerke Seidan commented on SOLR-14210: --- Wondering if Zk is down/unavailable, can we still get the status of the cores? > Add replica state option for HealthCheckHandler > --- > > Key: SOLR-14210 > URL: https://issues.apache.org/jira/browse/SOLR-14210 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.5 >Reporter: Houston Putman >Assignee: Jan Høydahl >Priority: Major > Fix For: 8.6 > > Attachments: docs.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > h2. Background > As was brought up in SOLR-13055, in order to run Solr in a more cloud-native > way, we need some additional features around node-level healthchecks. > {quote}Like in Kubernetes we need 'liveliness' and 'readiness' probe > explained in > [https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/n] > determine if a node is live and ready to serve live traffic. > {quote} > > However there are issues around kubernetes managing it's own rolling > restarts. With the current healthcheck setup, it's easy to envision a > scenario in which Solr reports itself as "healthy" when all of its replicas > are actually recovering. Therefore kubernetes, seeing a healthy pod would > then go and restart the next Solr node. This can happen until all replicas > are "recovering" and none are healthy. (maybe the last one restarted will be > "down", but still there are no "active" replicas) > h2. Proposal > I propose we make an additional healthcheck handler that returns whether all > replicas hosted by that Solr node are healthy and "active". That way we will > be able to use the [default kubernetes rolling restart > logic|https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#update-strategies] > with Solr. > To add on to [Jan's point > here|https://issues.apache.org/jira/browse/SOLR-13055?focusedCommentId=16716559&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16716559], > this handler should be more friendly for other Content-Types and should use > bettter HTTP response statuses. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14210) Add replica state option for HealthCheckHandler
[ https://issues.apache.org/jira/browse/SOLR-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139190#comment-17139190 ] Nazerke Seidan edited comment on SOLR-14210 at 6/18/20, 8:03 AM: - If Zk is down/unavailable, we can't get the status of the cores. I think this should be configurable without Zk we should be able to ping the solr cores. was (Author: seidan): Wondering if Zk is down/unavailable, can we still get the status of the cores? > Add replica state option for HealthCheckHandler > --- > > Key: SOLR-14210 > URL: https://issues.apache.org/jira/browse/SOLR-14210 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.5 >Reporter: Houston Putman >Assignee: Jan Høydahl >Priority: Major > Fix For: 8.6 > > Attachments: docs.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > h2. Background > As was brought up in SOLR-13055, in order to run Solr in a more cloud-native > way, we need some additional features around node-level healthchecks. > {quote}Like in Kubernetes we need 'liveliness' and 'readiness' probe > explained in > [https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/n] > determine if a node is live and ready to serve live traffic. > {quote} > > However there are issues around kubernetes managing it's own rolling > restarts. With the current healthcheck setup, it's easy to envision a > scenario in which Solr reports itself as "healthy" when all of its replicas > are actually recovering. Therefore kubernetes, seeing a healthy pod would > then go and restart the next Solr node. This can happen until all replicas > are "recovering" and none are healthy. (maybe the last one restarted will be > "down", but still there are no "active" replicas) > h2. Proposal > I propose we make an additional healthcheck handler that returns whether all > replicas hosted by that Solr node are healthy and "active". That way we will > be able to use the [default kubernetes rolling restart > logic|https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#update-strategies] > with Solr. > To add on to [Jan's point > here|https://issues.apache.org/jira/browse/SOLR-13055?focusedCommentId=16716559&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16716559], > this handler should be more friendly for other Content-Types and should use > bettter HTTP response statuses. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msfroh commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit
msfroh commented on a change in pull request #1552: URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r442043040 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3255,7 +3302,16 @@ private long prepareCommitInternal() throws IOException { } finally { maybeCloseOnTragicEvent(); } - + + if (onCommitMerges != null) { +mergeScheduler.merge(mergeSource, MergeTrigger.COMMIT); Review comment: Would it be sufficient to document the behavior in the Javadoc for `findFullFlushMerges`? I was assuming that any implementation of `findFullFlushMerges` would try to return merges that are very likely complete within whatever timeout someone would reasonably set (e.g. a few seconds). The timeout was intended just as an extra safeguard in case a merge takes longer. Given that lots of IndexWriter operations can have pauses with `SerialMergeScheduler` (judging by the number of calls to `maybeMerge`, especially the one from `processEvents`, in IndexWriter), blocking on this particular `merge` call doesn't feel like it introduces more risk (especially since it needs to be used in conjunction with a `MergePolicy` that implements `findFullFlushMerges`). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msfroh commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit
msfroh commented on a change in pull request #1552: URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r442043040 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3255,7 +3302,16 @@ private long prepareCommitInternal() throws IOException { } finally { maybeCloseOnTragicEvent(); } - + + if (onCommitMerges != null) { +mergeScheduler.merge(mergeSource, MergeTrigger.COMMIT); Review comment: Would it be sufficient to document the behavior in the Javadoc for `findFullFlushMerges`? I was assuming that any implementation of `findFullFlushMerges` would try to return merges that are very likely to complete within whatever timeout someone would reasonably set (e.g. a few seconds). The timeout was intended just as an extra safeguard in case a merge takes longer. Given that lots of IndexWriter operations can have pauses with `SerialMergeScheduler` (judging by the number of calls to `maybeMerge`, especially the one from `processEvents`, in IndexWriter), blocking on this particular `merge` call doesn't feel like it introduces more risk (especially since it needs to be used in conjunction with a `MergePolicy` that implements `findFullFlushMerges`). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14581) Document the way auto commits work in SolrCloud
Bram Van Dam created SOLR-14581: --- Summary: Document the way auto commits work in SolrCloud Key: SOLR-14581 URL: https://issues.apache.org/jira/browse/SOLR-14581 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: documentation, SolrCloud Reporter: Bram Van Dam The documentation is unclear about how auto commits actually work in SolrCloud. A mailing list reply by Erick Erickson proved to be enlightening. Erick's reply verbatim: {quote}Each node has its own timer that starts when it receives an update. So in your situation, 60 seconds after any give replica gets it’s first update, all documents that have been received in the interval will be committed. But note several things: 1> commits will tend to cluster for a given shard. By that I mean they’ll tend to happen within a few milliseconds of each other ‘cause it doesn’t take that long for an update to get from the leader to all the followers. 2> this is per replica. So if you host replicas from multiple collections on some node, their commits have no relation to each other. And say for some reason you transmit exactly one document that lands on shard1. Further, say nodeA contains replicas for shard1 and shard2. Only the replica for shard1 would commit. 3> Solr promises eventual consistency. In this case, due to all the timing variables it is not guaranteed that every replica of a single shard has the same document available for search at any given time. Say doc1 hits the leader at time T and a follower at time T+10ms. Say doc2 hits the leader and gets indexed 5ms before the commit is triggered, but for some reason it takes 15ms for it to get to the follower. The leader will be able to search doc2, but the follower won’t until 60 seconds later.{quote} Perhaps the subject deserves a section of its own, but I'll attach a patch which includes the gist of Erick's reply as a Tip in the "indexing in SolrCloud"-section. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14581) Document the way auto commits work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bram Van Dam updated SOLR-14581: Affects Version/s: master (9.0) > Document the way auto commits work in SolrCloud > --- > > Key: SOLR-14581 > URL: https://issues.apache.org/jira/browse/SOLR-14581 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation, SolrCloud >Affects Versions: master (9.0) >Reporter: Bram Van Dam >Priority: Minor > Attachments: SOLR-14581.patch > > > The documentation is unclear about how auto commits actually work in > SolrCloud. A mailing list reply by Erick Erickson proved to be enlightening. > Erick's reply verbatim: > {quote}Each node has its own timer that starts when it receives an update. > So in your situation, 60 seconds after any give replica gets it’s first > update, all documents that have been received in the interval will > be committed. > But note several things: > 1> commits will tend to cluster for a given shard. By that I mean > they’ll tend to happen within a few milliseconds of each other >‘cause it doesn’t take that long for an update to get from the >leader to all the followers. > 2> this is per replica. So if you host replicas from multiple collections >on some node, their commits have no relation to each other. And >say for some reason you transmit exactly one document that lands >on shard1. Further, say nodeA contains replicas for shard1 and shard2. >Only the replica for shard1 would commit. > 3> Solr promises eventual consistency. In this case, due to all the >timing variables it is not guaranteed that every replica of a single >shard has the same document available for search at any given time. >Say doc1 hits the leader at time T and a follower at time T+10ms. >Say doc2 hits the leader and gets indexed 5ms before the >commit is triggered, but for some reason it takes 15ms for it to get >to the follower. The leader will be able to search doc2, but the > follower won’t until 60 seconds later.{quote} > Perhaps the subject deserves a section of its own, but I'll attach a patch > which includes the gist of Erick's reply as a Tip in the "indexing in > SolrCloud"-section. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14581) Document the way auto commits work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bram Van Dam updated SOLR-14581: Attachment: SOLR-14581.patch Status: Open (was: Open) > Document the way auto commits work in SolrCloud > --- > > Key: SOLR-14581 > URL: https://issues.apache.org/jira/browse/SOLR-14581 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation, SolrCloud >Reporter: Bram Van Dam >Priority: Minor > Attachments: SOLR-14581.patch > > > The documentation is unclear about how auto commits actually work in > SolrCloud. A mailing list reply by Erick Erickson proved to be enlightening. > Erick's reply verbatim: > {quote}Each node has its own timer that starts when it receives an update. > So in your situation, 60 seconds after any give replica gets it’s first > update, all documents that have been received in the interval will > be committed. > But note several things: > 1> commits will tend to cluster for a given shard. By that I mean > they’ll tend to happen within a few milliseconds of each other >‘cause it doesn’t take that long for an update to get from the >leader to all the followers. > 2> this is per replica. So if you host replicas from multiple collections >on some node, their commits have no relation to each other. And >say for some reason you transmit exactly one document that lands >on shard1. Further, say nodeA contains replicas for shard1 and shard2. >Only the replica for shard1 would commit. > 3> Solr promises eventual consistency. In this case, due to all the >timing variables it is not guaranteed that every replica of a single >shard has the same document available for search at any given time. >Say doc1 hits the leader at time T and a follower at time T+10ms. >Say doc2 hits the leader and gets indexed 5ms before the >commit is triggered, but for some reason it takes 15ms for it to get >to the follower. The leader will be able to search doc2, but the > follower won’t until 60 seconds later.{quote} > Perhaps the subject deserves a section of its own, but I'll attach a patch > which includes the gist of Erick's reply as a Tip in the "indexing in > SolrCloud"-section. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14581) Document the way auto commits work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bram Van Dam updated SOLR-14581: Status: Patch Available (was: Open) > Document the way auto commits work in SolrCloud > --- > > Key: SOLR-14581 > URL: https://issues.apache.org/jira/browse/SOLR-14581 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation, SolrCloud >Affects Versions: master (9.0) >Reporter: Bram Van Dam >Priority: Minor > Attachments: SOLR-14581.patch > > > The documentation is unclear about how auto commits actually work in > SolrCloud. A mailing list reply by Erick Erickson proved to be enlightening. > Erick's reply verbatim: > {quote}Each node has its own timer that starts when it receives an update. > So in your situation, 60 seconds after any give replica gets it’s first > update, all documents that have been received in the interval will > be committed. > But note several things: > 1> commits will tend to cluster for a given shard. By that I mean > they’ll tend to happen within a few milliseconds of each other >‘cause it doesn’t take that long for an update to get from the >leader to all the followers. > 2> this is per replica. So if you host replicas from multiple collections >on some node, their commits have no relation to each other. And >say for some reason you transmit exactly one document that lands >on shard1. Further, say nodeA contains replicas for shard1 and shard2. >Only the replica for shard1 would commit. > 3> Solr promises eventual consistency. In this case, due to all the >timing variables it is not guaranteed that every replica of a single >shard has the same document available for search at any given time. >Say doc1 hits the leader at time T and a follower at time T+10ms. >Say doc2 hits the leader and gets indexed 5ms before the >commit is triggered, but for some reason it takes 15ms for it to get >to the follower. The leader will be able to search doc2, but the > follower won’t until 60 seconds later.{quote} > Perhaps the subject deserves a section of its own, but I'll attach a patch > which includes the gist of Erick's reply as a Tip in the "indexing in > SolrCloud"-section. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-9060) Spellcheck sort by frequency in solrcloud
[ https://issues.apache.org/jira/browse/SOLR-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139269#comment-17139269 ] Thomas Corthals commented on SOLR-9060: --- I ran into this issue with Solr 7 and 8. The behaviour has changed a little since this was first reported. The sorting seems to be correct now with {{spellcheck.extendedResults=true}}, but not with {{spellcheck.extendedResults=false}}. Tested on Solr 7.7.3 and Solr 8.5.2 in cloud mode with 2 nodes, 1 collection with numShards = 2 & replicationFactor = 1, techproducts configset and example data: {code:java} $ curl 'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=false' "suggestion":["cord", "corp", "card"]}], $ curl 'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=true' "suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}], {code} I made a full comparison for standalone and cloud, on both the {{/spell}} and {{/browse}} request handlers in techproducts. | |*Standalone*|*Cloud*| |{{ ** }}|{{*/spell*}}|{{*/browse*}}|{{*/spell*}}|{{*/browse*}}| |{{*spellcheck.extendedResults=false*}}|{{"suggestion":["corp",}} {{ "cord",}} {{ "card"]}],}}|{{"suggestion":["corp",}} {{ "cord",}} {{ "card"]}],}}|{{"suggestion":[{color:#FF}"cord"{color},}} {{{color:#FF} "corp"{color},}} {{ "card"]}],}}|{{"suggestion":[{color:#FF}"cord"{color},}} {{{color:#FF} "corp"{color},}} {{ "card"]}],}}| |{{*spellcheck.extendedResults=true*}}|{{"suggestion":[{}} {{ "word":"corp",}} {{ "freq":2},}} {{ {}} {{ "word":"cord",}} {{ "freq":1},}} {{ {}} {{ "word":"card",}} {{ "freq":4}]}],}}|{{"suggestion":[{}} {{ "word":"corp",}} {{ "freq":2},}} {{ {}} {{ "word":"cord",}} {{ "freq":1},}} {{ {}} {{ "word":"card",}} {{ "freq":4}]}],}}|{{"suggestion":[{}} {{ "word":"corp",}} {{ "freq":2},}} {{ {}} {{ "word":"cord",}} {{ "freq":1},}} {{ {}} {{ "word":"card",}} {{ "freq":4}]}],}}|{{"suggestion":[{}} {{ "word":"corp",}} {{ "freq":2},}} {{ {}} {{ "word":"cord",}} {{ "freq":1},}} {{ {}} {{ "word":"card",}} {{ "freq":4}]}],}}| > Spellcheck sort by frequency in solrcloud > - > > Key: SOLR-9060 > URL: https://issues.apache.org/jira/browse/SOLR-9060 > Project: Solr > Issue Type: Bug >Affects Versions: 5.3 >Reporter: Gitanjali Palwe >Priority: Major > Attachments: spellcheck-sort-frequency.png > > > The sorting by frequency for spellchecker doesn't work in solrcloud mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9060) Spellcheck sort by frequency in solrcloud
[ https://issues.apache.org/jira/browse/SOLR-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139269#comment-17139269 ] Thomas Corthals edited comment on SOLR-9060 at 6/18/20, 9:53 AM: - I ran into this issue with Solr 7 and 8. The behaviour has changed a little since this was first reported. The sorting seems to be correct now with {{spellcheck.extendedResults=true}}, but not with {{spellcheck.extendedResults=false}}. Tested on Solr 7.7.3 and Solr 8.5.2 in cloud mode with 2 nodes, 1 collection with numShards = 2 & replicationFactor = 1, techproducts configset and example data: {code:java} $ curl 'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=false' "suggestion":["cord", "corp", "card"]}], $ curl 'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=true' "suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}], {code} I made a full comparison for standalone and cloud, on both the {{/spell}} and {{/browse}} request handlers in techproducts. | |*Standalone*|*Standalone*|*Cloud*|*Cloud*| | ** |*/spell*|*/browse*|*/spell*|*/browse*| |*spellcheck.extendedResults=false*|"suggestion":["corp", "cord", "card"]}],|"suggestion":["corp", "cord", "card"]}],|"suggestion":[{color:#FF}"cord"{color}, {color:#FF}"corp"{color}, "card"]}],|"suggestion":[{color:#FF}"cord"{color}, {color:#FF}"corp"{color}, "card"]}],| |*spellcheck.extendedResults=true*|"suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}],|"suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}],|"suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}],|"suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}],| was (Author: thomascorthals): I ran into this issue with Solr 7 and 8. The behaviour has changed a little since this was first reported. The sorting seems to be correct now with {{spellcheck.extendedResults=true}}, but not with {{spellcheck.extendedResults=false}}. Tested on Solr 7.7.3 and Solr 8.5.2 in cloud mode with 2 nodes, 1 collection with numShards = 2 & replicationFactor = 1, techproducts configset and example data: {code:java} $ curl 'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=false' "suggestion":["cord", "corp", "card"]}], $ curl 'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=true' "suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}], {code} I made a full comparison for standalone and cloud, on both the {{/spell}} and {{/browse}} request handlers in techproducts. | |*Standalone*|*Cloud*| |{{ ** }}|{{*/spell*}}|{{*/browse*}}|{{*/spell*}}|{{*/browse*}}| |{{*spellcheck.extendedResults=false*}}|{{"suggestion":["corp",}} {{ "cord",}} {{ "card"]}],}}|{{"suggestion":["corp",}} {{ "cord",}} {{ "card"]}],}}|{{"suggestion":[{color:#FF}"cord"{color},}} {{{color:#FF} "corp"{color},}} {{ "card"]}],}}|{{"suggestion":[{color:#FF}"cord"{color},}} {{{color:#FF} "corp"{color},}} {{ "card"]}],}}| |{{*spellcheck.extendedResults=true*}}|{{"suggestion":[{}} {{ "word":"corp",}} {{ "freq":2},}} {{ {}} {{ "word":"cord",}} {{ "freq":1},}} {{ {}} {{ "word":"card",}} {{ "freq":4}]}],}}|{{"suggestion":[{}} {{ "word":"corp",}} {{ "freq":2},}} {{ {}} {{ "word":"cord",}} {{ "freq":1},}} {{ {}} {{ "word":"card",}} {{ "freq":4}]}],}}|{{"suggestion":[{}} {{ "word":"corp",}} {{ "freq":2},}} {{ {}} {{ "word":"cord",}} {{ "freq":1},}} {{ {}} {{ "word":"card",}} {{ "freq":4}]}],}}|{{"suggestion":[{}} {{ "word":"corp",}} {{ "freq":2},}} {{ {}} {{ "word":"cord",}} {{ "freq":1},}} {{ {}} {{ "word":"card",}} {{ "freq":4}]}],}}| > Spellcheck sort by frequency in solrcloud > - > > Key: SOLR-9060 > URL: https://issues.apache.org/jira/browse/SOLR-9060 > Project: Solr > Issue Type: Bug >Affects Versions: 5.3 >Reporter: Gitanjali Palwe >Priority: Major > Attachments: spellcheck-sort-frequency.png > > > The sorting by frequency for spellchecker doesn't work in solrcloud mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
[jira] [Comment Edited] (SOLR-9060) Spellcheck sort by frequency in solrcloud
[ https://issues.apache.org/jira/browse/SOLR-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139269#comment-17139269 ] Thomas Corthals edited comment on SOLR-9060 at 6/18/20, 9:53 AM: - I ran into this issue with Solr 7 and 8. The behaviour has changed a little since this was first reported. The sorting seems to be correct now with {{spellcheck.extendedResults=true}}, but not with {{spellcheck.extendedResults=false}}. Tested on Solr 7.7.3 and Solr 8.5.2 in cloud mode with 2 nodes, 1 collection with numShards = 2 & replicationFactor = 1, techproducts configset and example data: {code:java} $ curl 'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=false' "suggestion":["cord", "corp", "card"]}], $ curl 'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=true' "suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}], {code} I made a full comparison for standalone and cloud, on both the {{/spell}} and {{/browse}} request handlers in techproducts. | |*Standalone*|*Standalone*|*Cloud*|*Cloud*| | |*/spell*|*/browse*|*/spell*|*/browse*| |*spellcheck.extendedResults=false*|"suggestion":["corp", "cord", "card"]}],|"suggestion":["corp", "cord", "card"]}],|"suggestion":[{color:#ff}"cord"{color}, {color:#ff}"corp"{color}, "card"]}],|"suggestion":[{color:#ff}"cord"{color}, {color:#ff}"corp"{color}, "card"]}],| |*spellcheck.extendedResults=true*|"suggestion":[ \{ "word":"corp", "freq":2},{ "word":"cord", "freq":1} ,{ "word":"card", "freq":4} ]}],|"suggestion":[ \{ "word":"corp", "freq":2},{ "word":"cord", "freq":1} ,{ "word":"card", "freq":4} ]}],|"suggestion":[ \{ "word":"corp", "freq":2},{ "word":"cord", "freq":1} ,{ "word":"card", "freq":4} ]}],|"suggestion":[ \{ "word":"corp", "freq":2},{ "word":"cord", "freq":1} ,{ "word":"card", "freq":4} ]}],| was (Author: thomascorthals): I ran into this issue with Solr 7 and 8. The behaviour has changed a little since this was first reported. The sorting seems to be correct now with {{spellcheck.extendedResults=true}}, but not with {{spellcheck.extendedResults=false}}. Tested on Solr 7.7.3 and Solr 8.5.2 in cloud mode with 2 nodes, 1 collection with numShards = 2 & replicationFactor = 1, techproducts configset and example data: {code:java} $ curl 'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=false' "suggestion":["cord", "corp", "card"]}], $ curl 'http://localhost:8983/solr/techproducts/spell?q=power%20cort&spellcheck.extendedResults=true' "suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}], {code} I made a full comparison for standalone and cloud, on both the {{/spell}} and {{/browse}} request handlers in techproducts. | |*Standalone*|*Standalone*|*Cloud*|*Cloud*| | ** |*/spell*|*/browse*|*/spell*|*/browse*| |*spellcheck.extendedResults=false*|"suggestion":["corp", "cord", "card"]}],|"suggestion":["corp", "cord", "card"]}],|"suggestion":[{color:#FF}"cord"{color}, {color:#FF}"corp"{color}, "card"]}],|"suggestion":[{color:#FF}"cord"{color}, {color:#FF}"corp"{color}, "card"]}],| |*spellcheck.extendedResults=true*|"suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}],|"suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}],|"suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}],|"suggestion":[{ "word":"corp", "freq":2}, { "word":"cord", "freq":1}, { "word":"card", "freq":4}]}],| > Spellcheck sort by frequency in solrcloud > - > > Key: SOLR-9060 > URL: https://issues.apache.org/jira/browse/SOLR-9060 > Project: Solr > Issue Type: Bug >Affects Versions: 5.3 >Reporter: Gitanjali Palwe >Priority: Major > Attachments: spellcheck-sort-frequency.png > > > The sorting by frequency for spellchecker doesn't work in solrcloud mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14210) Add replica state option for HealthCheckHandler
[ https://issues.apache.org/jira/browse/SOLR-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139290#comment-17139290 ] Jan Høydahl commented on SOLR-14210: Please ask such questions on the solr-user mailing list. > Add replica state option for HealthCheckHandler > --- > > Key: SOLR-14210 > URL: https://issues.apache.org/jira/browse/SOLR-14210 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.5 >Reporter: Houston Putman >Assignee: Jan Høydahl >Priority: Major > Fix For: 8.6 > > Attachments: docs.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > h2. Background > As was brought up in SOLR-13055, in order to run Solr in a more cloud-native > way, we need some additional features around node-level healthchecks. > {quote}Like in Kubernetes we need 'liveliness' and 'readiness' probe > explained in > [https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/n] > determine if a node is live and ready to serve live traffic. > {quote} > > However there are issues around kubernetes managing it's own rolling > restarts. With the current healthcheck setup, it's easy to envision a > scenario in which Solr reports itself as "healthy" when all of its replicas > are actually recovering. Therefore kubernetes, seeing a healthy pod would > then go and restart the next Solr node. This can happen until all replicas > are "recovering" and none are healthy. (maybe the last one restarted will be > "down", but still there are no "active" replicas) > h2. Proposal > I propose we make an additional healthcheck handler that returns whether all > replicas hosted by that Solr node are healthy and "active". That way we will > be able to use the [default kubernetes rolling restart > logic|https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#update-strategies] > with Solr. > To add on to [Jan's point > here|https://issues.apache.org/jira/browse/SOLR-13055?focusedCommentId=16716559&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16716559], > this handler should be more friendly for other Content-Types and should use > bettter HTTP response statuses. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit
s1monw commented on a change in pull request #1552: URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r442167608 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3255,7 +3302,16 @@ private long prepareCommitInternal() throws IOException { } finally { maybeCloseOnTragicEvent(); } - + + if (onCommitMerges != null) { +mergeScheduler.merge(mergeSource, MergeTrigger.COMMIT); Review comment: yeah I mean we don't have to do that and I think its rather a rare combination. My problem is that this entire configuration of max wait time is nonsense if SerialMS is used since we block until it has merged them all and potentially a bunch of other merges to a commit / refresh could take quite a long time. On the other hand, as you stated we will call maybeMerge anyway in the commit such that it's not really making any difference and the same is true for getReader so I think we are fine as it is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14575) Solr restore is failing when basic authentication is enabled
[ https://issues.apache.org/jira/browse/SOLR-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139384#comment-17139384 ] Yaswanth commented on SOLR-14575: - Collection: test operation: restore failed:org.apache.solr.common.SolrException: ADDREPLICA failed to create replicaCollection: test operation: restore failed:org.apache.solr.common.SolrException: ADDREPLICA failed to create replica at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler$ShardRequestTracker.processResponses(OverseerCollectionMessageHandler.java:1030) at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler$ShardRequestTracker.processResponses(OverseerCollectionMessageHandler.java:1013) at org.apache.solr.cloud.api.collections.AddReplicaCmd.lambda$addReplica$1(AddReplicaCmd.java:177) at org.apache.solr.cloud.api.collections.AddReplicaCmd$$Lambda$798/.run(Unknown Source) at org.apache.solr.cloud.api.collections.AddReplicaCmd.addReplica(AddReplicaCmd.java:199) at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:708) at org.apache.solr.cloud.api.collections.RestoreCmd.call(RestoreCmd.java:286) at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)Caused by: org.apache.solr.common.SolrException: javax.crypto.BadPaddingException: RSA private key operation failed at org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:325) at org.apache.solr.security.PKIAuthenticationPlugin.generateToken(PKIAuthenticationPlugin.java:305) at org.apache.solr.security.PKIAuthenticationPlugin.access$200(PKIAuthenticationPlugin.java:61) at org.apache.solr.security.PKIAuthenticationPlugin$2.onQueued(PKIAuthenticationPlugin.java:239) at org.apache.solr.client.solrj.impl.Http2SolrClient.decorateRequest(Http2SolrClient.java:468) at org.apache.solr.client.solrj.impl.Http2SolrClient.makeRequest(Http2SolrClient.java:455) at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:364) at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:746) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1274) at org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238) at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199) at org.apache.solr.handler.component.HttpShardHandler$$Lambda$512/.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) ... 5 moreCaused by: javax.crypto.BadPaddingException: RSA private key operation failed at java.base/sun.security.rsa.NativeRSACore.crtCrypt_Native(NativeRSACore.java:149) at java.base/sun.security.rsa.NativeRSACore.rsa(NativeRSACore.java:91) at java.base/sun.security.rsa.RSACore.rsa(RSACore.java:149) at java.base/com.sun.crypto.provider.RSACipher.doFinal(RSACipher.java:355) at java.base/com.sun.crypto.provider.RSACipher.engineDoFinal(RSACipher.java:392) at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2260) at org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:323) ... 20 more That's the error stack trace I am seeing, as soon as I call the restore API I am seeing the collection test with a single core on the cloud but its in down state. No of nodes that I configured with solr cloud is : 2 Testing on a single collection with 2 replicas Here is my security.json looks like { "authentication":{ "class":"solr.BasicAuthPlugin", "credentials":{ "admin":"", "dev":""}, "":\{"v":11}, "blockUnknown":true, "forwardCredentials":true}, "authorization":{ "class":"solr.RuleBasedAuthorizationPlugin", "user-role":{ "solradmin":[ "admin", "dev"], "dev":["read"]}, "":\{"v":9}, "permissions":[ { "name":"read", "role":"*", "index":1}, { "name":"security-read", "role":"admin", "index":2}, { "name":"security-edit", "role":"admin", "index":3
[jira] [Commented] (SOLR-14581) Document the way auto commits work in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-14581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139416#comment-17139416 ] Lucene/Solr QA commented on SOLR-14581: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || || || || || {color:brown} master Compile Tests {color} || || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 0m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 0m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate ref guide {color} | {color:green} 0m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:black}{color} | {color:black} {color} | {color:black} 2m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-14581 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13005940/SOLR-14581.patch | | Optional Tests | ratsources validatesourcepatterns validaterefguide | | uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / 0ea0358 | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | modules | C: solr/solr-ref-guide U: solr/solr-ref-guide | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/766/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Document the way auto commits work in SolrCloud > --- > > Key: SOLR-14581 > URL: https://issues.apache.org/jira/browse/SOLR-14581 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation, SolrCloud >Affects Versions: master (9.0) >Reporter: Bram Van Dam >Priority: Minor > Attachments: SOLR-14581.patch > > > The documentation is unclear about how auto commits actually work in > SolrCloud. A mailing list reply by Erick Erickson proved to be enlightening. > Erick's reply verbatim: > {quote}Each node has its own timer that starts when it receives an update. > So in your situation, 60 seconds after any give replica gets it’s first > update, all documents that have been received in the interval will > be committed. > But note several things: > 1> commits will tend to cluster for a given shard. By that I mean > they’ll tend to happen within a few milliseconds of each other >‘cause it doesn’t take that long for an update to get from the >leader to all the followers. > 2> this is per replica. So if you host replicas from multiple collections >on some node, their commits have no relation to each other. And >say for some reason you transmit exactly one document that lands >on shard1. Further, say nodeA contains replicas for shard1 and shard2. >Only the replica for shard1 would commit. > 3> Solr promises eventual consistency. In this case, due to all the >timing variables it is not guaranteed that every replica of a single >shard has the same document available for search at any given time. >Say doc1 hits the leader at time T and a follower at time T+10ms. >Say doc2 hits the leader and gets indexed 5ms before the >commit is triggered, but for some reason it takes 15ms for it to get >to the follower. The leader will be able to search doc2, but the > follower won’t until 60 seconds later.{quote} > Perhaps the subject deserves a section of its own, but I'll attach a patch > which includes the gist of Erick's reply as a Tip in the "indexing in > SolrCloud"-section. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw opened a new pull request #1594: Replace DWPT.DocState with simple method parameters
s1monw opened a new pull request #1594: URL: https://github.com/apache/lucene-solr/pull/1594 DWPT.DocState had some history value but today in a little bit more cleaned up DWPT and IndexingChain there is little to no value in having this class. It also requires explicit cleanup which is not not necessary anymore. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on pull request #1552: LUCENE-8962: merge small segments on commit
s1monw commented on pull request #1552: URL: https://github.com/apache/lucene-solr/pull/1552#issuecomment-646029945 @msokolov you are more than welcome. I think it's a great example how OSS works or should work thanks for being so patient with me :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8574) ExpressionFunctionValues should cache per-hit value
[ https://issues.apache.org/jira/browse/LUCENE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139437#comment-17139437 ] Michael McCandless commented on LUCENE-8574: {quote}please, lets keep the boolean and not bring NaN into this. {quote} +1 {quote}And it seems the patch attached to this issue could not handle it as well (since DoubleValues generated for the same LeafReaderContext is not the same, we still get tons of DoubleValues created). {quote} Hmm, good catch! So we somehow need to ensure that we use the same {{DoubleValues}} instance per-segment per-binding? But how can we safely do that, i.e. we can't know that this current caller will consume the same {{DoubleValues}} in the same {{docid}} progression? > ExpressionFunctionValues should cache per-hit value > --- > > Key: LUCENE-8574 > URL: https://issues.apache.org/jira/browse/LUCENE-8574 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.5, 8.0 >Reporter: Michael McCandless >Assignee: Robert Muir >Priority: Major > Attachments: LUCENE-8574.patch, unit_test.patch > > Time Spent: 1h > Remaining Estimate: 0h > > The original version of {{ExpressionFunctionValues}} had a simple per-hit > cache, so that nested expressions that reference the same common variable > would compute the value for that variable the first time it was referenced > and then use that cached value for all subsequent invocations, within one > hit. I think it was accidentally removed in LUCENE-7609? > This is quite important if you have non-trivial expressions that reference > the same variable multiple times. > E.g. if I have these expressions: > {noformat} > x = c + d > c = b + 2 > d = b * 2{noformat} > Then evaluating x should only cause b's value to be computed once (for a > given hit), but today it's computed twice. The problem is combinatoric if b > then references another variable multiple times, etc. > I think to fix this we just need to restore the per-hit cache? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on pull request #1594: Replace DWPT.DocState with simple method parameters
s1monw commented on pull request #1594: URL: https://github.com/apache/lucene-solr/pull/1594#issuecomment-646030388 @dweiss maybe you have a moment to look at this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy merged pull request #1572: SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated
janhoy merged pull request #1572: URL: https://github.com/apache/lucene-solr/pull/1572 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14561) Validate parameters to CoreAdminAPI
[ https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-14561: --- Fix Version/s: 8.6 > Validate parameters to CoreAdminAPI > --- > > Key: SOLR-14561 > URL: https://issues.apache.org/jira/browse/SOLR-14561 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: 8.6 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > CoreAdminAPI does not validate parameter input. We should limit what users > can specify for at least {{instanceDir and dataDir}} params, perhaps restrict > them to be relative to SOLR_HOME or SOLR_DATA_HOME. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14561) Validate parameters to CoreAdminAPI
[ https://issues.apache.org/jira/browse/SOLR-14561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139453#comment-17139453 ] Jan Høydahl commented on SOLR-14561: Committed to master. Will let Jenkins work on it and then backport. > Validate parameters to CoreAdminAPI > --- > > Key: SOLR-14561 > URL: https://issues.apache.org/jira/browse/SOLR-14561 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: 8.6 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > CoreAdminAPI does not validate parameter input. We should limit what users > can specify for at least {{instanceDir and dataDir}} params, perhaps restrict > them to be relative to SOLR_HOME or SOLR_DATA_HOME. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9410) German/French stemmers fail for common forms maux, gegrüßt, grüßend, schlummert
Ben Kazez created LUCENE-9410: - Summary: German/French stemmers fail for common forms maux, gegrüßt, grüßend, schlummert Key: LUCENE-9410 URL: https://issues.apache.org/jira/browse/LUCENE-9410 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 8.5 Environment: Elasticsearch 7.7.1 running on cloud.elastic.co Reporter: Ben Kazez I'm using Lucene via Elasticsearch 7.7.1 and have run into an issue where German and French stemming (either via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) fails to identify some common forms: - French: - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" is unchanged - German: - "schlummert" should match "schlummern" (infinitive) but instead is unchanged - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend" - "gegrüßt" should match "grüßen" (infinitive) but instead yields "gegrusst" The folks from Elasticsearch said I should file a bug with Lucene: https://discuss.elastic.co/t/better-french-and-german-stemming/236283 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9410) German/French stemmers fail for common forms maux, gegrüßt, grüßend, schlummert
[ https://issues.apache.org/jira/browse/LUCENE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kazez updated LUCENE-9410: -- Description: I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are failing to understand some common forms: - French: - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" is unchanged - German: - "schlummert" should match "schlummern" (infinitive) but instead is unchanged - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend" - "gegrüßt" should match "grüßen" (infinitive) but instead yields "gegrusst" The Elasticsearch folks [said|https://discuss.elastic.co/t/better-french-and-german-stemming/236283] I should file a bug with Lucene. was: I'm using Lucene via Elasticsearch 7.7.1 and have run into an issue where German and French stemming (either via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) fails to identify some common forms: - French: - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" is unchanged - German: - "schlummert" should match "schlummern" (infinitive) but instead is unchanged - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend" - "gegrüßt" should match "grüßen" (infinitive) but instead yields "gegrusst" The folks from Elasticsearch said I should file a bug with Lucene: https://discuss.elastic.co/t/better-french-and-german-stemming/236283 > German/French stemmers fail for common forms maux, gegrüßt, grüßend, > schlummert > --- > > Key: LUCENE-9410 > URL: https://issues.apache.org/jira/browse/LUCENE-9410 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.5 > Environment: Elasticsearch 7.7.1 running on cloud.elastic.co >Reporter: Ben Kazez >Priority: Major > Labels: french, german, stemmer, stemming > > I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either > via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are > failing to understand some common forms: > - French: > - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" > is unchanged > - German: > - "schlummert" should match "schlummern" (infinitive) but instead is > unchanged > - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend" > - "gegrüßt" should match "grüßen" (infinitive) but instead yields > "gegrusst" > The Elasticsearch folks > [said|https://discuss.elastic.co/t/better-french-and-german-stemming/236283] > I should file a bug with Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9410) German/French stemmers fail for common forms maux, gegrüßt, grüßend, schlummert
[ https://issues.apache.org/jira/browse/LUCENE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kazez updated LUCENE-9410: -- Description: I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are failing to understand some common forms: French: - "maux" (plural) should match "mal" (singular) but instead "maux" is unchanged German: - "schlummert" should match "schlummern" (infinitive) but instead is unchanged - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend" - "gegrüßt" should match "grüßen" (infinitive) but instead yields "gegrusst" The Elasticsearch folks [said|https://discuss.elastic.co/t/better-french-and-german-stemming/236283] I should file a bug with Lucene. was: I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are failing to understand some common forms: - French: - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" is unchanged - German: - "schlummert" should match "schlummern" (infinitive) but instead is unchanged - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend" - "gegrüßt" should match "grüßen" (infinitive) but instead yields "gegrusst" The Elasticsearch folks [said|https://discuss.elastic.co/t/better-french-and-german-stemming/236283] I should file a bug with Lucene. > German/French stemmers fail for common forms maux, gegrüßt, grüßend, > schlummert > --- > > Key: LUCENE-9410 > URL: https://issues.apache.org/jira/browse/LUCENE-9410 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.5 > Environment: Elasticsearch 7.7.1 running on cloud.elastic.co >Reporter: Ben Kazez >Priority: Major > Labels: french, german, stemmer, stemming > > I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either > via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are > failing to understand some common forms: > French: > - "maux" (plural) should match "mal" (singular) but instead "maux" is > unchanged > German: > - "schlummert" should match "schlummern" (infinitive) but instead is > unchanged > - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend" > - "gegrüßt" should match "grüßen" (infinitive) but instead yields > "gegrusst" > The Elasticsearch folks > [said|https://discuss.elastic.co/t/better-french-and-german-stemming/236283] > I should file a bug with Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14574) Fix or suppress warnings in solr/core/src/test
[ https://issues.apache.org/jira/browse/SOLR-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-14574. --- Fix Version/s: 8.6 Resolution: Fixed Hmmm, not sure where the commit message went for part 2, maybe it'll just take a while. Here are the shas anyway 936b9d770e7..84729edbba0 master -> master 2113597970b..9ed037074c1 branch_8x -> branch_8x > Fix or suppress warnings in solr/core/src/test > -- > > Key: SOLR-14574 > URL: https://issues.apache.org/jira/browse/SOLR-14574 > Project: Solr > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Fix For: 8.6 > > > Just when I thought I was done I ran testClasses > I'm going to do this a little differently. Rather than do a directory at a > time, I'll just fix a bunch, push, fix a bunch more, push all on this Jira > until I'm done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msfroh commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit
msfroh commented on a change in pull request #1552: URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r442326229 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3226,15 +3235,53 @@ private long prepareCommitInternal() throws IOException { // sneak into the commit point: toCommit = segmentInfos.clone(); + if (anyChanges && maxCommitMergeWaitSeconds > 0) { +SegmentInfos committingSegmentInfos = toCommit; +onCommitMerges = updatePendingMerges(new OneMergeWrappingMergePolicy(config.getMergePolicy(), toWrap -> +new MergePolicy.OneMerge(toWrap.segments) { + @Override + public void mergeFinished(boolean committed) throws IOException { Review comment: Oh -- I guess one minor complaint about moving this into `prepareCommitInternal` is that we won't be able to reuse it (without moving it) if we decide to apply the same logic to `IndexWriter.getReader()`. That said, moving it if/when someone gets around to applying the logic there isn't a big deal. (I think the real work there is reconciling logic from StandardDirectoryReader.open() with logic in IndexWriter.prepareCommitInternal(), since the functionality is kind of duplicated.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-10059) In SolrCloud, every fq added via is computed twice.
[ https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Torsten Bøgh Köster updated SOLR-10059: --- Attachment: SOLR-10059_7x.patch > In SolrCloud, every fq added via is computed twice. > > > Key: SOLR-10059 > URL: https://issues.apache.org/jira/browse/SOLR-10059 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 6.4 >Reporter: Marc Morissette >Priority: Major > Labels: performance > Attachments: SOLR-10059_7x.patch > > > While researching another issue, I noticed that parameters appended to a > query via SearchHandler's are added to the query twice > in SolrCloud: once on the aggregator and again on the shard. > The FacetComponent corrects this automatically by removing duplicates. Field > queries added in this fashion are however computed twice and that hinders > performance on filter queries that aren't simple bitsets such as those > produced by the CollapsingQueryParser. > To reproduce the issue, simply test this handler on a large enough > collection, then replace "appends" with "defaults". You'll notice significant > performance improvements. > {code} > > > {!collapse field=routingKey hint=top_fc} > > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14566) Record "NOW" on "coordinator" log messages
[ https://issues.apache.org/jira/browse/SOLR-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139557#comment-17139557 ] Jason Gerlowski commented on SOLR-14566: I ended up using the DebugComponent as Tomas suggested. It uses a format similar to what Robert pointed to as well. So 2 birds with one stone. In terms of implementation I waffled a bit between putting the logic in its own SearchComponent impl that fires all the time, and just bundling it in to SearchHandler. The former is simpler and easier to test, but I'm not sure that such a trivial Component impl really fits what that abstraction is intended for. I implemented both methods since they were both small changes. The Component-based approach is on a branch in my personal fork here: https://github.com/gerlowskija/lucene-solr-1/tree/SOLR_14566_move_rid_into_separate_component. I've updated the existing Github PR to use the SearchHandler impl, since I was leaning slightly in that direction: https://github.com/apache/lucene-solr/pull/1574 Once I choose an approach I still plan on adding a feature flag to disable it, and some tests (easier said than done for SearchHandler, but maybe I just need to sleep on it.) Again, appreciate any feedback on the approach if people prefer one over the other. A part of me still likes the simplicity of the {{NOW}} based impl, but oh well. > Record "NOW" on "coordinator" log messages > -- > > Key: SOLR-14566 > URL: https://issues.apache.org/jira/browse/SOLR-14566 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently, in SolrCore.java we log each search request that comes through > each core as it is finishing. This includes the path, query-params, QTime, > and status. In the case of a distributed search both the "coordinator" node > and each of the per-shard requests produce a log message. > When Solr is fielding many identical queries, such as those created by a > healthcheck or dashboard, it can be hard when examining logs to link the > per-shard requests with the "cooordinator" request that came in upstream. > One thing that would make this easier is if the {{NOW}} param added to > per-shard requests is also included in the log message from the > "coordinator". While {{NOW}} isn't unique strictly speaking, it often is in > practice, and along with the query-params would allow debuggers to associate > shard requests with coordinator requests a large majority of the time. > An alternative approach would be to create a {{qid}} or {{query-uuid}} when > the coordinator starts its work that can be logged everywhere. This provides > a stronger expectation around uniqueness, but would require UUID generation > on the coordinator, which may be non-negligible work at high QPS (maybe? I > have no idea). It also loses the neatness of reusing data already present on > the shard requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] tflobbe merged pull request #1567: LUCENE-9402: Let MultiCollector handle minCompetitiveScore
tflobbe merged pull request #1567: URL: https://github.com/apache/lucene-solr/pull/1567 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9402) Let MultiCollector Scorer handle minCompetitiveScore calls
[ https://issues.apache.org/jira/browse/LUCENE-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomas Eduardo Fernandez Lobbe reassigned LUCENE-9402: - Assignee: Tomas Eduardo Fernandez Lobbe > Let MultiCollector Scorer handle minCompetitiveScore calls > -- > > Key: LUCENE-9402 > URL: https://issues.apache.org/jira/browse/LUCENE-9402 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomas Eduardo Fernandez Lobbe >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > See SOLR-14554. MultiCollector creates a scorer that explicitly prevents > setting the {{minCompetitiveScore}}: > {code:java} > @Override > public void setScorer(Scorable scorer) throws IOException { > if (cacheScores) { > scorer = new ScoreCachingWrappingScorer(scorer); > } > scorer = new FilterScorable(scorer) { > @Override > public void setMinCompetitiveScore(float minScore) throws IOException > { > // Ignore calls to setMinCompetitiveScore so that if we wrap two > // collectors and one of them wants to skip low-scoring hits, then > // the other collector still sees all hits. We could try to > reconcile > // min scores and take the maximum min score across collectors, but > // this is very unlikely to be helpful in practice. > } > }; > for (int i = 0; i < numCollectors; ++i) { > final LeafCollector c = collectors[i]; > c.setScorer(scorer); > } > } > {code} > Solr uses MultiCollector when scores are requested (to collect the max > score), which means it wouldn't be able to use WAND algorithm. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw merged pull request #1594: Replace DWPT.DocState with simple method parameters
s1monw merged pull request #1594: URL: https://github.com/apache/lucene-solr/pull/1594 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on pull request #1594: Replace DWPT.DocState with simple method parameters
mikemccand commented on pull request #1594: URL: https://github.com/apache/lucene-solr/pull/1594#issuecomment-646224325 +1, thanks for cleaning things up @s1monw. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on a change in pull request #1552: LUCENE-8962: merge small segments on commit
s1monw commented on a change in pull request #1552: URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r442422185 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3226,15 +3235,53 @@ private long prepareCommitInternal() throws IOException { // sneak into the commit point: toCommit = segmentInfos.clone(); + if (anyChanges && maxCommitMergeWaitSeconds > 0) { +SegmentInfos committingSegmentInfos = toCommit; +onCommitMerges = updatePendingMerges(new OneMergeWrappingMergePolicy(config.getMergePolicy(), toWrap -> +new MergePolicy.OneMerge(toWrap.segments) { + @Override + public void mergeFinished(boolean committed) throws IOException { Review comment: I like to move stuf once necessary I think we need to adjust it there anyway so we can move it in a followup. ok? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-10059) In SolrCloud, every fq added via is computed twice.
[ https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139917#comment-17139917 ] Torsten Bøgh Köster commented on SOLR-10059: I attached a patch for this issue, which still exists in 7.x and 8.x. In a distributes request, pre-configured query params in the "appends"-section get re-appended on the shards. If those parameters furthermore reference other parameters (like $qq), these do not get dereferenced. In our case, this broke the collapse component. The patch skips re-appending on the shards (_isShard=true_) if the parameter _shards.handler.skipAppends=true_. The latter defaults to _false_. > In SolrCloud, every fq added via is computed twice. > > > Key: SOLR-10059 > URL: https://issues.apache.org/jira/browse/SOLR-10059 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 6.4 >Reporter: Marc Morissette >Priority: Major > Labels: performance > Attachments: SOLR-10059_7x.patch > > > While researching another issue, I noticed that parameters appended to a > query via SearchHandler's are added to the query twice > in SolrCloud: once on the aggregator and again on the shard. > The FacetComponent corrects this automatically by removing duplicates. Field > queries added in this fashion are however computed twice and that hinders > performance on filter queries that aren't simple bitsets such as those > produced by the CollapsingQueryParser. > To reproduce the issue, simply test this handler on a large enough > collection, then replace "appends" with "defaults". You'll notice significant > performance improvements. > {code} > > > {!collapse field=routingKey hint=top_fc} > > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-10059) In SolrCloud, every fq added via is computed twice.
[ https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Torsten Bøgh Köster updated SOLR-10059: --- Attachment: (was: SOLR-10059_7x.patch) > In SolrCloud, every fq added via is computed twice. > > > Key: SOLR-10059 > URL: https://issues.apache.org/jira/browse/SOLR-10059 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 6.4 >Reporter: Marc Morissette >Priority: Major > Labels: performance > Attachments: SOLR-10059_7x.patch > > > While researching another issue, I noticed that parameters appended to a > query via SearchHandler's are added to the query twice > in SolrCloud: once on the aggregator and again on the shard. > The FacetComponent corrects this automatically by removing duplicates. Field > queries added in this fashion are however computed twice and that hinders > performance on filter queries that aren't simple bitsets such as those > produced by the CollapsingQueryParser. > To reproduce the issue, simply test this handler on a large enough > collection, then replace "appends" with "defaults". You'll notice significant > performance improvements. > {code} > > > {!collapse field=routingKey hint=top_fc} > > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-10059) In SolrCloud, every fq added via is computed twice.
[ https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Torsten Bøgh Köster updated SOLR-10059: --- Attachment: SOLR-10059_7x.patch > In SolrCloud, every fq added via is computed twice. > > > Key: SOLR-10059 > URL: https://issues.apache.org/jira/browse/SOLR-10059 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 6.4 >Reporter: Marc Morissette >Priority: Major > Labels: performance > Attachments: SOLR-10059_7x.patch > > > While researching another issue, I noticed that parameters appended to a > query via SearchHandler's are added to the query twice > in SolrCloud: once on the aggregator and again on the shard. > The FacetComponent corrects this automatically by removing duplicates. Field > queries added in this fashion are however computed twice and that hinders > performance on filter queries that aren't simple bitsets such as those > produced by the CollapsingQueryParser. > To reproduce the issue, simply test this handler on a large enough > collection, then replace "appends" with "defaults". You'll notice significant > performance improvements. > {code} > > > {!collapse field=routingKey hint=top_fc} > > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9411) Fail complation on warnings
Erick Erickson created LUCENE-9411: -- Summary: Fail complation on warnings Key: LUCENE-9411 URL: https://issues.apache.org/jira/browse/LUCENE-9411 Project: Lucene - Core Issue Type: Improvement Components: general/build Reporter: Erick Erickson Assignee: Erick Erickson Moving this over here from SOLR-11973 since it's part of the build system and affects Lucene as well as Solr. You might want to see the discussion there. We have a clean compile for both Solr and Lucene, no rawtypes, unchecked, try, etc. warnings. There are some peculiar warnings (things like SuppressFBWarnings, i.e. FindBugs) that I'm not sure about at all, but let's assume those are not a problem. Now I'd like to start failing the compilation if people write new code that generates warnings. >From what I can tell, just adding the flag is easy in both the Gradle and Ant >builds. I still have to prove out that adding -Werrors does what I expect, >i.e. succeeds now and fails when I introduce warnings. But let's assume that works. Are there objections to this idea generally? I hope to have some data by next Monday. FWIW, the Lucene code base had far fewer issues than Solr, but common-build.xml is in Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-10059) In SolrCloud, every fq added via is computed twice.
[ https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Torsten Bøgh Köster updated SOLR-10059: --- Attachment: (was: SOLR-10059_7x.patch) > In SolrCloud, every fq added via is computed twice. > > > Key: SOLR-10059 > URL: https://issues.apache.org/jira/browse/SOLR-10059 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 6.4 >Reporter: Marc Morissette >Priority: Major > Labels: performance > > While researching another issue, I noticed that parameters appended to a > query via SearchHandler's are added to the query twice > in SolrCloud: once on the aggregator and again on the shard. > The FacetComponent corrects this automatically by removing duplicates. Field > queries added in this fashion are however computed twice and that hinders > performance on filter queries that aren't simple bitsets such as those > produced by the CollapsingQueryParser. > To reproduce the issue, simply test this handler on a large enough > collection, then replace "appends" with "defaults". You'll notice significant > performance improvements. > {code} > > > {!collapse field=routingKey hint=top_fc} > > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-10059) In SolrCloud, every fq added via is computed twice.
[ https://issues.apache.org/jira/browse/SOLR-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Torsten Bøgh Köster updated SOLR-10059: --- Attachment: SOLR-10059_7x.patch > In SolrCloud, every fq added via is computed twice. > > > Key: SOLR-10059 > URL: https://issues.apache.org/jira/browse/SOLR-10059 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 6.4 >Reporter: Marc Morissette >Priority: Major > Labels: performance > Attachments: SOLR-10059_7x.patch > > > While researching another issue, I noticed that parameters appended to a > query via SearchHandler's are added to the query twice > in SolrCloud: once on the aggregator and again on the shard. > The FacetComponent corrects this automatically by removing duplicates. Field > queries added in this fashion are however computed twice and that hinders > performance on filter queries that aren't simple bitsets such as those > produced by the CollapsingQueryParser. > To reproduce the issue, simply test this handler on a large enough > collection, then replace "appends" with "defaults". You'll notice significant > performance improvements. > {code} > > > {!collapse field=routingKey hint=top_fc} > > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1593: LUCENE-9409: Check file lengths before creating slices.
mikemccand commented on a change in pull request #1593: URL: https://github.com/apache/lucene-solr/pull/1593#discussion_r442474019 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene86/Lucene86PointsWriter.java ## @@ -68,11 +71,9 @@ public Lucene86PointsWriter(SegmentWriteState writeState, int maxPointsInLeafNod writeState.segmentInfo.getId(), writeState.segmentSuffix); - String metaFileName = IndexFileNames.segmentFileName(writeState.segmentInfo.name, - writeState.segmentSuffix, - Lucene86PointsFormat.META_EXTENSION); - metaOut = writeState.directory.createOutput(metaFileName, writeState.context); - CodecUtil.writeIndexHeader(metaOut, + tempMetaOut = writeState.directory.createTempOutput( Review comment: Why are we switching to a temp file and copying to the real file after closing? Maybe add a comment explaining? ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene86/Lucene86PointsReader.java ## @@ -93,18 +97,12 @@ public Lucene86PointsReader(SegmentReadState readState) throws IOException { BKDReader reader = new BKDReader(metaIn, indexIn, dataIn); readers.put(fieldNumber, reader); } - indexLength = metaIn.readLong(); - dataLength = metaIn.readLong(); } catch (Throwable t) { priorE = t; } finally { CodecUtil.checkFooter(metaIn, priorE); } } - // At this point, checksums of the meta file have been validated so we Review comment: Hmm are we losing this safety? Oh, actually, maybe not, because in the `finally` clause above, where we check meta's footer, if the checksum is bad we will throw an exception, adding it as suppressed exception if the `indexLength` or `dataLength` was wrong. So I think we do not lose any safety with this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1593: LUCENE-9409: Check file lengths before creating slices.
jpountz commented on a change in pull request #1593: URL: https://github.com/apache/lucene-solr/pull/1593#discussion_r442477973 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene86/Lucene86PointsReader.java ## @@ -93,18 +97,12 @@ public Lucene86PointsReader(SegmentReadState readState) throws IOException { BKDReader reader = new BKDReader(metaIn, indexIn, dataIn); readers.put(fieldNumber, reader); } - indexLength = metaIn.readLong(); - dataLength = metaIn.readLong(); } catch (Throwable t) { priorE = t; } finally { CodecUtil.checkFooter(metaIn, priorE); } } - // At this point, checksums of the meta file have been validated so we Review comment: we don't lose safety, but in case of a corrupt meta file, it might be slightly more confusing in the sense that the suppressed exception will complain about a truncated index/data file This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1593: LUCENE-9409: Check file lengths before creating slices.
jpountz commented on a change in pull request #1593: URL: https://github.com/apache/lucene-solr/pull/1593#discussion_r442479067 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene86/Lucene86PointsWriter.java ## @@ -68,11 +71,9 @@ public Lucene86PointsWriter(SegmentWriteState writeState, int maxPointsInLeafNod writeState.segmentInfo.getId(), writeState.segmentSuffix); - String metaFileName = IndexFileNames.segmentFileName(writeState.segmentInfo.name, - writeState.segmentSuffix, - Lucene86PointsFormat.META_EXTENSION); - metaOut = writeState.directory.createOutput(metaFileName, writeState.context); - CodecUtil.writeIndexHeader(metaOut, + tempMetaOut = writeState.directory.createTempOutput( Review comment: This is because we need to write file lengths of the index/data files before any offsets/lengths of slices into these files. But since these index/data files have not been written yet, we don't know the length yet. So I wrote into a temp file, and only then write the final metadata file that includes first the lengths of the index/data files and then metadata about the KD trees that includes offsets into these index/data files. I'll add a comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1593: LUCENE-9409: Check file lengths before creating slices.
jpountz commented on a change in pull request #1593: URL: https://github.com/apache/lucene-solr/pull/1593#discussion_r442482722 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene86/Lucene86PointsWriter.java ## @@ -68,11 +71,9 @@ public Lucene86PointsWriter(SegmentWriteState writeState, int maxPointsInLeafNod writeState.segmentInfo.getId(), writeState.segmentSuffix); - String metaFileName = IndexFileNames.segmentFileName(writeState.segmentInfo.name, - writeState.segmentSuffix, - Lucene86PointsFormat.META_EXTENSION); - metaOut = writeState.directory.createOutput(metaFileName, writeState.context); - CodecUtil.writeIndexHeader(metaOut, + tempMetaOut = writeState.directory.createTempOutput( Review comment: As an alternative, I could buffer the metadata in memory like we do for terms. It will require changing some APIs to replace IndexOutput with DataOutputs but other than that it shouldn't be too hard. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139977#comment-17139977 ] Adrien Grand commented on LUCENE-9378: -- [~alexklibisz] Thanks for the details, what is the order of magnitude of the slowdown that you are observing? > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > Attachments: image-2020-06-12-22-17-30-339.png, > image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, > image-2020-06-12-22-18-48-919.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov merged pull request #1552: LUCENE-8962: merge small segments on commit
msokolov merged pull request #1552: URL: https://github.com/apache/lucene-solr/pull/1552 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9402) Let MultiCollector Scorer handle minCompetitiveScore calls
[ https://issues.apache.org/jira/browse/LUCENE-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomas Eduardo Fernandez Lobbe resolved LUCENE-9402. --- Fix Version/s: 8.6 master (9.0) Resolution: Fixed Git tagging doesn’t seem to be working. Merged this. Master https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4db1e3895fec7cd50b0ad266af5db0757bb5780a 8x: https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8d20e31b22244f4fa68d5d8d82da5d07c4b6a351 > Let MultiCollector Scorer handle minCompetitiveScore calls > -- > > Key: LUCENE-9402 > URL: https://issues.apache.org/jira/browse/LUCENE-9402 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomas Eduardo Fernandez Lobbe >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Major > Fix For: master (9.0), 8.6 > > Time Spent: 1.5h > Remaining Estimate: 0h > > See SOLR-14554. MultiCollector creates a scorer that explicitly prevents > setting the {{minCompetitiveScore}}: > {code:java} > @Override > public void setScorer(Scorable scorer) throws IOException { > if (cacheScores) { > scorer = new ScoreCachingWrappingScorer(scorer); > } > scorer = new FilterScorable(scorer) { > @Override > public void setMinCompetitiveScore(float minScore) throws IOException > { > // Ignore calls to setMinCompetitiveScore so that if we wrap two > // collectors and one of them wants to skip low-scoring hits, then > // the other collector still sees all hits. We could try to > reconcile > // min scores and take the maximum min score across collectors, but > // this is very unlikely to be helpful in practice. > } > }; > for (int i = 0; i < numCollectors; ++i) { > final LeafCollector c = collectors[i]; > c.setScorer(scorer); > } > } > {code} > Solr uses MultiCollector when scores are requested (to collect the max > score), which means it wouldn't be able to use WAND algorithm. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14575) Solr restore is failing when basic authentication is enabled
[ https://issues.apache.org/jira/browse/SOLR-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140021#comment-17140021 ] Jan Høydahl commented on SOLR-14575: The interesting part is {code:java} java.base/java.lang.Thread.run(Thread.java:834)Caused by: org.apache.solr.common.SolrException: javax.crypto.BadPaddingException: RSA private key operation failed at org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:325) at org.apache.solr.security.PKIAuthenticationPlugin.generateToken(PKIAuthenticationPlugin.java:305) at {code} Somehow PKI plugin is trying to agree on PKI auth between the two nodes, but fail. However, you have explicitly enabled {{forwardCredentials=true}}, so PKI should not have been used here, instead the basic auth header should have been sent to the other node. My guess is that there is a bug when using Http2SolrClient with forwardCredentials?? [~ichattopadhyaya]? > Solr restore is failing when basic authentication is enabled > > > Key: SOLR-14575 > URL: https://issues.apache.org/jira/browse/SOLR-14575 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore >Affects Versions: 8.2 >Reporter: Yaswanth >Priority: Blocker > > Hi Team, > I was testing backup / restore for solrcloud and its failing exactly when I > am trying to restore a successfully backed up collection. > I am using solr 8.2 with basic authentication enabled and then creating a 2 > replica collection. When I run the backup like > curl -u xxx:xxx -k > '[https://x.x.x.x:8080/solr/admin/collections?action=BACKUP&name=test&collection=test&location=/solrdatabkup'|https://x.x.x.x:8080/solr/admin/collections?action=BACKUP&name=test&collection=test&location=/solrdatabkup%27] > it worked fine and I do see a folder was created with the collection name > under /solrdatabackup > But now when I deleted the existing collection and then try running restore > api like > curl -u xxx:xxx -k > '[https://x.x.x.x:8080/solr/admin/collections?action=RESTORE&name=test&collection=test&location=/solrdatabkup'|https://x.x.x.x:8080/solr/admin/collections?action=BACKUP&name=test&collection=test&location=/solrdatabkup%27] > its throwing an error like > { > "responseHeader":{ > "status":500, > "QTime":457}, > "Operation restore caused > *exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > ADDREPLICA failed to create replica",* > "exception":{ > "msg":"ADDREPLICA failed to create replica", > "rspCode":500}, > "error":{ > "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"ADDREPLICA failed to create replica", > "trace":"org.apache.solr.common.SolrException: ADDREPLICA failed to create > replica\n\tat > org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:280)\n\tat > > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:252)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat > > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:820)\n\tat > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:786)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:546)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:423)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:350)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope
[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140039#comment-17140039 ] Michael Sokolov commented on LUCENE-8962: - pushed [https://github.com/apache/lucene-solr/pull/1552] to master, and cherry-picked to branch_8x, resolving > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.6 > > Attachments: LUCENE-8962_demo.png, failed-tests.patch > > Time Spent: 18h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?
[ https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Sokolov resolved LUCENE-8962. - Resolution: Fixed > Can we merge small segments during refresh, for faster searching? > - > > Key: LUCENE-8962 > URL: https://issues.apache.org/jira/browse/LUCENE-8962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > Fix For: 8.6 > > Attachments: LUCENE-8962_demo.png, failed-tests.patch > > Time Spent: 18h 20m > Remaining Estimate: 0h > > With near-real-time search we ask {{IndexWriter}} to write all in-memory > segments to disk and open an {{IndexReader}} to search them, and this is > typically a quick operation. > However, when you use many threads for concurrent indexing, {{IndexWriter}} > will accumulate write many small segments during {{refresh}} and this then > adds search-time cost as searching must visit all of these tiny segments. > The merge policy would normally quickly coalesce these small segments if > given a little time ... so, could we somehow improve {{IndexWriter'}}s > refresh to optionally kick off merge policy to merge segments below some > threshold before opening the near-real-time reader? It'd be a bit tricky > because while we are waiting for merges, indexing may continue, and new > segments may be flushed, but those new segments shouldn't be included in the > point-in-time segments returned by refresh ... > One could almost do this on top of Lucene today, with a custom merge policy, > and some hackity logic to have the merge policy target small segments just > written by refresh, but it's tricky to then open a near-real-time reader, > excluding newly flushed but including newly merged segments since the refresh > originally finished ... > I'm not yet sure how best to solve this, so I wanted to open an issue for > discussion! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] tflobbe commented on a change in pull request #1574: SOLR-14566: Add request-ID to all distrib-search requests
tflobbe commented on a change in pull request #1574: URL: https://github.com/apache/lucene-solr/pull/1574#discussion_r442599895 ## File path: solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java ## @@ -500,6 +509,29 @@ public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throw } } + private void tagRequestWithRequestId(ResponseBuilder rb) { +String rid = getRequestId(rb.req); +if (StringUtils.isBlank(rb.req.getParams().get(CommonParams.REQUEST_ID))) { + ModifiableSolrParams params = new ModifiableSolrParams(rb.req.getParams()); + params.add(CommonParams.REQUEST_ID, rid);//add rid to the request so that shards see it + rb.req.setParams(params); +} +if (rb.isDistrib) { + rb.rsp.addToLog(CommonParams.REQUEST_ID, rid); //to see it in the logs of the landing core Review comment: Do we now want it also in the cordinator node? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14566) Record "NOW" on "coordinator" log messages
[ https://issues.apache.org/jira/browse/SOLR-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140136#comment-17140136 ] Tomas Eduardo Fernandez Lobbe commented on SOLR-14566: -- Agree with you, an isolated SearchComponent just for this sounds like too much. +1 for using SearchHandler. > Record "NOW" on "coordinator" log messages > -- > > Key: SOLR-14566 > URL: https://issues.apache.org/jira/browse/SOLR-14566 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, in SolrCore.java we log each search request that comes through > each core as it is finishing. This includes the path, query-params, QTime, > and status. In the case of a distributed search both the "coordinator" node > and each of the per-shard requests produce a log message. > When Solr is fielding many identical queries, such as those created by a > healthcheck or dashboard, it can be hard when examining logs to link the > per-shard requests with the "cooordinator" request that came in upstream. > One thing that would make this easier is if the {{NOW}} param added to > per-shard requests is also included in the log message from the > "coordinator". While {{NOW}} isn't unique strictly speaking, it often is in > practice, and along with the query-params would allow debuggers to associate > shard requests with coordinator requests a large majority of the time. > An alternative approach would be to create a {{qid}} or {{query-uuid}} when > the coordinator starts its work that can be logged everywhere. This provides > a stronger expectation around uniqueness, but would require UUID generation > on the coordinator, which may be non-negligible work at high QPS (maybe? I > have no idea). It also loses the neatness of reusing data already present on > the shard requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9411) Fail complation on warnings
[ https://issues.apache.org/jira/browse/LUCENE-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-9411: --- Attachment: LUCENE-9411.patch Status: Open (was: Open) The first step here is to get a successful compile with -Werror, and I'm starting with the gradle build. Actually, the very first step is getting a clean compile. I've been ignoring a couple of things but now I have to deal with them. The attached patch gets rid of a couple of warnings, apparently from dependencies. Specifically: {code:java} /Users/Erick/.gradle/caches/modules-2/files-2.1/org.apache.zookeeper/zookeeper/3.5.7/12bdf55ba8be7fc891996319d37f35eaad7e63ea/zookeeper-3.5.7.jar(/org/apache/zookeeper/ZooDefs$Ids.class): warning: Cannot find annotation method 'value()' in type 'SuppressFBWarnings' {code} and {code:java} /Users/Erick/.gradle/caches/modules-2/files-2.1/com.google.guava/guava/25.1-jre/6c57e4b22b44e89e548b5c9f70f0c45fe10fb0b4/guava-25.1-jre.jar(/com/google/common/collect/Multimap.class): warning: Cannot find annotation method 'value()' in type 'CompatibleWith' {code} I can put these either in the bulid.gradle or solr/build.gradle, solr/build.gradle seems best since they aren't part of Lucene. The patch shows them (it's a small patch, don't be scared). But I still get 6 warnings: {code:java} > Task :solr:solrj:compileJava warning: [rawtypes] found raw type: Map missing type arguments for generic class Map where K,V are type-variables: K extends Object declared in interface Map V extends Object declared in interface Map {code} Problem is that I have no idea at all where they come from. For all the other 8,000 warnings, the warnings were clearly identified with the file and line. Cranking the Gradle logging up to debug doesn't shed any light on the problem. Any clue how to find out what generates these would be appreciated. A secondary question is why, after a build, I have references in .gradle/caches to: {code:java} ./modules-2/files-2.1/com.google.code.findbugs ./modules-2/files-2.1/com.google.code.findbugs/jsr305 ./modules-2/files-2.1/com.google.code.findbugs/jsr305/1.3.9 ./modules-2/files-2.1/com.google.code.findbugs/jsr305/1.3.9/67ea333a3244bc20a17d6f0c29498071dfa409fc ./modules-2/files-2.1/com.google.code.findbugs/jsr305/1.3.9/67ea333a3244bc20a17d6f0c29498071dfa409fc/jsr305-1.3.9.pom ./modules-2/files-2.1/com.google.code.findbugs/jsr305/3.0.2 ./modules-2/files-2.1/com.google.code.findbugs/jsr305/3.0.2/25ea2e8b0c338a877313bd4672d3fe056ea78f0d ./modules-2/files-2.1/com.google.code.findbugs/jsr305/3.0.2/25ea2e8b0c338a877313bd4672d3fe056ea78f0d/jsr305-3.0.2.jar ./modules-2/files-2.1/com.google.code.findbugs/jsr305/3.0.2/8d93cdf4d84d7e1de736df607945c6df0730a10f ./modules-2/files-2.1/com.google.code.findbugs/jsr305/3.0.2/8d93cdf4d84d7e1de736df607945c6df0730a10f/jsr305-3.0.2.pom {code} but gradlew dependencies only lists 3.0.2. I can live without knowing, but if anyone knows off the top of their heads Similarly I have error_prone_annotations 2.1.3 and 2.3.4. But the attached patch gets rid of all the warnings so I'm not inclined to pursue that very far. > Fail complation on warnings > --- > > Key: LUCENE-9411 > URL: https://issues.apache.org/jira/browse/LUCENE-9411 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Labels: build > Attachments: LUCENE-9411.patch > > > Moving this over here from SOLR-11973 since it's part of the build system and > affects Lucene as well as Solr. You might want to see the discussion there. > We have a clean compile for both Solr and Lucene, no rawtypes, unchecked, > try, etc. warnings. There are some peculiar warnings (things like > SuppressFBWarnings, i.e. FindBugs) that I'm not sure about at all, but let's > assume those are not a problem. Now I'd like to start failing the compilation > if people write new code that generates warnings. > From what I can tell, just adding the flag is easy in both the Gradle and Ant > builds. I still have to prove out that adding -Werrors does what I expect, > i.e. succeeds now and fails when I introduce warnings. > But let's assume that works. Are there objections to this idea generally? I > hope to have some data by next Monday. > FWIW, the Lucene code base had far fewer issues than Solr, but > common-build.xml is in Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Klibisz updated LUCENE-9378: - Attachment: snapshots-v76x.nps hotspots-v76x.png hotspots-v77x.png snapshot-v77x.nps > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > Attachments: hotspots-v76x.png, hotspots-v77x.png, > image-2020-06-12-22-17-30-339.png, image-2020-06-12-22-17-53-961.png, > image-2020-06-12-22-18-24-527.png, image-2020-06-12-22-18-48-919.png, > snapshot-v77x.nps, snapshots-v76x.nps > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Klibisz updated LUCENE-9378: - Attachment: snapshots-v76x.nps > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, > hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, > image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, > image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, > snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Klibisz updated LUCENE-9378: - Attachment: snapshots-v76x.nps hotspots-v76x.png hotspots-v77x.png snapshot-v77x.nps > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, > hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, > image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, > image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, > snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Klibisz updated LUCENE-9378: - Attachment: hotspots-v76x.png hotspots-v77x.png > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, > hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, hotspots-v77x.png, > image-2020-06-12-22-17-30-339.png, image-2020-06-12-22-17-53-961.png, > image-2020-06-12-22-18-24-527.png, image-2020-06-12-22-18-48-919.png, > snapshot-v77x.nps, snapshot-v77x.nps, snapshot-v77x.nps, snapshots-v76x.nps, > snapshots-v76x.nps, snapshots-v76x.nps > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Klibisz updated LUCENE-9378: - Attachment: snapshot-v77x.nps > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, > hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, hotspots-v77x.png, > image-2020-06-12-22-17-30-339.png, image-2020-06-12-22-17-53-961.png, > image-2020-06-12-22-18-24-527.png, image-2020-06-12-22-18-48-919.png, > snapshot-v77x.nps, snapshot-v77x.nps, snapshot-v77x.nps, snapshots-v76x.nps, > snapshots-v76x.nps, snapshots-v76x.nps > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Klibisz updated LUCENE-9378: - Attachment: hotspots-v76x.png > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, > hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, hotspots-v77x.png, > image-2020-06-12-22-17-30-339.png, image-2020-06-12-22-17-53-961.png, > image-2020-06-12-22-18-24-527.png, image-2020-06-12-22-18-48-919.png, > snapshot-v77x.nps, snapshot-v77x.nps, snapshot-v77x.nps, snapshots-v76x.nps, > snapshots-v76x.nps, snapshots-v76x.nps > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Klibisz updated LUCENE-9378: - Attachment: hotspots-v77x.png > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, > hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, > hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, > image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, > image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, > snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Klibisz updated LUCENE-9378: - Attachment: hotspots-v76x.png > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, > hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, > hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, > image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, > image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, > snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140163#comment-17140163 ] Alex Klibisz commented on LUCENE-9378: -- [~jpountz] It's about 2x slower. I re-ran a benchmark to be sure. Here is the setup: * Storing a corpus of 18K binary vectors in a single shard. * Each vector contains ~500 ints denoting the positive indices. So each one is storing a bytearray of 500 * 4 = 2000 bytes in the binary doc values. * Running 2000 serial searches against these vectors. Each search reads, deserializes, and computes the Jaccard similarity against every vector in the corpus. So a total of 18K * 2K reads from the shard. * The read order is defined by Elasticsearch. Internally I'm using a FunctionScoreQuery, code here: [https://github.com/alexklibisz/elastiknn/blob/5246a26f76791362482a98066e31071cb03e0a74/plugin/src/main/scala/com/klibisz/elastiknn/query/ExactQuery.scala#L22-L29] * Ubuntu 20 on an Intel i7-8750H 2.20GHz x 12cores * Running Oracle Jdk 14 : ``` $ java -version java version "14" 2020-03-17 Java(TM) SE Runtime Environment (build 14+36-1461) Java HotSpot(TM) 64-Bit Server VM (build 14+36-1461, mixed mode, sharing) ``` * Running all 2000 searches once, then again, and reporting the time from second run (JVM warmup, etc.). Results: * Using Elasticsearch 7.6.2 w/ Lucene 8.4.0: ** 212 seconds for 2000 searches ** Search threads spend 95.5% of time computing similarities, 0.2% in the LZ4.decompress() method. * Using Elasticsearch 7.7.1 w/ Lucene 8.5.1: ** 445 seconds for 2000 searches ** Search threads spend 56% of total time computing similarities, 42% in the decompress method. VisualVM screenshot for 7.6.x: !hotspots-v76x.png! VisualVM screenshot for 7.7.x: !hotspots-v77x.png! Attaching snapshots from VisualVM: [^snapshots-v76x.nps] [^snapshot-v77x.nps] Thank you all for your help! :) > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Minor > Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, > hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, > hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, > image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, > image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, > snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org