[GitHub] [lucene-solr] dweiss commented on a change in pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
dweiss commented on a change in pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#discussion_r519618160 ## File path: lucene/native/build.gradle ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * This gets separated out from misc module into a native module due to incompatibility between cpp-library and java-library plugins. + * For details, please see https://github.com/gradle/gradle-native/issues/352#issuecomment-461724948 + */ +apply plugin: 'cpp-library' + +description = 'Module for native code' + +library { +baseName = 'NativePosixUtil' + +privateHeaders.from file(System.getProperty('java.home') + '/include') +privateHeaders.from file(System.getProperty('java.home') + '/include/darwin') +privateHeaders.from file(System.getProperty('java.home') + '/../include/solaris') Review comment: These paths are wrong, I think. I don't know where they came from but they should correspond to the layout of newer JDK distributions (openjdk). I'm not sure Solaris is still needed - I can't test it on Solaris. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
dweiss commented on pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-723842876 bq. The windows equivalentWindowsDirectory.cpp still sits in misc module and hasn't been moved over yet. Ah... I didn't realize this is the case then - sorry. Give me a day or two, I'll try to make those files compile under Windows and maybe we can do a clean patch that moves everything. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
bruno-roustant commented on a change in pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r519684814 ## File path: solr/core/src/java/org/apache/solr/core/SolrCores.java ## @@ -490,21 +492,19 @@ public CoreDescriptor getCoreDescriptor(String coreName) { } /** - * Get the CoreDescriptors for every SolrCore managed here - * @return a List of CoreDescriptors + * Get the CoreDescriptors for every {@link SolrCore} managed here (permanent and transient, loaded and unloaded). + * + * @return An unordered list copy. This list can be modified by the caller (e.g. sorted). Review comment: I'll use ArrayList. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values
[ https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228512#comment-17228512 ] Andrzej Bialecki commented on SOLR-14683: - {quote}Solr's JSON Response writer already has long standing support to output {{Float.NaN}} as a quoted string {{"NaN"}} {quote} Therein lies the problem ;) since there is no standard way to do it Solr decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries (and the popular extended spec [http://json5.org|http://json5.org%29/]) use unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, inf, -inf}} - that's the beauty of standards, there are so many of them to choose from... /s Taking all this into account returning {{null}} for NaN or undefined seems like the safest option. > Review the metrics API to ensure consistent placeholders for missing values > --- > > Key: SOLR-14683 > URL: https://issues.apache.org/jira/browse/SOLR-14683 > Project: Solr > Issue Type: Improvement > Components: metrics >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > > Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an > unknown state at some points in time, eg. during SolrCore startup or shutdown. > Currently the API returns placeholders with either impossible values for > numeric gauges (such as index size -1) or empty maps / strings for other > non-numeric gauges. > [~hossman] noticed that the values for these placeholders may be misleading, > depending on how the user treats them - if the client has no special logic to > treat them as "missing values" it may erroneously treat them as valid data. > E.g. numeric values of -1 or 0 may severely skew averages and produce > misleading peaks / valleys in metrics histories. > On the other hand returning a literal {{null}} value instead of the expected > number may also cause unexpected client issues - although in this case it's > clearer that there's actually no data available, so long-term this may be a > better strategy than returning impossible values, even if it means that the > client should learn to handle {{null}} values appropriately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values
[ https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228512#comment-17228512 ] Andrzej Bialecki edited comment on SOLR-14683 at 11/9/20, 10:57 AM: {quote}Solr's JSON Response writer already has long standing support to output {{Float.NaN}} as a quoted string {{"NaN"}} {quote} Therein lies the problem ;) since there is no standard way to do it Solr decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries (and the popular extended spec [http://json5.org|http://json5.org%29/]) use unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, inf, -inf}} - that's the beauty of standards, there are so many of them to choose from... /s Taking all this into account serializing NaN as {{null}} seems like the safest option, unless we add this configurability to our JSONWriter. Also, since from the point of view of metrics it seems it conveys the same message when it returns NaN or null when the value is unknown - so for simplicity and easier compatibility we could always return {{null}} as a metric value, regardless of how it's serialized. was (Author: ab): {quote}Solr's JSON Response writer already has long standing support to output {{Float.NaN}} as a quoted string {{"NaN"}} {quote} Therein lies the problem ;) since there is no standard way to do it Solr decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries (and the popular extended spec [http://json5.org|http://json5.org%29/]) use unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, inf, -inf}} - that's the beauty of standards, there are so many of them to choose from... /s Taking all this into account returning {{null}} for NaN or undefined seems like the safest option. > Review the metrics API to ensure consistent placeholders for missing values > --- > > Key: SOLR-14683 > URL: https://issues.apache.org/jira/browse/SOLR-14683 > Project: Solr > Issue Type: Improvement > Components: metrics >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > > Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an > unknown state at some points in time, eg. during SolrCore startup or shutdown. > Currently the API returns placeholders with either impossible values for > numeric gauges (such as index size -1) or empty maps / strings for other > non-numeric gauges. > [~hossman] noticed that the values for these placeholders may be misleading, > depending on how the user treats them - if the client has no special logic to > treat them as "missing values" it may erroneously treat them as valid data. > E.g. numeric values of -1 or 0 may severely skew averages and produce > misleading peaks / valleys in metrics histories. > On the other hand returning a literal {{null}} value instead of the expected > number may also cause unexpected client issues - although in this case it's > clearer that there's actually no data available, so long-term this may be a > better strategy than returning impossible values, even if it means that the > client should learn to handle {{null}} values appropriately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values
[ https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228512#comment-17228512 ] Andrzej Bialecki edited comment on SOLR-14683 at 11/9/20, 10:58 AM: {quote}Solr's JSON Response writer already has long standing support to output {{Float.NaN}} as a quoted string {{"NaN"}} {quote} Therein lies the problem ;) since there is no standard way to do it Solr decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries (and the popular extended spec [http://json5.org|http://json5.org%29/]) use unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, inf, -inf}} - that's the beauty of standards, there are so many of them to choose from... /s Taking all this into account serializing NaN as {{null}} seems like the safest option, unless we add this configurability to our JSONWriter. Also, from the point of view of metrics it seems it conveys the same message when it returns NaN or null when the value is unknown - so for simplicity and easier compatibility we could always return {{null}} as a metric value, regardless of how it's serialized. was (Author: ab): {quote}Solr's JSON Response writer already has long standing support to output {{Float.NaN}} as a quoted string {{"NaN"}} {quote} Therein lies the problem ;) since there is no standard way to do it Solr decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries (and the popular extended spec [http://json5.org|http://json5.org%29/]) use unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, inf, -inf}} - that's the beauty of standards, there are so many of them to choose from... /s Taking all this into account serializing NaN as {{null}} seems like the safest option, unless we add this configurability to our JSONWriter. Also, since from the point of view of metrics it seems it conveys the same message when it returns NaN or null when the value is unknown - so for simplicity and easier compatibility we could always return {{null}} as a metric value, regardless of how it's serialized. > Review the metrics API to ensure consistent placeholders for missing values > --- > > Key: SOLR-14683 > URL: https://issues.apache.org/jira/browse/SOLR-14683 > Project: Solr > Issue Type: Improvement > Components: metrics >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > > Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an > unknown state at some points in time, eg. during SolrCore startup or shutdown. > Currently the API returns placeholders with either impossible values for > numeric gauges (such as index size -1) or empty maps / strings for other > non-numeric gauges. > [~hossman] noticed that the values for these placeholders may be misleading, > depending on how the user treats them - if the client has no special logic to > treat them as "missing values" it may erroneously treat them as valid data. > E.g. numeric values of -1 or 0 may severely skew averages and produce > misleading peaks / valleys in metrics histories. > On the other hand returning a literal {{null}} value instead of the expected > number may also cause unexpected client issues - although in this case it's > clearer that there's actually no data available, so long-term this may be a > better strategy than returning impossible values, even if it means that the > client should learn to handle {{null}} values appropriately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14969) Prevent creating multiple cores with the same name which leads to instabilities (race condition)
[ https://issues.apache.org/jira/browse/SOLR-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228529#comment-17228529 ] Andreas Hubold commented on SOLR-14969: --- I've tested concurrent core creation for branch_8x, and can confirm that the bug is fixed :) Thanks a lot for fixing this issue, [~erickerickson]. However, logging is quite ugly. Maybe it would had been better to throw a SolrException with ErrorCode.CONFLICT in case of concurrent core creation, and not the currently used ErrorCode.SERVER_ERROR. If I understand logging in RequestHandlerBase correctly, this would avoid the ERROR message with stack trace that appears in addition to the WARN message. FYI, maybe this is still worth changing. {noformat} 2020-11-09 11:10:03.104 WARN (qtp1033348658-69) [ x:test-0.5753008886962022-71] o.a.s.c.CoreContainer Already creating a core with name 'test-0.5753008886962022-71', call aborted ' 2020-11-09 11:10:03.104 ERROR (qtp1033348658-126) [ x:test-0.5753008886962022-71] o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Already creating a core with name 'test-0.5753008886962022-71', call aborted ' at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1284) at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214) ...{noformat} > Prevent creating multiple cores with the same name which leads to > instabilities (race condition) > > > Key: SOLR-14969 > URL: https://issues.apache.org/jira/browse/SOLR-14969 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: multicore >Affects Versions: 8.6, 8.6.3 >Reporter: Andreas Hubold >Assignee: Erick Erickson >Priority: Major > Fix For: 8.8 > > Attachments: CmCoreAdminHandler.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > CoreContainer#create does not correctly handle concurrent requests to create > the same core. There's a race condition (see also existing TODO comment in > the code), and CoreContainer#createFromDescriptor may be called subsequently > for the same core name. > The _second call_ then fails to create an IndexWriter, and exception handling > causes an inconsistent CoreContainer state. > {noformat} > 2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [ ] > o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error > CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core > [blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual > machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312) > at > org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95) > at > org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367) > ... > Caused by: org.apache.solr.common.SolrException: Unable to create core > [blueprint_acgqqafsogyc_comments] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273) > ... 47 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.(SolrCore.java:1071) > at org.apache.solr.core.SolrCore.(SolrCore.java:906) > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387) > ... 48 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308) > at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130) > at org.apache.solr.core.SolrCore.(SolrCore.java:1012) > ... 50 more > Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by > this virtual machine: > /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock > at > org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139) > at > org.apache.lu
[GitHub] [lucene-solr] bruno-roustant commented on pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
bruno-roustant commented on pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#issuecomment-723960636 Ok, now I think I integrated all comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson commented on pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
ErickErickson commented on pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#issuecomment-723990436 +1 > On Nov 8, 2020, at 3:30 PM, David Smiley wrote: > > > So @dsmiley @ErickErickson do we all agree that we throw an exception in in TransientSolrCoreCacheFactory.newInstance() line 60 and we never return null? > > +1 definitely > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or unsubscribe. > This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
dsmiley commented on a change in pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r519792336 ## File path: solr/core/src/java/org/apache/solr/core/SolrCores.java ## @@ -121,8 +101,8 @@ protected void close() { // make a copy of the cores then clear the map so the core isn't handed out to a request again coreList.addAll(cores.values()); cores.clear(); -if (transientSolrCoreCache != null) { - coreList.addAll(transientSolrCoreCache.prepareForShutdown()); +if (transientSolrCoreCacheFactory != null) { Review comment: Do we still need a null check here? And why the factory vs the cache itself? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14969) Prevent creating multiple cores with the same name which leads to instabilities (race condition)
[ https://issues.apache.org/jira/browse/SOLR-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228572#comment-17228572 ] ASF subversion and git services commented on SOLR-14969: Commit be19432b750b94c4703ee7b19ef681ebf771a95a in lucene-solr's branch refs/heads/master from Erick Erickson [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=be19432 ] SOLR-14969: Prevent creating multiple cores with the same name which leads to instabilities (race condition) changed error code > Prevent creating multiple cores with the same name which leads to > instabilities (race condition) > > > Key: SOLR-14969 > URL: https://issues.apache.org/jira/browse/SOLR-14969 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: multicore >Affects Versions: 8.6, 8.6.3 >Reporter: Andreas Hubold >Assignee: Erick Erickson >Priority: Major > Fix For: 8.8 > > Attachments: CmCoreAdminHandler.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > CoreContainer#create does not correctly handle concurrent requests to create > the same core. There's a race condition (see also existing TODO comment in > the code), and CoreContainer#createFromDescriptor may be called subsequently > for the same core name. > The _second call_ then fails to create an IndexWriter, and exception handling > causes an inconsistent CoreContainer state. > {noformat} > 2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [ ] > o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error > CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core > [blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual > machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312) > at > org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95) > at > org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367) > ... > Caused by: org.apache.solr.common.SolrException: Unable to create core > [blueprint_acgqqafsogyc_comments] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273) > ... 47 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.(SolrCore.java:1071) > at org.apache.solr.core.SolrCore.(SolrCore.java:906) > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387) > ... 48 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308) > at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130) > at org.apache.solr.core.SolrCore.(SolrCore.java:1012) > ... 50 more > Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by > this virtual machine: > /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock > at > org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139) > at > org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) > at > org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) > at > org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:785) > at > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:126) > at > org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100) > at > org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:261) > at > org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:135) > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2145) > {noformat} > CoreContainer#createFromDescriptor removes the CoreDescriptor when handling > this exception. The SolrCore created for the first successful call is still > registered in SolrCores.cores, but now there's no corresponding > CoreDescriptor for that name anymore. > This inconsistency leads to subsequent NullPointerExceptions, for example > when using CoreAdmin STATUS with the core name: > CoreAdminOperation#getCoreStatus first
[jira] [Commented] (SOLR-14969) Prevent creating multiple cores with the same name which leads to instabilities (race condition)
[ https://issues.apache.org/jira/browse/SOLR-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228583#comment-17228583 ] ASF subversion and git services commented on SOLR-14969: Commit 91ef1c0fe8854db04e42b9095437b3186ff8038e in lucene-solr's branch refs/heads/branch_8x from Erick Erickson [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=91ef1c0 ] SOLR-14969: Prevent creating multiple cores with the same name which leads to instabilities (race condition) changed error code (cherry picked from commit be19432b750b94c4703ee7b19ef681ebf771a95a) > Prevent creating multiple cores with the same name which leads to > instabilities (race condition) > > > Key: SOLR-14969 > URL: https://issues.apache.org/jira/browse/SOLR-14969 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: multicore >Affects Versions: 8.6, 8.6.3 >Reporter: Andreas Hubold >Assignee: Erick Erickson >Priority: Major > Fix For: 8.8 > > Attachments: CmCoreAdminHandler.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > CoreContainer#create does not correctly handle concurrent requests to create > the same core. There's a race condition (see also existing TODO comment in > the code), and CoreContainer#createFromDescriptor may be called subsequently > for the same core name. > The _second call_ then fails to create an IndexWriter, and exception handling > causes an inconsistent CoreContainer state. > {noformat} > 2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [ ] > o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error > CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core > [blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual > machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312) > at > org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95) > at > org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367) > ... > Caused by: org.apache.solr.common.SolrException: Unable to create core > [blueprint_acgqqafsogyc_comments] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273) > ... 47 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.(SolrCore.java:1071) > at org.apache.solr.core.SolrCore.(SolrCore.java:906) > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387) > ... 48 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308) > at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130) > at org.apache.solr.core.SolrCore.(SolrCore.java:1012) > ... 50 more > Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by > this virtual machine: > /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock > at > org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139) > at > org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) > at > org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) > at > org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:785) > at > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:126) > at > org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100) > at > org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:261) > at > org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:135) > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2145) > {noformat} > CoreContainer#createFromDescriptor removes the CoreDescriptor when handling > this exception. The SolrCore created for the first successful call is still > registered in SolrCores.cores, but now there's no corresponding > CoreDescriptor for that name anymore. > This inconsistency leads to subsequent NullPointerExceptions, for example > when using CoreA
[jira] [Commented] (SOLR-14969) Prevent creating multiple cores with the same name which leads to instabilities (race condition)
[ https://issues.apache.org/jira/browse/SOLR-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228593#comment-17228593 ] Andreas Hubold commented on SOLR-14969: --- Works, no logged ERROR messages anymore. Thank you. > Prevent creating multiple cores with the same name which leads to > instabilities (race condition) > > > Key: SOLR-14969 > URL: https://issues.apache.org/jira/browse/SOLR-14969 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: multicore >Affects Versions: 8.6, 8.6.3 >Reporter: Andreas Hubold >Assignee: Erick Erickson >Priority: Major > Fix For: 8.8 > > Attachments: CmCoreAdminHandler.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > CoreContainer#create does not correctly handle concurrent requests to create > the same core. There's a race condition (see also existing TODO comment in > the code), and CoreContainer#createFromDescriptor may be called subsequently > for the same core name. > The _second call_ then fails to create an IndexWriter, and exception handling > causes an inconsistent CoreContainer state. > {noformat} > 2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [ ] > o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error > CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core > [blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual > machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312) > at > org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95) > at > org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367) > ... > Caused by: org.apache.solr.common.SolrException: Unable to create core > [blueprint_acgqqafsogyc_comments] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273) > ... 47 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.(SolrCore.java:1071) > at org.apache.solr.core.SolrCore.(SolrCore.java:906) > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387) > ... 48 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308) > at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130) > at org.apache.solr.core.SolrCore.(SolrCore.java:1012) > ... 50 more > Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by > this virtual machine: > /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock > at > org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139) > at > org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) > at > org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) > at > org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105) > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:785) > at > org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:126) > at > org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100) > at > org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:261) > at > org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:135) > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2145) > {noformat} > CoreContainer#createFromDescriptor removes the CoreDescriptor when handling > this exception. The SolrCore created for the first successful call is still > registered in SolrCores.cores, but now there's no corresponding > CoreDescriptor for that name anymore. > This inconsistency leads to subsequent NullPointerExceptions, for example > when using CoreAdmin STATUS with the core name: > CoreAdminOperation#getCoreStatus first gets the non-null SolrCore > (cores.getCore(cname)) but core.getInstancePath() throws an NPE, because the > CoreDescriptor is not registered anymore: > {noformat} > 2020-10-27 00:29:25.353 INFO (qtp2029754983-19) [ ] o.a.s.s.HttpSolrCall > [admin] webapp=null path=/admin/cores > para
[jira] [Created] (SOLR-14991) tag and remove obsolete branches
Erick Erickson created SOLR-14991: - Summary: tag and remove obsolete branches Key: SOLR-14991 URL: https://issues.apache.org/jira/browse/SOLR-14991 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Erick Erickson Assignee: Erick Erickson I'm going to gradually work through the branches, tagging and removing 1> anything with a Jira name that's fixed 2> anything that I'm certain will never be fixed (e.g. the various gradle build branches) So the changes will still available, they just won't pollute the branch list. I'll list the branches here, all the tags will be history/branches/lucene-solr/ This specifically will _not_ include 1> any release, e.g. branch_8_4 2> anything I'm unsure about. People who've created branches should expect some pings about this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory
mikemccand commented on a change in pull request #2052: URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r519868079 ## File path: lucene/misc/src/java/org/apache/lucene/store/DirectIODirectory.java ## @@ -66,45 +66,32 @@ * * @lucene.experimental */ -public class NativeUnixDirectory extends FSDirectory { +public class DirectIODirectory extends FSDirectory { // TODO: this is OS dependent, but likely 512 is the LCD private final static long ALIGN = 512; private final static long ALIGN_NOT_MASK = ~(ALIGN-1); - - /** Default buffer size before writing to disk (256 KB); - * larger means less IO load but more RAM and direct - * buffer storage space consumed during merging. */ - - public final static int DEFAULT_MERGE_BUFFER_SIZE = 262144; /** Default min expected merge size before direct IO is * used (10 MB): */ public final static long DEFAULT_MIN_BYTES_DIRECT = 10*1024*1024; - private final int mergeBufferSize; private final long minBytesDirect; private final Directory delegate; /** Create a new NIOFSDirectory for the named location. * * @param path the path of the directory - * @param lockFactory to use - * @param mergeBufferSize Size of buffer to use for - *merging. See {@link #DEFAULT_MERGE_BUFFER_SIZE}. * @param minBytesDirect Merges, or files to be opened for * reading, smaller than this will * not use direct IO. See {@link * #DEFAULT_MIN_BYTES_DIRECT} + * @param lockFactory to use * @param delegate fallback Directory for non-merges * @throws IOException If there is a low-level I/O error */ - public NativeUnixDirectory(Path path, int mergeBufferSize, long minBytesDirect, LockFactory lockFactory, Directory delegate) throws IOException { + public DirectIODirectory(Path path, long minBytesDirect, LockFactory lockFactory, Directory delegate) throws IOException { super(path, lockFactory); -if ((mergeBufferSize & ALIGN) != 0) { - throw new IllegalArgumentException("mergeBufferSize must be 0 mod " + ALIGN + " (got: " + mergeBufferSize + ")"); -} -this.mergeBufferSize = mergeBufferSize; Review comment: Hmm but previously it was a 256 KB buffer, by default, and caller could change that if they wanted. But with this change, it's now hardwired to something much smaller (512 bytes, or 1 or 4 KB; I'm not sure what "typical" filesystem block sizes are now?). This buffering, and its size, is really important when using direct IO because every write will go straight to the device, so a larger buffer amortizes the cost of such writes. I think we need to keep the option for caller to set this buffer size, and leave it at the 256 KB default? Or at least, let's not try to change that behavior here, and leave this change 100% focused on moving to pure java implementation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2022: LUCENE-9004: KNN vector search using NSW graphs
mikemccand commented on a change in pull request #2022: URL: https://github.com/apache/lucene-solr/pull/2022#discussion_r519873455 ## File path: lucene/core/src/java/org/apache/lucene/index/VectorValues.java ## @@ -74,6 +74,18 @@ public BytesRef binaryValue() throws IOException { throw new UnsupportedOperationException(); } + /** + * Return the k nearest neighbor documents as determined by comparison of their vector values + * for this field, to the given vector, by the field's search strategy. If the search strategy is + * reversed, lower values indicate nearer vectors, otherwise higher scores indicate nearer + * vectors. Unlike relevance scores, vector scores may be negative. + * @param target the vector-valued query + * @param k the number of docs to return + * @param fanout control the accuracy/speed tradeoff - larger values give better recall at higher cost Review comment: > Don't Codecs get created automatically using no-args constructors and service autodiscovery? They do at read (search) time! But at write time, you can pass parameters that alter how the Codec does its work, as long as the resulting index is then readable at search time with no-args constructors. I vaguely remember talking about having ways for Codec at read-time to also take options, but I'm not sure that was ever fully designed / pushed ... @s1monw may remember? > But I'm reluctant to expose hnsw-specific hyperparameters in VectorField, which we want to support other algorithms as well. > Might be a good use case for generic IndexedField.attributes? Yeah, maybe? I agree it is not obvious where the API should live and how it then finds its way into the ANN data structure construction when writing each segment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228642#comment-17228642 ] Michael McCandless commented on LUCENE-9378: {quote}IMO this is so definitely not a "Minor" matter so I changed it to our default of Major. {quote} +1, thanks [~dsmiley]. This is a major issue for us at Amazon – we are now running the catalog search using a custom Codec that forces (reverts) the whole {{BinaryDocValues}} writing/reading back to before LUCENE-9211, which is not really a comfortable long-term solution. I am hoping that the [ideas being discussed in the PR|https://github.com/apache/lucene-solr/pull/1543] lead to an acceptable solution. I think whether or not {{BinaryDocValues}} fields should be compressed will be very application dependent. Some applications care greatly about the size of the index, and can accept a small hit to search time performance, but for others (like Amazon's!) it is the opposite. > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Major > Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, > hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, > hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, > image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, > image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, > snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps > > Time Spent: 4h 10m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.
mikemccand commented on pull request #1543: URL: https://github.com/apache/lucene-solr/pull/1543#issuecomment-724069358 > > But then I wonder why not just add a boolean compress option to Lucene80DocValuesFormat? This is similar to the compression Mode we pass to stored fields and term vectors format at write time, and it'd allow users who would like to disable BINARY doc values compression to keep backwards compatibility. > > I wanted to look into whether we could avoid this as it would boil down to maintaining two doc-value formats, but this might be the best way forward as it looks like the heuristics we tried out above don't work well to disable compression for use-cases when it hurts more than it helps. +1. I'm afraid whether compression is a good idea for BDV or not is a very application specific tradeoff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2022: LUCENE-9004: KNN vector search using NSW graphs
mikemccand commented on a change in pull request #2022: URL: https://github.com/apache/lucene-solr/pull/2022#discussion_r519881970 ## File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java ## @@ -0,0 +1,235 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.util.hnsw; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.HashSet; +import java.util.List; +import java.util.Random; +import java.util.Set; +import java.util.TreeSet; + +import org.apache.lucene.index.KnnGraphValues; +import org.apache.lucene.index.VectorValues; + +import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS; +import static org.apache.lucene.util.VectorUtil.dotProduct; +import static org.apache.lucene.util.VectorUtil.squareDistance; + +/** + * Navigable Small-world graph. Provides efficient approximate nearest neighbor + * search for high dimensional vectors. See https://doi.org/10.1016/j.is.2013.10.006";>Approximate nearest + * neighbor algorithm based on navigable small world graphs [2014] and https://arxiv.org/abs/1603.09320";>this paper [2018] for details. + * + * This implementation is actually more like the one in the same authors' earlier 2014 paper in that + * there is no hierarchy (just one layer), and no fanout restriction on the graph: nodes are allowed to accumulate + * an unbounded number of outbound links, but it does incorporate some of the innovations of the later paper, like + * using a priority queue to perform a beam search while traversing the graph. The nomenclature is a bit different + * here from what's used in those papers: + * + * Hyperparameters + * + * numSeed is the equivalent of m in the 2012 paper; it controls the number of random entry points to sample. + * beamWidth in {@link HnswGraphBuilder} has the same meaning as efConst in the 2016 paper. It is the number of + * nearest neighbor candidates to track while searching the graph for each newly inserted node. + * maxConn has the same meaning as M in the later paper; it controls how many of the efConst neighbors are + * connected to the new node + * fanout the fanout parameter of {@link VectorValues#search(float[], int, int)} + * is used to control the values of numSeed and topK that are passed to this API. + * Thus fanout is like a combination of ef (search beam width) from the 2016 paper and m from the 2014 paper. + * + * + * + * Note: The graph may be searched by multiple threads concurrently, but updates are not thread-safe. Also note: there is no notion of + * deletions. Document searching built on top of this must do its own deletion-filtering. + */ +public final class HnswGraph { + + // each entry lists the neighbors of a node, in node order + private final List> graph; + + HnswGraph() { Review comment: Yeah, +1 for fast follow! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228665#comment-17228665 ] Adrien Grand commented on LUCENE-9378: -- I'll be looking into it for 8.8. > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Major > Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, > hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, > hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, > image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, > image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, > snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov merged pull request #2037: LUCENE-9583: extract separate RandomAccessVectorValues interface
msokolov merged pull request #2037: URL: https://github.com/apache/lucene-solr/pull/2037 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?
[ https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228677#comment-17228677 ] ASF subversion and git services commented on LUCENE-9583: - Commit 8be0cea5442c2edab260d0598b920ba832506f80 in lucene-solr's branch refs/heads/master from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8be0cea ] LUCENE-9583: extract separate RandomAccessVectorValues interface (#2037) > How should we expose VectorValues.RandomAccess? > --- > > Key: LUCENE-9583 > URL: https://issues.apache.org/jira/browse/LUCENE-9583 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} > sub-interface. [~jtibshirani] pointed out this is not needed by some > vector-indexing strategies which can operate solely using a forward-iterator > (it is needed by HNSW), and so in the interest of simplifying the public API > we should not expose this internal detail (which by the way surfaces internal > ordinals that are somewhat uninteresting outside the random access API). > I looked into how to move this inside the HNSW-specific code and remembered > that we do also currently make use of the RA API when merging vector fields > over sorted indexes. Without it, we would need to load all vectors into RAM > while flushing/merging, as we currently do in > {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost > for the simpler API. > Another thing I noticed while reviewing this is that I moved the KNN > {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}} > to {{VectorValues.RandomAccess}}. This I think we could move back, and > handle the HNSW requirements for search elsewhere. I wonder if that would > alleviate the major concern here? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2022: LUCENE-9004: KNN vector search using NSW graphs
mikemccand commented on a change in pull request #2022: URL: https://github.com/apache/lucene-solr/pull/2022#discussion_r519895947 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90VectorReader.java ## @@ -22,39 +22,60 @@ import java.nio.FloatBuffer; import java.util.HashMap; import java.util.Map; +import java.util.Random; import org.apache.lucene.codecs.CodecUtil; import org.apache.lucene.codecs.VectorReader; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.FieldInfo; import org.apache.lucene.index.FieldInfos; import org.apache.lucene.index.IndexFileNames; +import org.apache.lucene.index.KnnGraphValues; +import org.apache.lucene.index.RandomAccessVectorValues; +import org.apache.lucene.index.RandomAccessVectorValuesProducer; import org.apache.lucene.index.SegmentReadState; import org.apache.lucene.index.VectorValues; +import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; +import org.apache.lucene.search.TotalHits; import org.apache.lucene.store.ChecksumIndexInput; +import org.apache.lucene.store.DataInput; import org.apache.lucene.store.IndexInput; import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.IOUtils; import org.apache.lucene.util.RamUsageEstimator; +import org.apache.lucene.util.hnsw.HnswGraph; +import org.apache.lucene.util.hnsw.Neighbor; +import org.apache.lucene.util.hnsw.Neighbors; + +import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS; /** - * Reads vectors from the index segments. + * Reads vectors from the index segments along with index data structures supporting KNN search. * @lucene.experimental */ public final class Lucene90VectorReader extends VectorReader { private final FieldInfos fieldInfos; private final Map fields = new HashMap<>(); private final IndexInput vectorData; - private final int maxDoc; + private final IndexInput vectorIndex; + private final long checksumSeed; Lucene90VectorReader(SegmentReadState state) throws IOException { this.fieldInfos = state.fieldInfos; -this.maxDoc = state.segmentInfo.maxDoc(); -String metaFileName = IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, Lucene90VectorFormat.META_EXTENSION); +int versionMeta = readMetadata(state, Lucene90VectorFormat.META_EXTENSION); +long[] checksumRef = new long[1]; +vectorData = openDataInput(state, versionMeta, Lucene90VectorFormat.VECTOR_DATA_EXTENSION, Lucene90VectorFormat.VECTOR_DATA_CODEC_NAME, checksumRef); +vectorIndex = openDataInput(state, versionMeta, Lucene90VectorFormat.VECTOR_INDEX_EXTENSION, Lucene90VectorFormat.VECTOR_INDEX_CODEC_NAME, checksumRef); +checksumSeed = checksumRef[0]; + } + + private int readMetadata(SegmentReadState state, String fileExtension) throws IOException { +String metaFileName = IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, fileExtension); int versionMeta = -1; +long checksum = -1; Review comment: Hmm is this unused? ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90VectorReader.java ## @@ -277,24 +351,46 @@ public long cost() { } @Override -public RandomAccess randomAccess() { +public RandomAccessVectorValues randomAccess() { return new OffHeapRandomAccess(dataIn.clone()); } +@Override +public TopDocs search(float[] vector, int topK, int fanout) throws IOException { + // use a seed that is fixed for the index so we get reproducible results for the same query + final Random random = new Random(checksumSeed); Review comment: Clever seed! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues
[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228681#comment-17228681 ] Michael McCandless commented on LUCENE-9378: {quote}I'll be looking into it for 8.8. {quote} +1, thank you [~jpountz]! > Configurable compression for BinaryDocValues > > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Viral Gandhi >Priority: Major > Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, > hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, > hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, > image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, > image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, > snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] asfgit closed pull request #972: SOLR-13452: Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.
asfgit closed pull request #972: URL: https://github.com/apache/lucene-solr/pull/972 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14991) tag and remove obsolete branches
[ https://issues.apache.org/jira/browse/SOLR-14991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228687#comment-17228687 ] Erick Erickson commented on SOLR-14991: --- I just did the grade branches, with the exception of *reference_impl_gradle_updates*. I'll wait until tomorrow to do any more to see if anyone sees any problems so far. > tag and remove obsolete branches > > > Key: SOLR-14991 > URL: https://issues.apache.org/jira/browse/SOLR-14991 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > > I'm going to gradually work through the branches, tagging and removing > 1> anything with a Jira name that's fixed > 2> anything that I'm certain will never be fixed (e.g. the various gradle > build branches) > So the changes will still available, they just won't pollute the branch list. > I'll list the branches here, all the tags will be > history/branches/lucene-solr/ > > This specifically will _not_ include > 1> any release, e.g. branch_8_4 > 2> anything I'm unsure about. People who've created branches should expect > some pings about this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
bruno-roustant commented on pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#issuecomment-724113057 @dsmiley I pushed a new commit to have a NO OP TransientSolrCacheFactory before SolrCores.load() is called. This is to have your opinion, whether to keep it or remove it. If SolrCores is used before calling SolrCores.load() it results in a SolrException in getTransientCacheHandler(). This may happen for current users that previously didn't have to call load() at the very beginning after SolrCores creation. Do you think it could be a behavior backward incompatibility? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values
[ https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-14683: Attachment: SOLR-14683.patch > Review the metrics API to ensure consistent placeholders for missing values > --- > > Key: SOLR-14683 > URL: https://issues.apache.org/jira/browse/SOLR-14683 > Project: Solr > Issue Type: Improvement > Components: metrics >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-14683.patch > > > Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an > unknown state at some points in time, eg. during SolrCore startup or shutdown. > Currently the API returns placeholders with either impossible values for > numeric gauges (such as index size -1) or empty maps / strings for other > non-numeric gauges. > [~hossman] noticed that the values for these placeholders may be misleading, > depending on how the user treats them - if the client has no special logic to > treat them as "missing values" it may erroneously treat them as valid data. > E.g. numeric values of -1 or 0 may severely skew averages and produce > misleading peaks / valleys in metrics histories. > On the other hand returning a literal {{null}} value instead of the expected > number may also cause unexpected client issues - although in this case it's > clearer that there's actually no data available, so long-term this may be a > better strategy than returning impossible values, even if it means that the > client should learn to handle {{null}} values appropriately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0
[ https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228695#comment-17228695 ] Kevin Risden commented on SOLR-14951: - Sigh I missed merging it. I just saw the review and will get it merged soon. > Upgrade Angular JS 1.7.9 to 1.8.0 > - > > Key: SOLR-14951 > URL: https://issues.apache.org/jira/browse/SOLR-14951 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: Kevin Risden >Assignee: Kevin Risden >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Angular JS released 1.8.0 to fix some security vulnerabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0
[ https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Risden updated SOLR-14951: Fix Version/s: 8.8 > Upgrade Angular JS 1.7.9 to 1.8.0 > - > > Key: SOLR-14951 > URL: https://issues.apache.org/jira/browse/SOLR-14951 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: Kevin Risden >Assignee: Kevin Risden >Priority: Major > Fix For: 8.8 > > Time Spent: 10m > Remaining Estimate: 0h > > Angular JS released 1.8.0 to fix some security vulnerabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r519944462 ## File path: solr/solr-ref-guide/src/learning-to-rank.adoc ## @@ -247,6 +254,81 @@ The output XML will include feature values as a comma-separated list, resembling }} +=== Running a Rerank Query Interleaving Two Models + +To rerank the results of a query, interleaving two models (myModelA, myModelB) add the `rq` parameter to your search, passing two models in input, for example: + +[source,text] +http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score + +To obtain the model that interleaving picked for a search result, computed during reranking, add `[interleaving]` to the `fl` parameter, for example: Review comment: question: if myModelA had `[ doc1, doc2, doc3 ]` document order and myModelB had `[ doc1, doc3, doc2 ]` document order i.e. there was agreement between the models re: the first document, will `[interleaving]` return (1) randomly `myModelA` or `myModelB` depending on how the picking actually happened or will it return (2) something else e.g. `myModelA,myModelB` (if myModelA actually picked and myModelB agreed) or `myModelB,myModelA` (if myModelB actually picked and myModelA agreed) or will it return (3) neither since in a way neither of them picked the document since they both agreed on it? answer-ish: from recalling the implementation the answer is (1) i think though from a user's perspective perhaps it might be nice here to clarify here somehow around that? a subtle aspect being (if i understand things right) that `[features]` and `[interleaving]` could both be requested in the `fl` and whilst myModelA and myModelB might have agreed that `doc1` should be the first document they might have used very different features to arrived at that conclusion and their `score` value could also differ. ## File path: solr/solr-ref-guide/src/learning-to-rank.adoc ## @@ -247,6 +254,81 @@ The output XML will include feature values as a comma-separated list, resembling }} +=== Running a Rerank Query Interleaving Two Models + +To rerank the results of a query, interleaving two models (myModelA, myModelB) add the `rq` parameter to your search, passing two models in input, for example: + +[source,text] +http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score + +To obtain the model that interleaving picked for a search result, computed during reranking, add `[interleaving]` to the `fl` parameter, for example: + +[source,text] +http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA model=myModelB reRankDocs=100}&fl=id,score,[interleaving] + +The output XML will include the model picked for each search result, resembling the output shown here: + +[source,json] + +{ + "responseHeader":{ +"status":0, +"QTime":0, +"params":{ + "q":"test", + "fl":"id,score,[interleaving]", + "rq":"{!ltr model=myModelA model=myModelB reRankDocs=100}"}}, + "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[ + { +"id":"GB18030TEST", +"score":1.0005897, +"[interleaving]":"myModelB"}, + { +"id":"UTF8TEST", +"score":0.79656565, +"[interleaving]":"myModelA"}] + }} + + +=== Running a Rerank Query Interleaving a model with the original ranking +When approaching Search Quality Evaluation with interleaving it may be useful to compare a model with the original ranking. +To rerank the results of a query, interleaving a model with the original ranking, add the `rq` parameter to your search, with a model in input and activating the original ranking interleaving, for example: + + +[source,text] +http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel model=_OriginalRanking_ reRankDocs=100}&fl=id,score Review comment: subjective: might `model=_OriginalRanking_ model=myModel` be more intuitive i.e. the 'from' baseline model on the left and the 'to' alternative model on the right? (i recall that the code had an "original ranking last" assumption before but if that's gone there's a possibility here to swap the order) ## File path: solr/solr-ref-guide/src/learning-to-rank.adoc ## @@ -418,6 +500,14 @@ Learning-To-Rank is a contrib module and therefore its plugins must be configure +* Declaration of the `[interleaving]` transformer. ++ +[source,xml] + + + Review comment: minor/subjective: could shorten since there's no parameters ``` ``` ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
cpoerschke commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r519967315 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.ltr.interleaving; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.LinkedHashSet; +import java.util.Random; +import java.util.Set; + +import org.apache.lucene.search.ScoreDoc; + +/** + * Interleaving was introduced the first time by Joachims in [1, 2]. + * Team Draft Interleaving is among the most successful and used interleaving approaches[3]. + * Here the authors implement a method similar to the way in which captains select their players in team-matches. + * Team Draft Interleaving produces a fair distribution of ranking models’ elements in the final interleaved list. + * It has also proved to overcome an issue of the previous implemented approach, Balanced interleaving, in determining the winning model[4]. Review comment: ```suggestion * "Team draft interleaving" has also proved to overcome an issue of the "Balanced interleaving" approach, in determining the winning model[4]. ``` Suggest to avoid the "previous implemented approach" wording since it could be misinterpreted to mean that Solr previously had a `BalancedInterleaving` class. ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.ltr.interleaving; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.LinkedHashSet; +import java.util.Random; +import java.util.Set; + +import org.apache.lucene.search.ScoreDoc; + +/** + * Interleaving was introduced the first time by Joachims in [1, 2]. + * Team Draft Interleaving is among the most successful and used interleaving approaches[3]. + * Here the authors implement a method similar to the way in which captains select their players in team-matches. Review comment: ```suggestion * Team Draft Interleaving implements a method similar to the way in which captains select their players in team-matches. ``` ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.ltr.interleaving; + +import java.util.ArrayList; +import java.util.HashSet; +import java.util.LinkedHashSet; +import java.util.Random; +imp
[jira] [Created] (SOLR-14992) TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures
Tomas Eduardo Fernandez Lobbe created SOLR-14992: Summary: TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures Key: SOLR-14992 URL: https://issues.apache.org/jira/browse/SOLR-14992 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Tomas Eduardo Fernandez Lobbe I've noticed this test started failing very frequently with an error like: {noformat} Error Message: Error from server at http://127.0.0.1:39037/solr: Cannot create collection pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of maxShardsPerNode is 1, and the number of nodes currently live or live and part of your createNodeSet is 3. This allows a maximum of 3 to be created. Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is 0 and value of pullReplicas is 1. This requires 4 shards to be created (higher than the allowed number) Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:39037/solr: Cannot create collection pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of maxShardsPerNode is 1, and the number of nodes currently live or live and part of your createNodeSet is 3. This allows a maximum of 3 to be created. Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is 0 and value of pullReplicas is 1. This requires 4 shards to be created (higher than the allowed number) at __randomizedtesting.SeedInfo.seed([3D670DC4BEABD958:3550EB0C6505ADD6]:0) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) at org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369) at org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297) at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1173) at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934) at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231) at org.apache.solr.cloud.TestPullReplicaErrorHandling.testCantConnectToPullReplica(TestPullReplicaErrorHandling.java:149) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832) at com.carrots
[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
muse-dev[bot] commented on a change in pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r519994863 ## File path: solr/core/src/java/org/apache/solr/core/SolrCores.java ## @@ -62,55 +51,44 @@ // to essentially queue them up to be handled via pendingCoreOps. private static final List pendingCloses = new ArrayList<>(); - private TransientSolrCoreCacheFactory transientCoreCache; + private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory = TransientSolrCoreCacheFactory.NO_OP; - private TransientSolrCoreCache transientSolrCoreCache = null; - SolrCores(CoreContainer container) { this.container = container; } protected void addCoreDescriptor(CoreDescriptor p) { synchronized (modifyLock) { if (p.isTransient()) { -if (getTransientCacheHandler() != null) { - getTransientCacheHandler().addTransientDescriptor(p.getName(), p); -} else { - log.warn("We encountered a core marked as transient, but there is no transient handler defined. This core will be inaccessible"); -} +getTransientCacheHandler().addTransientDescriptor(p.getName(), p); } else { -residentDesciptors.put(p.getName(), p); +residentDescriptors.put(p.getName(), p); } } } protected void removeCoreDescriptor(CoreDescriptor p) { synchronized (modifyLock) { if (p.isTransient()) { -if (getTransientCacheHandler() != null) { - getTransientCacheHandler().removeTransientDescriptor(p.getName()); -} +getTransientCacheHandler().removeTransientDescriptor(p.getName()); } else { -residentDesciptors.remove(p.getName()); +residentDescriptors.remove(p.getName()); } } } public void load(SolrResourceLoader loader) { -transientCoreCache = TransientSolrCoreCacheFactory.newInstance(loader, container); +transientSolrCoreCacheFactory = TransientSolrCoreCacheFactory.newInstance(loader, container); } + // We are shutting down. You can't hold the lock on the various lists of cores while they shut down, so we need to // make a temporary copy of the names and shut them down outside the lock. protected void close() { waitForLoadingCoresToFinish(30*1000); Collection coreList = new ArrayList<>(); - -TransientSolrCoreCache transientSolrCoreCache = getTransientCacheHandler(); -// Release observer -if (transientSolrCoreCache != null) { - transientSolrCoreCache.close(); -} +// Release transient core cache. +getTransientCacheHandler().close(); Review comment: *THREAD_SAFETY_VIOLATION:* Read/Write race. Non-private method `SolrCores.close()` indirectly reads without synchronization from `this.transientSolrCoreCacheFactory`. Potentially races with write in method `SolrCores.load(...)`. Reporting because this access may occur on a background thread. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API
[ https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228742#comment-17228742 ] ASF subversion and git services commented on LUCENE-9322: - Commit ec9a659845973a0dd0ee7c04e0075db818ed118d in lucene-solr's branch refs/heads/master from Michael McCandless [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ec9a659 ] LUCENE-9322: fix minor cosmetic refactoring error in logging string in IndexWriter's infoStream logging. It was always printing 'vector values' for all merging times instead of the other parts of Lucene index ('doc values', 'stored fields', etc.) > Discussing a unified vectors format API > --- > > Key: LUCENE-9322 > URL: https://issues.apache.org/jira/browse/LUCENE-9322 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Julie Tibshirani >Priority: Major > Fix For: master (9.0) > > Time Spent: 6h 40m > Remaining Estimate: 0h > > Two different approximate nearest neighbor approaches are currently being > developed, one based on HNSW (LUCENE-9004) and another based on coarse > quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to > handle vectors. In LUCENE-9136 we discussed the possibility of a unified API > that could support both approaches. The two ANN strategies give different > trade-offs in terms of speed, memory, and complexity, and it’s likely that > we’ll want to support both. Vector search is also an active research area, > and it would be great to be able to prototype and incorporate new approaches > without introducing more formats. > To me it seems like a good time to begin discussing a unified API. The > prototype for coarse quantization > ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit > soon (this depends on everyone's feedback of course). The approach is simple > and shows solid search performance, as seen > [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326]. > I think this API discussion is an important step in moving that > implementation forward. > The goals of the API would be > # Support for storing and retrieving individual float vectors. > # Support for approximate nearest neighbor search -- given a query vector, > return the indexed vectors that are closest to it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz opened a new pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.
jpountz opened a new pull request #2069: URL: https://github.com/apache/lucene-solr/pull/2069 This adds a switch to `Lucene80DocValuesFormat` which allows to configure whether to prioritize retrieval speed over compression ratio or the other way around. When prioritizing retrieval speed, binary doc values are written using the exact same format as before more aggressive compression got introduced. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9602) TestBackwardsCompatibility should test BEST_COMPRESSION
Adrien Grand created LUCENE-9602: Summary: TestBackwardsCompatibility should test BEST_COMPRESSION Key: LUCENE-9602 URL: https://issues.apache.org/jira/browse/LUCENE-9602 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand Currently we only test for backward compatibility indices created with BEST_SPEED. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.
mikemccand commented on a change in pull request #2069: URL: https://github.com/apache/lucene-solr/pull/2069#discussion_r520024634 ## File path: lucene/core/src/test/org/apache/lucene/codecs/lucene80/BaseLucene80DocValuesFormatTestCase.java ## @@ -286,7 +278,7 @@ private void doTestTermsEnumRandom(int numDocs, Supplier valuesProducer) conf.setMergeScheduler(new SerialMergeScheduler()); // set to duel against a codec which has ordinals: final PostingsFormat pf = TestUtil.getPostingsFormatWithOrds(random()); -final DocValuesFormat dv = new Lucene80DocValuesFormat(); +final DocValuesFormat dv = getCodec().docValuesFormat(); Review comment: Will this randomize between the different `Mode` tradeoffs? ## File path: lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene87/Lucene87Codec.java ## @@ -56,6 +57,23 @@ * @lucene.experimental */ public class Lucene87Codec extends Codec { + + /** Configuration option for the codec. */ + public static enum Mode { +/** Trade compression ratio for retrieval speed. */ +BEST_SPEED(Lucene87StoredFieldsFormat.Mode.BEST_SPEED, Lucene80DocValuesFormat.Mode.BEST_SPEED), +/** Trade retrieval speed for compression ratio. */ +BEST_COMPRESSION(Lucene87StoredFieldsFormat.Mode.BEST_COMPRESSION, Lucene80DocValuesFormat.Mode.BEST_COMPRESSION); + +private final Lucene87StoredFieldsFormat.Mode storedMode; +private final Lucene80DocValuesFormat.Mode dvMode; + +private Mode(Lucene87StoredFieldsFormat.Mode storedMode, Lucene80DocValuesFormat.Mode dvMode) { Review comment: Nice! So we roll up the tradeoffs to Codec level which will then tell each format how to tradeoff. ## File path: lucene/core/src/test/org/apache/lucene/codecs/lucene80/TestBestSpeedLucene80DocValuesFormat.java ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene80; + +import org.apache.lucene.codecs.Codec; +import org.apache.lucene.util.TestUtil; + +/** + * Tests Lucene80DocValuesFormat + */ +public class TestBestSpeedLucene80DocValuesFormat extends BaseLucene80DocValuesFormatTestCase { Review comment: Do we also have a dedicated `TestBestCompressedLucene80DocValuesFormat`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
dsmiley commented on a change in pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520029778 ## File path: solr/core/src/java/org/apache/solr/core/SolrCores.java ## @@ -51,7 +51,7 @@ // to essentially queue them up to be handled via pendingCoreOps. private static final List pendingCloses = new ArrayList<>(); - private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory; + private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory = TransientSolrCoreCacheFactory.NO_OP; Review comment: Under what circumstance do we need this no-op impl to prevent an NPE? ## File path: solr/core/src/java/org/apache/solr/core/SolrCores.java ## @@ -62,55 +51,44 @@ // to essentially queue them up to be handled via pendingCoreOps. private static final List pendingCloses = new ArrayList<>(); - private TransientSolrCoreCacheFactory transientCoreCache; + private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory = TransientSolrCoreCacheFactory.NO_OP; - private TransientSolrCoreCache transientSolrCoreCache = null; - SolrCores(CoreContainer container) { this.container = container; } protected void addCoreDescriptor(CoreDescriptor p) { synchronized (modifyLock) { if (p.isTransient()) { -if (getTransientCacheHandler() != null) { - getTransientCacheHandler().addTransientDescriptor(p.getName(), p); -} else { - log.warn("We encountered a core marked as transient, but there is no transient handler defined. This core will be inaccessible"); -} +getTransientCacheHandler().addTransientDescriptor(p.getName(), p); } else { -residentDesciptors.put(p.getName(), p); +residentDescriptors.put(p.getName(), p); } } } protected void removeCoreDescriptor(CoreDescriptor p) { synchronized (modifyLock) { if (p.isTransient()) { -if (getTransientCacheHandler() != null) { - getTransientCacheHandler().removeTransientDescriptor(p.getName()); -} +getTransientCacheHandler().removeTransientDescriptor(p.getName()); } else { -residentDesciptors.remove(p.getName()); +residentDescriptors.remove(p.getName()); } } } public void load(SolrResourceLoader loader) { -transientCoreCache = TransientSolrCoreCacheFactory.newInstance(loader, container); +transientSolrCoreCacheFactory = TransientSolrCoreCacheFactory.newInstance(loader, container); } + // We are shutting down. You can't hold the lock on the various lists of cores while they shut down, so we need to // make a temporary copy of the names and shut them down outside the lock. protected void close() { waitForLoadingCoresToFinish(30*1000); Collection coreList = new ArrayList<>(); - -TransientSolrCoreCache transientSolrCoreCache = getTransientCacheHandler(); -// Release observer -if (transientSolrCoreCache != null) { - transientSolrCoreCache.close(); -} +// Release transient core cache. +getTransientCacheHandler().close(); Review comment: @bruno-roustant the muse bot makes a good point; there should be a synchronized(modifyLock) around grabbing getTransientCacheHandler and calling close on it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.
mikemccand commented on a change in pull request #2069: URL: https://github.com/apache/lucene-solr/pull/2069#discussion_r520031264 ## File path: lucene/core/src/test/org/apache/lucene/codecs/lucene80/TestBestSpeedLucene80DocValuesFormat.java ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene80; + +import org.apache.lucene.codecs.Codec; +import org.apache.lucene.util.TestUtil; + +/** + * Tests Lucene80DocValuesFormat + */ +public class TestBestSpeedLucene80DocValuesFormat extends BaseLucene80DocValuesFormatTestCase { Review comment: Oh nevermind I see you opened followon issue for this: https://issues.apache.org/jira/browse/LUCENE-9602 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.
jpountz commented on a change in pull request #2069: URL: https://github.com/apache/lucene-solr/pull/2069#discussion_r520033652 ## File path: lucene/core/src/test/org/apache/lucene/codecs/lucene80/TestBestSpeedLucene80DocValuesFormat.java ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene80; + +import org.apache.lucene.codecs.Codec; +import org.apache.lucene.util.TestUtil; + +/** + * Tests Lucene80DocValuesFormat + */ +public class TestBestSpeedLucene80DocValuesFormat extends BaseLucene80DocValuesFormatTestCase { Review comment: You should see a `TestBestCompressedLucene80DocValuesFormat` file as well in this PR. I opened LUCENE-9602 specifically for backward compatibility and make sure we check in indices created by BEST_COMPRESSION in our source tree after every release to make sure we have good bw compatibility coverage. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.
jpountz commented on a change in pull request #2069: URL: https://github.com/apache/lucene-solr/pull/2069#discussion_r520034757 ## File path: lucene/core/src/test/org/apache/lucene/codecs/lucene80/BaseLucene80DocValuesFormatTestCase.java ## @@ -286,7 +278,7 @@ private void doTestTermsEnumRandom(int numDocs, Supplier valuesProducer) conf.setMergeScheduler(new SerialMergeScheduler()); // set to duel against a codec which has ordinals: final PostingsFormat pf = TestUtil.getPostingsFormatWithOrds(random()); -final DocValuesFormat dv = new Lucene80DocValuesFormat(); +final DocValuesFormat dv = getCodec().docValuesFormat(); Review comment: It's not randomizing, we are testing both modes explicitly via TestBestSpeedLucene80DocValuesFormat on one hand and TestBestCompressionLucene80DocValuesFormat on the other hand. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.
jpountz commented on a change in pull request #2069: URL: https://github.com/apache/lucene-solr/pull/2069#discussion_r520035432 ## File path: lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene87/Lucene87Codec.java ## @@ -56,6 +57,23 @@ * @lucene.experimental */ public class Lucene87Codec extends Codec { + + /** Configuration option for the codec. */ + public static enum Mode { +/** Trade compression ratio for retrieval speed. */ +BEST_SPEED(Lucene87StoredFieldsFormat.Mode.BEST_SPEED, Lucene80DocValuesFormat.Mode.BEST_SPEED), +/** Trade retrieval speed for compression ratio. */ +BEST_COMPRESSION(Lucene87StoredFieldsFormat.Mode.BEST_COMPRESSION, Lucene80DocValuesFormat.Mode.BEST_COMPRESSION); + +private final Lucene87StoredFieldsFormat.Mode storedMode; +private final Lucene80DocValuesFormat.Mode dvMode; + +private Mode(Lucene87StoredFieldsFormat.Mode storedMode, Lucene80DocValuesFormat.Mode dvMode) { Review comment: Right. It's still possible to made different choices for stored fields and doc values given that we allow configuration of doc values on a per-field basis, but this should at least keep simple use simple with one switch that configures stored fields and doc values at the same time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
alessandrobenedetti commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r520038324 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ## @@ -210,50 +216,59 @@ public void setContext(ResultContext context) { } // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { -// if store is set in the transformer we should overwrite the logger - -final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); - -final FeatureStore store = fr.getFeatureStore(featureStoreName); -featureStoreName = store.getName(); // if featureStoreName was null before this gets actual name - -try { - final LoggingModel lm = new LoggingModel(loggingModelName, - featureStoreName, store.getFeatures()); - - scoringQuery = new LTRScoringQuery(lm, - LTRQParserPlugin.extractEFIParams(localparams), - true, - threadManager); // request feature weights to be created for all features - -}catch (final Exception e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, - "retrieving the feature store "+featureStoreName, e); -} - } + rerankingQueries = SolrQueryRequestContextUtils.getScoringQueries(req); - if (scoringQuery.getOriginalQuery() == null) { -scoringQuery.setOriginalQuery(context.getQuery()); + docsWereNotReranked = (rerankingQueries == null || rerankingQueries.length == 0); + if (docsWereNotReranked) { +rerankingQueries = new LTRScoringQuery[]{null}; } - if (scoringQuery.getFeatureLogger() == null){ -scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); - } - scoringQuery.setRequest(req); - - featureLogger = scoringQuery.getFeatureLogger(); + modelWeights = new LTRScoringQuery.ModelWeight[rerankingQueries.length]; + String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); + for (int i = 0; i < rerankingQueries.length; i++) { +LTRScoringQuery scoringQuery = rerankingQueries[i]; +if ((scoringQuery == null || !(scoringQuery instanceof OriginalRankingLTRScoringQuery)) && (docsWereNotReranked || (featureStoreName != null && !featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { Review comment: So I just committed my changes on this, I spent quite a while thinking about the various scenarios, adjusting the code and adding the related tests. From your observations: - if both models are for the requested feature store then that's great and each document would have been picked by one of the models and so we use the feature vector already previously calculated by whatever model had picked the document. [OK] - if neither model is for the requested feature store then we need to create a logging model, is one logging model sufficient or do we need two? intuitively to me one would seem to be sufficient but that's based on partial analysis only so far. [One is sufficient, and my latest changes do that, anyway I just realized that the loggingModel is not heavy to create and when getting the featureVector, the cache is accessed, the key for that cache doesn't look to the instance but to the content of classes, so two identical logging models would have matched in the feature vector cache, the change was probably not vital, but not harmful either] - in the third scenario we still need the logging model, because when specifying a featureStore in the featureVector transformer we aim to extract all the features of that store, from efi passed to the transformer, so when explicitly mentioned, we need a logger for both the model again (also for the one that aligns) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
bruno-roustant commented on a change in pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520040508 ## File path: solr/core/src/java/org/apache/solr/core/SolrCores.java ## @@ -51,7 +51,7 @@ // to essentially queue them up to be handled via pendingCoreOps. private static final List pendingCloses = new ArrayList<>(); - private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory; + private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory = TransientSolrCoreCacheFactory.NO_OP; Review comment: If a SolrCores method accessing getTransientCacheHandler() is called before calling load(). For example an asynchronous periodic thread that would be started with CoreContainer and that would call getLoadedCoreNames(). This worked before even if SolrCores.load() was called after. (first calls would return without counting transient cores, and subsequent calls after load() would count transient cores) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values
[ https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-14683: Attachment: SOLR-14683.patch > Review the metrics API to ensure consistent placeholders for missing values > --- > > Key: SOLR-14683 > URL: https://issues.apache.org/jira/browse/SOLR-14683 > Project: Solr > Issue Type: Improvement > Components: metrics >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-14683.patch, SOLR-14683.patch > > > Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an > unknown state at some points in time, eg. during SolrCore startup or shutdown. > Currently the API returns placeholders with either impossible values for > numeric gauges (such as index size -1) or empty maps / strings for other > non-numeric gauges. > [~hossman] noticed that the values for these placeholders may be misleading, > depending on how the user treats them - if the client has no special logic to > treat them as "missing values" it may erroneously treat them as valid data. > E.g. numeric values of -1 or 0 may severely skew averages and produce > misleading peaks / valleys in metrics histories. > On the other hand returning a literal {{null}} value instead of the expected > number may also cause unexpected client issues - although in this case it's > clearer that there's actually no data available, so long-term this may be a > better strategy than returning impossible values, even if it means that the > client should learn to handle {{null}} values appropriately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values
[ https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228768#comment-17228768 ] Andrzej Bialecki commented on SOLR-14683: - This patch adds configurable placeholders for missing values of different types, all returning {{null}} by default. They are configured in {{solr.xml:solr/metrics/missingValues}} section, per Ref Guide doc (see example there). If there are no objections I'll commit this shortly. > Review the metrics API to ensure consistent placeholders for missing values > --- > > Key: SOLR-14683 > URL: https://issues.apache.org/jira/browse/SOLR-14683 > Project: Solr > Issue Type: Improvement > Components: metrics >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-14683.patch, SOLR-14683.patch > > > Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an > unknown state at some points in time, eg. during SolrCore startup or shutdown. > Currently the API returns placeholders with either impossible values for > numeric gauges (such as index size -1) or empty maps / strings for other > non-numeric gauges. > [~hossman] noticed that the values for these placeholders may be misleading, > depending on how the user treats them - if the client has no special logic to > treat them as "missing values" it may erroneously treat them as valid data. > E.g. numeric values of -1 or 0 may severely skew averages and produce > misleading peaks / valleys in metrics histories. > On the other hand returning a literal {{null}} value instead of the expected > number may also cause unexpected client issues - although in this case it's > clearer that there's actually no data available, so long-term this may be a > better strategy than returning impossible values, even if it means that the > client should learn to handle {{null}} values appropriately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other
[ https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228769#comment-17228769 ] Kevin Risden commented on SOLR-14973: - FYI [~tallison] - not sure who updated Tika libraries last :D I can help look at this I think. > Solr 8.6 is shipping libraries that are incompatible with each other > > > Key: SOLR-14973 > URL: https://issues.apache.org/jira/browse/SOLR-14973 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 8.6 >Reporter: Samir Huremovic >Priority: Major > Labels: tika-parsers > > Hi, > since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This > version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} > (see https://issues.apache.org/jira/browse/TIKA-3047) > Solr has version {{4.1.1}} of poi included. > This creates (at least) a problem for parsing {{.xls}} files. The following > exception gets thrown by trying to post an {{.xls}} file in the techproducts > example: > {{java.lang.NoSuchMethodError: > org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
bruno-roustant commented on a change in pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520049002 ## File path: solr/core/src/java/org/apache/solr/core/SolrCores.java ## @@ -62,55 +51,44 @@ // to essentially queue them up to be handled via pendingCoreOps. private static final List pendingCloses = new ArrayList<>(); - private TransientSolrCoreCacheFactory transientCoreCache; + private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory = TransientSolrCoreCacheFactory.NO_OP; - private TransientSolrCoreCache transientSolrCoreCache = null; - SolrCores(CoreContainer container) { this.container = container; } protected void addCoreDescriptor(CoreDescriptor p) { synchronized (modifyLock) { if (p.isTransient()) { -if (getTransientCacheHandler() != null) { - getTransientCacheHandler().addTransientDescriptor(p.getName(), p); -} else { - log.warn("We encountered a core marked as transient, but there is no transient handler defined. This core will be inaccessible"); -} +getTransientCacheHandler().addTransientDescriptor(p.getName(), p); } else { -residentDesciptors.put(p.getName(), p); +residentDescriptors.put(p.getName(), p); } } } protected void removeCoreDescriptor(CoreDescriptor p) { synchronized (modifyLock) { if (p.isTransient()) { -if (getTransientCacheHandler() != null) { - getTransientCacheHandler().removeTransientDescriptor(p.getName()); -} +getTransientCacheHandler().removeTransientDescriptor(p.getName()); } else { -residentDesciptors.remove(p.getName()); +residentDescriptors.remove(p.getName()); } } } public void load(SolrResourceLoader loader) { -transientCoreCache = TransientSolrCoreCacheFactory.newInstance(loader, container); +transientSolrCoreCacheFactory = TransientSolrCoreCacheFactory.newInstance(loader, container); } + // We are shutting down. You can't hold the lock on the various lists of cores while they shut down, so we need to // make a temporary copy of the names and shut them down outside the lock. protected void close() { waitForLoadingCoresToFinish(30*1000); Collection coreList = new ArrayList<>(); - -TransientSolrCoreCache transientSolrCoreCache = getTransientCacheHandler(); -// Release observer -if (transientSolrCoreCache != null) { - transientSolrCoreCache.close(); -} +// Release transient core cache. +getTransientCacheHandler().close(); Review comment: +1 thanks muse! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.
mikemccand commented on a change in pull request #2069: URL: https://github.com/apache/lucene-solr/pull/2069#discussion_r520055341 ## File path: lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene87/Lucene87Codec.java ## @@ -56,6 +57,23 @@ * @lucene.experimental */ public class Lucene87Codec extends Codec { + + /** Configuration option for the codec. */ + public static enum Mode { +/** Trade compression ratio for retrieval speed. */ +BEST_SPEED(Lucene87StoredFieldsFormat.Mode.BEST_SPEED, Lucene80DocValuesFormat.Mode.BEST_SPEED), +/** Trade retrieval speed for compression ratio. */ +BEST_COMPRESSION(Lucene87StoredFieldsFormat.Mode.BEST_COMPRESSION, Lucene80DocValuesFormat.Mode.BEST_COMPRESSION); + +private final Lucene87StoredFieldsFormat.Mode storedMode; +private final Lucene80DocValuesFormat.Mode dvMode; + +private Mode(Lucene87StoredFieldsFormat.Mode storedMode, Lucene80DocValuesFormat.Mode dvMode) { Review comment: Great! Simple for common use cases ("I want best compression" or "I want fastest search"), and complex for complex use cases (I want separate control for each part of the index). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank
alessandrobenedetti commented on a change in pull request #1571: URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r520055346 ## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java ## @@ -210,50 +216,59 @@ public void setContext(ResultContext context) { } // Setup LTRScoringQuery - scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req); - docsWereNotReranked = (scoringQuery == null); - String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); - if (docsWereNotReranked || (featureStoreName != null && (!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { -// if store is set in the transformer we should overwrite the logger - -final ManagedFeatureStore fr = ManagedFeatureStore.getManagedFeatureStore(req.getCore()); - -final FeatureStore store = fr.getFeatureStore(featureStoreName); -featureStoreName = store.getName(); // if featureStoreName was null before this gets actual name - -try { - final LoggingModel lm = new LoggingModel(loggingModelName, - featureStoreName, store.getFeatures()); - - scoringQuery = new LTRScoringQuery(lm, - LTRQParserPlugin.extractEFIParams(localparams), - true, - threadManager); // request feature weights to be created for all features - -}catch (final Exception e) { - throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, - "retrieving the feature store "+featureStoreName, e); -} - } + rerankingQueries = SolrQueryRequestContextUtils.getScoringQueries(req); - if (scoringQuery.getOriginalQuery() == null) { -scoringQuery.setOriginalQuery(context.getQuery()); + docsWereNotReranked = (rerankingQueries == null || rerankingQueries.length == 0); + if (docsWereNotReranked) { +rerankingQueries = new LTRScoringQuery[]{null}; } - if (scoringQuery.getFeatureLogger() == null){ -scoringQuery.setFeatureLogger( SolrQueryRequestContextUtils.getFeatureLogger(req) ); - } - scoringQuery.setRequest(req); - - featureLogger = scoringQuery.getFeatureLogger(); + modelWeights = new LTRScoringQuery.ModelWeight[rerankingQueries.length]; + String featureStoreName = SolrQueryRequestContextUtils.getFvStoreName(req); + for (int i = 0; i < rerankingQueries.length; i++) { +LTRScoringQuery scoringQuery = rerankingQueries[i]; +if ((scoringQuery == null || !(scoringQuery instanceof OriginalRankingLTRScoringQuery)) && (docsWereNotReranked || (featureStoreName != null && !featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName() { Review comment: Actually taking a deeper look to the third point, the original implementation was not extracting all the features, but if the explicit featureStore was matching the model featureStore, it was using the model one (no logger). So I agree with you, in our case, we want to use the model already existent and no logger at all. I am going to clean up that bit and do a new commit tomorrow This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
dsmiley commented on a change in pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520057072 ## File path: solr/core/src/java/org/apache/solr/core/SolrCores.java ## @@ -51,7 +51,7 @@ // to essentially queue them up to be handled via pendingCoreOps. private static final List pendingCloses = new ArrayList<>(); - private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory; + private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory = TransientSolrCoreCacheFactory.NO_OP; Review comment: then SolrCores.load could be called much sooner, basically right after the resourceLoader is ready. Also, maybe this other thread ought to wait to start till some later time. Perhaps ideally there would be an event publishing mechanism, which doesn't exist currently, I know. Or alternatively just have some CountDownLatch signal, like signaling when Solr will begin loading cores. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
dsmiley commented on a change in pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520058820 ## File path: solr/core/src/java/org/apache/solr/core/SolrCores.java ## @@ -536,7 +538,9 @@ public void queueCoreToClose(SolrCore coreToClose) { * @return the cache holding the transient cores; never null. */ public TransientSolrCoreCache getTransientCacheHandler() { Review comment: I think it should be a prerequisite that the caller acquire the lock a-priori. For example, close() seems to need to keep holding this lock to call close() on it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other
[ https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228776#comment-17228776 ] Tim Allison commented on SOLR-14973: Thank you [~krisden] for the ping. > Solr 8.6 is shipping libraries that are incompatible with each other > > > Key: SOLR-14973 > URL: https://issues.apache.org/jira/browse/SOLR-14973 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 8.6 >Reporter: Samir Huremovic >Priority: Major > Labels: tika-parsers > > Hi, > since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This > version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} > (see https://issues.apache.org/jira/browse/TIKA-3047) > Solr has version {{4.1.1}} of poi included. > This creates (at least) a problem for parsing {{.xls}} files. The following > exception gets thrown by trying to post an {{.xls}} file in the techproducts > example: > {{java.lang.NoSuchMethodError: > org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
bruno-roustant commented on a change in pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520089739 ## File path: solr/core/src/java/org/apache/solr/core/SolrCores.java ## @@ -536,7 +538,9 @@ public void queueCoreToClose(SolrCore coreToClose) { * @return the cache holding the transient cores; never null. */ public TransientSolrCoreCache getTransientCacheHandler() { Review comment: I don't think so. Let's say transientSolrCoreCacheFactory = A. If there is a race and load() is called between getTransientCacheHandler() and TransientSolrCoreCache.close(). getTransientCacheHandler() returns A, then load() sets B, and then A.close() is called. It is the same result as the sequence { getTransientCacheHandler().close() } atomically on A, and then load() B. But actually it doesn't matter. So I can add a synchronized (modifyLock) around if you prefer. I'll keep anyway the synchronized inside getTransientCacheHandler() because it is public. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.
dsmiley commented on a change in pull request #2066: URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520092229 ## File path: solr/core/src/java/org/apache/solr/core/SolrCores.java ## @@ -536,7 +538,9 @@ public void queueCoreToClose(SolrCore coreToClose) { * @return the cache holding the transient cores; never null. */ public TransientSolrCoreCache getTransientCacheHandler() { Review comment: Okay; I'm fine with the lock to be safe. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #2020: SOLR-14949: Ability to customize Solr Docker build
dweiss commented on pull request #2020: URL: https://github.com/apache/lucene-solr/pull/2020#issuecomment-724260281 Hmm... sorry, missed your mention/ request somehow. Yeah - these functions are intended to read variables from multiple locations so it looks ok to me. I didn't test it (or use docker much for that matter). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] HoustonPutman commented on pull request #2020: SOLR-14949: Ability to customize Solr Docker build
HoustonPutman commented on pull request #2020: URL: https://github.com/apache/lucene-solr/pull/2020#issuecomment-724264582 Thanks for the sanity check! I've tested pretty thoroughly, and the PR test does some checks on its own too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14993) Unable to download zookeeper files of 1byte in size
Allen Sooredoo created SOLR-14993: - Summary: Unable to download zookeeper files of 1byte in size Key: SOLR-14993 URL: https://issues.apache.org/jira/browse/SOLR-14993 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud, SolrJ Affects Versions: 8.5.1 Reporter: Allen Sooredoo When downloading a file from Zookeeper using the Solrj client, files of size 1 byte are ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
dweiss commented on pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-724283052 Zach I've cleaned up the native build a bit - moved it under lucene/misc, added Windows build (it does build the native library for me). I didn't check whether it works on a Mac but I suspect it should. I also left the native project included by default in settings (removed the "optional" flag). Gradle's cpp plugin ignores the project on platforms not explicitly mentioned in the targetMachines - I am curious whether we'll blow up something or if it's just going to work. While I don't particularly like having native code in Lucene, I think it's better than it used to be (mixed cpp code with java code, etc.). I allowed myself to commit directly to your fork, hope you don't mind (please test it out!). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
dweiss commented on a change in pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#discussion_r520125532 ## File path: lucene/misc/native/build.gradle ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * This gets separated out from misc module into a native module due to incompatibility between cpp-library and java-library plugins. + * For details, please see https://github.com/gradle/gradle-native/issues/352#issuecomment-461724948 + */ +import org.apache.tools.ant.taskdefs.condition.Os + +description = 'Module for native code' + +apply plugin: 'cpp-library' + +library { + baseName = 'NativePosixUtil' Review comment: I wonder if we should rename the resulting native library something more specific... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani opened a new pull request #2070: LUCENE-9536: Correct the OrdinalMap optimization.
jtibshirani opened a new pull request #2070: URL: https://github.com/apache/lucene-solr/pull/2070 Previously we only checked that the first segment's ordinal deltas were all zero. This didn't account for some rare cases where some of the segment's ordinals are filtered out, so the ordinals aren't contiguous. In these cases we fill in dummy values for the missing ordinal deltas. So a segment's ordinals can appear to match the global ordinals perfectly, but not actually contain all the terms. Such a case can arise when using a FilteredTermsEnum, for example when merging a segment with deletions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
dweiss commented on pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-724287441 Byt. if we do have to add that explicit 'build.native' option then I'd implement it as a task (graph) exclusion rather than project exclusion. Windows users in particular may complain as the plugin requires visual studio... https://docs.gradle.org/current/userguide/building_cpp_projects.html#sec:cpp_supported_tool_chain So something like this on all native project's tasks (conditionally): https://discuss.gradle.org/t/removing-tasks-from-taskgraph-remove-a-task-dependency/394 or filter them out entirely: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani commented on pull request #2070: LUCENE-9536: Correct the OrdinalMap optimization.
jtibshirani commented on pull request #2070: URL: https://github.com/apache/lucene-solr/pull/2070#issuecomment-724288495 This should fix the failures we're seeing like `TestLucene70DocValuesFormat#testSparseSortedVariableLengthVsStoredFields` and `TestSimpleTextDocValuesFormat#testSparseSortedFixedLengthVsStoredFields`. Note to self: run the whole test suite a bunch of times when changing subtle logic!! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14994) Bring in Solr Operator into the Lucene project
Anshum Gupta created SOLR-14994: --- Summary: Bring in Solr Operator into the Lucene project Key: SOLR-14994 URL: https://issues.apache.org/jira/browse/SOLR-14994 Project: Solr Issue Type: Task Security Level: Public (Default Security Level. Issues are Public) Reporter: Anshum Gupta Assignee: Anshum Gupta Solr Operator project codebase is currently in the process of being donated to the Apache Lucene project. This is an umbrella JIRA to track the progress and tasks associated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14788) Solr: The Next Big Thing
[ https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228852#comment-17228852 ] Mark Robert Miller commented on SOLR-14788: --- This Overseer work (release it from its birth into heavy historic tech debt), on top of the general state of everything else near this state, requires that i really work step by step through the system and what it does - the first time I saw things from that state, that perspective, I realized we don’t have adequate developer / user log, it’s really not sufficient at all, and so you have to start adding info and debug logging, and that is very, very useful. I didn’t really just come to understand this wide area, but having to work through so much to “re-master” it, the logging i need becomes evident as I learned I needed it. So this time I’m not doing a great job. I’m adding here and there, over logging whee I have to clean up, blah, blah. The takeaway really is that our system is actually fairly simple, but only if you axe the decade old baggage and realign some implementations. Once the foundation is stable, there is high value in nailing the logging. It’s the key to letting more real help in, it’s the key for efficient test and user and support debugging. We log so much data, we over log data and this thing and that thing. We should not over log data by default and we should log system flow really well and it will be a really big deal. > Solr: The Next Big Thing > > > Key: SOLR-14788 > URL: https://issues.apache.org/jira/browse/SOLR-14788 > Project: Solr > Issue Type: Task >Reporter: Mark Robert Miller >Assignee: Mark Robert Miller >Priority: Critical > > h3. > [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The > Policeman is on duty!*{color} > {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and > have some fun. Try to make some progress. Don't stress too much about the > impact of your changes or maintaining stability and performance and > correctness so much. Until the end of phase 1, I've got your back. I have a > variety of tools and contraptions I have been building over the years and I > will continue training them on this branch. I will review your changes and > peer out across the land and course correct where needed. As Mike D will be > thinking, "Sounds like a bottleneck Mark." And indeed it will be to some > extent. Which is why once stage one is completed, I will flip The Policeman > to off duty. When off duty, I'm always* {color:#de350b}*occasionally*{color} > *down for some vigilante justice, but I won't be walking the beat, all that > stuff about sit back and relax goes out the window.*{color}_ > {quote} > > I have stolen this title from Ishan or Noble and Ishan. > This issue is meant to capture the work of a small team that is forming to > push Solr and SolrCloud to the next phase. > I have kicked off the work with an effort to create a very fast and solid > base. That work is not 100% done, but it's ready to join the fight. > Tim Potter has started giving me a tremendous hand in finishing up. Ishan and > Noble have already contributed support and testing and have plans for > additional work to shore up some of our current shortcomings. > Others have expressed an interest in helping and hopefully they will pop up > here as well. > Let's organize and discuss our efforts here and in various sub issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14788) Solr: The Next Big Thing
[ https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228852#comment-17228852 ] Mark Robert Miller edited comment on SOLR-14788 at 11/9/20, 11:16 PM: -- This Overseer work (released from its birth into heavy historic tech debt), on top of the general state of everything else near this state, requires that i really work step by step through the system and what it does - the first time I saw things from that state, that perspective, I realized we don’t have adequate developer / user logging, it’s really not sufficient at all, and so you have to start adding info and debug logging, and that is very, very useful. I didn’t really just come to understand this wide area, but having to work through so much to “re-master” it, the logging i need becomes evident as I learned I needed it and we could just be 100x more helpful than we are. So this time I’m not doing a great job. I’m adding here and there, over logging where I have to clean up, favoring finishing over a little paint outside the lines, blah, blah. The takeaway really is that our system is actually fairly simple, but only if you axe the decade old baggage and realign some implementations. Once the foundation is stable, there is high value in nailing the logging. It’s the key to letting more real help in, it’s the key for efficient test and user and support debugging. We log so much data, we over log data and this thing and that thing. We should not over log data by default and we should log system flow really well and it will be a really big deal. was (Author: markrmiller): This Overseer work (release it from its birth into heavy historic tech debt), on top of the general state of everything else near this state, requires that i really work step by step through the system and what it does - the first time I saw things from that state, that perspective, I realized we don’t have adequate developer / user log, it’s really not sufficient at all, and so you have to start adding info and debug logging, and that is very, very useful. I didn’t really just come to understand this wide area, but having to work through so much to “re-master” it, the logging i need becomes evident as I learned I needed it. So this time I’m not doing a great job. I’m adding here and there, over logging whee I have to clean up, blah, blah. The takeaway really is that our system is actually fairly simple, but only if you axe the decade old baggage and realign some implementations. Once the foundation is stable, there is high value in nailing the logging. It’s the key to letting more real help in, it’s the key for efficient test and user and support debugging. We log so much data, we over log data and this thing and that thing. We should not over log data by default and we should log system flow really well and it will be a really big deal. > Solr: The Next Big Thing > > > Key: SOLR-14788 > URL: https://issues.apache.org/jira/browse/SOLR-14788 > Project: Solr > Issue Type: Task >Reporter: Mark Robert Miller >Assignee: Mark Robert Miller >Priority: Critical > > h3. > [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The > Policeman is on duty!*{color} > {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and > have some fun. Try to make some progress. Don't stress too much about the > impact of your changes or maintaining stability and performance and > correctness so much. Until the end of phase 1, I've got your back. I have a > variety of tools and contraptions I have been building over the years and I > will continue training them on this branch. I will review your changes and > peer out across the land and course correct where needed. As Mike D will be > thinking, "Sounds like a bottleneck Mark." And indeed it will be to some > extent. Which is why once stage one is completed, I will flip The Policeman > to off duty. When off duty, I'm always* {color:#de350b}*occasionally*{color} > *down for some vigilante justice, but I won't be walking the beat, all that > stuff about sit back and relax goes out the window.*{color}_ > {quote} > > I have stolen this title from Ishan or Noble and Ishan. > This issue is meant to capture the work of a small team that is forming to > push Solr and SolrCloud to the next phase. > I have kicked off the work with an effort to create a very fast and solid > base. That work is not 100% done, but it's ready to join the fight. > Tim Potter has started giving me a tremendous hand in finishing up. Ishan and > Noble have already contributed support and testing and have plans for > additional work to shore up some of our curren
[GitHub] [lucene-solr] madrob commented on a change in pull request #2067: SOLR-14987: Reuse HttpSolrClient per node vs. one per Solr core when using CloudSolrStream
madrob commented on a change in pull request #2067: URL: https://github.com/apache/lucene-solr/pull/2067#discussion_r520198395 ## File path: solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/CloudSolrStream.java ## @@ -334,11 +334,6 @@ private StreamComparator parseComp(String sort, String fl) throws IOException { public static Slice[] getSlices(String collectionName, ZkStateReader zkStateReader, boolean checkAlias) throws IOException { ClusterState clusterState = zkStateReader.getClusterState(); -Map collectionsMap = clusterState.getCollectionsMap(); Review comment: related: can we update the javadoc on clusterState.getCollectionsMap to be more explicit that it _will_ make a call to zk, instead of the current _may_ ## File path: solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/CloudSolrStream.java ## @@ -334,11 +334,6 @@ private StreamComparator parseComp(String sort, String fl) throws IOException { public static Slice[] getSlices(String collectionName, ZkStateReader zkStateReader, boolean checkAlias) throws IOException { ClusterState clusterState = zkStateReader.getClusterState(); -Map collectionsMap = clusterState.getCollectionsMap(); - -//TODO we should probably split collection by comma to query more than one -// which is something already supported in other parts of Solr - // check for alias or collection Review comment: Should we cache the value of `zkStateReader.getAliases` below to avoid volatile reads? ## File path: solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/SolrStream.java ## @@ -126,6 +135,17 @@ public void open() throws IOException { } } + private String getNodeUrl() { Review comment: Can we precomute this in the constructor? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani opened a new pull request #2071: LUCENE-9322: Some fixes to SimpleTextVectorFormat.
jtibshirani opened a new pull request #2071: URL: https://github.com/apache/lucene-solr/pull/2071 * Make sure the files are unique by renaming the term vectors extension to `tvc`. * Fix a bug where reading a vector would drop the leading digit of the first element. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani commented on pull request #2071: LUCENE-9322: Some fixes to SimpleTextVectorFormat.
jtibshirani commented on pull request #2071: URL: https://github.com/apache/lucene-solr/pull/2071#issuecomment-724386681 I found these issues while fixing the following failing test: ``` ./gradlew test --tests TestSortingCodecReader.testSortOnAddIndicesRandom -Dtests.seed=B38EBA45728D5FB1 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on a change in pull request #2071: LUCENE-9322: Some fixes to SimpleTextVectorFormat.
msokolov commented on a change in pull request #2071: URL: https://github.com/apache/lucene-solr/pull/2071#discussion_r520253512 ## File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextVectorReader.java ## @@ -245,8 +245,8 @@ private void readAllVectors() throws IOException { private void readVector(float[] value) throws IOException { SimpleTextUtil.readLine(in, scratch); - // skip leading " [" and strip trailing "]" - String s = new BytesRef(scratch.bytes(), 2, scratch.length() - 3).utf8ToString(); + // skip leading "[" and strip trailing "]" + String s = new BytesRef(scratch.bytes(), 1, scratch.length() - 2).utf8ToString(); Review comment: Wow, how did this ever work; we must never have tested it. grr. Thank you for cleaning up! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov merged pull request #2071: LUCENE-9322: Some fixes to SimpleTextVectorFormat.
msokolov merged pull request #2071: URL: https://github.com/apache/lucene-solr/pull/2071 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API
[ https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228945#comment-17228945 ] ASF subversion and git services commented on LUCENE-9322: - Commit 42c5206cea5c85d486813d42f7d52e44a5a695ba in lucene-solr's branch refs/heads/master from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=42c5206 ] LUCENE-9322: Some fixes to SimpleTextVectorFormat. (#2071) * Make sure the file extensions are unique. * Fix bug in vector reading. > Discussing a unified vectors format API > --- > > Key: LUCENE-9322 > URL: https://issues.apache.org/jira/browse/LUCENE-9322 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Julie Tibshirani >Priority: Major > Fix For: master (9.0) > > Time Spent: 7h 20m > Remaining Estimate: 0h > > Two different approximate nearest neighbor approaches are currently being > developed, one based on HNSW (LUCENE-9004) and another based on coarse > quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to > handle vectors. In LUCENE-9136 we discussed the possibility of a unified API > that could support both approaches. The two ANN strategies give different > trade-offs in terms of speed, memory, and complexity, and it’s likely that > we’ll want to support both. Vector search is also an active research area, > and it would be great to be able to prototype and incorporate new approaches > without introducing more formats. > To me it seems like a good time to begin discussing a unified API. The > prototype for coarse quantization > ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit > soon (this depends on everyone's feedback of course). The approach is simple > and shows solid search performance, as seen > [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326]. > I think this API discussion is an important step in moving that > implementation forward. > The goals of the API would be > # Support for storing and retrieving individual float vectors. > # Support for approximate nearest neighbor search -- given a query vector, > return the indexed vectors that are closest to it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API
[ https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228951#comment-17228951 ] ASF subversion and git services commented on LUCENE-9322: - Commit 42c5206cea5c85d486813d42f7d52e44a5a695ba in lucene-solr's branch refs/heads/master from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=42c5206 ] LUCENE-9322: Some fixes to SimpleTextVectorFormat. (#2071) * Make sure the file extensions are unique. * Fix bug in vector reading. > Discussing a unified vectors format API > --- > > Key: LUCENE-9322 > URL: https://issues.apache.org/jira/browse/LUCENE-9322 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Julie Tibshirani >Priority: Major > Fix For: master (9.0) > > Time Spent: 7h 20m > Remaining Estimate: 0h > > Two different approximate nearest neighbor approaches are currently being > developed, one based on HNSW (LUCENE-9004) and another based on coarse > quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to > handle vectors. In LUCENE-9136 we discussed the possibility of a unified API > that could support both approaches. The two ANN strategies give different > trade-offs in terms of speed, memory, and complexity, and it’s likely that > we’ll want to support both. Vector search is also an active research area, > and it would be great to be able to prototype and incorporate new approaches > without introducing more formats. > To me it seems like a good time to begin discussing a unified API. The > prototype for coarse quantization > ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit > soon (this depends on everyone's feedback of course). The approach is simple > and shows solid search performance, as seen > [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326]. > I think this API discussion is an important step in moving that > implementation forward. > The goals of the API would be > # Support for storing and retrieving individual float vectors. > # Support for approximate nearest neighbor search -- given a query vector, > return the indexed vectors that are closest to it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory
zacharymorn commented on a change in pull request #2052: URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r520268743 ## File path: lucene/misc/src/java/org/apache/lucene/store/DirectIODirectory.java ## @@ -66,45 +66,32 @@ * * @lucene.experimental */ -public class NativeUnixDirectory extends FSDirectory { +public class DirectIODirectory extends FSDirectory { // TODO: this is OS dependent, but likely 512 is the LCD private final static long ALIGN = 512; private final static long ALIGN_NOT_MASK = ~(ALIGN-1); - - /** Default buffer size before writing to disk (256 KB); - * larger means less IO load but more RAM and direct - * buffer storage space consumed during merging. */ - - public final static int DEFAULT_MERGE_BUFFER_SIZE = 262144; /** Default min expected merge size before direct IO is * used (10 MB): */ public final static long DEFAULT_MIN_BYTES_DIRECT = 10*1024*1024; - private final int mergeBufferSize; private final long minBytesDirect; private final Directory delegate; /** Create a new NIOFSDirectory for the named location. * * @param path the path of the directory - * @param lockFactory to use - * @param mergeBufferSize Size of buffer to use for - *merging. See {@link #DEFAULT_MERGE_BUFFER_SIZE}. * @param minBytesDirect Merges, or files to be opened for * reading, smaller than this will * not use direct IO. See {@link * #DEFAULT_MIN_BYTES_DIRECT} + * @param lockFactory to use * @param delegate fallback Directory for non-merges * @throws IOException If there is a low-level I/O error */ - public NativeUnixDirectory(Path path, int mergeBufferSize, long minBytesDirect, LockFactory lockFactory, Directory delegate) throws IOException { + public DirectIODirectory(Path path, long minBytesDirect, LockFactory lockFactory, Directory delegate) throws IOException { super(path, lockFactory); -if ((mergeBufferSize & ALIGN) != 0) { - throw new IllegalArgumentException("mergeBufferSize must be 0 mod " + ALIGN + " (got: " + mergeBufferSize + ")"); -} -this.mergeBufferSize = mergeBufferSize; Review comment: I see it makes sense. I've reverted the relevant section of code in the latest commit to keep it focused on moving to pure java implementation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory
zacharymorn commented on a change in pull request #2052: URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r520268743 ## File path: lucene/misc/src/java/org/apache/lucene/store/DirectIODirectory.java ## @@ -66,45 +66,32 @@ * * @lucene.experimental */ -public class NativeUnixDirectory extends FSDirectory { +public class DirectIODirectory extends FSDirectory { // TODO: this is OS dependent, but likely 512 is the LCD private final static long ALIGN = 512; private final static long ALIGN_NOT_MASK = ~(ALIGN-1); - - /** Default buffer size before writing to disk (256 KB); - * larger means less IO load but more RAM and direct - * buffer storage space consumed during merging. */ - - public final static int DEFAULT_MERGE_BUFFER_SIZE = 262144; /** Default min expected merge size before direct IO is * used (10 MB): */ public final static long DEFAULT_MIN_BYTES_DIRECT = 10*1024*1024; - private final int mergeBufferSize; private final long minBytesDirect; private final Directory delegate; /** Create a new NIOFSDirectory for the named location. * * @param path the path of the directory - * @param lockFactory to use - * @param mergeBufferSize Size of buffer to use for - *merging. See {@link #DEFAULT_MERGE_BUFFER_SIZE}. * @param minBytesDirect Merges, or files to be opened for * reading, smaller than this will * not use direct IO. See {@link * #DEFAULT_MIN_BYTES_DIRECT} + * @param lockFactory to use * @param delegate fallback Directory for non-merges * @throws IOException If there is a low-level I/O error */ - public NativeUnixDirectory(Path path, int mergeBufferSize, long minBytesDirect, LockFactory lockFactory, Directory delegate) throws IOException { + public DirectIODirectory(Path path, long minBytesDirect, LockFactory lockFactory, Directory delegate) throws IOException { super(path, lockFactory); -if ((mergeBufferSize & ALIGN) != 0) { - throw new IllegalArgumentException("mergeBufferSize must be 0 mod " + ALIGN + " (got: " + mergeBufferSize + ")"); -} -this.mergeBufferSize = mergeBufferSize; Review comment: I see it makes sense. I've reverted the relevant section of code in the latest commits to keep it focused on moving to pure java implementation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
zacharymorn commented on a change in pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#discussion_r520270563 ## File path: lucene/misc/native/build.gradle ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * This gets separated out from misc module into a native module due to incompatibility between cpp-library and java-library plugins. + * For details, please see https://github.com/gradle/gradle-native/issues/352#issuecomment-461724948 + */ +import org.apache.tools.ant.taskdefs.condition.Os + +description = 'Module for native code' + +apply plugin: 'cpp-library' + +library { + baseName = 'NativePosixUtil' Review comment: Given this now also includes the Windows one, and the cpp code focus on doing file IO, I'm guessing something like `NativeIOUtil` might work ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
zacharymorn commented on pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-724434703 > Zach I've cleaned up the native build a bit - moved it under lucene/misc, added Windows build (it does build the native library for me). I didn't check whether it works on a Mac but I suspect it should. > > I also left the native project included by default in settings (removed the "optional" flag). Gradle's cpp plugin ignores the project on platforms not explicitly mentioned in the targetMachines - I am curious whether we'll blow up something or if it's just going to work. > > While I don't particularly like having native code in Lucene, I think it's better than it used to be (mixed cpp code with java code, etc.). > > I allowed myself to commit directly to your fork, hope you don't mind (please test it out!). Thanks Dawid for the changes! I tested it out in my mac and it built fine as well. I originally separated out into an independent native module thinking that it could host future native code as well, but I guess that's probably just pre-mature optimization as it hasn't been the case for the last few years. > I am curious whether we’ll blow up something or if it’s just going to work. Just curious, are there any cross-platform tests as well in the pipeline that can confirm this? How do we verify this other than running local builds? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle
zacharymorn commented on a change in pull request #2068: URL: https://github.com/apache/lucene-solr/pull/2068#discussion_r520273368 ## File path: lucene/misc/native/build.gradle ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * This gets separated out from misc module into a native module due to incompatibility between cpp-library and java-library plugins. + * For details, please see https://github.com/gradle/gradle-native/issues/352#issuecomment-461724948 + */ +import org.apache.tools.ant.taskdefs.condition.Os + +description = 'Module for native code' + +apply plugin: 'cpp-library' + +library { + baseName = 'NativePosixUtil' + + // Native build for Windows platform will be added in later stage + targetMachines = [ + machines.linux.x86_64, + machines.macOS.x86_64, + machines.windows.x86_64 + ] + + // Point at platform-specific sources. Other platforms will be ignored + // (plugin won't find the toolchain). + if (Os.isFamily(Os.FAMILY_WINDOWS)) { +source.from file("${projectDir}/src/main/windows") + } else if (Os.isFamily(Os.FAMILY_UNIX) || Os.isFamily(Os.FAMILY_MAC)) { +source.from file("${projectDir}/src/main/posix") + } +} + +tasks.withType(CppCompile).configureEach { + def javaHome = rootProject.ext.runtimeJava.getInstallationDirectory().getAsFile().getPath() + + // Assume standard openjdk layout. This means only one architecture-specific include folder + // is present. + systemIncludes.from file("${javaHome}/include") + + for (def path : [ + file("${javaHome}/include/win32"), + file("${javaHome}/include/darwin"), + file("${javaHome}/include/linux"), + file("${javaHome}/include/solaris")]) { +if (path.exists()) { + systemIncludes.from path +} + } + + compilerArgs.add '-fPIC' Review comment: Just curious, shall we also modify the compiler args when it’s on Windows, to be the same with what's used before? https://github.com/apache/lucene-solr/blob/ec9a659845973a0dd0ee7c04e0075db818ed118d/lucene/misc/src/java/org/apache/lucene/store/WindowsDirectory.java#L31-L35 A quick search shows that some of these flags might be specific to MinGW compiler though, so I'm not sure if these flags are still relevant. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9590) Add javadoc for Lucene86PointsFormat class
[ https://issues.apache.org/jira/browse/LUCENE-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228980#comment-17228980 ] Lu Xugang commented on LUCENE-9590: --- here the link:https://www.amazingkoala.com.cn/Lucene_Document/IndexFile/2020/1104/175.html > Add javadoc for Lucene86PointsFormat class > --- > > Key: LUCENE-9590 > URL: https://issues.apache.org/jira/browse/LUCENE-9590 > Project: Lucene - Core > Issue Type: Wish > Components: core/codecs >Reporter: Lu Xugang >Priority: Minor > Attachments: 1.png > > > I would like to add javadoc for Lucene86PointsFormat class, it is really > helpful for source reader to understand the data structure with point value, > is anyone doing this or plan? > The attachment list part of the data structure (filled with color means it > has sub data structure) > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org