date:20201109

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-09 Thread GitBox



dweiss commented on a change in pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#discussion_r519618160



##
File path: lucene/native/build.gradle
##
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This gets separated out from misc module into a native module due to 
incompatibility between cpp-library and java-library plugins.
+ * For details, please see 
https://github.com/gradle/gradle-native/issues/352#issuecomment-461724948
+ */
+apply plugin: 'cpp-library'
+
+description = 'Module for native code'
+
+library {
+baseName = 'NativePosixUtil'
+
+privateHeaders.from file(System.getProperty('java.home') + '/include')
+privateHeaders.from file(System.getProperty('java.home') + 
'/include/darwin')
+privateHeaders.from file(System.getProperty('java.home') + 
'/../include/solaris')

Review comment:
   These paths are wrong, I think. I don't know where they came from but 
they should correspond to the layout of newer JDK distributions (openjdk). 
   
   I'm not sure Solaris is still needed - I can't test it on Solaris.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-09 Thread GitBox



dweiss commented on pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-723842876


   bq. The windows equivalentWindowsDirectory.cpp still sits in misc module and 
hasn't been moved over yet.
   
   Ah... I didn't realize this is the case then - sorry. Give me a day or two, 
I'll try to make those files compile under Windows and maybe we can do a clean 
patch that moves everything.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



bruno-roustant commented on a change in pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r519684814



##
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##
@@ -490,21 +492,19 @@ public CoreDescriptor getCoreDescriptor(String coreName) {
   }
 
   /**
-   * Get the CoreDescriptors for every SolrCore managed here
-   * @return a List of CoreDescriptors
+   * Get the CoreDescriptors for every {@link SolrCore} managed here 
(permanent and transient, loaded and unloaded).
+   *
+   * @return An unordered list copy. This list can be modified by the caller 
(e.g. sorted).

Review comment:
   I'll use ArrayList.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-09 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228512#comment-17228512
 ] 

Andrzej Bialecki commented on SOLR-14683:
-

{quote}Solr's JSON Response writer already has long standing support to output 
{{Float.NaN}} as a quoted string {{"NaN"}}
{quote}
Therein lies the problem ;) since there is no standard way to do it Solr 
decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries 
(and the popular extended spec [http://json5.org|http://json5.org%29/]) use 
unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, 
inf, -inf}} - that's the beauty of standards, there are so many of them to 
choose from... /s

Taking all this into account returning {{null}} for NaN or undefined seems like 
the safest option.

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-09 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228512#comment-17228512
 ] 

Andrzej Bialecki edited comment on SOLR-14683 at 11/9/20, 10:57 AM:


{quote}Solr's JSON Response writer already has long standing support to output 
{{Float.NaN}} as a quoted string {{"NaN"}}
{quote}
Therein lies the problem ;) since there is no standard way to do it Solr 
decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries 
(and the popular extended spec [http://json5.org|http://json5.org%29/]) use 
unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, 
inf, -inf}} - that's the beauty of standards, there are so many of them to 
choose from... /s

Taking all this into account serializing NaN as {{null}} seems like the safest 
option, unless we add this configurability to our JSONWriter.

Also, since from the point of view of metrics it seems it conveys the same 
message when it returns NaN or null when the value is unknown - so for 
simplicity and easier compatibility we could always return {{null}} as a metric 
value, regardless of how it's serialized.


was (Author: ab):
{quote}Solr's JSON Response writer already has long standing support to output 
{{Float.NaN}} as a quoted string {{"NaN"}}
{quote}
Therein lies the problem ;) since there is no standard way to do it Solr 
decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries 
(and the popular extended spec [http://json5.org|http://json5.org%29/]) use 
unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, 
inf, -inf}} - that's the beauty of standards, there are so many of them to 
choose from... /s

Taking all this into account returning {{null}} for NaN or undefined seems like 
the safest option.

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-09 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228512#comment-17228512
 ] 

Andrzej Bialecki edited comment on SOLR-14683 at 11/9/20, 10:58 AM:


{quote}Solr's JSON Response writer already has long standing support to output 
{{Float.NaN}} as a quoted string {{"NaN"}}
{quote}
Therein lies the problem ;) since there is no standard way to do it Solr 
decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries 
(and the popular extended spec [http://json5.org|http://json5.org%29/]) use 
unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, 
inf, -inf}} - that's the beauty of standards, there are so many of them to 
choose from... /s

Taking all this into account serializing NaN as {{null}} seems like the safest 
option, unless we add this configurability to our JSONWriter.

Also, from the point of view of metrics it seems it conveys the same message 
when it returns NaN or null when the value is unknown - so for simplicity and 
easier compatibility we could always return {{null}} as a metric value, 
regardless of how it's serialized.


was (Author: ab):
{quote}Solr's JSON Response writer already has long standing support to output 
{{Float.NaN}} as a quoted string {{"NaN"}}
{quote}
Therein lies the problem ;) since there is no standard way to do it Solr 
decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries 
(and the popular extended spec [http://json5.org|http://json5.org%29/]) use 
unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, 
inf, -inf}} - that's the beauty of standards, there are so many of them to 
choose from... /s

Taking all this into account serializing NaN as {{null}} seems like the safest 
option, unless we add this configurability to our JSONWriter.

Also, since from the point of view of metrics it seems it conveys the same 
message when it returns NaN or null when the value is unknown - so for 
simplicity and easier compatibility we could always return {{null}} as a metric 
value, regardless of how it's serialized.

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14969) Prevent creating multiple cores with the same name which leads to instabilities (race condition)

2020-11-09 Thread Andreas Hubold (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228529#comment-17228529
 ] 

Andreas Hubold commented on SOLR-14969:
---

I've tested concurrent core creation for branch_8x, and can confirm that the 
bug is fixed :) 
Thanks a lot for fixing this issue, [~erickerickson].

However, logging is quite ugly. Maybe it would had been better to throw a 
SolrException with ErrorCode.CONFLICT in case of concurrent core creation, and 
not the currently used ErrorCode.SERVER_ERROR. If I understand logging in 
RequestHandlerBase correctly, this would avoid the ERROR message with stack 
trace that appears in addition to the WARN message. FYI, maybe this is still 
worth changing.
{noformat}
2020-11-09 11:10:03.104 WARN  (qtp1033348658-69) [   
x:test-0.5753008886962022-71] o.a.s.c.CoreContainer Already creating a core 
with name 'test-0.5753008886962022-71', call aborted '
2020-11-09 11:10:03.104 ERROR (qtp1033348658-126) [   
x:test-0.5753008886962022-71] o.a.s.s.HttpSolrCall 
null:org.apache.solr.common.SolrException: Already creating a core with name 
'test-0.5753008886962022-71', call aborted '
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1284)
at 
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95)
at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
at 
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)
 
...{noformat}
 

> Prevent creating multiple cores with the same name which leads to 
> instabilities (race condition)
> 
>
> Key: SOLR-14969
> URL: https://issues.apache.org/jira/browse/SOLR-14969
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: multicore
>Affects Versions: 8.6, 8.6.3
>Reporter: Andreas Hubold
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.8
>
> Attachments: CmCoreAdminHandler.java
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> CoreContainer#create does not correctly handle concurrent requests to create 
> the same core. There's a race condition (see also existing TODO comment in 
> the code), and CoreContainer#createFromDescriptor may be called subsequently 
> for the same core name.
> The _second call_ then fails to create an IndexWriter, and exception handling 
> causes an inconsistent CoreContainer state.
> {noformat}
> 2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [   ] 
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error 
> CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core 
> [blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual 
> machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312)
>  at 
> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95)
>  at 
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
> ...
> Caused by: org.apache.solr.common.SolrException: Unable to create core 
> [blueprint_acgqqafsogyc_comments]
>  at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408)
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273)
>  ... 47 more
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
>  at org.apache.solr.core.SolrCore.(SolrCore.java:1071)
>  at org.apache.solr.core.SolrCore.(SolrCore.java:906)
>  at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387)
>  ... 48 more
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
>  at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184)
>  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308)
>  at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130)
>  at org.apache.solr.core.SolrCore.(SolrCore.java:1012)
>  ... 50 more
> Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by 
> this virtual machine: 
> /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
>  at 
> org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139)
>  at 
> org.apache.lu

[GitHub] [lucene-solr] bruno-roustant commented on pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



bruno-roustant commented on pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#issuecomment-723960636


   Ok, now I think I integrated all comments.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] ErickErickson commented on pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



ErickErickson commented on pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#issuecomment-723990436


   +1
   
   > On Nov 8, 2020, at 3:30 PM, David Smiley  wrote:
   > 
   > 
   > So @dsmiley @ErickErickson do we all agree that we throw an exception in 
in TransientSolrCoreCacheFactory.newInstance() line 60 and we never return null?
   > 
   > +1 definitely
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   > 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



dsmiley commented on a change in pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r519792336



##
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##
@@ -121,8 +101,8 @@ protected void close() {
 // make a copy of the cores then clear the map so the core isn't 
handed out to a request again
 coreList.addAll(cores.values());
 cores.clear();
-if (transientSolrCoreCache != null) {
-  coreList.addAll(transientSolrCoreCache.prepareForShutdown());
+if (transientSolrCoreCacheFactory != null) {

Review comment:
   Do we still need a null check here?  And why the factory vs the cache 
itself?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14969) Prevent creating multiple cores with the same name which leads to instabilities (race condition)

2020-11-09 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228572#comment-17228572
 ] 

ASF subversion and git services commented on SOLR-14969:


Commit be19432b750b94c4703ee7b19ef681ebf771a95a in lucene-solr's branch 
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=be19432 ]

SOLR-14969: Prevent creating multiple cores with the same name which leads to 
instabilities (race condition) changed error code


> Prevent creating multiple cores with the same name which leads to 
> instabilities (race condition)
> 
>
> Key: SOLR-14969
> URL: https://issues.apache.org/jira/browse/SOLR-14969
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: multicore
>Affects Versions: 8.6, 8.6.3
>Reporter: Andreas Hubold
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.8
>
> Attachments: CmCoreAdminHandler.java
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> CoreContainer#create does not correctly handle concurrent requests to create 
> the same core. There's a race condition (see also existing TODO comment in 
> the code), and CoreContainer#createFromDescriptor may be called subsequently 
> for the same core name.
> The _second call_ then fails to create an IndexWriter, and exception handling 
> causes an inconsistent CoreContainer state.
> {noformat}
> 2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [   ] 
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error 
> CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core 
> [blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual 
> machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312)
>  at 
> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95)
>  at 
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
> ...
> Caused by: org.apache.solr.common.SolrException: Unable to create core 
> [blueprint_acgqqafsogyc_comments]
>  at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408)
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273)
>  ... 47 more
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
>  at org.apache.solr.core.SolrCore.(SolrCore.java:1071)
>  at org.apache.solr.core.SolrCore.(SolrCore.java:906)
>  at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387)
>  ... 48 more
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
>  at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184)
>  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308)
>  at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130)
>  at org.apache.solr.core.SolrCore.(SolrCore.java:1012)
>  ... 50 more
> Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by 
> this virtual machine: 
> /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
>  at 
> org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139)
>  at 
> org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41)
>  at 
> org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45)
>  at 
> org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105)
>  at org.apache.lucene.index.IndexWriter.(IndexWriter.java:785)
>  at 
> org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:126)
>  at 
> org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100)
>  at 
> org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:261)
>  at 
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:135)
>  at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2145) 
> {noformat}
> CoreContainer#createFromDescriptor removes the CoreDescriptor when handling 
> this exception. The SolrCore created for the first successful call is still 
> registered in SolrCores.cores, but now there's no corresponding 
> CoreDescriptor for that name anymore.
> This inconsistency leads to subsequent NullPointerExceptions, for example 
> when using CoreAdmin STATUS with the core name: 
> CoreAdminOperation#getCoreStatus first

[jira] [Commented] (SOLR-14969) Prevent creating multiple cores with the same name which leads to instabilities (race condition)

2020-11-09 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228583#comment-17228583
 ] 

ASF subversion and git services commented on SOLR-14969:


Commit 91ef1c0fe8854db04e42b9095437b3186ff8038e in lucene-solr's branch 
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=91ef1c0 ]

SOLR-14969: Prevent creating multiple cores with the same name which leads to 
instabilities (race condition) changed error code

(cherry picked from commit be19432b750b94c4703ee7b19ef681ebf771a95a)


> Prevent creating multiple cores with the same name which leads to 
> instabilities (race condition)
> 
>
> Key: SOLR-14969
> URL: https://issues.apache.org/jira/browse/SOLR-14969
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: multicore
>Affects Versions: 8.6, 8.6.3
>Reporter: Andreas Hubold
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.8
>
> Attachments: CmCoreAdminHandler.java
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> CoreContainer#create does not correctly handle concurrent requests to create 
> the same core. There's a race condition (see also existing TODO comment in 
> the code), and CoreContainer#createFromDescriptor may be called subsequently 
> for the same core name.
> The _second call_ then fails to create an IndexWriter, and exception handling 
> causes an inconsistent CoreContainer state.
> {noformat}
> 2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [   ] 
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error 
> CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core 
> [blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual 
> machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312)
>  at 
> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95)
>  at 
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
> ...
> Caused by: org.apache.solr.common.SolrException: Unable to create core 
> [blueprint_acgqqafsogyc_comments]
>  at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408)
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273)
>  ... 47 more
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
>  at org.apache.solr.core.SolrCore.(SolrCore.java:1071)
>  at org.apache.solr.core.SolrCore.(SolrCore.java:906)
>  at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387)
>  ... 48 more
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
>  at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184)
>  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308)
>  at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130)
>  at org.apache.solr.core.SolrCore.(SolrCore.java:1012)
>  ... 50 more
> Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by 
> this virtual machine: 
> /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
>  at 
> org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139)
>  at 
> org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41)
>  at 
> org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45)
>  at 
> org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105)
>  at org.apache.lucene.index.IndexWriter.(IndexWriter.java:785)
>  at 
> org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:126)
>  at 
> org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100)
>  at 
> org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:261)
>  at 
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:135)
>  at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2145) 
> {noformat}
> CoreContainer#createFromDescriptor removes the CoreDescriptor when handling 
> this exception. The SolrCore created for the first successful call is still 
> registered in SolrCores.cores, but now there's no corresponding 
> CoreDescriptor for that name anymore.
> This inconsistency leads to subsequent NullPointerExceptions, for example 
> when using CoreA

[jira] [Commented] (SOLR-14969) Prevent creating multiple cores with the same name which leads to instabilities (race condition)

2020-11-09 Thread Andreas Hubold (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228593#comment-17228593
 ] 

Andreas Hubold commented on SOLR-14969:
---

Works, no logged ERROR messages anymore. Thank you.

> Prevent creating multiple cores with the same name which leads to 
> instabilities (race condition)
> 
>
> Key: SOLR-14969
> URL: https://issues.apache.org/jira/browse/SOLR-14969
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: multicore
>Affects Versions: 8.6, 8.6.3
>Reporter: Andreas Hubold
>Assignee: Erick Erickson
>Priority: Major
> Fix For: 8.8
>
> Attachments: CmCoreAdminHandler.java
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> CoreContainer#create does not correctly handle concurrent requests to create 
> the same core. There's a race condition (see also existing TODO comment in 
> the code), and CoreContainer#createFromDescriptor may be called subsequently 
> for the same core name.
> The _second call_ then fails to create an IndexWriter, and exception handling 
> causes an inconsistent CoreContainer state.
> {noformat}
> 2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [   ] 
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error 
> CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core 
> [blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual 
> machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312)
>  at 
> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95)
>  at 
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
> ...
> Caused by: org.apache.solr.common.SolrException: Unable to create core 
> [blueprint_acgqqafsogyc_comments]
>  at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408)
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273)
>  ... 47 more
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
>  at org.apache.solr.core.SolrCore.(SolrCore.java:1071)
>  at org.apache.solr.core.SolrCore.(SolrCore.java:906)
>  at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387)
>  ... 48 more
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
>  at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184)
>  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308)
>  at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130)
>  at org.apache.solr.core.SolrCore.(SolrCore.java:1012)
>  ... 50 more
> Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by 
> this virtual machine: 
> /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
>  at 
> org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139)
>  at 
> org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41)
>  at 
> org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45)
>  at 
> org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105)
>  at org.apache.lucene.index.IndexWriter.(IndexWriter.java:785)
>  at 
> org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:126)
>  at 
> org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100)
>  at 
> org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:261)
>  at 
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:135)
>  at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2145) 
> {noformat}
> CoreContainer#createFromDescriptor removes the CoreDescriptor when handling 
> this exception. The SolrCore created for the first successful call is still 
> registered in SolrCores.cores, but now there's no corresponding 
> CoreDescriptor for that name anymore.
> This inconsistency leads to subsequent NullPointerExceptions, for example 
> when using CoreAdmin STATUS with the core name: 
> CoreAdminOperation#getCoreStatus first gets the non-null SolrCore 
> (cores.getCore(cname)) but core.getInstancePath() throws an NPE, because the 
> CoreDescriptor is not registered anymore:
> {noformat}
> 2020-10-27 00:29:25.353 INFO  (qtp2029754983-19) [   ] o.a.s.s.HttpSolrCall 
> [admin] webapp=null path=/admin/cores 
> para

[jira] [Created] (SOLR-14991) tag and remove obsolete branches

2020-11-09 Thread Erick Erickson (Jira)

Erick Erickson created SOLR-14991:
-

 Summary: tag and remove obsolete branches
 Key: SOLR-14991
 URL: https://issues.apache.org/jira/browse/SOLR-14991
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Erick Erickson
Assignee: Erick Erickson


I'm going to gradually work through the branches, tagging and removing

1> anything with a Jira name that's fixed

2> anything that I'm certain will never be fixed (e.g. the various gradle build 
branches)

So the changes will still available, they just won't pollute the branch list.

I'll list the branches here, all the tags will be

history/branches/lucene-solr/

 

This specifically will _not_ include

1> any release, e.g. branch_8_4

2> anything I'm unsure about. People who've created branches should expect some 
pings about this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-09 Thread GitBox



mikemccand commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r519868079



##
File path: lucene/misc/src/java/org/apache/lucene/store/DirectIODirectory.java
##
@@ -66,45 +66,32 @@
  *
  * @lucene.experimental
  */
-public class NativeUnixDirectory extends FSDirectory {
+public class DirectIODirectory extends FSDirectory {
 
   // TODO: this is OS dependent, but likely 512 is the LCD
   private final static long ALIGN = 512;
   private final static long ALIGN_NOT_MASK = ~(ALIGN-1);
-  
-  /** Default buffer size before writing to disk (256 KB);
-   *  larger means less IO load but more RAM and direct
-   *  buffer storage space consumed during merging. */
-
-  public final static int DEFAULT_MERGE_BUFFER_SIZE = 262144;
 
   /** Default min expected merge size before direct IO is
*  used (10 MB): */
   public final static long DEFAULT_MIN_BYTES_DIRECT = 10*1024*1024;
 
-  private final int mergeBufferSize;
   private final long minBytesDirect;
   private final Directory delegate;
 
   /** Create a new NIOFSDirectory for the named location.
* 
* @param path the path of the directory
-   * @param lockFactory to use
-   * @param mergeBufferSize Size of buffer to use for
-   *merging.  See {@link #DEFAULT_MERGE_BUFFER_SIZE}.
* @param minBytesDirect Merges, or files to be opened for
*   reading, smaller than this will
*   not use direct IO.  See {@link
*   #DEFAULT_MIN_BYTES_DIRECT}
+   * @param lockFactory to use
* @param delegate fallback Directory for non-merges
* @throws IOException If there is a low-level I/O error
*/
-  public NativeUnixDirectory(Path path, int mergeBufferSize, long 
minBytesDirect, LockFactory lockFactory, Directory delegate) throws IOException 
{
+  public DirectIODirectory(Path path, long minBytesDirect, LockFactory 
lockFactory, Directory delegate) throws IOException {
 super(path, lockFactory);
-if ((mergeBufferSize & ALIGN) != 0) {
-  throw new IllegalArgumentException("mergeBufferSize must be 0 mod " + 
ALIGN + " (got: " + mergeBufferSize + ")");
-}
-this.mergeBufferSize = mergeBufferSize;

Review comment:
   Hmm but previously it was a 256 KB buffer, by default, and caller could 
change that if they wanted.
   
   But with this change, it's now hardwired to something much smaller (512 
bytes, or 1 or 4 KB; I'm not sure what "typical" filesystem block sizes are 
now?).
   
   This buffering, and its size, is really important when using direct IO 
because every write will go straight to the device, so a larger buffer 
amortizes the cost of such writes.  I think we need to keep the option for 
caller to set this buffer size, and leave it at the 256 KB default?  Or at 
least, let's not try to change that behavior here, and leave this change 100% 
focused on moving to pure java implementation?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2022: LUCENE-9004: KNN vector search using NSW graphs

2020-11-09 Thread GitBox



mikemccand commented on a change in pull request #2022:
URL: https://github.com/apache/lucene-solr/pull/2022#discussion_r519873455



##
File path: lucene/core/src/java/org/apache/lucene/index/VectorValues.java
##
@@ -74,6 +74,18 @@ public BytesRef binaryValue() throws IOException {
 throw new UnsupportedOperationException();
   }
 
+  /**
+   * Return the k nearest neighbor documents as determined by comparison of 
their vector values
+   * for this field, to the given vector, by the field's search strategy. If 
the search strategy is
+   * reversed, lower values indicate nearer vectors, otherwise higher scores 
indicate nearer
+   * vectors. Unlike relevance scores, vector scores may be negative.
+   * @param target the vector-valued query
+   * @param k  the number of docs to return
+   * @param fanout control the accuracy/speed tradeoff - larger values give 
better recall at higher cost

Review comment:
   > Don't Codecs get created automatically using no-args constructors and 
service autodiscovery? 
   
   They do at read (search) time!  But at write time, you can pass parameters 
that alter how the Codec does its work, as long as the resulting index is then 
readable at search time with no-args constructors.
   
   I vaguely remember talking about having ways for Codec at read-time to also 
take options, but I'm not sure that was ever fully designed / pushed ... 
@s1monw may remember?
   
   > But I'm reluctant to expose hnsw-specific hyperparameters in VectorField, 
which we want to support other algorithms as well.
   > Might be a good use case for generic IndexedField.attributes?
   
   Yeah, maybe?  I agree it is not obvious where the API should live and how it 
then finds its way into the ANN data structure construction when writing each 
segment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-09 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228642#comment-17228642
 ] 

Michael McCandless commented on LUCENE-9378:


{quote}IMO this is so definitely not a "Minor" matter so I changed it to our 
default of Major.
{quote}
+1, thanks [~dsmiley].

This is a major issue for us at Amazon – we are now running the catalog search 
using a custom Codec that forces (reverts) the whole {{BinaryDocValues}} 
writing/reading back to before LUCENE-9211, which is not really a comfortable 
long-term solution.

I am hoping that the [ideas being discussed in the 
PR|https://github.com/apache/lucene-solr/pull/1543] lead to an acceptable 
solution.  I think whether or not {{BinaryDocValues}} fields should be 
compressed will be very application dependent.  Some applications care greatly 
about the size of the index, and can accept a small hit to search time 
performance, but for others (like Amazon's!) it is the opposite.

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.

2020-11-09 Thread GitBox



mikemccand commented on pull request #1543:
URL: https://github.com/apache/lucene-solr/pull/1543#issuecomment-724069358


   > > But then I wonder why not just add a boolean compress option to 
Lucene80DocValuesFormat? This is similar to the compression Mode we pass to 
stored fields and term vectors format at write time, and it'd allow users who 
would like to disable BINARY doc values compression to keep backwards 
compatibility.
   > 
   > I wanted to look into whether we could avoid this as it would boil down to 
maintaining two doc-value formats, but this might be the best way forward as it 
looks like the heuristics we tried out above don't work well to disable 
compression for use-cases when it hurts more than it helps.
   
   +1.
   
   I'm afraid whether compression is a good idea for BDV or not is a very 
application specific tradeoff.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2022: LUCENE-9004: KNN vector search using NSW graphs

2020-11-09 Thread GitBox



mikemccand commented on a change in pull request #2022:
URL: https://github.com/apache/lucene-solr/pull/2022#discussion_r519881970



##
File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java
##
@@ -0,0 +1,235 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.hnsw;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Random;
+import java.util.Set;
+import java.util.TreeSet;
+
+import org.apache.lucene.index.KnnGraphValues;
+import org.apache.lucene.index.VectorValues;
+
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
+import static org.apache.lucene.util.VectorUtil.dotProduct;
+import static org.apache.lucene.util.VectorUtil.squareDistance;
+
+/**
+ * Navigable Small-world graph. Provides efficient approximate nearest neighbor
+ * search for high dimensional vectors.  See https://doi.org/10.1016/j.is.2013.10.006";>Approximate nearest
+ * neighbor algorithm based on navigable small world graphs [2014] and https://arxiv.org/abs/1603.09320";>this paper [2018] for details.
+ *
+ * This implementation is actually more like the one in the same authors' 
earlier 2014 paper in that
+ * there is no hierarchy (just one layer), and no fanout restriction on the 
graph: nodes are allowed to accumulate
+ * an unbounded number of outbound links, but it does incorporate some of the 
innovations of the later paper, like
+ * using a priority queue to perform a beam search while traversing the graph. 
The nomenclature is a bit different
+ * here from what's used in those papers:
+ *
+ * Hyperparameters
+ * 
+ *   numSeed is the equivalent of m in the 2012 
paper; it controls the number of random entry points to sample.
+ *   beamWidth in {@link HnswGraphBuilder} has the same 
meaning as efConst in the 2016 paper. It is the number of
+ *   nearest neighbor candidates to track while searching the graph for each 
newly inserted node.
+ *   maxConn has the same meaning as M in the 
later paper; it controls how many of the efConst neighbors are
+ *   connected to the new node
+ *   fanout the fanout parameter of {@link 
VectorValues#search(float[], int, int)}
+ *   is used to control the values of numSeed and 
topK that are passed to this API.
+ *   Thus fanout is like a combination of ef (search 
beam width) from the 2016 paper and m from the 2014 paper.
+ *   
+ * 
+ *
+ * Note: The graph may be searched by multiple threads concurrently, but 
updates are not thread-safe. Also note: there is no notion of
+ * deletions. Document searching built on top of this must do its own 
deletion-filtering.
+ */
+public final class HnswGraph {
+
+  // each entry lists the neighbors of a node, in node order
+  private final List> graph;
+
+  HnswGraph() {

Review comment:
   Yeah, +1 for fast follow!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-09 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228665#comment-17228665
 ] 

Adrien Grand commented on LUCENE-9378:
--

I'll be looking into it for 8.8.

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov merged pull request #2037: LUCENE-9583: extract separate RandomAccessVectorValues interface

2020-11-09 Thread GitBox



msokolov merged pull request #2037:
URL: https://github.com/apache/lucene-solr/pull/2037


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

2020-11-09 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228677#comment-17228677
 ] 

ASF subversion and git services commented on LUCENE-9583:
-

Commit 8be0cea5442c2edab260d0598b920ba832506f80 in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8be0cea ]

LUCENE-9583: extract separate RandomAccessVectorValues interface (#2037)



> How should we expose VectorValues.RandomAccess?
> ---
>
> Key: LUCENE-9583
> URL: https://issues.apache.org/jira/browse/LUCENE-9583
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} 
> sub-interface. [~jtibshirani] pointed out this is not needed by some 
> vector-indexing strategies which can operate solely using a forward-iterator 
> (it is needed by HNSW), and so in the interest of simplifying the public API 
> we should not expose this internal detail (which by the way surfaces internal 
> ordinals that are somewhat uninteresting outside the random access API).
> I looked into how to move this inside the HNSW-specific code and remembered 
> that we do also currently make use of the RA API when merging vector fields 
> over sorted indexes. Without it, we would need to load all vectors into RAM  
> while flushing/merging, as we currently do in 
> {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost 
> for the simpler API.
> Another thing I noticed while reviewing this is that I moved the KNN 
> {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}} 
>  to {{VectorValues.RandomAccess}}. This I think we could move back, and 
> handle the HNSW requirements for search elsewhere. I wonder if that would 
> alleviate the major concern here? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2022: LUCENE-9004: KNN vector search using NSW graphs

2020-11-09 Thread GitBox



mikemccand commented on a change in pull request #2022:
URL: https://github.com/apache/lucene-solr/pull/2022#discussion_r519895947



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90VectorReader.java
##
@@ -22,39 +22,60 @@
 import java.nio.FloatBuffer;
 import java.util.HashMap;
 import java.util.Map;
+import java.util.Random;
 
 import org.apache.lucene.codecs.CodecUtil;
 import org.apache.lucene.codecs.VectorReader;
 import org.apache.lucene.index.CorruptIndexException;
 import org.apache.lucene.index.FieldInfo;
 import org.apache.lucene.index.FieldInfos;
 import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.KnnGraphValues;
+import org.apache.lucene.index.RandomAccessVectorValues;
+import org.apache.lucene.index.RandomAccessVectorValuesProducer;
 import org.apache.lucene.index.SegmentReadState;
 import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.search.ScoreDoc;
 import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.search.TotalHits;
 import org.apache.lucene.store.ChecksumIndexInput;
+import org.apache.lucene.store.DataInput;
 import org.apache.lucene.store.IndexInput;
 import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.IOUtils;
 import org.apache.lucene.util.RamUsageEstimator;
+import org.apache.lucene.util.hnsw.HnswGraph;
+import org.apache.lucene.util.hnsw.Neighbor;
+import org.apache.lucene.util.hnsw.Neighbors;
+
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
 
 /**
- * Reads vectors from the index segments.
+ * Reads vectors from the index segments along with index data structures 
supporting KNN search.
  * @lucene.experimental
  */
 public final class Lucene90VectorReader extends VectorReader {
 
   private final FieldInfos fieldInfos;
   private final Map fields = new HashMap<>();
   private final IndexInput vectorData;
-  private final int maxDoc;
+  private final IndexInput vectorIndex;
+  private final long checksumSeed;
 
   Lucene90VectorReader(SegmentReadState state) throws IOException {
 this.fieldInfos = state.fieldInfos;
-this.maxDoc = state.segmentInfo.maxDoc();
 
-String metaFileName = 
IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, 
Lucene90VectorFormat.META_EXTENSION);
+int versionMeta = readMetadata(state, Lucene90VectorFormat.META_EXTENSION);
+long[] checksumRef = new long[1];
+vectorData = openDataInput(state, versionMeta, 
Lucene90VectorFormat.VECTOR_DATA_EXTENSION, 
Lucene90VectorFormat.VECTOR_DATA_CODEC_NAME, checksumRef);
+vectorIndex = openDataInput(state, versionMeta, 
Lucene90VectorFormat.VECTOR_INDEX_EXTENSION, 
Lucene90VectorFormat.VECTOR_INDEX_CODEC_NAME, checksumRef);
+checksumSeed = checksumRef[0];
+  }
+
+  private int readMetadata(SegmentReadState state, String fileExtension) 
throws IOException {
+String metaFileName = 
IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, 
fileExtension);
 int versionMeta = -1;
+long checksum = -1;

Review comment:
   Hmm is this unused?

##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90VectorReader.java
##
@@ -277,24 +351,46 @@ public long cost() {
 }
 
 @Override
-public RandomAccess randomAccess() {
+public RandomAccessVectorValues randomAccess() {
   return new OffHeapRandomAccess(dataIn.clone());
 }
 
+@Override
+public TopDocs search(float[] vector, int topK, int fanout) throws 
IOException {
+  // use a seed that is fixed for the index so we get reproducible results 
for the same query
+  final Random random = new Random(checksumSeed);

Review comment:
   Clever seed!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

2020-11-09 Thread Michael McCandless (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228681#comment-17228681
 ] 

Michael McCandless commented on LUCENE-9378:


{quote}I'll be looking into it for 8.8.
{quote}
+1, thank you [~jpountz]!

> Configurable compression for BinaryDocValues
> 
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Viral Gandhi
>Priority: Major
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png, 
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png, 
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png, 
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png, 
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps, 
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] asfgit closed pull request #972: SOLR-13452: Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.

2020-11-09 Thread GitBox



asfgit closed pull request #972:
URL: https://github.com/apache/lucene-solr/pull/972


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14991) tag and remove obsolete branches

2020-11-09 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228687#comment-17228687
 ] 

Erick Erickson commented on SOLR-14991:
---

I just did the grade branches, with the exception of 
*reference_impl_gradle_updates*. I'll wait until tomorrow to do any more to see 
if anyone sees any problems so far.

> tag and remove obsolete branches
> 
>
> Key: SOLR-14991
> URL: https://issues.apache.org/jira/browse/SOLR-14991
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
>
> I'm going to gradually work through the branches, tagging and removing
> 1> anything with a Jira name that's fixed
> 2> anything that I'm certain will never be fixed (e.g. the various gradle 
> build branches)
> So the changes will still available, they just won't pollute the branch list.
> I'll list the branches here, all the tags will be
> history/branches/lucene-solr/
>  
> This specifically will _not_ include
> 1> any release, e.g. branch_8_4
> 2> anything I'm unsure about. People who've created branches should expect 
> some pings about this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant commented on pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



bruno-roustant commented on pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#issuecomment-724113057


   @dsmiley I pushed a new commit to have a NO OP TransientSolrCacheFactory 
before SolrCores.load() is called.
   This is to have your opinion, whether to keep it or remove it.
   If SolrCores is used before calling SolrCores.load() it results in a 
SolrException in getTransientCacheHandler(). This may happen for current users 
that previously didn't have to call load() at the very beginning after 
SolrCores creation. Do you think it could be a behavior backward 
incompatibility?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-09 Thread Andrzej Bialecki (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14683:

Attachment: SOLR-14683.patch

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-14683.patch
>
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0

2020-11-09 Thread Kevin Risden (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228695#comment-17228695
 ] 

Kevin Risden commented on SOLR-14951:
-

Sigh I missed merging it. I just saw the review and will get it merged soon.

> Upgrade Angular JS 1.7.9 to 1.8.0
> -
>
> Key: SOLR-14951
> URL: https://issues.apache.org/jira/browse/SOLR-14951
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Angular JS released 1.8.0 to fix some security vulnerabilities. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0

2020-11-09 Thread Kevin Risden (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14951:

Fix Version/s: 8.8

> Upgrade Angular JS 1.7.9 to 1.8.0
> -
>
> Key: SOLR-14951
> URL: https://issues.apache.org/jira/browse/SOLR-14951
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Angular JS released 1.8.0 to fix some security vulnerabilities. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-09 Thread GitBox



cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r519944462



##
File path: solr/solr-ref-guide/src/learning-to-rank.adoc
##
@@ -247,6 +254,81 @@ The output XML will include feature values as a 
comma-separated list, resembling
   }}
 
 
+=== Running a Rerank Query Interleaving Two Models
+
+To rerank the results of a query, interleaving two models (myModelA, myModelB) 
add the `rq` parameter to your search, passing two models in input, for example:
+
+[source,text]
+http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA 
model=myModelB reRankDocs=100}&fl=id,score
+
+To obtain the model that interleaving picked for a search result, computed 
during reranking, add `[interleaving]` to the `fl` parameter, for example:

Review comment:
   question: if myModelA had `[ doc1, doc2, doc3 ]` document order and 
myModelB had `[ doc1, doc3, doc2 ]` document order i.e. there was agreement 
between the models re: the first document, will `[interleaving]` return (1) 
randomly `myModelA` or `myModelB` depending on how the picking actually 
happened or will it return (2) something else e.g. `myModelA,myModelB` (if 
myModelA actually picked and myModelB agreed) or `myModelB,myModelA` (if 
myModelB actually picked and myModelA agreed) or will it return (3) neither 
since in a way neither of them picked the document since they both agreed on it?
   
   answer-ish: from recalling the implementation the answer is (1) i think 
though from a user's perspective perhaps it might be nice here to clarify here 
somehow around that? a subtle aspect being (if i understand things right) that 
`[features]` and `[interleaving]` could both be requested in the `fl` and 
whilst myModelA and myModelB might have agreed that `doc1` should be the first 
document they might have used very different features to arrived at that 
conclusion and their `score` value could also differ.

##
File path: solr/solr-ref-guide/src/learning-to-rank.adoc
##
@@ -247,6 +254,81 @@ The output XML will include feature values as a 
comma-separated list, resembling
   }}
 
 
+=== Running a Rerank Query Interleaving Two Models
+
+To rerank the results of a query, interleaving two models (myModelA, myModelB) 
add the `rq` parameter to your search, passing two models in input, for example:
+
+[source,text]
+http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA 
model=myModelB reRankDocs=100}&fl=id,score
+
+To obtain the model that interleaving picked for a search result, computed 
during reranking, add `[interleaving]` to the `fl` parameter, for example:
+
+[source,text]
+http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA 
model=myModelB reRankDocs=100}&fl=id,score,[interleaving]
+
+The output XML will include the model picked for each search result, 
resembling the output shown here:
+
+[source,json]
+
+{
+  "responseHeader":{
+"status":0,
+"QTime":0,
+"params":{
+  "q":"test",
+  "fl":"id,score,[interleaving]",
+  "rq":"{!ltr model=myModelA model=myModelB reRankDocs=100}"}},
+  "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
+  {
+"id":"GB18030TEST",
+"score":1.0005897,
+"[interleaving]":"myModelB"},
+  {
+"id":"UTF8TEST",
+"score":0.79656565,
+"[interleaving]":"myModelA"}]
+  }}
+
+
+=== Running a Rerank Query Interleaving a model with the original ranking
+When approaching Search Quality Evaluation with interleaving it may be useful 
to compare a model with the original ranking. 
+To rerank the results of a query, interleaving a model with the original 
ranking, add the `rq` parameter to your search, with a model in input and 
activating the original ranking interleaving, for example:
+
+
+[source,text]
+http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel 
model=_OriginalRanking_ reRankDocs=100}&fl=id,score

Review comment:
   subjective: might `model=_OriginalRanking_ model=myModel` be more 
intuitive i.e. the 'from' baseline model on the left and the 'to' alternative 
model on the right? (i recall that the code had an "original ranking last" 
assumption before but if that's gone there's a possibility here to swap the 
order)

##
File path: solr/solr-ref-guide/src/learning-to-rank.adoc
##
@@ -418,6 +500,14 @@ Learning-To-Rank is a contrib module and therefore its 
plugins must be configure
 
 
 
+* Declaration of the `[interleaving]` transformer.
++
+[source,xml]
+
+
+

Review comment:
   minor/subjective: could shorten since there's no parameters
   
   ```
   
   ```

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-09 Thread GitBox



cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r519967315



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+/**
+ * Interleaving was introduced the first time by Joachims in [1, 2].
+ * Team Draft Interleaving is among the most successful and used interleaving 
approaches[3].
+ * Here the authors implement a method similar to the way in which captains 
select their players in team-matches.
+ * Team Draft Interleaving produces a fair distribution of ranking models’ 
elements in the final interleaved list.
+ * It has also proved to overcome an issue of the previous implemented 
approach, Balanced interleaving, in determining the winning model[4].

Review comment:
   ```suggestion
* "Team draft interleaving" has also proved to overcome an issue of the 
"Balanced interleaving" approach, in determining the winning model[4].
   ```
   
   Suggest to avoid the "previous implemented approach" wording since it could 
be misinterpreted to mean that Solr previously had a `BalancedInterleaving` 
class.

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+/**
+ * Interleaving was introduced the first time by Joachims in [1, 2].
+ * Team Draft Interleaving is among the most successful and used interleaving 
approaches[3].
+ * Here the authors implement a method similar to the way in which captains 
select their players in team-matches.

Review comment:
   ```suggestion
* Team Draft Interleaving implements a method similar to the way in which 
captains select their players in team-matches.
   ```

##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+imp

[jira] [Created] (SOLR-14992) TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures

2020-11-09 Thread Tomas Eduardo Fernandez Lobbe (Jira)

Tomas Eduardo Fernandez Lobbe created SOLR-14992:


 Summary: TestPullReplicaErrorHandling.testCantConnectToPullReplica 
Failures
 Key: SOLR-14992
 URL: https://issues.apache.org/jira/browse/SOLR-14992
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Tomas Eduardo Fernandez Lobbe


I've noticed this test started failing very frequently with an error like:
{noformat}
Error Message:
Error from server at http://127.0.0.1:39037/solr: Cannot create collection 
pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of 
maxShardsPerNode is 1, and the number of nodes currently live or live and part 
of your createNodeSet is 3. This allows a maximum of 3 to be created. Value of 
numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is 0 and value 
of pullReplicas is 1. This requires 4 shards to be created (higher than the 
allowed number)

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:39037/solr: Cannot create collection 
pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of 
maxShardsPerNode is 1, and the number of nodes currently live or live and part 
of your createNodeSet is 3. This allows a maximum of 3 to be created. Value of 
numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is 0 and value 
of pullReplicas is 1. This requires 4 shards to be created (higher than the 
allowed number)
at 
__randomizedtesting.SeedInfo.seed([3D670DC4BEABD958:3550EB0C6505ADD6]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at 
org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369)
at 
org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297)
at 
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1173)
at 
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934)
at 
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231)
at 
org.apache.solr.cloud.TestPullReplicaErrorHandling.testCantConnectToPullReplica(TestPullReplicaErrorHandling.java:149)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
at 
com.carrots

[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



muse-dev[bot] commented on a change in pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r519994863



##
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##
@@ -62,55 +51,44 @@
   // to essentially queue them up to be handled via pendingCoreOps.
   private static final List pendingCloses = new ArrayList<>();
 
-  private TransientSolrCoreCacheFactory transientCoreCache;
+  private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory = 
TransientSolrCoreCacheFactory.NO_OP;
 
-  private TransientSolrCoreCache transientSolrCoreCache = null;
-  
   SolrCores(CoreContainer container) {
 this.container = container;
   }
   
   protected void addCoreDescriptor(CoreDescriptor p) {
 synchronized (modifyLock) {
   if (p.isTransient()) {
-if (getTransientCacheHandler() != null) {
-  getTransientCacheHandler().addTransientDescriptor(p.getName(), p);
-} else {
-  log.warn("We encountered a core marked as transient, but there is no 
transient handler defined. This core will be inaccessible");
-}
+getTransientCacheHandler().addTransientDescriptor(p.getName(), p);
   } else {
-residentDesciptors.put(p.getName(), p);
+residentDescriptors.put(p.getName(), p);
   }
 }
   }
 
   protected void removeCoreDescriptor(CoreDescriptor p) {
 synchronized (modifyLock) {
   if (p.isTransient()) {
-if (getTransientCacheHandler() != null) {
-  getTransientCacheHandler().removeTransientDescriptor(p.getName());
-}
+getTransientCacheHandler().removeTransientDescriptor(p.getName());
   } else {
-residentDesciptors.remove(p.getName());
+residentDescriptors.remove(p.getName());
   }
 }
   }
 
   public void load(SolrResourceLoader loader) {
-transientCoreCache = TransientSolrCoreCacheFactory.newInstance(loader, 
container);
+transientSolrCoreCacheFactory = 
TransientSolrCoreCacheFactory.newInstance(loader, container);
   }
+
   // We are shutting down. You can't hold the lock on the various lists of 
cores while they shut down, so we need to
   // make a temporary copy of the names and shut them down outside the lock.
   protected void close() {
 waitForLoadingCoresToFinish(30*1000);
 Collection coreList = new ArrayList<>();
 
-
-TransientSolrCoreCache transientSolrCoreCache = getTransientCacheHandler();
-// Release observer
-if (transientSolrCoreCache != null) {
-  transientSolrCoreCache.close();
-}
+// Release transient core cache.
+getTransientCacheHandler().close();

Review comment:
   *THREAD_SAFETY_VIOLATION:*  Read/Write race. Non-private method 
`SolrCores.close()` indirectly reads without synchronization from 
`this.transientSolrCoreCacheFactory`. Potentially races with write in method 
`SolrCores.load(...)`.
Reporting because this access may occur on a background thread.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API

2020-11-09 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228742#comment-17228742
 ] 

ASF subversion and git services commented on LUCENE-9322:
-

Commit ec9a659845973a0dd0ee7c04e0075db818ed118d in lucene-solr's branch 
refs/heads/master from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ec9a659 ]

LUCENE-9322: fix minor cosmetic refactoring error in logging string in 
IndexWriter's infoStream logging. It was always printing 'vector values' for 
all merging times instead of the other parts of Lucene index ('doc values', 
'stored fields', etc.)


> Discussing a unified vectors format API
> ---
>
> Key: LUCENE-9322
> URL: https://issues.apache.org/jira/browse/LUCENE-9322
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Julie Tibshirani
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Two different approximate nearest neighbor approaches are currently being 
> developed, one based on HNSW (LUCENE-9004) and another based on coarse 
> quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to 
> handle vectors. In LUCENE-9136 we discussed the possibility of a unified API 
> that could support both approaches. The two ANN strategies give different 
> trade-offs in terms of speed, memory, and complexity, and it’s likely that 
> we’ll want to support both. Vector search is also an active research area, 
> and it would be great to be able to prototype and incorporate new approaches 
> without introducing more formats.
> To me it seems like a good time to begin discussing a unified API. The 
> prototype for coarse quantization 
> ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit 
> soon (this depends on everyone's feedback of course). The approach is simple 
> and shows solid search performance, as seen 
> [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326].
>  I think this API discussion is an important step in moving that 
> implementation forward.
> The goals of the API would be
> # Support for storing and retrieving individual float vectors.
> # Support for approximate nearest neighbor search -- given a query vector, 
> return the indexed vectors that are closest to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz opened a new pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.

2020-11-09 Thread GitBox



jpountz opened a new pull request #2069:
URL: https://github.com/apache/lucene-solr/pull/2069


   This adds a switch to `Lucene80DocValuesFormat` which allows to
   configure whether to prioritize retrieval speed over compression ratio
   or the other way around. When prioritizing retrieval speed, binary doc
   values are written using the exact same format as before more aggressive
   compression got introduced.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9602) TestBackwardsCompatibility should test BEST_COMPRESSION

2020-11-09 Thread Adrien Grand (Jira)

Adrien Grand created LUCENE-9602:


 Summary: TestBackwardsCompatibility should test BEST_COMPRESSION
 Key: LUCENE-9602
 URL: https://issues.apache.org/jira/browse/LUCENE-9602
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand


Currently we only test for backward compatibility indices created with 
BEST_SPEED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.

2020-11-09 Thread GitBox



mikemccand commented on a change in pull request #2069:
URL: https://github.com/apache/lucene-solr/pull/2069#discussion_r520024634



##
File path: 
lucene/core/src/test/org/apache/lucene/codecs/lucene80/BaseLucene80DocValuesFormatTestCase.java
##
@@ -286,7 +278,7 @@ private void doTestTermsEnumRandom(int numDocs, 
Supplier valuesProducer)
 conf.setMergeScheduler(new SerialMergeScheduler());
 // set to duel against a codec which has ordinals:
 final PostingsFormat pf = TestUtil.getPostingsFormatWithOrds(random());
-final DocValuesFormat dv = new Lucene80DocValuesFormat();
+final DocValuesFormat dv = getCodec().docValuesFormat();

Review comment:
   Will this randomize between the different `Mode` tradeoffs?

##
File path: 
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene87/Lucene87Codec.java
##
@@ -56,6 +57,23 @@
  * @lucene.experimental
  */
 public class Lucene87Codec extends Codec {
+
+  /** Configuration option for the codec. */
+  public static enum Mode {
+/** Trade compression ratio for retrieval speed. */
+BEST_SPEED(Lucene87StoredFieldsFormat.Mode.BEST_SPEED, 
Lucene80DocValuesFormat.Mode.BEST_SPEED),
+/** Trade retrieval speed for compression ratio. */
+BEST_COMPRESSION(Lucene87StoredFieldsFormat.Mode.BEST_COMPRESSION, 
Lucene80DocValuesFormat.Mode.BEST_COMPRESSION);
+
+private final Lucene87StoredFieldsFormat.Mode storedMode;
+private final Lucene80DocValuesFormat.Mode dvMode;
+
+private Mode(Lucene87StoredFieldsFormat.Mode storedMode, 
Lucene80DocValuesFormat.Mode dvMode) {

Review comment:
   Nice!  So we roll up the tradeoffs to Codec level which will then tell 
each format how to tradeoff.

##
File path: 
lucene/core/src/test/org/apache/lucene/codecs/lucene80/TestBestSpeedLucene80DocValuesFormat.java
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.lucene80;
+
+import org.apache.lucene.codecs.Codec;
+import org.apache.lucene.util.TestUtil;
+
+/**
+ * Tests Lucene80DocValuesFormat
+ */
+public class TestBestSpeedLucene80DocValuesFormat extends 
BaseLucene80DocValuesFormatTestCase {

Review comment:
   Do we also have a dedicated `TestBestCompressedLucene80DocValuesFormat`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



dsmiley commented on a change in pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520029778



##
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##
@@ -51,7 +51,7 @@
   // to essentially queue them up to be handled via pendingCoreOps.
   private static final List pendingCloses = new ArrayList<>();
 
-  private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory;
+  private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory = 
TransientSolrCoreCacheFactory.NO_OP;

Review comment:
   Under what circumstance do we need this no-op impl to prevent an NPE?

##
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##
@@ -62,55 +51,44 @@
   // to essentially queue them up to be handled via pendingCoreOps.
   private static final List pendingCloses = new ArrayList<>();
 
-  private TransientSolrCoreCacheFactory transientCoreCache;
+  private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory = 
TransientSolrCoreCacheFactory.NO_OP;
 
-  private TransientSolrCoreCache transientSolrCoreCache = null;
-  
   SolrCores(CoreContainer container) {
 this.container = container;
   }
   
   protected void addCoreDescriptor(CoreDescriptor p) {
 synchronized (modifyLock) {
   if (p.isTransient()) {
-if (getTransientCacheHandler() != null) {
-  getTransientCacheHandler().addTransientDescriptor(p.getName(), p);
-} else {
-  log.warn("We encountered a core marked as transient, but there is no 
transient handler defined. This core will be inaccessible");
-}
+getTransientCacheHandler().addTransientDescriptor(p.getName(), p);
   } else {
-residentDesciptors.put(p.getName(), p);
+residentDescriptors.put(p.getName(), p);
   }
 }
   }
 
   protected void removeCoreDescriptor(CoreDescriptor p) {
 synchronized (modifyLock) {
   if (p.isTransient()) {
-if (getTransientCacheHandler() != null) {
-  getTransientCacheHandler().removeTransientDescriptor(p.getName());
-}
+getTransientCacheHandler().removeTransientDescriptor(p.getName());
   } else {
-residentDesciptors.remove(p.getName());
+residentDescriptors.remove(p.getName());
   }
 }
   }
 
   public void load(SolrResourceLoader loader) {
-transientCoreCache = TransientSolrCoreCacheFactory.newInstance(loader, 
container);
+transientSolrCoreCacheFactory = 
TransientSolrCoreCacheFactory.newInstance(loader, container);
   }
+
   // We are shutting down. You can't hold the lock on the various lists of 
cores while they shut down, so we need to
   // make a temporary copy of the names and shut them down outside the lock.
   protected void close() {
 waitForLoadingCoresToFinish(30*1000);
 Collection coreList = new ArrayList<>();
 
-
-TransientSolrCoreCache transientSolrCoreCache = getTransientCacheHandler();
-// Release observer
-if (transientSolrCoreCache != null) {
-  transientSolrCoreCache.close();
-}
+// Release transient core cache.
+getTransientCacheHandler().close();

Review comment:
   @bruno-roustant the muse bot makes a good point; there should be a 
synchronized(modifyLock) around grabbing getTransientCacheHandler and calling 
close on it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.

2020-11-09 Thread GitBox



mikemccand commented on a change in pull request #2069:
URL: https://github.com/apache/lucene-solr/pull/2069#discussion_r520031264



##
File path: 
lucene/core/src/test/org/apache/lucene/codecs/lucene80/TestBestSpeedLucene80DocValuesFormat.java
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.lucene80;
+
+import org.apache.lucene.codecs.Codec;
+import org.apache.lucene.util.TestUtil;
+
+/**
+ * Tests Lucene80DocValuesFormat
+ */
+public class TestBestSpeedLucene80DocValuesFormat extends 
BaseLucene80DocValuesFormatTestCase {

Review comment:
   Oh nevermind I see you opened followon issue for this: 
https://issues.apache.org/jira/browse/LUCENE-9602





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.

2020-11-09 Thread GitBox



jpountz commented on a change in pull request #2069:
URL: https://github.com/apache/lucene-solr/pull/2069#discussion_r520033652



##
File path: 
lucene/core/src/test/org/apache/lucene/codecs/lucene80/TestBestSpeedLucene80DocValuesFormat.java
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.lucene80;
+
+import org.apache.lucene.codecs.Codec;
+import org.apache.lucene.util.TestUtil;
+
+/**
+ * Tests Lucene80DocValuesFormat
+ */
+public class TestBestSpeedLucene80DocValuesFormat extends 
BaseLucene80DocValuesFormatTestCase {

Review comment:
   You should see a `TestBestCompressedLucene80DocValuesFormat` file as 
well in this PR. I opened LUCENE-9602 specifically for backward compatibility 
and make sure we check in indices created by BEST_COMPRESSION in our source 
tree after every release to make sure we have good bw compatibility coverage.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.

2020-11-09 Thread GitBox



jpountz commented on a change in pull request #2069:
URL: https://github.com/apache/lucene-solr/pull/2069#discussion_r520034757



##
File path: 
lucene/core/src/test/org/apache/lucene/codecs/lucene80/BaseLucene80DocValuesFormatTestCase.java
##
@@ -286,7 +278,7 @@ private void doTestTermsEnumRandom(int numDocs, 
Supplier valuesProducer)
 conf.setMergeScheduler(new SerialMergeScheduler());
 // set to duel against a codec which has ordinals:
 final PostingsFormat pf = TestUtil.getPostingsFormatWithOrds(random());
-final DocValuesFormat dv = new Lucene80DocValuesFormat();
+final DocValuesFormat dv = getCodec().docValuesFormat();

Review comment:
   It's not randomizing, we are testing both modes explicitly via 
TestBestSpeedLucene80DocValuesFormat on one hand and 
TestBestCompressionLucene80DocValuesFormat on the other hand.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.

2020-11-09 Thread GitBox



jpountz commented on a change in pull request #2069:
URL: https://github.com/apache/lucene-solr/pull/2069#discussion_r520035432



##
File path: 
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene87/Lucene87Codec.java
##
@@ -56,6 +57,23 @@
  * @lucene.experimental
  */
 public class Lucene87Codec extends Codec {
+
+  /** Configuration option for the codec. */
+  public static enum Mode {
+/** Trade compression ratio for retrieval speed. */
+BEST_SPEED(Lucene87StoredFieldsFormat.Mode.BEST_SPEED, 
Lucene80DocValuesFormat.Mode.BEST_SPEED),
+/** Trade retrieval speed for compression ratio. */
+BEST_COMPRESSION(Lucene87StoredFieldsFormat.Mode.BEST_COMPRESSION, 
Lucene80DocValuesFormat.Mode.BEST_COMPRESSION);
+
+private final Lucene87StoredFieldsFormat.Mode storedMode;
+private final Lucene80DocValuesFormat.Mode dvMode;
+
+private Mode(Lucene87StoredFieldsFormat.Mode storedMode, 
Lucene80DocValuesFormat.Mode dvMode) {

Review comment:
   Right. It's still possible to made different choices for stored fields 
and doc values given that we allow configuration of doc values on a per-field 
basis, but this should at least keep simple use simple with one switch that 
configures stored fields and doc values at the same time.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-09 Thread GitBox



alessandrobenedetti commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r520038324



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -210,50 +216,59 @@ public void setContext(ResultContext context) {
   }
   
   // Setup LTRScoringQuery
-  scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-  docsWereNotReranked = (scoringQuery == null);
-  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {
-// if store is set in the transformer we should overwrite the logger
-
-final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
-
-final FeatureStore store = fr.getFeatureStore(featureStoreName);
-featureStoreName = store.getName(); // if featureStoreName was null 
before this gets actual name
-
-try {
-  final LoggingModel lm = new LoggingModel(loggingModelName,
-  featureStoreName, store.getFeatures());
-
-  scoringQuery = new LTRScoringQuery(lm,
-  LTRQParserPlugin.extractEFIParams(localparams),
-  true,
-  threadManager); // request feature weights to be created for all 
features
-
-}catch (final Exception e) {
-  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-  "retrieving the feature store "+featureStoreName, e);
-}
-  }
+  rerankingQueries = SolrQueryRequestContextUtils.getScoringQueries(req);
 
-  if (scoringQuery.getOriginalQuery() == null) {
-scoringQuery.setOriginalQuery(context.getQuery());
+  docsWereNotReranked = (rerankingQueries == null || 
rerankingQueries.length == 0);
+  if (docsWereNotReranked) {
+rerankingQueries = new LTRScoringQuery[]{null};
   }
-  if (scoringQuery.getFeatureLogger() == null){
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
-  }
-  scoringQuery.setRequest(req);
-
-  featureLogger = scoringQuery.getFeatureLogger();
+  modelWeights = new LTRScoringQuery.ModelWeight[rerankingQueries.length];
+  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
+  for (int i = 0; i < rerankingQueries.length; i++) {
+LTRScoringQuery scoringQuery = rerankingQueries[i];
+if ((scoringQuery == null || !(scoringQuery instanceof 
OriginalRankingLTRScoringQuery)) && (docsWereNotReranked || (featureStoreName 
!= null && 
!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {

Review comment:
   So I just committed my changes on this, I spent quite a while thinking 
about the various scenarios, adjusting the code and adding the related tests.
   
   From your observations:
   - if both models are for the requested feature store then that's great and 
each document would have been picked by one of the models and so we use the 
feature vector already previously calculated by whatever model had picked the 
document. [OK]
   - if neither model is for the requested feature store then we need to create 
a logging model, is one logging model sufficient or do we need two? intuitively 
to me one would seem to be sufficient but that's based on partial analysis only 
so far.
   [One is sufficient, and my latest changes do that, anyway I just realized 
that the loggingModel is not heavy to create and when getting the 
featureVector, the cache is accessed, the key for that cache doesn't look to 
the instance but to the content of classes, so two identical logging models 
would have matched in the feature vector cache, the change was probably not 
vital, but not harmful either]
   - in the third scenario we still need the logging model, because when 
specifying a featureStore in the featureVector transformer we aim to extract 
all the features of that store, from efi passed to the transformer, so when 
explicitly mentioned, we need a logger for both the model again (also for the 
one that aligns)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



bruno-roustant commented on a change in pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520040508



##
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##
@@ -51,7 +51,7 @@
   // to essentially queue them up to be handled via pendingCoreOps.
   private static final List pendingCloses = new ArrayList<>();
 
-  private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory;
+  private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory = 
TransientSolrCoreCacheFactory.NO_OP;

Review comment:
   If a SolrCores method accessing getTransientCacheHandler() is called 
before calling load().
   For example an asynchronous periodic thread that would be started with 
CoreContainer and that would call getLoadedCoreNames(). This worked before even 
if SolrCores.load() was called after. (first calls would return without 
counting transient cores, and subsequent calls after load() would count 
transient cores)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-09 Thread Andrzej Bialecki (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14683:

Attachment: SOLR-14683.patch

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-14683.patch, SOLR-14683.patch
>
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-09 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228768#comment-17228768
 ] 

Andrzej Bialecki commented on SOLR-14683:
-

This patch adds configurable placeholders for missing values of different 
types, all returning {{null}} by default. They are configured in 
{{solr.xml:solr/metrics/missingValues}} section, per Ref Guide doc (see example 
there).

If there are no objections I'll commit this shortly.

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-14683.patch, SOLR-14683.patch
>
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-09 Thread Kevin Risden (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228769#comment-17228769
 ] 

Kevin Risden commented on SOLR-14973:
-

FYI [~tallison] - not sure who updated Tika libraries last :D I can help look 
at this I think.

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Priority: Major
>  Labels: tika-parsers
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



bruno-roustant commented on a change in pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520049002



##
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##
@@ -62,55 +51,44 @@
   // to essentially queue them up to be handled via pendingCoreOps.
   private static final List pendingCloses = new ArrayList<>();
 
-  private TransientSolrCoreCacheFactory transientCoreCache;
+  private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory = 
TransientSolrCoreCacheFactory.NO_OP;
 
-  private TransientSolrCoreCache transientSolrCoreCache = null;
-  
   SolrCores(CoreContainer container) {
 this.container = container;
   }
   
   protected void addCoreDescriptor(CoreDescriptor p) {
 synchronized (modifyLock) {
   if (p.isTransient()) {
-if (getTransientCacheHandler() != null) {
-  getTransientCacheHandler().addTransientDescriptor(p.getName(), p);
-} else {
-  log.warn("We encountered a core marked as transient, but there is no 
transient handler defined. This core will be inaccessible");
-}
+getTransientCacheHandler().addTransientDescriptor(p.getName(), p);
   } else {
-residentDesciptors.put(p.getName(), p);
+residentDescriptors.put(p.getName(), p);
   }
 }
   }
 
   protected void removeCoreDescriptor(CoreDescriptor p) {
 synchronized (modifyLock) {
   if (p.isTransient()) {
-if (getTransientCacheHandler() != null) {
-  getTransientCacheHandler().removeTransientDescriptor(p.getName());
-}
+getTransientCacheHandler().removeTransientDescriptor(p.getName());
   } else {
-residentDesciptors.remove(p.getName());
+residentDescriptors.remove(p.getName());
   }
 }
   }
 
   public void load(SolrResourceLoader loader) {
-transientCoreCache = TransientSolrCoreCacheFactory.newInstance(loader, 
container);
+transientSolrCoreCacheFactory = 
TransientSolrCoreCacheFactory.newInstance(loader, container);
   }
+
   // We are shutting down. You can't hold the lock on the various lists of 
cores while they shut down, so we need to
   // make a temporary copy of the names and shut them down outside the lock.
   protected void close() {
 waitForLoadingCoresToFinish(30*1000);
 Collection coreList = new ArrayList<>();
 
-
-TransientSolrCoreCache transientSolrCoreCache = getTransientCacheHandler();
-// Release observer
-if (transientSolrCoreCache != null) {
-  transientSolrCoreCache.close();
-}
+// Release transient core cache.
+getTransientCacheHandler().close();

Review comment:
   +1 thanks muse!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2069: LUCENE-9378: Make it possible to configure how to trade speed for compression on doc values.

2020-11-09 Thread GitBox



mikemccand commented on a change in pull request #2069:
URL: https://github.com/apache/lucene-solr/pull/2069#discussion_r520055341



##
File path: 
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene87/Lucene87Codec.java
##
@@ -56,6 +57,23 @@
  * @lucene.experimental
  */
 public class Lucene87Codec extends Codec {
+
+  /** Configuration option for the codec. */
+  public static enum Mode {
+/** Trade compression ratio for retrieval speed. */
+BEST_SPEED(Lucene87StoredFieldsFormat.Mode.BEST_SPEED, 
Lucene80DocValuesFormat.Mode.BEST_SPEED),
+/** Trade retrieval speed for compression ratio. */
+BEST_COMPRESSION(Lucene87StoredFieldsFormat.Mode.BEST_COMPRESSION, 
Lucene80DocValuesFormat.Mode.BEST_COMPRESSION);
+
+private final Lucene87StoredFieldsFormat.Mode storedMode;
+private final Lucene80DocValuesFormat.Mode dvMode;
+
+private Mode(Lucene87StoredFieldsFormat.Mode storedMode, 
Lucene80DocValuesFormat.Mode dvMode) {

Review comment:
   Great!  Simple for common use cases ("I want best compression" or "I 
want fastest search"), and complex for complex use cases (I want separate 
control for each part of the index).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] alessandrobenedetti commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

2020-11-09 Thread GitBox



alessandrobenedetti commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r520055346



##
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##
@@ -210,50 +216,59 @@ public void setContext(ResultContext context) {
   }
   
   // Setup LTRScoringQuery
-  scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-  docsWereNotReranked = (scoringQuery == null);
-  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-  if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {
-// if store is set in the transformer we should overwrite the logger
-
-final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
-
-final FeatureStore store = fr.getFeatureStore(featureStoreName);
-featureStoreName = store.getName(); // if featureStoreName was null 
before this gets actual name
-
-try {
-  final LoggingModel lm = new LoggingModel(loggingModelName,
-  featureStoreName, store.getFeatures());
-
-  scoringQuery = new LTRScoringQuery(lm,
-  LTRQParserPlugin.extractEFIParams(localparams),
-  true,
-  threadManager); // request feature weights to be created for all 
features
-
-}catch (final Exception e) {
-  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-  "retrieving the feature store "+featureStoreName, e);
-}
-  }
+  rerankingQueries = SolrQueryRequestContextUtils.getScoringQueries(req);
 
-  if (scoringQuery.getOriginalQuery() == null) {
-scoringQuery.setOriginalQuery(context.getQuery());
+  docsWereNotReranked = (rerankingQueries == null || 
rerankingQueries.length == 0);
+  if (docsWereNotReranked) {
+rerankingQueries = new LTRScoringQuery[]{null};
   }
-  if (scoringQuery.getFeatureLogger() == null){
-scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
-  }
-  scoringQuery.setRequest(req);
-
-  featureLogger = scoringQuery.getFeatureLogger();
+  modelWeights = new LTRScoringQuery.ModelWeight[rerankingQueries.length];
+  String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
+  for (int i = 0; i < rerankingQueries.length; i++) {
+LTRScoringQuery scoringQuery = rerankingQueries[i];
+if ((scoringQuery == null || !(scoringQuery instanceof 
OriginalRankingLTRScoringQuery)) && (docsWereNotReranked || (featureStoreName 
!= null && 
!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()
 {

Review comment:
   Actually taking a deeper look to the third point, the original 
implementation was not extracting all the features, but if the explicit 
featureStore was matching the model featureStore, it was using the model one 
(no logger).
   
   So I agree with you, in our case, we want to use the model already existent 
and no logger at all.
   I am going to clean up that bit and do a new commit tomorrow





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



dsmiley commented on a change in pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520057072



##
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##
@@ -51,7 +51,7 @@
   // to essentially queue them up to be handled via pendingCoreOps.
   private static final List pendingCloses = new ArrayList<>();
 
-  private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory;
+  private TransientSolrCoreCacheFactory transientSolrCoreCacheFactory = 
TransientSolrCoreCacheFactory.NO_OP;

Review comment:
   then SolrCores.load could be called much sooner, basically right after 
the resourceLoader is ready.  
   
   Also, maybe this other thread ought to wait to start till some later time.  
Perhaps ideally there would be an event publishing mechanism, which doesn't 
exist currently, I know.  Or alternatively just have some CountDownLatch 
signal, like signaling when Solr will begin loading cores.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



dsmiley commented on a change in pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520058820



##
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##
@@ -536,7 +538,9 @@ public void queueCoreToClose(SolrCore coreToClose) {
* @return the cache holding the transient cores; never null.
*/
   public TransientSolrCoreCache getTransientCacheHandler() {

Review comment:
   I think it should be a prerequisite that the caller acquire the lock 
a-priori.  For example, close() seems to need to keep holding this lock to call 
close() on it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14973) Solr 8.6 is shipping libraries that are incompatible with each other

2020-11-09 Thread Tim Allison (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228776#comment-17228776
 ] 

Tim Allison commented on SOLR-14973:


 Thank you [~krisden] for the ping.

> Solr 8.6 is shipping libraries that are incompatible with each other
> 
>
> Key: SOLR-14973
> URL: https://issues.apache.org/jira/browse/SOLR-14973
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.6
>Reporter: Samir Huremovic
>Priority: Major
>  Labels: tika-parsers
>
> Hi,
> since Solr 8.6 the version of {{tika-parsers}} was updated to {{1.24}}. This 
> version of {{tika-parsers}} needs the {{poi}} library in version {{4.1.2}} 
> (see https://issues.apache.org/jira/browse/TIKA-3047) 
> Solr has version {{4.1.1}} of poi included.
> This creates (at least) a problem for parsing {{.xls}} files. The following 
> exception gets thrown by trying to post an {{.xls}} file in the techproducts 
> example:
> {{java.lang.NoSuchMethodError: 
> org.apache.poi.hssf.record.common.UnicodeString.getExtendedRst()Lorg/apache/poi/hssf/record/common/ExtRst;}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



bruno-roustant commented on a change in pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520089739



##
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##
@@ -536,7 +538,9 @@ public void queueCoreToClose(SolrCore coreToClose) {
* @return the cache holding the transient cores; never null.
*/
   public TransientSolrCoreCache getTransientCacheHandler() {

Review comment:
   I don't think so. Let's say transientSolrCoreCacheFactory = A. If there 
is a race and load() is called between getTransientCacheHandler() and 
TransientSolrCoreCache.close(). getTransientCacheHandler() returns A, then 
load() sets B, and then A.close() is called. It is the same result as the 
sequence { getTransientCacheHandler().close() } atomically on A, and then 
load() B.
   
   But actually it doesn't matter. So I can add a synchronized (modifyLock) 
around if you prefer.
   I'll keep anyway the synchronized inside getTransientCacheHandler() because 
it is public.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2066: SOLR-14975: Optimize CoreContainer.getAllCoreNames and getLoadedCoreNames.

2020-11-09 Thread GitBox



dsmiley commented on a change in pull request #2066:
URL: https://github.com/apache/lucene-solr/pull/2066#discussion_r520092229



##
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##
@@ -536,7 +538,9 @@ public void queueCoreToClose(SolrCore coreToClose) {
* @return the cache holding the transient cores; never null.
*/
   public TransientSolrCoreCache getTransientCacheHandler() {

Review comment:
   Okay; I'm fine with the lock to be safe.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #2020: SOLR-14949: Ability to customize Solr Docker build

2020-11-09 Thread GitBox



dweiss commented on pull request #2020:
URL: https://github.com/apache/lucene-solr/pull/2020#issuecomment-724260281


   Hmm... sorry, missed your mention/ request somehow. Yeah - these functions 
are intended to read variables from multiple locations so it looks ok to me. I 
didn't test it (or use docker much for that matter).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] HoustonPutman commented on pull request #2020: SOLR-14949: Ability to customize Solr Docker build

2020-11-09 Thread GitBox



HoustonPutman commented on pull request #2020:
URL: https://github.com/apache/lucene-solr/pull/2020#issuecomment-724264582


   Thanks for the sanity check!
   
   I've tested pretty thoroughly, and the PR test does some checks on its own 
too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14993) Unable to download zookeeper files of 1byte in size

2020-11-09 Thread Allen Sooredoo (Jira)

Allen Sooredoo created SOLR-14993:
-

 Summary: Unable to download zookeeper files of 1byte in size
 Key: SOLR-14993
 URL: https://issues.apache.org/jira/browse/SOLR-14993
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud, SolrJ
Affects Versions: 8.5.1
Reporter: Allen Sooredoo


When downloading a file from Zookeeper using the Solrj client, files of size 1 
byte are ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-09 Thread GitBox



dweiss commented on pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-724283052


   Zach I've cleaned up the native build a bit - moved it under lucene/misc, 
added Windows build (it does build the native library for me). I didn't check 
whether it works on a Mac but I suspect it should.
   
   I also left the native project included by default in settings (removed the 
"optional" flag). Gradle's cpp plugin ignores the project on platforms not 
explicitly mentioned in the targetMachines - I am curious whether we'll blow up 
something or if it's just going to work. 
   
   While I don't particularly like having native code in Lucene, I think it's 
better than it used to be (mixed cpp code with java code, etc.).
   
   I allowed myself to commit directly to your fork, hope you don't mind 
(please test it out!).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-09 Thread GitBox



dweiss commented on a change in pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#discussion_r520125532



##
File path: lucene/misc/native/build.gradle
##
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This gets separated out from misc module into a native module due to 
incompatibility between cpp-library and java-library plugins.
+ * For details, please see 
https://github.com/gradle/gradle-native/issues/352#issuecomment-461724948
+ */
+import org.apache.tools.ant.taskdefs.condition.Os
+
+description = 'Module for native code'
+
+apply plugin: 'cpp-library'
+
+library {
+  baseName = 'NativePosixUtil'

Review comment:
   I wonder if we should rename the resulting native library something more 
specific... 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani opened a new pull request #2070: LUCENE-9536: Correct the OrdinalMap optimization.

2020-11-09 Thread GitBox



jtibshirani opened a new pull request #2070:
URL: https://github.com/apache/lucene-solr/pull/2070


   Previously we only checked that the first segment's ordinal deltas were all
   zero. This didn't account for some rare cases where some of the segment's
   ordinals are filtered out, so the ordinals aren't contiguous. In these cases 
we
   fill in dummy values for the missing ordinal deltas. So a segment's ordinals
   can appear to match the global ordinals perfectly, but not actually contain 
all
   the terms.
   
   Such a case can arise when using a FilteredTermsEnum, for example when 
merging
   a segment with deletions. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-09 Thread GitBox



dweiss commented on pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-724287441


   Byt. if we do have to add that explicit 'build.native' option then I'd 
implement it as a task (graph) exclusion rather than project exclusion. Windows 
users in particular may complain as the plugin requires visual studio...
   
   
https://docs.gradle.org/current/userguide/building_cpp_projects.html#sec:cpp_supported_tool_chain
   
   So something like this on all native project's tasks (conditionally):
   
https://discuss.gradle.org/t/removing-tasks-from-taskgraph-remove-a-task-dependency/394
   or filter them out entirely:
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani commented on pull request #2070: LUCENE-9536: Correct the OrdinalMap optimization.

2020-11-09 Thread GitBox



jtibshirani commented on pull request #2070:
URL: https://github.com/apache/lucene-solr/pull/2070#issuecomment-724288495


   This should fix the failures we're seeing like 
`TestLucene70DocValuesFormat#testSparseSortedVariableLengthVsStoredFields` and 
`TestSimpleTextDocValuesFormat#testSparseSortedFixedLengthVsStoredFields`. Note 
to self: run the whole test suite a bunch of times when changing subtle logic!!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14994) Bring in Solr Operator into the Lucene project

2020-11-09 Thread Anshum Gupta (Jira)

Anshum Gupta created SOLR-14994:
---

 Summary: Bring in Solr Operator into the Lucene project
 Key: SOLR-14994
 URL: https://issues.apache.org/jira/browse/SOLR-14994
 Project: Solr
  Issue Type: Task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Anshum Gupta
Assignee: Anshum Gupta


Solr Operator project codebase is currently in the process of being donated to 
the Apache Lucene  project. This is an umbrella JIRA to track the progress and 
tasks associated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14788) Solr: The Next Big Thing

2020-11-09 Thread Mark Robert Miller (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228852#comment-17228852
 ] 

Mark Robert Miller commented on SOLR-14788:
---

This Overseer work (release it from its birth into heavy historic tech debt), 
on top of the general state of everything else near this state, requires that i 
really work step by step through the system and what it does - the first time I 
saw things from that state, that perspective, I realized we don’t have adequate 
developer / user log, it’s really not sufficient at all, and so you have to 
start adding info and debug logging, and that is very, very useful. I didn’t 
really just come to understand this wide area, but having to work through so 
much to “re-master” it, the logging i need becomes evident as I learned I 
needed it.

So this time I’m not doing a great job. I’m adding here and there, over logging 
whee I have to clean up, blah, blah.

The takeaway really is that our system is actually fairly simple, but only if 
you axe the decade old baggage and realign some implementations. Once the 
foundation is stable, there is high value in nailing the logging.

It’s the key to letting more real help in, it’s the key for efficient test and 
user and support debugging.

We log so much data, we over log data and this thing and that thing. We should 
not over log data by default and we should log system flow really well and it 
will be a really big deal. 

> Solr: The Next Big Thing
> 
>
> Key: SOLR-14788
> URL: https://issues.apache.org/jira/browse/SOLR-14788
> Project: Solr
>  Issue Type: Task
>Reporter: Mark Robert Miller
>Assignee: Mark Robert Miller
>Priority: Critical
>
> h3. 
> [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
>  Policeman is on duty!*{color}
> {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
> have some fun. Try to make some progress. Don't stress too much about the 
> impact of your changes or maintaining stability and performance and 
> correctness so much. Until the end of phase 1, I've got your back. I have a 
> variety of tools and contraptions I have been building over the years and I 
> will continue training them on this branch. I will review your changes and 
> peer out across the land and course correct where needed. As Mike D will be 
> thinking, "Sounds like a bottleneck Mark." And indeed it will be to some 
> extent. Which is why once stage one is completed, I will flip The Policeman 
> to off duty. When off duty, I'm always* {color:#de350b}*occasionally*{color} 
> *down for some vigilante justice, but I won't be walking the beat, all that 
> stuff about sit back and relax goes out the window.*{color}_
> {quote}
>  
> I have stolen this title from Ishan or Noble and Ishan.
> This issue is meant to capture the work of a small team that is forming to 
> push Solr and SolrCloud to the next phase.
> I have kicked off the work with an effort to create a very fast and solid 
> base. That work is not 100% done, but it's ready to join the fight.
> Tim Potter has started giving me a tremendous hand in finishing up. Ishan and 
> Noble have already contributed support and testing and have plans for 
> additional work to shore up some of our current shortcomings.
> Others have expressed an interest in helping and hopefully they will pop up 
> here as well.
> Let's organize and discuss our efforts here and in various sub issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14788) Solr: The Next Big Thing

2020-11-09 Thread Mark Robert Miller (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228852#comment-17228852
 ] 

Mark Robert Miller edited comment on SOLR-14788 at 11/9/20, 11:16 PM:
--

This Overseer work (released from its birth into heavy historic tech debt), on 
top of the general state of everything else near this state, requires that i 
really work step by step through the system and what it does - the first time I 
saw things from that state, that perspective, I realized we don’t have adequate 
developer / user logging, it’s really not sufficient at all, and so you have to 
start adding info and debug logging, and that is very, very useful. I didn’t 
really just come to understand this wide area, but having to work through so 
much to “re-master” it, the logging i need becomes evident as I learned I 
needed it and we could just be 100x more helpful than we are.

So this time I’m not doing a great job. I’m adding here and there, over logging 
where I have to clean up, favoring finishing over a little paint outside the 
lines, blah, blah.

The takeaway really is that our system is actually fairly simple, but only if 
you axe the decade old baggage and realign some implementations. Once the 
foundation is stable, there is high value in nailing the logging.

It’s the key to letting more real help in, it’s the key for efficient test and 
user and support debugging.

We log so much data, we over log data and this thing and that thing. We should 
not over log data by default and we should log system flow really well and it 
will be a really big deal. 


was (Author: markrmiller):
This Overseer work (release it from its birth into heavy historic tech debt), 
on top of the general state of everything else near this state, requires that i 
really work step by step through the system and what it does - the first time I 
saw things from that state, that perspective, I realized we don’t have adequate 
developer / user log, it’s really not sufficient at all, and so you have to 
start adding info and debug logging, and that is very, very useful. I didn’t 
really just come to understand this wide area, but having to work through so 
much to “re-master” it, the logging i need becomes evident as I learned I 
needed it.

So this time I’m not doing a great job. I’m adding here and there, over logging 
whee I have to clean up, blah, blah.

The takeaway really is that our system is actually fairly simple, but only if 
you axe the decade old baggage and realign some implementations. Once the 
foundation is stable, there is high value in nailing the logging.

It’s the key to letting more real help in, it’s the key for efficient test and 
user and support debugging.

We log so much data, we over log data and this thing and that thing. We should 
not over log data by default and we should log system flow really well and it 
will be a really big deal. 

> Solr: The Next Big Thing
> 
>
> Key: SOLR-14788
> URL: https://issues.apache.org/jira/browse/SOLR-14788
> Project: Solr
>  Issue Type: Task
>Reporter: Mark Robert Miller
>Assignee: Mark Robert Miller
>Priority: Critical
>
> h3. 
> [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
>  Policeman is on duty!*{color}
> {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
> have some fun. Try to make some progress. Don't stress too much about the 
> impact of your changes or maintaining stability and performance and 
> correctness so much. Until the end of phase 1, I've got your back. I have a 
> variety of tools and contraptions I have been building over the years and I 
> will continue training them on this branch. I will review your changes and 
> peer out across the land and course correct where needed. As Mike D will be 
> thinking, "Sounds like a bottleneck Mark." And indeed it will be to some 
> extent. Which is why once stage one is completed, I will flip The Policeman 
> to off duty. When off duty, I'm always* {color:#de350b}*occasionally*{color} 
> *down for some vigilante justice, but I won't be walking the beat, all that 
> stuff about sit back and relax goes out the window.*{color}_
> {quote}
>  
> I have stolen this title from Ishan or Noble and Ishan.
> This issue is meant to capture the work of a small team that is forming to 
> push Solr and SolrCloud to the next phase.
> I have kicked off the work with an effort to create a very fast and solid 
> base. That work is not 100% done, but it's ready to join the fight.
> Tim Potter has started giving me a tremendous hand in finishing up. Ishan and 
> Noble have already contributed support and testing and have plans for 
> additional work to shore up some of our curren

[GitHub] [lucene-solr] madrob commented on a change in pull request #2067: SOLR-14987: Reuse HttpSolrClient per node vs. one per Solr core when using CloudSolrStream

2020-11-09 Thread GitBox



madrob commented on a change in pull request #2067:
URL: https://github.com/apache/lucene-solr/pull/2067#discussion_r520198395



##
File path: 
solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/CloudSolrStream.java
##
@@ -334,11 +334,6 @@ private StreamComparator parseComp(String sort, String fl) 
throws IOException {
   public static Slice[] getSlices(String collectionName, ZkStateReader 
zkStateReader, boolean checkAlias) throws IOException {
 ClusterState clusterState = zkStateReader.getClusterState();
 
-Map collectionsMap = 
clusterState.getCollectionsMap();

Review comment:
   related: can we update the javadoc on clusterState.getCollectionsMap to 
be more explicit that it _will_ make a call to zk, instead of the current _may_

##
File path: 
solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/CloudSolrStream.java
##
@@ -334,11 +334,6 @@ private StreamComparator parseComp(String sort, String fl) 
throws IOException {
   public static Slice[] getSlices(String collectionName, ZkStateReader 
zkStateReader, boolean checkAlias) throws IOException {
 ClusterState clusterState = zkStateReader.getClusterState();
 
-Map collectionsMap = 
clusterState.getCollectionsMap();
-
-//TODO we should probably split collection by comma to query more than one
-//  which is something already supported in other parts of Solr
-
 // check for alias or collection

Review comment:
   Should we cache the value of `zkStateReader.getAliases` below to avoid 
volatile reads?

##
File path: 
solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/SolrStream.java
##
@@ -126,6 +135,17 @@ public void open() throws IOException {
 }
   }
 
+  private String getNodeUrl() {

Review comment:
   Can we precomute this in the constructor?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani opened a new pull request #2071: LUCENE-9322: Some fixes to SimpleTextVectorFormat.

2020-11-09 Thread GitBox



jtibshirani opened a new pull request #2071:
URL: https://github.com/apache/lucene-solr/pull/2071


   * Make sure the files are unique by renaming the term vectors extension to 
`tvc`.
   * Fix a bug where reading a vector would drop the leading digit of the first 
element.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jtibshirani commented on pull request #2071: LUCENE-9322: Some fixes to SimpleTextVectorFormat.

2020-11-09 Thread GitBox



jtibshirani commented on pull request #2071:
URL: https://github.com/apache/lucene-solr/pull/2071#issuecomment-724386681


   I found these issues while fixing the following failing test:
   
   ```
   ./gradlew test --tests TestSortingCodecReader.testSortOnAddIndicesRandom 
-Dtests.seed=B38EBA45728D5FB1
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on a change in pull request #2071: LUCENE-9322: Some fixes to SimpleTextVectorFormat.

2020-11-09 Thread GitBox



msokolov commented on a change in pull request #2071:
URL: https://github.com/apache/lucene-solr/pull/2071#discussion_r520253512



##
File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextVectorReader.java
##
@@ -245,8 +245,8 @@ private void readAllVectors() throws IOException {
 
 private void readVector(float[] value) throws IOException {
   SimpleTextUtil.readLine(in, scratch);
-  // skip leading " [" and strip trailing "]"
-  String s = new BytesRef(scratch.bytes(), 2, scratch.length() - 
3).utf8ToString();
+  // skip leading "[" and strip trailing "]"
+  String s = new BytesRef(scratch.bytes(), 1, scratch.length() - 
2).utf8ToString();

Review comment:
   Wow, how did this ever work; we must never have tested it. grr. Thank 
you for cleaning up!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov merged pull request #2071: LUCENE-9322: Some fixes to SimpleTextVectorFormat.

2020-11-09 Thread GitBox



msokolov merged pull request #2071:
URL: https://github.com/apache/lucene-solr/pull/2071


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API

2020-11-09 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228945#comment-17228945
 ] 

ASF subversion and git services commented on LUCENE-9322:
-

Commit 42c5206cea5c85d486813d42f7d52e44a5a695ba in lucene-solr's branch 
refs/heads/master from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=42c5206 ]

LUCENE-9322: Some fixes to SimpleTextVectorFormat. (#2071)

* Make sure the file extensions are unique.

* Fix bug in vector reading.

> Discussing a unified vectors format API
> ---
>
> Key: LUCENE-9322
> URL: https://issues.apache.org/jira/browse/LUCENE-9322
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Julie Tibshirani
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Two different approximate nearest neighbor approaches are currently being 
> developed, one based on HNSW (LUCENE-9004) and another based on coarse 
> quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to 
> handle vectors. In LUCENE-9136 we discussed the possibility of a unified API 
> that could support both approaches. The two ANN strategies give different 
> trade-offs in terms of speed, memory, and complexity, and it’s likely that 
> we’ll want to support both. Vector search is also an active research area, 
> and it would be great to be able to prototype and incorporate new approaches 
> without introducing more formats.
> To me it seems like a good time to begin discussing a unified API. The 
> prototype for coarse quantization 
> ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit 
> soon (this depends on everyone's feedback of course). The approach is simple 
> and shows solid search performance, as seen 
> [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326].
>  I think this API discussion is an important step in moving that 
> implementation forward.
> The goals of the API would be
> # Support for storing and retrieving individual float vectors.
> # Support for approximate nearest neighbor search -- given a query vector, 
> return the indexed vectors that are closest to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API

2020-11-09 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228951#comment-17228951
 ] 

ASF subversion and git services commented on LUCENE-9322:
-

Commit 42c5206cea5c85d486813d42f7d52e44a5a695ba in lucene-solr's branch 
refs/heads/master from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=42c5206 ]

LUCENE-9322: Some fixes to SimpleTextVectorFormat. (#2071)

* Make sure the file extensions are unique.

* Fix bug in vector reading.

> Discussing a unified vectors format API
> ---
>
> Key: LUCENE-9322
> URL: https://issues.apache.org/jira/browse/LUCENE-9322
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Julie Tibshirani
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Two different approximate nearest neighbor approaches are currently being 
> developed, one based on HNSW (LUCENE-9004) and another based on coarse 
> quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to 
> handle vectors. In LUCENE-9136 we discussed the possibility of a unified API 
> that could support both approaches. The two ANN strategies give different 
> trade-offs in terms of speed, memory, and complexity, and it’s likely that 
> we’ll want to support both. Vector search is also an active research area, 
> and it would be great to be able to prototype and incorporate new approaches 
> without introducing more formats.
> To me it seems like a good time to begin discussing a unified API. The 
> prototype for coarse quantization 
> ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit 
> soon (this depends on everyone's feedback of course). The approach is simple 
> and shows solid search performance, as seen 
> [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326].
>  I think this API discussion is an important step in moving that 
> implementation forward.
> The goals of the API would be
> # Support for storing and retrieving individual float vectors.
> # Support for approximate nearest neighbor search -- given a query vector, 
> return the indexed vectors that are closest to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-09 Thread GitBox



zacharymorn commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r520268743



##
File path: lucene/misc/src/java/org/apache/lucene/store/DirectIODirectory.java
##
@@ -66,45 +66,32 @@
  *
  * @lucene.experimental
  */
-public class NativeUnixDirectory extends FSDirectory {
+public class DirectIODirectory extends FSDirectory {
 
   // TODO: this is OS dependent, but likely 512 is the LCD
   private final static long ALIGN = 512;
   private final static long ALIGN_NOT_MASK = ~(ALIGN-1);
-  
-  /** Default buffer size before writing to disk (256 KB);
-   *  larger means less IO load but more RAM and direct
-   *  buffer storage space consumed during merging. */
-
-  public final static int DEFAULT_MERGE_BUFFER_SIZE = 262144;
 
   /** Default min expected merge size before direct IO is
*  used (10 MB): */
   public final static long DEFAULT_MIN_BYTES_DIRECT = 10*1024*1024;
 
-  private final int mergeBufferSize;
   private final long minBytesDirect;
   private final Directory delegate;
 
   /** Create a new NIOFSDirectory for the named location.
* 
* @param path the path of the directory
-   * @param lockFactory to use
-   * @param mergeBufferSize Size of buffer to use for
-   *merging.  See {@link #DEFAULT_MERGE_BUFFER_SIZE}.
* @param minBytesDirect Merges, or files to be opened for
*   reading, smaller than this will
*   not use direct IO.  See {@link
*   #DEFAULT_MIN_BYTES_DIRECT}
+   * @param lockFactory to use
* @param delegate fallback Directory for non-merges
* @throws IOException If there is a low-level I/O error
*/
-  public NativeUnixDirectory(Path path, int mergeBufferSize, long 
minBytesDirect, LockFactory lockFactory, Directory delegate) throws IOException 
{
+  public DirectIODirectory(Path path, long minBytesDirect, LockFactory 
lockFactory, Directory delegate) throws IOException {
 super(path, lockFactory);
-if ((mergeBufferSize & ALIGN) != 0) {
-  throw new IllegalArgumentException("mergeBufferSize must be 0 mod " + 
ALIGN + " (got: " + mergeBufferSize + ")");
-}
-this.mergeBufferSize = mergeBufferSize;

Review comment:
   I see it makes sense. I've reverted the relevant section of code in the 
latest commit to keep it focused on moving to pure java implementation.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2020-11-09 Thread GitBox



zacharymorn commented on a change in pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#discussion_r520268743



##
File path: lucene/misc/src/java/org/apache/lucene/store/DirectIODirectory.java
##
@@ -66,45 +66,32 @@
  *
  * @lucene.experimental
  */
-public class NativeUnixDirectory extends FSDirectory {
+public class DirectIODirectory extends FSDirectory {
 
   // TODO: this is OS dependent, but likely 512 is the LCD
   private final static long ALIGN = 512;
   private final static long ALIGN_NOT_MASK = ~(ALIGN-1);
-  
-  /** Default buffer size before writing to disk (256 KB);
-   *  larger means less IO load but more RAM and direct
-   *  buffer storage space consumed during merging. */
-
-  public final static int DEFAULT_MERGE_BUFFER_SIZE = 262144;
 
   /** Default min expected merge size before direct IO is
*  used (10 MB): */
   public final static long DEFAULT_MIN_BYTES_DIRECT = 10*1024*1024;
 
-  private final int mergeBufferSize;
   private final long minBytesDirect;
   private final Directory delegate;
 
   /** Create a new NIOFSDirectory for the named location.
* 
* @param path the path of the directory
-   * @param lockFactory to use
-   * @param mergeBufferSize Size of buffer to use for
-   *merging.  See {@link #DEFAULT_MERGE_BUFFER_SIZE}.
* @param minBytesDirect Merges, or files to be opened for
*   reading, smaller than this will
*   not use direct IO.  See {@link
*   #DEFAULT_MIN_BYTES_DIRECT}
+   * @param lockFactory to use
* @param delegate fallback Directory for non-merges
* @throws IOException If there is a low-level I/O error
*/
-  public NativeUnixDirectory(Path path, int mergeBufferSize, long 
minBytesDirect, LockFactory lockFactory, Directory delegate) throws IOException 
{
+  public DirectIODirectory(Path path, long minBytesDirect, LockFactory 
lockFactory, Directory delegate) throws IOException {
 super(path, lockFactory);
-if ((mergeBufferSize & ALIGN) != 0) {
-  throw new IllegalArgumentException("mergeBufferSize must be 0 mod " + 
ALIGN + " (got: " + mergeBufferSize + ")");
-}
-this.mergeBufferSize = mergeBufferSize;

Review comment:
   I see it makes sense. I've reverted the relevant section of code in the 
latest commits to keep it focused on moving to pure java implementation.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-09 Thread GitBox



zacharymorn commented on a change in pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#discussion_r520270563



##
File path: lucene/misc/native/build.gradle
##
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This gets separated out from misc module into a native module due to 
incompatibility between cpp-library and java-library plugins.
+ * For details, please see 
https://github.com/gradle/gradle-native/issues/352#issuecomment-461724948
+ */
+import org.apache.tools.ant.taskdefs.condition.Os
+
+description = 'Module for native code'
+
+apply plugin: 'cpp-library'
+
+library {
+  baseName = 'NativePosixUtil'

Review comment:
   Given this now also includes the Windows one, and the cpp code focus on 
doing file IO, I'm guessing something like `NativeIOUtil` might work ? 
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] zacharymorn commented on pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-09 Thread GitBox



zacharymorn commented on pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#issuecomment-724434703


   > Zach I've cleaned up the native build a bit - moved it under lucene/misc, 
added Windows build (it does build the native library for me). I didn't check 
whether it works on a Mac but I suspect it should.
   > 
   > I also left the native project included by default in settings (removed 
the "optional" flag). Gradle's cpp plugin ignores the project on platforms not 
explicitly mentioned in the targetMachines - I am curious whether we'll blow up 
something or if it's just going to work.
   > 
   > While I don't particularly like having native code in Lucene, I think it's 
better than it used to be (mixed cpp code with java code, etc.).
   > 
   > I allowed myself to commit directly to your fork, hope you don't mind 
(please test it out!).
   
   Thanks Dawid for the changes! I tested it out in my mac and it built fine as 
well. 
   
   I originally separated out into an independent native module thinking that 
it could host future native code as well, but I guess that's probably just 
pre-mature optimization as it hasn't been the case for the last few years.
   
   > I am curious whether we’ll blow up something or if it’s just going to work.
   
   Just curious, are there any cross-platform tests as well in the pipeline 
that can confirm this? How do we verify this other than running local builds?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2068: LUCENE-8982: Separate out native code to another module to allow cpp build with gradle

2020-11-09 Thread GitBox



zacharymorn commented on a change in pull request #2068:
URL: https://github.com/apache/lucene-solr/pull/2068#discussion_r520273368



##
File path: lucene/misc/native/build.gradle
##
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This gets separated out from misc module into a native module due to 
incompatibility between cpp-library and java-library plugins.
+ * For details, please see 
https://github.com/gradle/gradle-native/issues/352#issuecomment-461724948
+ */
+import org.apache.tools.ant.taskdefs.condition.Os
+
+description = 'Module for native code'
+
+apply plugin: 'cpp-library'
+
+library {
+  baseName = 'NativePosixUtil'
+
+  // Native build for Windows platform will be added in later stage
+  targetMachines = [
+  machines.linux.x86_64,
+  machines.macOS.x86_64,
+  machines.windows.x86_64
+  ]
+
+  // Point at platform-specific sources. Other platforms will be ignored
+  // (plugin won't find the toolchain).
+  if (Os.isFamily(Os.FAMILY_WINDOWS)) {
+source.from file("${projectDir}/src/main/windows")
+  } else if (Os.isFamily(Os.FAMILY_UNIX) || Os.isFamily(Os.FAMILY_MAC)) {
+source.from file("${projectDir}/src/main/posix")
+  }
+}
+
+tasks.withType(CppCompile).configureEach {
+  def javaHome = 
rootProject.ext.runtimeJava.getInstallationDirectory().getAsFile().getPath()
+
+  // Assume standard openjdk layout. This means only one architecture-specific 
include folder
+  // is present.
+  systemIncludes.from file("${javaHome}/include")
+
+  for (def path : [
+  file("${javaHome}/include/win32"),
+  file("${javaHome}/include/darwin"),
+  file("${javaHome}/include/linux"),
+  file("${javaHome}/include/solaris")]) {
+if (path.exists()) {
+  systemIncludes.from path
+}
+  }
+
+  compilerArgs.add '-fPIC'

Review comment:
   Just curious, shall we also modify the compiler args when it’s on 
Windows, to be the same with what's used before?
   
   
https://github.com/apache/lucene-solr/blob/ec9a659845973a0dd0ee7c04e0075db818ed118d/lucene/misc/src/java/org/apache/lucene/store/WindowsDirectory.java#L31-L35
 
   
   A quick search shows that some of these flags might be specific to MinGW 
compiler though, so I'm not sure if these flags are still relevant.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9590) Add javadoc for Lucene86PointsFormat class

2020-11-09 Thread Lu Xugang (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228980#comment-17228980
 ] 

Lu Xugang commented on LUCENE-9590:
---

here the 
link：https://www.amazingkoala.com.cn/Lucene_Document/IndexFile/2020/1104/175.html

> Add javadoc for  Lucene86PointsFormat class
> ---
>
> Key: LUCENE-9590
> URL: https://issues.apache.org/jira/browse/LUCENE-9590
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: core/codecs
>Reporter: Lu Xugang
>Priority: Minor
> Attachments: 1.png
>
>
> I would like to add javadoc for Lucene86PointsFormat class,  it is really 
> helpful for source reader to understand the data structure with point value, 
> is anyone doing this or plan?
> The attachment list part of the data structure （filled with color means it 
> has sub data structure）
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

80 matches

Mail list logo