[GitHub] [lucene] shubhamvishu commented on pull request #11832: Added static factory method for loading VectorValues

2022-10-19 Thread GitBox


shubhamvishu commented on PR #11832:
URL: https://github.com/apache/lucene/pull/11832#issuecomment-1283725512

   No problem at all. I get it it makes sense to not do this right now. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


iverase commented on PR #11861:
URL: https://github.com/apache/lucene/pull/11861#issuecomment-1283787674

   > Do the monster tests get run regularly (perhaps during nightly builds)?
   
   I thought they were running weekly or monthly but checked Apache and 
Policeman CI and they don't seem to be running.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-10-19 Thread GitBox


rmuir commented on PR #11847:
URL: https://github.com/apache/lucene/pull/11847#issuecomment-1283793590

   Sorry dsmiley, clearly you want this change, but I don't have to justify my 
hate for memory leaks. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #11853: Make CJKAnalyzer that use Trigram instead of Bigram

2022-10-19 Thread GitBox


rmuir commented on issue #11853:
URL: https://github.com/apache/lucene/issues/11853#issuecomment-1283800342

   If you want to do things like trigrams, just use n-gram tokenizer instead...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #11856: Fix nanos to millis conversion for tests

2022-10-19 Thread GitBox


rmuir commented on code in PR #11856:
URL: https://github.com/apache/lucene/pull/11856#discussion_r999273020


##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -4190,7 +4190,7 @@ private static Status.SoftDeletsStatus checkSoftDeletes(
   }
 
   private static double nsToSec(long ns) {
-return ns / 10.0;
+return ns / (double) TimeUnit.SECONDS.toNanos(1);

Review Comment:
   Nothing calls this gazillions of times, its only called once per "message" 
printed from checkindex, after it does a ton of work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


rmuir commented on code in PR #11861:
URL: https://github.com/apache/lucene/pull/11861#discussion_r999278417


##
lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java:
##
@@ -175,7 +175,7 @@ private void validateFieldEntry(FieldInfo info, FieldEntry 
fieldEntry) {
   case BYTE -> Byte.BYTES;
   case FLOAT32 -> Float.BYTES;
 };
-int numBytes = fieldEntry.size * dimension * byteSize;
+long numBytes = (long) fieldEntry.size * dimension * byteSize;

Review Comment:
   so we got bit by overflow once here, let's learn from the lesson and use 
Math.multiplyexact. its just validation code and not performance sensitive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


rmuir commented on PR #11861:
URL: https://github.com/apache/lucene/pull/11861#issuecomment-1283842393

   Without a test, I can't tell that the PR fixes the issue. There might be 
more problems lurking in other vectors code. We shouldn't play whack-a-mole 
with releases. I think we should add a Test2BXXX or similar and confirm that 
"large segments" work correctly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] donnerpeter merged pull request #11859: hunspell: speed up GeneratingSuggester by not deserializing non-suggestible roots

2022-10-19 Thread GitBox


donnerpeter merged PR #11859:
URL: https://github.com/apache/lucene/pull/11859


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cuckooM closed pull request #639: Solve the problem of highlighting Chinese inaccurately.

2022-10-19 Thread GitBox


cuckooM closed pull request #639: Solve the problem of highlighting Chinese 
inaccurately.
URL: https://github.com/apache/lucene-solr/pull/639


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


uschindler commented on code in PR #11861:
URL: https://github.com/apache/lucene/pull/11861#discussion_r999447512


##
lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java:
##
@@ -175,7 +175,7 @@ private void validateFieldEntry(FieldInfo info, FieldEntry 
fieldEntry) {
   case BYTE -> Byte.BYTES;
   case FLOAT32 -> Float.BYTES;
 };
-int numBytes = fieldEntry.size * dimension * byteSize;
+long numBytes = (long) fieldEntry.size * dimension * byteSize;

Review Comment:
   Yes please always use Math.multiplyExact(). It is a method call, but it is 
optimized away as it is an intrinsic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on a diff in pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


jtibshirani commented on code in PR #11861:
URL: https://github.com/apache/lucene/pull/11861#discussion_r999637833


##
lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java:
##
@@ -175,7 +175,7 @@ private void validateFieldEntry(FieldInfo info, FieldEntry 
fieldEntry) {
   case BYTE -> Byte.BYTES;
   case FLOAT32 -> Float.BYTES;
 };
-int numBytes = fieldEntry.size * dimension * byteSize;
+long numBytes = (long) fieldEntry.size * dimension * byteSize;

Review Comment:
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on a diff in pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


jtibshirani commented on code in PR #11861:
URL: https://github.com/apache/lucene/pull/11861#discussion_r999709822


##
lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java:
##
@@ -175,7 +175,7 @@ private void validateFieldEntry(FieldInfo info, FieldEntry 
fieldEntry) {
   case BYTE -> Byte.BYTES;
   case FLOAT32 -> Float.BYTES;
 };
-int numBytes = fieldEntry.size * dimension * byteSize;
+long numBytes = (long) fieldEntry.size * dimension * byteSize;

Review Comment:
   I realized I don't totally understand this suggestion. We have three `ints` 
and we want to compute the product as a `long`. Are you suggesting something 
like this?
   
   ```
   long numBytes = Math.multiplyExact(Math.multiplyExact((long) 
fieldEntry.size, dimension), byteSize);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on a diff in pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


jtibshirani commented on code in PR #11861:
URL: https://github.com/apache/lucene/pull/11861#discussion_r999709822


##
lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java:
##
@@ -175,7 +175,7 @@ private void validateFieldEntry(FieldInfo info, FieldEntry 
fieldEntry) {
   case BYTE -> Byte.BYTES;
   case FLOAT32 -> Float.BYTES;
 };
-int numBytes = fieldEntry.size * dimension * byteSize;
+long numBytes = (long) fieldEntry.size * dimension * byteSize;

Review Comment:
   I realized I don't totally understand this suggestion. We have three `ints` 
and we want to compute the product as a `long`. Are you suggesting something 
like this?
   
   ```java
   long numBytes = Math.multiplyExact(Math.multiplyExact((long) 
fieldEntry.size, dimension), byteSize);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on a diff in pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


jtibshirani commented on code in PR #11861:
URL: https://github.com/apache/lucene/pull/11861#discussion_r999709822


##
lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java:
##
@@ -175,7 +175,7 @@ private void validateFieldEntry(FieldInfo info, FieldEntry 
fieldEntry) {
   case BYTE -> Byte.BYTES;
   case FLOAT32 -> Float.BYTES;
 };
-int numBytes = fieldEntry.size * dimension * byteSize;
+long numBytes = (long) fieldEntry.size * dimension * byteSize;

Review Comment:
   I realized I don't totally understand this suggestion. We have three `ints` 
and we want to compute the product as a `long`. It's totally fine if the 
calculated data size exceeds `Integer.MAX_VALUE`.
   
   Are you suggesting something like this?
   
   ```java
   long numBytes = Math.multiplyExact(Math.multiplyExact((long) 
fieldEntry.size, dimension), byteSize);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite

2022-10-19 Thread GitBox


zhaih commented on PR #11840:
URL: https://github.com/apache/lucene/pull/11840#issuecomment-1284300487

   > About the backport to 9.x I will help soon, at moment I am not well due to 
COVID after a conference last week.
   
   Thanks Uwe! No hurries and take care!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih merged pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite

2022-10-19 Thread GitBox


zhaih merged PR #11840:
URL: https://github.com/apache/lucene/pull/11840


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on issue #11838: Adding concurrency to query rewrite?

2022-10-19 Thread GitBox


zhaih commented on issue #11838:
URL: https://github.com/apache/lucene/issues/11838#issuecomment-1284303658

   Change to main branch merged, keep this open until backport finishes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


rmuir commented on code in PR #11861:
URL: https://github.com/apache/lucene/pull/11861#discussion_r999716253


##
lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java:
##
@@ -175,7 +175,7 @@ private void validateFieldEntry(FieldInfo info, FieldEntry 
fieldEntry) {
   case BYTE -> Byte.BYTES;
   case FLOAT32 -> Float.BYTES;
 };
-int numBytes = fieldEntry.size * dimension * byteSize;
+long numBytes = (long) fieldEntry.size * dimension * byteSize;

Review Comment:
   yeah. Maybe on two separate lines so that if it overflows, the stacktrace 
lets you know exactly which part, for easier debugging?
   
   it may sound like paranoia, but multiplication is rather dangerous with 
overflow so I think it would be worth it for code like this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk commented on pull request #1676: SOLR-13973: Depricate Tika support in 8.7

2022-10-19 Thread GitBox


risdenk commented on PR #1676:
URL: https://github.com/apache/lucene-solr/pull/1676#issuecomment-1284317351

   https://issues.apache.org/jira/browse/SOLR-13973 decided not to move forward 
with this. Also would need to apply to apache/solr repo instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk closed pull request #1676: SOLR-13973: Depricate Tika support in 8.7

2022-10-19 Thread GitBox


risdenk closed pull request #1676: SOLR-13973: Depricate Tika support in 8.7
URL: https://github.com/apache/lucene-solr/pull/1676


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk commented on pull request #1383: SOLR-14367: Updated Tika version to 1.24

2022-10-19 Thread GitBox


risdenk commented on PR #1383:
URL: https://github.com/apache/lucene-solr/pull/1383#issuecomment-1284319127

   This has been upgraded elsewhere along the way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk closed pull request #1383: SOLR-14367: Updated Tika version to 1.24

2022-10-19 Thread GitBox


risdenk closed pull request #1383: SOLR-14367: Updated Tika version to 1.24
URL: https://github.com/apache/lucene-solr/pull/1383


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk closed pull request #1622: SOLR-14603: update Restlet version

2022-10-19 Thread GitBox


risdenk closed pull request #1622: SOLR-14603: update Restlet version 
URL: https://github.com/apache/lucene-solr/pull/1622


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk commented on pull request #1622: SOLR-14603: update Restlet version

2022-10-19 Thread GitBox


risdenk commented on PR #1622:
URL: https://github.com/apache/lucene-solr/pull/1622#issuecomment-1284321402

   Closing since sasys this was merged - also restlet was removed down the line 
anyway - https://issues.apache.org/jira/browse/SOLR-14659


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk commented on pull request #1858: LUCENE-6744: equals methods should compare classes directly, not use instanceof

2022-10-19 Thread GitBox


risdenk commented on PR #1858:
URL: https://github.com/apache/lucene-solr/pull/1858#issuecomment-1284330471

   So interestingly errorprone for Solr ends up going the other direction - 
https://errorprone.info/bugpattern/EqualsGetClass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk closed pull request #1858: LUCENE-6744: equals methods should compare classes directly, not use instanceof

2022-10-19 Thread GitBox


risdenk closed pull request #1858: LUCENE-6744: equals methods should compare 
classes directly, not use instanceof
URL: https://github.com/apache/lucene-solr/pull/1858


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] risdenk commented on issue #7802: equals methods should compare classes directly, not use instanceof [LUCENE-6744]

2022-10-19 Thread GitBox


risdenk commented on issue #7802:
URL: https://github.com/apache/lucene/issues/7802#issuecomment-1284331456

   So at least for Solr - errorprone suggest going the other direction - 
https://errorprone.info/bugpattern/EqualsGetClass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on a diff in pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


jtibshirani commented on code in PR #11861:
URL: https://github.com/apache/lucene/pull/11861#discussion_r999743022


##
lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java:
##
@@ -175,7 +175,7 @@ private void validateFieldEntry(FieldInfo info, FieldEntry 
fieldEntry) {
   case BYTE -> Byte.BYTES;
   case FLOAT32 -> Float.BYTES;
 };
-int numBytes = fieldEntry.size * dimension * byteSize;
+long numBytes = (long) fieldEntry.size * dimension * byteSize;

Review Comment:
   Thanks! I'm all about paranoia now after tracking down this issue 😅 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


rmuir commented on code in PR #11861:
URL: https://github.com/apache/lucene/pull/11861#discussion_r999752419


##
lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsReader.java:
##
@@ -175,7 +175,7 @@ private void validateFieldEntry(FieldInfo info, FieldEntry 
fieldEntry) {
   case BYTE -> Byte.BYTES;
   case FLOAT32 -> Float.BYTES;
 };
-int numBytes = fieldEntry.size * dimension * byteSize;
+long numBytes = (long) fieldEntry.size * dimension * byteSize;

Review Comment:
   thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


jtibshirani commented on PR #11861:
URL: https://github.com/apache/lucene/pull/11861#issuecomment-1284348388

   I'm working on a monster test `TestManyKnnVectors` that indexes a bunch of 
vectors and force merges. I didn't mention this explicitly, but I also did 
extensive testing using the Elasticsearch benchmarks that caught the issue, and 
validated everything works given this fix. Definitely not whacking any moles 
this release :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


rmuir commented on PR #11861:
URL: https://github.com/apache/lucene/pull/11861#issuecomment-1284358987

   Thank you. There are still many benefits to testing (e.g. -ea flag) vs a 
benchmark which is generally less picky. Also if there are different supported 
modes (e.g. 4-byte vs 1-byte) for vectors, in a way that might have different 
logic/bugs, maybe we can be more exhaustive in the test with it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih opened a new issue, #11862: Concurrent rewrite for KnnVectorQuery

2022-10-19 Thread GitBox


zhaih opened a new issue, #11862:
URL: https://github.com/apache/lucene/issues/11862

   ### Description
   
   #11840 allows query rewrite to be parallelized, we should try to have an 
implementation for KNN query make use of that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani opened a new issue, #11863: Add large-scale test for kNN vectors

2022-10-19 Thread GitBox


jtibshirani opened a new issue, #11863:
URL: https://github.com/apache/lucene/issues/11863

   We recently had a regression where the kNN vectors format validation could 
fail on large segments. We didn't catch this in testing or nightly performance 
benchmarks because they didn't produce large enough segments to trigger the bug.
   
   We should add a "monster" test similar to `Test4BBKDPoints` that indexes a 
large dataset and maybe does a force merge. When tests are labelled `@Monster` 
they won't run as part of the regular build and are allowed to take much longer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


jtibshirani commented on PR #11861:
URL: https://github.com/apache/lucene/pull/11861#issuecomment-1284401741

   Thanks for the reviews! I started to write a monster test, but it will take 
some time since the iteration cycle is long (each run can take 2+ hours). I'd 
like to merge this now and get started on a release vote, then follow up with a 
test. I filed https://github.com/apache/lucene/issues/11863 so we don't drop 
it. As I mentioned, I did a thorough local test to confirm the fix, and that 
there aren't other lurking bugs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on a diff in pull request #11856: Fix nanos to millis conversion for tests

2022-10-19 Thread GitBox


matriv commented on code in PR #11856:
URL: https://github.com/apache/lucene/pull/11856#discussion_r999887367


##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -4190,7 +4190,7 @@ private static Status.SoftDeletsStatus checkSoftDeletes(
   }
 
   private static double nsToSec(long ns) {
-return ns / 10.0;
+return ns / (double) TimeUnit.SECONDS.toNanos(1);

Review Comment:
   Thx, Could you please check the PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani closed issue #11858: Lucene94HnswVectorsFormat validation fails with large datasets

2022-10-19 Thread GitBox


jtibshirani closed issue #11858: Lucene94HnswVectorsFormat validation fails 
with large datasets
URL: https://github.com/apache/lucene/issues/11858


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani merged pull request #11861: Fix Lucene94HnswVectorsFormat validation on large segments

2022-10-19 Thread GitBox


jtibshirani merged PR #11861:
URL: https://github.com/apache/lucene/pull/11861


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stevenschlansker commented on pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

2022-10-19 Thread GitBox


stevenschlansker commented on PR #11822:
URL: https://github.com/apache/lucene/pull/11822#issuecomment-1284708755

   OK - I did run `./gradlew check` so I don't think I broke anything, but 
please let me know if it does end up being related!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

2022-10-19 Thread GitBox


zhaih commented on PR #11822:
URL: https://github.com/apache/lucene/pull/11822#issuecomment-1284755158

   I tried to reproduce the issue but couldn't. So likely a transient or
   extremely rare test failure, and should not be related to the PR
   
   On Wed, Oct 19, 2022, 16:44 Steven Schlansker ***@***.***>
   wrote:
   
   > OK - I did run ./gradlew check so I don't think I broke anything, but
   > please let me know if it does end up being related!
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you modified the open/close state.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org