Re: [PR] Use growNoCopy in some places [lucene]

2024-01-19 Thread via GitHub


easyice commented on code in PR #12951:
URL: https://github.com/apache/lucene/pull/12951#discussion_r1458564693


##
lucene/core/src/java/org/apache/lucene/util/fst/Util.java:
##


Review Comment:
   Nice find! Thank you @epotyom !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] What if we pick up segments in segment size's ascending order in TieredMergePolicy.doFindMerges? [lucene]

2024-01-19 Thread via GitHub


vsop-479 commented on issue #13022:
URL: https://github.com/apache/lucene/issues/13022#issuecomment-1900060071

   Out of curiosity, i measured the method performance in both pick up order,  
by add a test case in `TestTieredMergePolicy`(without assert):
   
   seg count | baseline(desc pick) | candidate(asc pick) | speedup
   -- | -- | -- | --
   20 | 2422458 | 2215500 | 8.5%
   50 | 2790250 | 2466834 | 11.5%
   120 | 5624833 | 3950459 | 29.7%
   200 | 11594958 | 7412708 | 36.0%
   
   When the segment count is larger than 100 (which is unusual), there is a 
stable speedup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use growNoCopy in some places [lucene]

2024-01-19 Thread via GitHub


easyice commented on PR #12951:
URL: https://github.com/apache/lucene/pull/12951#issuecomment-1900247470

   Thank you for reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Enable CheckIndex to exorcise segments with missing segment infos (.si) (#7820) [lucene]

2024-01-19 Thread via GitHub


gokaai commented on code in PR #12872:
URL: https://github.com/apache/lucene/pull/12872#discussion_r1459040185


##
lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java:
##
@@ -389,13 +386,25 @@ private static void parseSegmentInfos(
 }
 
 long totalDocs = 0;
+SegmentInfo info;
 for (int seg = 0; seg < numSegments; seg++) {
   String segName = input.readString();
   byte[] segmentID = new byte[StringHelper.ID_LENGTH];
   input.readBytes(segmentID, 0, segmentID.length);
   Codec codec = readCodec(input);
-  SegmentInfo info =
-  codec.segmentInfoFormat().read(directory, segName, segmentID, 
IOContext.READ);
+  try {
+info = codec.segmentInfoFormat().read(directory, segName, segmentID, 
IOContext.READ);
+  } catch (ThreadInterruptedException e) {
+throw e;
+  } catch (Exception e) {

Review Comment:
   Catching a general-purpose exception here was tricky !
   - `TestIndexWrtier#testThreadInterruptDeadlock` expected a 
ThreadInterruptedException to be thrown as is
   - `TestTransactions#testTransactions` expected an 'on-purpose' IOException 
to be thrown as is



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Throw CorruptSegmentInfoException on encountering missing segment info (_N.si) file in CheckIndex [lucene]

2024-01-19 Thread via GitHub


gokaai commented on PR #12872:
URL: https://github.com/apache/lucene/pull/12872#issuecomment-1900475365

   I edited the PR description to more accurately represent what is being done. 
We aren't dealing with the exorcism issue yet, but throwing the right exception 
on encountering missing segment info. I will tackle the final step (actually 
making exorcism possible) in a subsequent PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Add Stefan Vodita as committer [lucene-site]

2024-01-19 Thread via GitHub


stefanvodita opened a new pull request, #74:
URL: https://github.com/apache/lucene-site/pull/74

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add Stefan Vodita as committer [lucene-site]

2024-01-19 Thread via GitHub


stefanvodita merged PR #74:
URL: https://github.com/apache/lucene-site/pull/74


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Run BWC indices generation code together with unittest [lucene]

2024-01-19 Thread via GitHub


s1monw merged PR #13023:
URL: https://github.com/apache/lucene/pull/13023


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Split taxonomy arrays across chunks [lucene]

2024-01-19 Thread via GitHub


stefanvodita commented on code in PR #12995:
URL: https://github.com/apache/lucene/pull/12995#discussion_r1459792589


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java:
##
@@ -68,25 +94,66 @@ public TaxonomyIndexArrays(IndexReader reader, 
TaxonomyIndexArrays copyFrom) thr
 // it may be caused if e.g. the taxonomy segments were merged, and so an 
updated
 // NRT reader was obtained, even though nothing was changed. this is not 
very likely
 // to happen.
-int[] copyParents = copyFrom.parents();
-this.parents = new int[reader.maxDoc()];
-System.arraycopy(copyParents, 0, parents, 0, copyParents.length);
-initParents(reader, copyParents.length);
-
+int[][] parentArray = allocateChunkedArray(reader.maxDoc(), 
copyFrom.parents.values.length - 1);
+if (parentArray.length > 0) {
+  copyChunkedArray(copyFrom.parents.values, parentArray);
+  initParents(parentArray, reader, copyFrom.parents.length());
+}
+parents = new ChunkedIntArray(parentArray);
 if (copyFrom.initializedChildren) {
   initChildrenSiblings(copyFrom);
 }
   }
 
+  private static int[][] allocateChunkedArray(int size, int startFrom) {
+if (size == 0) {
+  return new int[0][];
+}
+int chunkCount = size >> CHUNK_SIZE_BITS;
+int fullChunkCount;
+int lastChunkSize = size & CHUNK_MASK;
+if (lastChunkSize == 0) {

Review Comment:
   Thank you for persisting while we're iterating over this method.
   
   Since `fullChunkCount` is assigned `chunkCount` on both branches, why not do 
this:
   ```java
   int fullChunkCount = chunkCount;
   if (lastChunkSize != 0) {
 chunkCount++;
   }
   ```
   
   On a higher level, I think I still wasn't specific enough in my previous 
comment. I didn't mind that we would sometimes have an empty array at the end 
if `size` was a multiple of `CHUNK_SIZE`, but we had if-statements that didn't 
seem to me like they were adding something. In this case, I prefer spending 
those extra bytes if we can make the code simpler. If you think the 
implementation we already have is better, we can keep it, but here is my 
preferred solution written out if you want to consider it:
   
   ```java
   private static int[][] allocateChunkedArray(int size, int startFrom) {
   int chunkCount = (size >> CHUNK_SIZE_BITS) + 1;
   int[][] array = new int[chunkCount][];
   for (int i = startFrom; i < chunkCount - 1; i++) {
   array[i] = new int[CHUNK_SIZE];
   }
   array[chunkCount - 1] = new int[size & CHUNK_MASK];
   return array;
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org