Re: [PR] Use growNoCopy in some places [lucene]
easyice commented on code in PR #12951: URL: https://github.com/apache/lucene/pull/12951#discussion_r1458564693 ## lucene/core/src/java/org/apache/lucene/util/fst/Util.java: ## Review Comment: Nice find! Thank you @epotyom ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] What if we pick up segments in segment size's ascending order in TieredMergePolicy.doFindMerges? [lucene]
vsop-479 commented on issue #13022: URL: https://github.com/apache/lucene/issues/13022#issuecomment-1900060071 Out of curiosity, i measured the method performance in both pick up order, by add a test case in `TestTieredMergePolicy`(without assert): seg count | baseline(desc pick) | candidate(asc pick) | speedup -- | -- | -- | -- 20 | 2422458 | 2215500 | 8.5% 50 | 2790250 | 2466834 | 11.5% 120 | 5624833 | 3950459 | 29.7% 200 | 11594958 | 7412708 | 36.0% When the segment count is larger than 100 (which is unusual), there is a stable speedup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use growNoCopy in some places [lucene]
easyice commented on PR #12951: URL: https://github.com/apache/lucene/pull/12951#issuecomment-1900247470 Thank you for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Enable CheckIndex to exorcise segments with missing segment infos (.si) (#7820) [lucene]
gokaai commented on code in PR #12872: URL: https://github.com/apache/lucene/pull/12872#discussion_r1459040185 ## lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java: ## @@ -389,13 +386,25 @@ private static void parseSegmentInfos( } long totalDocs = 0; +SegmentInfo info; for (int seg = 0; seg < numSegments; seg++) { String segName = input.readString(); byte[] segmentID = new byte[StringHelper.ID_LENGTH]; input.readBytes(segmentID, 0, segmentID.length); Codec codec = readCodec(input); - SegmentInfo info = - codec.segmentInfoFormat().read(directory, segName, segmentID, IOContext.READ); + try { +info = codec.segmentInfoFormat().read(directory, segName, segmentID, IOContext.READ); + } catch (ThreadInterruptedException e) { +throw e; + } catch (Exception e) { Review Comment: Catching a general-purpose exception here was tricky ! - `TestIndexWrtier#testThreadInterruptDeadlock` expected a ThreadInterruptedException to be thrown as is - `TestTransactions#testTransactions` expected an 'on-purpose' IOException to be thrown as is -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Throw CorruptSegmentInfoException on encountering missing segment info (_N.si) file in CheckIndex [lucene]
gokaai commented on PR #12872: URL: https://github.com/apache/lucene/pull/12872#issuecomment-1900475365 I edited the PR description to more accurately represent what is being done. We aren't dealing with the exorcism issue yet, but throwing the right exception on encountering missing segment info. I will tackle the final step (actually making exorcism possible) in a subsequent PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Add Stefan Vodita as committer [lucene-site]
stefanvodita opened a new pull request, #74: URL: https://github.com/apache/lucene-site/pull/74 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add Stefan Vodita as committer [lucene-site]
stefanvodita merged PR #74: URL: https://github.com/apache/lucene-site/pull/74 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Run BWC indices generation code together with unittest [lucene]
s1monw merged PR #13023: URL: https://github.com/apache/lucene/pull/13023 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Split taxonomy arrays across chunks [lucene]
stefanvodita commented on code in PR #12995: URL: https://github.com/apache/lucene/pull/12995#discussion_r1459792589 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java: ## @@ -68,25 +94,66 @@ public TaxonomyIndexArrays(IndexReader reader, TaxonomyIndexArrays copyFrom) thr // it may be caused if e.g. the taxonomy segments were merged, and so an updated // NRT reader was obtained, even though nothing was changed. this is not very likely // to happen. -int[] copyParents = copyFrom.parents(); -this.parents = new int[reader.maxDoc()]; -System.arraycopy(copyParents, 0, parents, 0, copyParents.length); -initParents(reader, copyParents.length); - +int[][] parentArray = allocateChunkedArray(reader.maxDoc(), copyFrom.parents.values.length - 1); +if (parentArray.length > 0) { + copyChunkedArray(copyFrom.parents.values, parentArray); + initParents(parentArray, reader, copyFrom.parents.length()); +} +parents = new ChunkedIntArray(parentArray); if (copyFrom.initializedChildren) { initChildrenSiblings(copyFrom); } } + private static int[][] allocateChunkedArray(int size, int startFrom) { +if (size == 0) { + return new int[0][]; +} +int chunkCount = size >> CHUNK_SIZE_BITS; +int fullChunkCount; +int lastChunkSize = size & CHUNK_MASK; +if (lastChunkSize == 0) { Review Comment: Thank you for persisting while we're iterating over this method. Since `fullChunkCount` is assigned `chunkCount` on both branches, why not do this: ```java int fullChunkCount = chunkCount; if (lastChunkSize != 0) { chunkCount++; } ``` On a higher level, I think I still wasn't specific enough in my previous comment. I didn't mind that we would sometimes have an empty array at the end if `size` was a multiple of `CHUNK_SIZE`, but we had if-statements that didn't seem to me like they were adding something. In this case, I prefer spending those extra bytes if we can make the code simpler. If you think the implementation we already have is better, we can keep it, but here is my preferred solution written out if you want to consider it: ```java private static int[][] allocateChunkedArray(int size, int startFrom) { int chunkCount = (size >> CHUNK_SIZE_BITS) + 1; int[][] array = new int[chunkCount][]; for (int i = startFrom; i < chunkCount - 1; i++) { array[i] = new int[CHUNK_SIZE]; } array[chunkCount - 1] = new int[size & CHUNK_MASK]; return array; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org