Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1712610080 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/misc/LongValueFacetCutter.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.misc; Review Comment: For other reviewers, resolving this thread - as per discussion here https://github.com/apache/lucene/pull/13568#discussion_r1692249664 we've deprecated `FacetRollup` interface -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1712617910 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/cutters/ranges/LongRangeFacetCutter.java: ## @@ -0,0 +1,413 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.cutters.ranges; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Comparator; +import java.util.List; +import java.util.Objects; +import org.apache.lucene.facet.MultiLongValues; +import org.apache.lucene.facet.MultiLongValuesSource; +import org.apache.lucene.facet.range.LongRange; +import org.apache.lucene.sandbox.facet.cutters.FacetCutter; +import org.apache.lucene.sandbox.facet.cutters.LeafFacetCutter; +import org.apache.lucene.search.LongValues; +import org.apache.lucene.search.LongValuesSource; + +/** {@link RangeFacetCutter} for ranges of long values. It's based on LongRangeCounter class. */ +public abstract class LongRangeFacetCutter extends RangeFacetCutter { + + MultiLongValuesSource valuesSource; + LongValuesSource singleValues; // TODO: refactor - weird that we have both multi and single here. + LongRangeAndPos[] sortedRanges; + + int requestedRangeCount; + + List elementaryIntervals; + + /** elementary interval boundaries used for efficient counting (bsearch to find interval) */ + long[] boundaries; + + int[] pos; + + // Default interval position, when elementary interval is mapped to this interval + // it is skipped. + static final int SKIP_INTERVAL_POSITION = -1; + + /** Create {@link FacetCutter} for provided value source and long ranges. */ + public static LongRangeFacetCutter create( + String field, + MultiLongValuesSource longValuesSource, + LongValuesSource singleLongValuesSource, + LongRange[] longRanges) { +if (areOverlappingRanges(longRanges)) { + return new OverlappingLongRangeFacetCutter( + field, longValuesSource, singleLongValuesSource, longRanges); +} +return new NonOverlappingLongRangeFacetCutter( +field, longValuesSource, singleLongValuesSource, longRanges); + } + + public static LongRangeFacetCutter create( + String field, MultiLongValuesSource longValuesSource, LongRange[] longRanges) { +return create(field, longValuesSource, null, longRanges); + } + + // caller handles conversion of Doubles and DoubleRange to Long and LongRange + // ranges need not be sorted + LongRangeFacetCutter( + String field, + MultiLongValuesSource longValuesSource, + LongValuesSource singleLongValuesSource, + LongRange[] longRanges) { +super(field); +valuesSource = longValuesSource; +if (singleLongValuesSource != null) { + singleValues = singleLongValuesSource; +} else { + singleValues = MultiLongValuesSource.unwrapSingleton(valuesSource); +} + +sortedRanges = new LongRangeAndPos[longRanges.length]; +requestedRangeCount = longRanges.length; + +for (int i = 0; i < longRanges.length; i++) { + sortedRanges[i] = new LongRangeAndPos(longRanges[i], i); +} + +Arrays.sort(this.sortedRanges, Comparator.comparingLong(r -> r.range.min)); +elementaryIntervals = buildElementaryIntervals(); + +// Keep track of elementary interval boundary ends (for binary search) along with the requested +// range they map back to (and -1 when they map to a "gap" range in case of ExclusiveRanges): +boundaries = new long[elementaryIntervals.size()]; +pos = new int[elementaryIntervals.size()]; +Arrays.fill(pos, SKIP_INTERVAL_POSITION); +int currRange = 0; +for (int i = 0; i < boundaries.length; i++) { + boundaries[i] = elementaryIntervals.get(i).end; + if (currRange < sortedRanges.length) { +LongRangeAndPos curr = sortedRanges[currRange]; +if (boundaries[i] == curr.range.max) { + pos[i] = curr.pos; + currRange++; +} + } +} + } + + abstract List buildElementaryIntervals(); + + private static boolean areOverlappingRanges(LongRange[] ranges) { +if (ranges.length == 0) { + return false; +} + +// Copy before sorting so we don't mess with the caller's o
Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]
uschindler commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1712624281 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.io.IOException; +import java.lang.foreign.MemorySegment; +import java.nio.ByteOrder; +import jdk.incubator.vector.LongVector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; +import org.apache.lucene.store.IndexInput; + +final class MemorySegmentPostingDecodingUtil extends PostingDecodingUtil { + + private static final VectorSpecies LONG_SPECIES = + PanamaVectorConstants.PRERERRED_LONG_SPECIES; + + private final IndexInput in; + private final MemorySegment memorySegment; + + MemorySegmentPostingDecodingUtil(IndexInput in, MemorySegment memorySegment) { +this.in = in; +this.memorySegment = memorySegment; + } + + @Override + public void readLongs(int count, long[] b) throws IOException { +if (count < LONG_SPECIES.length()) { + in.readLongs(b, 0, count); +} else { + long offset = in.getFilePointer(); + long endOffset = offset + count * Long.BYTES; + int loopBound = LONG_SPECIES.loopBound(count - 1); + for (int i = 0; + i < loopBound; + i += LONG_SPECIES.length(), offset += LONG_SPECIES.length() * Long.BYTES) { +LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, ByteOrder.LITTLE_ENDIAN) Review Comment: Hi, just a guess: `copyMemory()` has some overhead for very short arrays. The good old ByteBuffers (MappedByteBuffer) had some optimization on ByteBuffer.getLongs() and other methods to copy a few items with a for-loop instead of copyMemory, see https://github.com/openjdk/jdk/blob/358d77dafbe0e35d5b20340fccddc0fb8f3db82a/src/java.base/share/classes/java/nio/X-Buffer.java.template#L954 With MemorySegment theres no such optimization anymore. We discussed this in early times of MemorySegmentUsage, and Murizio at that time suggested to use a simple loop instead of copyMemory for very small arays `<= Bits.JNI_COPY_TO_ARRAY_THRESHOLD`. The value is 6 array entries: https://github.com/openjdk/jdk/blob/358d77dafbe0e35d5b20340fccddc0fb8f3db82a/src/java.base/share/classes/java/nio/Bits.java#L232-L236 It could be that the above code is using a loop and not copyMemory behind scenes and therefor has a small speedup for long arrays < 6 items. We could, separately from that issue implement some "short array scalar optimization" in `MemorySegmentIndexInput#getBytes/getLongs/getFloats/` like in `XBuffer` classes. I'd revert this additional complexity here and better check for optimizing `MemorySegmentIndexInput`. When doing this is would speedup other usages in Lucene, too! It is quite easy to add a simple if/else here: https://github.com/apache/lucene/blob/5aa1aa98ea2cdb37d7d8d48657703a5eb7fbaefa/lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java#L222-L233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]
uschindler commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1712624281 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.io.IOException; +import java.lang.foreign.MemorySegment; +import java.nio.ByteOrder; +import jdk.incubator.vector.LongVector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; +import org.apache.lucene.store.IndexInput; + +final class MemorySegmentPostingDecodingUtil extends PostingDecodingUtil { + + private static final VectorSpecies LONG_SPECIES = + PanamaVectorConstants.PRERERRED_LONG_SPECIES; + + private final IndexInput in; + private final MemorySegment memorySegment; + + MemorySegmentPostingDecodingUtil(IndexInput in, MemorySegment memorySegment) { +this.in = in; +this.memorySegment = memorySegment; + } + + @Override + public void readLongs(int count, long[] b) throws IOException { +if (count < LONG_SPECIES.length()) { + in.readLongs(b, 0, count); +} else { + long offset = in.getFilePointer(); + long endOffset = offset + count * Long.BYTES; + int loopBound = LONG_SPECIES.loopBound(count - 1); + for (int i = 0; + i < loopBound; + i += LONG_SPECIES.length(), offset += LONG_SPECIES.length() * Long.BYTES) { +LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, ByteOrder.LITTLE_ENDIAN) Review Comment: Hi, just a guess: `copyMemory()` has some overhead for very short arrays. The good old ByteBuffers (MappedByteBuffer) had some optimization on ByteBuffer.getLongs() and other methods to copy a few items with a for-loop instead of copyMemory, see https://github.com/openjdk/jdk/blob/358d77dafbe0e35d5b20340fccddc0fb8f3db82a/src/java.base/share/classes/java/nio/X-Buffer.java.template#L954 With MemorySegment theres no such optimization anymore. We discussed this in early times of MemorySegment usage in Lucene, and Maurizio (from OpenJDK) at that time suggested to try out to use a simple loop instead of copyMemory for very small arays `<= Bits.JNI_COPY_TO_ARRAY_THRESHOLD`. The value is 6 array entries: https://github.com/openjdk/jdk/blob/358d77dafbe0e35d5b20340fccddc0fb8f3db82a/src/java.base/share/classes/java/nio/Bits.java#L232-L236 (I did this, but with Mike's benchmarks I did not see a large difference, only < stddev). It could be that the above code is using a loop and not copyMemory behind scenes and therefor has a small speedup for long arrays < 6 items. We could, separately from that issue implement some "short array scalar optimization" in `MemorySegmentIndexInput#getBytes/getLongs/getFloats/` like in `XBuffer` classes. I'd revert this additional complexity here and better check for optimizing `MemorySegmentIndexInput`. When doing this is would speedup other usages in Lucene, too! It is quite easy to add a simple if/else here: https://github.com/apache/lucene/blob/5aa1aa98ea2cdb37d7d8d48657703a5eb7fbaefa/lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java#L222-L233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]
uschindler commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1712624281 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.io.IOException; +import java.lang.foreign.MemorySegment; +import java.nio.ByteOrder; +import jdk.incubator.vector.LongVector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; +import org.apache.lucene.store.IndexInput; + +final class MemorySegmentPostingDecodingUtil extends PostingDecodingUtil { + + private static final VectorSpecies LONG_SPECIES = + PanamaVectorConstants.PRERERRED_LONG_SPECIES; + + private final IndexInput in; + private final MemorySegment memorySegment; + + MemorySegmentPostingDecodingUtil(IndexInput in, MemorySegment memorySegment) { +this.in = in; +this.memorySegment = memorySegment; + } + + @Override + public void readLongs(int count, long[] b) throws IOException { +if (count < LONG_SPECIES.length()) { + in.readLongs(b, 0, count); +} else { + long offset = in.getFilePointer(); + long endOffset = offset + count * Long.BYTES; + int loopBound = LONG_SPECIES.loopBound(count - 1); + for (int i = 0; + i < loopBound; + i += LONG_SPECIES.length(), offset += LONG_SPECIES.length() * Long.BYTES) { +LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, ByteOrder.LITTLE_ENDIAN) Review Comment: Hi, just a guess: `copyMemory()` has some overhead for very short arrays. The good old ByteBuffers (MappedByteBuffer) had some optimization on ByteBuffer.getLongs() and other methods to copy a few items with a for-loop instead of copyMemory, see https://github.com/openjdk/jdk/blob/358d77dafbe0e35d5b20340fccddc0fb8f3db82a/src/java.base/share/classes/java/nio/X-Buffer.java.template#L954 With MemorySegment theres no such optimization anymore. We discussed this in early times of MemorySegment usage in Lucene, and Maurizio (from OpenJDK) at that time suggested to try out to use a simple loop instead of copyMemory for very small arays `<= Bits.JNI_COPY_TO_ARRAY_THRESHOLD`. The value is 6 array entries: https://github.com/openjdk/jdk/blob/358d77dafbe0e35d5b20340fccddc0fb8f3db82a/src/java.base/share/classes/java/nio/Bits.java#L232-L236 (I did this, but with Mike's benchmarks I did not see a large difference, only < stddev). It could be that the above code is using a loop and not copyMemory behind scenes and therefor has a small speedup for long arrays < 6 items. We could, separately from that issue implement some "short array scalar optimization" in `MemorySegmentIndexInput#getBytes/getLongs/getFloats/` like in `XBuffer` classes. I'd revert this additional complexity here and better check for optimizing `MemorySegmentIndexInput`. When doing this is would speedup other usages in Lucene, too! It is quite easy to add a simple if/else here: https://github.com/apache/lucene/blob/5aa1aa98ea2cdb37d7d8d48657703a5eb7fbaefa/lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java#L222-L233 Basically as quick try add a (`if (length <= 6) { super.readLongs(); return; }` here and measure again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Try applying bipartite graph reordering to KNN graph node ids [lucene]
msokolov commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2282182975 Thinking about the implementation a bit I realized that when we reorder the vector storage for the benefit of HNSW we will still need a way to iterate over vector values in docid order, and we need to map from vector ord to docid when searching. None of the existing vector formats handles this: they are optimized for vectors that are stored in docid order. To make some progress, I'd start with an initial implementation that stores these mappings in a naive way, eg as fully-populated arrays, and we can use that to measure how much improvement we see in graph storage size and search performance. Then we could revisit and use some more efficient data structure for the ord/doc mapping. Since the ordinals would no longer be increasing with docid we can't use DirectMonotonicReader/Writer any more, but it needs to be something more like what SortedNumericDocValues does. I'm not super familiar with what we have - I wonder if there is some reusable code that would help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Lucene99FlatVectorsReader.getFloatVectorValues(): NPE: Cannot read field "vectorEncoding" because "fieldEntry" is null [lucene]
msokolov commented on issue #13626: URL: https://github.com/apache/lucene/issues/13626#issuecomment-2282191793 It makes sense to me to add null checks. I don't think PerFieldCodec should be required? The usage seems legit to me - basically it defines a set of params to use with all knn fields. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Unify how missing field entries are handle in knn formats [lucene]
msokolov commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2282193063 > It is possible to inappropriately use the knn formats and attempt to merge segments with mismatched field names. I don't understand this - what's inappropriate about it? I guess we have something like segment 1 has field A and segment 2 has field B. Shouldn't the merged segment have both fields? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] testMergeStability failing for Knn formats [lucene]
msokolov commented on issue #13640: URL: https://github.com/apache/lucene/issues/13640#issuecomment-2282193339 hmm thanks I'll take a look soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] testMergeStability failing for Knn formats [lucene]
msokolov commented on issue #13640: URL: https://github.com/apache/lucene/issues/13640#issuecomment-2282198464 I didn't know about this constraint until now. Basically what happens is during merge we check for disconnected components and attempt to *add* connections to connect them. So it makes sense we might be adding some bytes to the vex file. Maybe a way to avoid this is to skip recreating the HNSW graph when merging a single segment. Honestly I don't know why we would be doing that. I'll dig a bit more/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Two fixes for recently-added HnswGraphBuilder.connectComponents: [lucene]
msokolov opened a new pull request, #13642: URL: https://github.com/apache/lucene/pull/13642 1. properly set frozen flag to avoid re-duplicative work 2. don't try to join a node to itself -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Two fixes for recently-added HnswGraphBuilder.connectComponents: [lucene]
msokolov commented on PR #13642: URL: https://github.com/apache/lucene/pull/13642#issuecomment-2282219044 addresses #13640 with this fix the failing test seed on that issue succeeded. I re-ran the test with tests.iters=1000 and didn't get any fails -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] testMergeStability failing for Knn formats [lucene]
msokolov commented on issue #13640: URL: https://github.com/apache/lucene/issues/13640#issuecomment-2282219274 OK, I guess we would need to actually build a graph when merging a single segment in case there are deletions. In any case it would be nice if the graph reconnection were stable. This test exposes some interesting problems! Yay for our tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Unify how missing field entries are handle in knn formats [lucene]
benwtrent commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2282231690 @msokolov maybe? Looking at the code, this has never worked this way. Field entry existence was just assumed to be handled by the perfield codec. I tried digging into the other formats that have an "per field" version (e.g. docValues,) and when I tested, they also threw a NPE in a scenario like this. So, I opted to keep what seems like the status quo. But, I am happy to change it if I am wrong about intended behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Two fixes for recently-added HnswGraphBuilder.connectComponents: [lucene]
msokolov merged PR #13642: URL: https://github.com/apache/lucene/pull/13642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org