Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]
uschindler commented on PR #13636: URL: https://github.com/apache/lucene/pull/13636#issuecomment-2277393866 I cleaned up the constants a bit more, there are still some duplicated in VectorUtilSupport, but I'd change this in the followup PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] `gradlew eclipse` no longer works [lucene]
uschindler commented on issue #13638: URL: https://github.com/apache/lucene/issues/13638#issuecomment-2277400477 Theres also some minor issue: Since this commit, whenever I commit something to the repo it complains about line endings of `versions.toml`: > warning: in the working copy of 'versions.toml', CRLF will be replaced by LF the next time Git touches it This happens after every time I execute Gradle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Knn(Float-->Byte)VectorField javadocs update in KnnByteVectorQuery [lucene]
cpoerschke merged PR #13637: URL: https://github.com/apache/lucene/pull/13637 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]
jpountz commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1711141270 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.io.IOException; +import java.lang.foreign.MemorySegment; +import java.nio.ByteOrder; +import jdk.incubator.vector.LongVector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; +import org.apache.lucene.store.IndexInput; + +final class MemorySegmentPostingDecodingUtil extends PostingDecodingUtil { + + private static final VectorSpecies LONG_SPECIES = + PanamaVectorConstants.PRERERRED_LONG_SPECIES; + + private final IndexInput in; + private final MemorySegment memorySegment; + + MemorySegmentPostingDecodingUtil(IndexInput in, MemorySegment memorySegment) { +this.in = in; +this.memorySegment = memorySegment; + } + + @Override + public void readLongs(int count, long[] b) throws IOException { +if (count < LONG_SPECIES.length()) { + in.readLongs(b, 0, count); +} else { + long offset = in.getFilePointer(); + long endOffset = offset + count * Long.BYTES; + int loopBound = LONG_SPECIES.loopBound(count - 1); + for (int i = 0; + i < loopBound; + i += LONG_SPECIES.length(), offset += LONG_SPECIES.length() * Long.BYTES) { +LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, ByteOrder.LITTLE_ENDIAN) +.intoArray(b, i); + } + + // Handle the tail by reading a vector that is aligned with with `count` on the right side. + int i = count - LONG_SPECIES.length(); + offset = endOffset - LONG_SPECIES.length() * Long.BYTES; + LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, ByteOrder.LITTLE_ENDIAN) + .intoArray(b, i); + + in.seek(endOffset); +} + } + + @Override + public void splitLongs(int count, long[] b, int bShift, long bMask, long[] c, long cMask) + throws IOException { +if (count < LONG_SPECIES.length()) { + // Not enough data to vectorize without going out-of-bounds. In practice, this branch is never + // used if the bit width is 256, and is used for 2 and 3 bits per value if the bit width is + // 512. + in.readLongs(c, 0, count); + for (int i = 0; i < count; ++i) { +b[i] = (c[i] >>> bShift) & bMask; +c[i] &= cMask; + } +} else { + long offset = in.getFilePointer(); + long endOffset = offset + count * Long.BYTES; + int loopBound = LONG_SPECIES.loopBound(count - 1); Review Comment: This is to make sure that the loop doesn't handle the rightmost vector. We could do `loopBound(N)` but then we'd need to protect the code after the loop with `if (loopBound < N)` to avoid doing redundant work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711163320 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/LongRangeFacetCutter.java: ## @@ -0,0 +1,431 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.ranges; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Comparator; +import java.util.List; +import java.util.Objects; +import org.apache.lucene.facet.MultiLongValues; +import org.apache.lucene.facet.MultiLongValuesSource; +import org.apache.lucene.facet.range.LongRange; +import org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter; +import org.apache.lucene.search.LongValues; +import org.apache.lucene.search.LongValuesSource; + +/** {@link RangeFacetCutter} for ranges of long values. * */ +public abstract class LongRangeFacetCutter extends RangeFacetCutter { + + MultiLongValuesSource valuesSource; + LongValuesSource singleValues; // TODO: refactor - weird that we have both multi and single here. + LongRangeAndPos[] sortedRanges; + + int requestedRangeCount; + + List elementaryIntervals; + + long[] boundaries; + int[] pos; + + // Default interval position, when elementary interval is mapped to this interval + // it is skipped. + static final int SKIP_INTERVAL_POSITION = -1; + + // Temporary callers should ensure that passed in single values sources are wrapped + /** add doc * */ + public static LongRangeFacetCutter create( + String field, + MultiLongValuesSource longValuesSource, + LongValuesSource singleLongValuesSource, + LongRange[] longRanges) { +if (areOverlappingRanges(longRanges)) { + return new OverlappingLongRangeFacetCutter( + field, longValuesSource, singleLongValuesSource, longRanges); +} +return new ExclusiveLongRangeFacetCutter( +field, longValuesSource, singleLongValuesSource, longRanges); + } + + public static LongRangeFacetCutter create( + String field, MultiLongValuesSource longValuesSource, LongRange[] longRanges) { +return create(field, longValuesSource, null, longRanges); + } + + // caller handles conversion of Doubles and DoubleRange to Long and LongRange + // ranges need not be sorted + LongRangeFacetCutter( + String field, + MultiLongValuesSource longValuesSource, + LongValuesSource singleLongValuesSource, + LongRange[] longRanges) { +super(field); +valuesSource = longValuesSource; +if (singleLongValuesSource != null) { + singleValues = singleLongValuesSource; +} else { + singleValues = MultiLongValuesSource.unwrapSingleton(valuesSource); +} + +sortedRanges = new LongRangeAndPos[longRanges.length]; +requestedRangeCount = longRanges.length; + +for (int i = 0; i < longRanges.length; i++) { + sortedRanges[i] = new LongRangeAndPos(longRanges[i], i); +} + +Arrays.sort(this.sortedRanges, Comparator.comparingLong(r -> r.range.min)); +elementaryIntervals = buildElementaryIntervals(); Review Comment: I've added javadoc to `buildElementaryIntervals` method -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711174092 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/OverlappingLongRangeFacetCutter.java: ## @@ -0,0 +1,272 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.ranges; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.facet.MultiLongValues; +import org.apache.lucene.facet.MultiLongValuesSource; +import org.apache.lucene.facet.range.LongRange; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter; +import org.apache.lucene.search.LongValues; +import org.apache.lucene.search.LongValuesSource; + +/** + * {@link RangeFacetCutter} for ranges of long value that overlap. Uses segment tree optimisation to + * find all matching ranges for a given value https://blog.mikemccandless.com/2013/12/fast-range-faceting-using-segment-trees.html";>fast-range-faceting- + * using-segment-trees.html + */ +class OverlappingLongRangeFacetCutter extends LongRangeFacetCutter { + private final LongRangeNode root; + + OverlappingLongRangeFacetCutter( + String field, + MultiLongValuesSource longValuesSource, + LongValuesSource singleLongValuesSource, + LongRange[] longRanges) { +super(field, longValuesSource, singleLongValuesSource, longRanges); + +// Build binary tree on top of intervals: +root = split(0, elementaryIntervals.size(), elementaryIntervals); + +// Set outputs, so we know which range to output for each node in the tree: +for (int i = 0; i < sortedRanges.length; i++) { + root.addOutputs(i, sortedRanges[i]); +} + } + + @Override + List buildElementaryIntervals() { +// Maps all range inclusive endpoints to int flags; 1 +// = start of interval, 2 = end of interval. We need to +// track the start vs end case separately because if a +// given point is both, then it must be its own +// elementary interval: +Map endsMap = new HashMap<>(); + +endsMap.put(Long.MIN_VALUE, 1); +endsMap.put(Long.MAX_VALUE, 2); + +for (LongRangeAndPos rangeAndPos : sortedRanges) { Review Comment: I've added a TODO to dudup with original method `OverlappingLongRangeCounter#buildElementaryIntervals` - we can consider refactoring as part of this work. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711194467 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/ExclusiveLongRangeFacetCutter.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.ranges; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import org.apache.lucene.facet.MultiLongValues; +import org.apache.lucene.facet.MultiLongValuesSource; +import org.apache.lucene.facet.range.LongRange; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter; +import org.apache.lucene.search.LongValues; +import org.apache.lucene.search.LongValuesSource; + +/** {@link RangeFacetCutter} for ranges of long value that don't overlap. * */ +class ExclusiveLongRangeFacetCutter extends LongRangeFacetCutter { + ExclusiveLongRangeFacetCutter( + String field, + MultiLongValuesSource longValuesSource, + LongValuesSource singleLongValuesSource, + LongRange[] longRanges) { +super(field, longValuesSource, singleLongValuesSource, longRanges); + } + + @Override + List buildElementaryIntervals() { +List elementaryIntervals = new ArrayList<>(); +long prev = Long.MIN_VALUE; +for (LongRangeAndPos range : sortedRanges) { + if (range.range().min > prev) { +// add a "gap" range preceding requested range if necessary: +elementaryIntervals.add(new InclusiveRange(prev, range.range().min - 1)); + } + // add the requested range: + elementaryIntervals.add(new InclusiveRange(range.range().min, range.range().max)); + prev = range.range().max + 1; +} +if (elementaryIntervals.isEmpty() == false) { + long lastEnd = elementaryIntervals.get(elementaryIntervals.size() - 1).end(); + if (lastEnd < Long.MAX_VALUE) { +elementaryIntervals.add(new InclusiveRange(lastEnd + 1, Long.MAX_VALUE)); + } +} else { + // If no ranges were requested, create a single entry from MIN_VALUE to MAX_VALUE: + elementaryIntervals.add(new InclusiveRange(Long.MIN_VALUE, Long.MAX_VALUE)); +} + +return elementaryIntervals; + } + + @Override + public FacetLeafCutter createLeafCutter(LeafReaderContext context) throws IOException { +if (singleValues != null) { + LongValues values = singleValues.getValues(context, null); + return new ExclusiveLongRangeSinglevalueFacetLeafCutter( + values, boundaries, pos, requestedRangeCount); +} else { + MultiLongValues values = valuesSource.getValues(context); + return new ExclusiveLongRangeMultivalueFacetLeafCutter( + values, boundaries, pos, requestedRangeCount); +} + } + + /** + * TODO: dedup ExclusiveLongRangeMultivalueFacetLeafCutter and Review Comment: I believe we can, but I'd suggest we do it as a follow up task to contain size of the PR - these are package-private static classes, so we are not going to change any API. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711199728 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/LongRangeFacetCutter.java: ## @@ -0,0 +1,431 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.ranges; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Comparator; +import java.util.List; +import java.util.Objects; +import org.apache.lucene.facet.MultiLongValues; +import org.apache.lucene.facet.MultiLongValuesSource; +import org.apache.lucene.facet.range.LongRange; +import org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter; +import org.apache.lucene.search.LongValues; +import org.apache.lucene.search.LongValuesSource; + +/** {@link RangeFacetCutter} for ranges of long values. * */ +public abstract class LongRangeFacetCutter extends RangeFacetCutter { + + MultiLongValuesSource valuesSource; + LongValuesSource singleValues; // TODO: refactor - weird that we have both multi and single here. + LongRangeAndPos[] sortedRanges; + + int requestedRangeCount; + + List elementaryIntervals; + + long[] boundaries; + int[] pos; + + // Default interval position, when elementary interval is mapped to this interval + // it is skipped. + static final int SKIP_INTERVAL_POSITION = -1; + + // Temporary callers should ensure that passed in single values sources are wrapped + /** add doc * */ + public static LongRangeFacetCutter create( + String field, + MultiLongValuesSource longValuesSource, + LongValuesSource singleLongValuesSource, + LongRange[] longRanges) { +if (areOverlappingRanges(longRanges)) { + return new OverlappingLongRangeFacetCutter( + field, longValuesSource, singleLongValuesSource, longRanges); +} +return new ExclusiveLongRangeFacetCutter( +field, longValuesSource, singleLongValuesSource, longRanges); + } + + public static LongRangeFacetCutter create( + String field, MultiLongValuesSource longValuesSource, LongRange[] longRanges) { +return create(field, longValuesSource, null, longRanges); + } + + // caller handles conversion of Doubles and DoubleRange to Long and LongRange + // ranges need not be sorted + LongRangeFacetCutter( + String field, + MultiLongValuesSource longValuesSource, + LongValuesSource singleLongValuesSource, + LongRange[] longRanges) { +super(field); +valuesSource = longValuesSource; +if (singleLongValuesSource != null) { + singleValues = singleLongValuesSource; +} else { + singleValues = MultiLongValuesSource.unwrapSingleton(valuesSource); +} + +sortedRanges = new LongRangeAndPos[longRanges.length]; +requestedRangeCount = longRanges.length; + +for (int i = 0; i < longRanges.length; i++) { + sortedRanges[i] = new LongRangeAndPos(longRanges[i], i); +} + +Arrays.sort(this.sortedRanges, Comparator.comparingLong(r -> r.range.min)); +elementaryIntervals = buildElementaryIntervals(); + +// Keep track of elementary interval boundary ends (for binary search) along with the requested +// range they map back to (and -1 when they map to a "gap" range in case of ExclusiveRanges): +boundaries = new long[elementaryIntervals.size()]; +pos = new int[elementaryIntervals.size()]; +Arrays.fill(pos, SKIP_INTERVAL_POSITION); +int currRange = 0; +for (int i = 0; i < boundaries.length; i++) { + boundaries[i] = elementaryIntervals.get(i).end; + if (currRange < sortedRanges.length) { +LongRangeAndPos curr = sortedRanges[currRange]; +if (boundaries[i] == curr.range.max) { + pos[i] = curr.pos; + currRange++; +} + } +} + } + + abstract List buildElementaryIntervals(); + + private static boolean areOverlappingRanges(LongRange[] ranges) { +if (ranges.length == 0) { + return false; +} + +// Copy before sorting so we don't mess with the caller's original ranges: +LongRange[] sortedRanges = new LongRange[ranges.length]; +System.arraycopy(ranges, 0, sortedRanges, 0, ranges.length); +Arrays.sort(sortedRanges, Comparator.co
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711205288 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/RangeOrdLabelBiMap.java: ## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.ranges; + +import java.io.IOException; +import java.util.List; +import org.apache.lucene.facet.range.Range; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.sandbox.facet.abstracts.OrdLabelBiMap; + +/** {@link OrdLabelBiMap} for ranges. */ +public class RangeOrdLabelBiMap implements OrdLabelBiMap { + + Range[] ranges; + + /** Constructor that takes array of Range objects as input * */ + public RangeOrdLabelBiMap(Range[] inputRanges) { +ranges = inputRanges; + } + + /** Constructor that takes List of Range objects as input * */ + public RangeOrdLabelBiMap(List inputRanges) { Review Comment: Oh I think I misunderstood the suggestion, you suggest to remove the second constructor, not the class. +1, removing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711213814 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/RangeOrdLabelBiMap.java: ## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.ranges; + +import java.io.IOException; +import java.util.List; +import org.apache.lucene.facet.range.Range; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.sandbox.facet.abstracts.OrdLabelBiMap; + +/** {@link OrdLabelBiMap} for ranges. */ +public class RangeOrdLabelBiMap implements OrdLabelBiMap { + + Range[] ranges; + + /** Constructor that takes array of Range objects as input * */ + public RangeOrdLabelBiMap(Range[] inputRanges) { +ranges = inputRanges; + } + + /** Constructor that takes List of Range objects as input * */ + public RangeOrdLabelBiMap(List inputRanges) { +ranges = inputRanges.toArray(new Range[0]); + } + + @Override + public FacetLabel getLabel(int ordinal) throws IOException { +if (ordinal >= 0 && ordinal < ranges.length) { + return new FacetLabel(ranges[ordinal].label); +} +return null; + } + + @Override + public FacetLabel[] getLabels(int[] ordinals) throws IOException { +FacetLabel[] facetLabels = new FacetLabel[ordinals.length]; +for (int i = 0; i < ordinals.length; i++) { + facetLabels[i] = new FacetLabel(ranges[ordinals[i]].label); +} +return facetLabels; + } + + @Override + public int getOrd(FacetLabel label) { +throw new UnsupportedOperationException("Not yet supported for ranges"); Review Comment: As it was suggested by Greg - we can't; so we split OrdLableBiMap into OrdToLabel and LabelToOrd interfaces. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]
uschindler commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1711262142 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.io.IOException; +import java.lang.foreign.MemorySegment; +import java.nio.ByteOrder; +import jdk.incubator.vector.LongVector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; +import org.apache.lucene.store.IndexInput; + +final class MemorySegmentPostingDecodingUtil extends PostingDecodingUtil { + + private static final VectorSpecies LONG_SPECIES = + PanamaVectorConstants.PRERERRED_LONG_SPECIES; + + private final IndexInput in; + private final MemorySegment memorySegment; + + MemorySegmentPostingDecodingUtil(IndexInput in, MemorySegment memorySegment) { +this.in = in; +this.memorySegment = memorySegment; + } + + @Override + public void readLongs(int count, long[] b) throws IOException { +if (count < LONG_SPECIES.length()) { + in.readLongs(b, 0, count); +} else { + long offset = in.getFilePointer(); + long endOffset = offset + count * Long.BYTES; + int loopBound = LONG_SPECIES.loopBound(count - 1); + for (int i = 0; + i < loopBound; + i += LONG_SPECIES.length(), offset += LONG_SPECIES.length() * Long.BYTES) { +LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, ByteOrder.LITTLE_ENDIAN) Review Comment: Why is this better/faster than a simple copyMemory as `in.readLongs()` is doing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]
uschindler commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1711262142 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.io.IOException; +import java.lang.foreign.MemorySegment; +import java.nio.ByteOrder; +import jdk.incubator.vector.LongVector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; +import org.apache.lucene.store.IndexInput; + +final class MemorySegmentPostingDecodingUtil extends PostingDecodingUtil { + + private static final VectorSpecies LONG_SPECIES = + PanamaVectorConstants.PRERERRED_LONG_SPECIES; + + private final IndexInput in; + private final MemorySegment memorySegment; + + MemorySegmentPostingDecodingUtil(IndexInput in, MemorySegment memorySegment) { +this.in = in; +this.memorySegment = memorySegment; + } + + @Override + public void readLongs(int count, long[] b) throws IOException { +if (count < LONG_SPECIES.length()) { + in.readLongs(b, 0, count); +} else { + long offset = in.getFilePointer(); + long endOffset = offset + count * Long.BYTES; + int loopBound = LONG_SPECIES.loopBound(count - 1); + for (int i = 0; + i < loopBound; + i += LONG_SPECIES.length(), offset += LONG_SPECIES.length() * Long.BYTES) { +LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, ByteOrder.LITTLE_ENDIAN) Review Comment: Why is this better/faster than a simple copyMemory as `in.readLongs()` is doing? At the end it is copied to an on-heap array anyways. I was expecting the whole PR to do the splitLongs directly on the MemorySegment and not on on-heap arrays. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711338640 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/LongAggregationsFacetRecorder.java: ## @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.recorders; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Iterator; +import java.util.List; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.internal.hppc.IntCursor; +import org.apache.lucene.internal.hppc.IntObjectHashMap; +import org.apache.lucene.sandbox.facet.abstracts.FacetLeafRecorder; +import org.apache.lucene.sandbox.facet.abstracts.FacetRecorder; +import org.apache.lucene.sandbox.facet.abstracts.FacetRollup; +import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator; +import org.apache.lucene.sandbox.facet.abstracts.Reducer; +import org.apache.lucene.search.LongValues; +import org.apache.lucene.search.LongValuesSource; + +/** + * {@link FacetRecorder} that computes multiple long aggregations per facet. TODO: [premature + * optimization idea] if instead of one array we keep aggregations in two LongVector (one for MAX + * aggregation and one for SUM) we can benefit from SIMD? + */ +public class LongAggregationsFacetRecorder implements FacetRecorder { + + private IntObjectHashMap perOrdinalValues; + private List> leafValues; + + private final LongValuesSource[] longValuesSources; + private final Reducer[] reducers; + + /** Constructor. */ + public LongAggregationsFacetRecorder(LongValuesSource[] longValuesSources, Reducer[] reducers) { +assert longValuesSources.length == reducers.length; +this.longValuesSources = longValuesSources; +this.reducers = reducers; +perOrdinalValues = new IntObjectHashMap<>(); +leafValues = Collections.synchronizedList(new ArrayList<>()); + } + + @Override + public FacetLeafRecorder getLeafRecorder(LeafReaderContext context) throws IOException { +LongValues[] longValues = new LongValues[longValuesSources.length]; +for (int i = 0; i < longValuesSources.length; i++) { + longValues[i] = longValuesSources[i].getValues(context, null); +} +IntObjectHashMap valuesRecorder = new IntObjectHashMap<>(); +leafValues.add(valuesRecorder); +return new LongAggregationsFacetLeafRecorder(longValues, reducers, valuesRecorder); + } + + @Override + public OrdinalIterator recordedOrds() { +if (perOrdinalValues.isEmpty()) { + return null; +} +Iterator ordIterator = perOrdinalValues.keys().iterator(); +return new OrdinalIterator() { + @Override + public int nextOrd() throws IOException { +if (ordIterator.hasNext()) { + return ordIterator.next().value; +} else { + return NO_MORE_ORDS; +} + } +}; + } + + @Override + public boolean isEmpty() { +return perOrdinalValues.isEmpty(); + } + + @Override + public void reduce(FacetRollup facetRollup) throws IOException { +boolean firstElement = true; +for (IntObjectHashMap leafValue : leafValues) { + if (firstElement) { +perOrdinalValues = leafValue; +firstElement = false; + } else { +for (IntObjectHashMap.IntObjectCursor elem : leafValue) { + long[] vals = perOrdinalValues.get(elem.key); + if (vals == null) { +perOrdinalValues.put(elem.key, elem.value); + } else { +for (int i = 0; i < longValuesSources.length; i++) { + vals[i] = reducers[i].reduce(vals[i], elem.value[i]); +} + } +} + } +} +if (firstElement) { + // TODO: do we need empty map by default? + perOrdinalValues = new IntObjectHashMap<>(); +} +// TODO: implement rollup +if (facetRollup != null +&& facetRollup.getDimOrdsToRollup().nextOrd() != OrdinalIterator.NO_MORE_ORDS) { + throw new UnsupportedOperationException("Rollup is required, but not implemented"); +} + } + + public long getRecordedValue(int ord, int valuesId) { +if (valuesId < 0 || valuesId >= longValu
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711348674 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ordinal_iterators/CandidateSetOrdinalIterator.java: ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.ordinal_iterators; + +import java.io.IOException; +import org.apache.lucene.internal.hppc.IntHashSet; +import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator; + +/** + * {@link OrdinalIterator} that filters out ordinals from delegate if they are not in the candidate + * set. + * + * Can be handy to get results only for specific facets. + */ +public class CandidateSetOrdinalIterator implements OrdinalIterator { Review Comment: I've added some unit tests for this class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711347682 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/misc/LongValueFacetCutter.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.misc; + +import java.io.IOException; +import java.util.Iterator; +import java.util.concurrent.atomic.AtomicInteger; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.index.DocValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.SortedNumericDocValues; +import org.apache.lucene.internal.hppc.IntLongHashMap; +import org.apache.lucene.internal.hppc.LongCursor; +import org.apache.lucene.internal.hppc.LongHashSet; +import org.apache.lucene.internal.hppc.LongIntHashMap; +import org.apache.lucene.sandbox.facet.abstracts.FacetCutter; +import org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter; +import org.apache.lucene.sandbox.facet.abstracts.OrdLabelBiMap; + +/** + * {@link FacetCutter} and {@link OrdLabelBiMap} for distinct long values. + * + * TODO: This class is quite inefficient. Will optimise later. TODO: add support for other value Review Comment: I've added some unit tests for this class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711353146 ## lucene/sandbox/src/test/org/apache/lucene/sandbox/facet/SandboxFacetTestCase.java: ## @@ -0,0 +1,407 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.Comparator; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.facet.FacetResult; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.FacetsCollector.MatchingDocs; +import org.apache.lucene.facet.FacetsConfig; +import org.apache.lucene.facet.LabelAndValue; +import org.apache.lucene.facet.taxonomy.FacetLabel; +import org.apache.lucene.facet.taxonomy.TaxonomyFacetLabels; +import org.apache.lucene.facet.taxonomy.TaxonomyFacetLabels.FacetLabelReader; +import org.apache.lucene.facet.taxonomy.TaxonomyReader; +import org.apache.lucene.sandbox.facet.abstracts.OrdLabelBiMap; +import org.apache.lucene.sandbox.facet.abstracts.OrdToComparable; +import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator; +import org.apache.lucene.sandbox.facet.ordinal_iterators.TopnOrdinalIterator; +import org.apache.lucene.sandbox.facet.recorders.CountFacetRecorder; +import org.apache.lucene.sandbox.facet.taxonomy.TaxonomyChildrenOrdinalIterator; +import org.apache.lucene.sandbox.facet.taxonomy.TaxonomyOrdLabelBiMap; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.tests.util.LuceneTestCase; +import org.apache.lucene.tests.util.TestUtil; +import org.apache.lucene.util.BytesRef; + +public abstract class SandboxFacetTestCase extends LuceneTestCase { + // we don't have access to overall count for all facets from count recorder, + // and we can't compute it as a SUM of values for each facet ID because we need to respect cases + // where + // the same doc belongs to multiple facets (e.g. overlapping ranges and + // multi value fields). We can add an extra range that includes everything, + // or consider supporting overall count in CountFacetRecorder. But it is not exactly the value + // we can get now, as this value wouldn't respect top-n cutoff. Is this value a must have facets + // feature? + static final int VALUE_CANT_BE_COMPUTED = -5; + + /** + * Utility method that uses {@link FacetLabelReader} to get facet labels for each hit in {@link + * MatchingDocs}. The method returns {@code List>} where outer list has one entry + * per document and inner list has all {@link FacetLabel} entries that belong to a document. The + * inner list may be empty if no {@link FacetLabel} are found for a hit. + * + * @param taxoReader {@link TaxonomyReader} used to read taxonomy during search. This instance is + * expected to be open for reading. + * @param fc {@link FacetsCollector} A collector with matching hits. + * @param dimension facet dimension for which labels are requested. A null value fetches labels + * for all dimensions. + * @return {@code List} where outer list has one non-null entry per document. and + * inner list contain all {@link FacetLabel} entries that belong to a document. + * @throws IOException when a low-level IO issue occurs. + */ + public List> getAllTaxonomyFacetLabels( + String dimension, TaxonomyReader taxoReader, FacetsCollector fc) throws IOException { +List> actualLabels = new ArrayList<>(); +TaxonomyFacetLabels taxoLabels = +new TaxonomyFacetLabels(taxoReader, FacetsConfig.DEFAULT_INDEX_FIELD_NAME); +for (MatchingDocs m : fc.getMatchingDocs()) { + FacetLabelReader facetLabelReader = taxoLabels.getFacetLabelReader(m.context); + DocIdSetIterator disi = m.bits.iterator(); + while (disi.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) { +actualLabels.add(allFacetLabels(disi.docID(), dimension, facetLabelReader)); + } +} +return actualLabels; + } + + /** + * Utility method to get all facet labels for an input docId and dimension using the supplied + * {@link FacetLabelReader}. +
Re: [PR] Compute facets while collecting [lucene]
stefanvodita commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711375034 ## lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java: ## @@ -45,58 +45,26 @@ class DrillSidewaysQuery extends Query { final Query baseQuery; - final FacetsCollectorManager drillDownCollectorManager; - final FacetsCollectorManager[] drillSidewaysCollectorManagers; - final List managedDrillDownCollectors; - final List managedDrillSidewaysCollectors; + final CollectorOwner drillDownCollectorOwner; + final List> drillSidewaysCollectorOwners; final Query[] drillDownQueries; final boolean scoreSubDocsAtOnce; /** * Construct a new {@code DrillSidewaysQuery} that will create new {@link FacetsCollector}s for - * each {@link LeafReaderContext} using the provided {@link FacetsCollectorManager}s. The caller - * can access the created {@link FacetsCollector}s through {@link #managedDrillDownCollectors} and - * {@link #managedDrillSidewaysCollectors}. + * each {@link LeafReaderContext} using the provided {@link FacetsCollectorManager}s. */ DrillSidewaysQuery( Query baseQuery, - FacetsCollectorManager drillDownCollectorManager, - FacetsCollectorManager[] drillSidewaysCollectorManagers, - Query[] drillDownQueries, - boolean scoreSubDocsAtOnce) { -// Note that the "managed" facet collector lists are synchronized here since bulkScorer() -// can be invoked concurrently and needs to remain thread-safe. We're OK with synchronizing -// on the whole list as contention is expected to remain very low: -this( -baseQuery, -drillDownCollectorManager, -drillSidewaysCollectorManagers, -Collections.synchronizedList(new ArrayList<>()), -Collections.synchronizedList(new ArrayList<>()), -drillDownQueries, -scoreSubDocsAtOnce); - } - - /** - * Needed for {@link Query#rewrite(IndexSearcher)}. Ensures the same "managed" lists get used - * since {@link DrillSideways} accesses references to these through the original {@code - * DrillSidewaysQuery}. - */ - private DrillSidewaysQuery( - Query baseQuery, - FacetsCollectorManager drillDownCollectorManager, - FacetsCollectorManager[] drillSidewaysCollectorManagers, - List managedDrillDownCollectors, - List managedDrillSidewaysCollectors, + CollectorOwner drillDownCollectorOwner, Review Comment: Oh, I see, sorry! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
stefanvodita commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711379107 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/CountFacetRecorder.java: ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.recorders; + +import static org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator.NO_MORE_ORDS; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Iterator; +import java.util.List; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.internal.hppc.IntCursor; +import org.apache.lucene.internal.hppc.IntIntHashMap; +import org.apache.lucene.sandbox.facet.abstracts.FacetLeafRecorder; +import org.apache.lucene.sandbox.facet.abstracts.FacetRecorder; +import org.apache.lucene.sandbox.facet.abstracts.FacetRollup; +import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator; + +/** + * {@link FacetRecorder} to count facets. TODO: add an option to keep counts in an array, to improve + * performance for facets with small number of ordinals e.g. range facets. Options: - {@link + * org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter} can inform {@link FacetLeafRecorder} + * about expected number of facet ordinals ({@link + * org.apache.lucene.sandbox.facet.FacetFieldCollector} can orchestrate that). If expeted facet ord + * number is below some threshold - use array instead of a map? - first 100/1k counts in array, the + * rest - in a map; the limit can also be provided in a constructor? It is similar to what + * LongValuesFacetCounts does today. + */ +public class CountFacetRecorder implements FacetRecorder { + + // TODO: deprecate - it is cheaper to merge during reduce than to lock threads during collection. + // TODO: alternatively, we can consider collecting 2 (3, 4, ..., can be parametrizes) slices to a + // single sync map + // which can reduce thread contention compared to single sync map for all slices; at the + // same time there will + // be less work for reduce method. So far reduce wasn't a bottleneck for us, but it is + // definitely not free. + private final boolean useSyncMap; + + /** Create */ + public CountFacetRecorder() { +this(false); + } + + IntIntHashMap values; + List perLeafValues; + + /** + * Create. + * + * @param useSyncMap if true, use single sync map for all leafs. + */ + public CountFacetRecorder(boolean useSyncMap) { +super(); +if (useSyncMap) { + values = new SafeIntIntHashMap(); +} else { + // Has to be synchronizedList as we have one recorder per all slices. + perLeafValues = Collections.synchronizedList(new ArrayList<>()); +} +this.useSyncMap = useSyncMap; + } + + /** Get count for provided ordinal. */ + public int getCount(int ord) { +// TODO: allow or don't allow missing values? +return values.get(ord); + } + + private static final class SafeIntIntHashMap extends IntIntHashMap { +@Override +public synchronized int addTo(int key, int incrementValue) { + return super.addTo(key, incrementValue); +} + } + + @Override + public FacetLeafRecorder getLeafRecorder(LeafReaderContext context) { +if (useSyncMap) { + return new CountLeafRecorder(values); +} else { + IntIntHashMap leafValues = new IntIntHashMap(); + perLeafValues.add(leafValues); + return new CountLeafRecorder(leafValues); +} + } + + @Override + public OrdinalIterator recordedOrds() { +// TODO: is that performant enough? +// TODO: even if this is called before collection started, we want it to use results from the +// time when nextOrd +// is first called. Does ordIterator work like that? I've run some tests that confirmed +// expected behavior, +// but I'm not sure IntIntMap guarantees that. We should at least add a unit test to make sure +// it always work +// that way. +Iterator ordIterator = values.keys().iterator(); +return new OrdinalIterator() { + @Override + public int nextOrd() throws IOException { +if (ordIterator.hasNext(
Re: [PR] Compute facets while collecting [lucene]
stefanvodita commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711380093 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/OverlappingLongRangeFacetCutter.java: ## @@ -0,0 +1,272 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.ranges; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.facet.MultiLongValues; +import org.apache.lucene.facet.MultiLongValuesSource; +import org.apache.lucene.facet.range.LongRange; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter; +import org.apache.lucene.search.LongValues; +import org.apache.lucene.search.LongValuesSource; + +/** + * {@link RangeFacetCutter} for ranges of long value that overlap. Uses segment tree optimisation to + * find all matching ranges for a given value https://blog.mikemccandless.com/2013/12/fast-range-faceting-using-segment-trees.html";>fast-range-faceting- + * using-segment-trees.html + */ +class OverlappingLongRangeFacetCutter extends LongRangeFacetCutter { + private final LongRangeNode root; + + OverlappingLongRangeFacetCutter( + String field, + MultiLongValuesSource longValuesSource, + LongValuesSource singleLongValuesSource, + LongRange[] longRanges) { +super(field, longValuesSource, singleLongValuesSource, longRanges); + +// Build binary tree on top of intervals: +root = split(0, elementaryIntervals.size(), elementaryIntervals); + +// Set outputs, so we know which range to output for each node in the tree: +for (int i = 0; i < sortedRanges.length; i++) { + root.addOutputs(i, sortedRanges[i]); +} + } + + @Override + List buildElementaryIntervals() { +// Maps all range inclusive endpoints to int flags; 1 +// = start of interval, 2 = end of interval. We need to +// track the start vs end case separately because if a +// given point is both, then it must be its own +// elementary interval: +Map endsMap = new HashMap<>(); + +endsMap.put(Long.MIN_VALUE, 1); +endsMap.put(Long.MAX_VALUE, 2); + +for (LongRangeAndPos rangeAndPos : sortedRanges) { Review Comment: Sure, seems reasonable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711382272 ## lucene/core/src/java/org/apache/lucene/search/CollectorOwner.java: ## @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Collections; +import java.util.List; + +/** + * Like {@link CollectorManager}, but it owns the collectors its manager creates. It is convenient + * that clients of the class don't have to worry about keeping the list of collectors, as well as + * about making the collectors type (C) compatible when reduce is called. Instance of this class + * also caches results of {@link CollectorManager#reduce(Collection)}. + * + * Note that instance of this class ignores any {@link Collector} created by {@link + * CollectorManager#newCollector()} directly, not through {@link #newCollector()} + * + * @lucene.experimental + */ +public final class CollectorOwner { Review Comment: > Ahhh OK thank you for the explanation. I like the new `CollectorOwner` ... I just wish we could somehow iterate `CollectorManager` towards that, or deprecate / phase out `CollectorManager` somehow, or ... I'm not sure how to do that TBH. Changing `CollectorManager` to keep its collectors means that we either have to make it abstract class (which might not be convenient for users) or make sure each implementation keeps a list of its collectors, which adds some code duplication? Alternatively we could rename it to something like `CollectorSource` (similarly to DoubleValues**Source**) or `CollectorSupplier`? Although `reduce` method doesn't seem to fit into Source/Supplier category well. > But it seems worth at least creating an actual issue to track this TODO and see if we can converge at some point? Good point, I'll create an issue when/if this PR is merged, just in case we come up with a better solution as part of this PR later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
stefanvodita commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711388569 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/LongAggregationsFacetRecorder.java: ## @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.recorders; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Iterator; +import java.util.List; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.internal.hppc.IntCursor; +import org.apache.lucene.internal.hppc.IntObjectHashMap; +import org.apache.lucene.sandbox.facet.abstracts.FacetLeafRecorder; +import org.apache.lucene.sandbox.facet.abstracts.FacetRecorder; +import org.apache.lucene.sandbox.facet.abstracts.FacetRollup; +import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator; +import org.apache.lucene.sandbox.facet.abstracts.Reducer; +import org.apache.lucene.search.LongValues; +import org.apache.lucene.search.LongValuesSource; + +/** + * {@link FacetRecorder} that computes multiple long aggregations per facet. TODO: [premature + * optimization idea] if instead of one array we keep aggregations in two LongVector (one for MAX + * aggregation and one for SUM) we can benefit from SIMD? + */ +public class LongAggregationsFacetRecorder implements FacetRecorder { + + private IntObjectHashMap perOrdinalValues; + private List> leafValues; + + private final LongValuesSource[] longValuesSources; + private final Reducer[] reducers; + + /** Constructor. */ + public LongAggregationsFacetRecorder(LongValuesSource[] longValuesSources, Reducer[] reducers) { +assert longValuesSources.length == reducers.length; +this.longValuesSources = longValuesSources; +this.reducers = reducers; +perOrdinalValues = new IntObjectHashMap<>(); +leafValues = Collections.synchronizedList(new ArrayList<>()); + } + + @Override + public FacetLeafRecorder getLeafRecorder(LeafReaderContext context) throws IOException { +LongValues[] longValues = new LongValues[longValuesSources.length]; +for (int i = 0; i < longValuesSources.length; i++) { + longValues[i] = longValuesSources[i].getValues(context, null); +} +IntObjectHashMap valuesRecorder = new IntObjectHashMap<>(); +leafValues.add(valuesRecorder); +return new LongAggregationsFacetLeafRecorder(longValues, reducers, valuesRecorder); + } + + @Override + public OrdinalIterator recordedOrds() { +if (perOrdinalValues.isEmpty()) { + return null; +} +Iterator ordIterator = perOrdinalValues.keys().iterator(); +return new OrdinalIterator() { + @Override + public int nextOrd() throws IOException { +if (ordIterator.hasNext()) { + return ordIterator.next().value; +} else { + return NO_MORE_ORDS; +} + } +}; + } + + @Override + public boolean isEmpty() { +return perOrdinalValues.isEmpty(); + } + + @Override + public void reduce(FacetRollup facetRollup) throws IOException { +boolean firstElement = true; +for (IntObjectHashMap leafValue : leafValues) { + if (firstElement) { +perOrdinalValues = leafValue; +firstElement = false; + } else { +for (IntObjectHashMap.IntObjectCursor elem : leafValue) { + long[] vals = perOrdinalValues.get(elem.key); + if (vals == null) { +perOrdinalValues.put(elem.key, elem.value); + } else { +for (int i = 0; i < longValuesSources.length; i++) { + vals[i] = reducers[i].reduce(vals[i], elem.value[i]); +} + } +} + } +} +if (firstElement) { + // TODO: do we need empty map by default? + perOrdinalValues = new IntObjectHashMap<>(); +} +// TODO: implement rollup +if (facetRollup != null +&& facetRollup.getDimOrdsToRollup().nextOrd() != OrdinalIterator.NO_MORE_ORDS) { + throw new UnsupportedOperationException("Rollup is required, but not implemented"); +} + } + + public long getRecordedValue(int ord, int valuesId) { +if (valuesId < 0 || valuesId >= lon
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711547954 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ordinals/OrdinalGetter.java: ## @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.ordinals; + +/** Interface to return an ordinal. */ +public interface OrdinalGetter { Review Comment: I've made the change in this commit, please let me know what you think! https://github.com/apache/lucene/pull/13568/commits/9d158e225346d31cebb0e4eea32d5c425c155ca3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]
gsmiller commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1711646318 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultPostingDecodingUtil.java: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.io.IOException; +import org.apache.lucene.store.IndexInput; + +final class DefaultPostingDecodingUtil extends PostingDecodingUtil { + + protected final IndexInput in; + + public DefaultPostingDecodingUtil(IndexInput in) { +this.in = in; + } + + @Override + public void splitLongs(int count, long[] b, int bShift, long bMask, long[] c, long cMask) + throws IOException { +assert count <= 64; +in.readLongs(c, 0, count); +// The below loop is auto-vectorized +for (int i = 0; i < count; ++i) { + b[i] = (c[i] >>> bShift) & bMask; + c[i] &= cMask; Review Comment: Thanks @jpountz I think the version you landed on makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] These attributes are better for the final state(#13628) [lucene]
gsmiller commented on code in PR #13630: URL: https://github.com/apache/lucene/pull/13630#discussion_r1711652689 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene99/Lucene99SkipWriter.java: ## @@ -46,8 +46,8 @@ * uptos(position, payload). 4. start offset. */ public final class Lucene99SkipWriter extends MultiLevelSkipListWriter { - private int[] lastSkipDoc; - private long[] lastSkipDocPointer; + private final int[] lastSkipDoc; + private final long[] lastSkipDocPointer; private long[] lastSkipPosPointer; private long[] lastSkipPayPointer; Review Comment: @mrhbj were you planning to push an update to this PR that also marks these fields `final`? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] These attributes are better for the final state(#13628) [lucene]
mrhbj commented on PR #13630: URL: https://github.com/apache/lucene/pull/13630#issuecomment-2278184332 ok i will do this ---Original--- From: "Greg ***@***.***> Date: Fri, Aug 9, 2024 23:12 PM To: ***@***.***>; Cc: ***@***.**@***.***>; Subject: Re: [apache/lucene] These attributes are better for the finalstate(#13628) (PR #13630) @gsmiller commented on this pull request. In lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene99/Lucene99SkipWriter.java: >private long[] lastSkipPosPointer;private long[] lastSkipPayPointer; @mrhbj were you planning to push an update to this PR that also marks these fields final? Thanks! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] These attributes are better for the final state(#13628)(#13630) [lucene]
mrhbj opened a new pull request, #13639: URL: https://github.com/apache/lucene/pull/13639 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] These attributes are better for the final state(#13628)(#13630) [lucene]
mrhbj commented on PR #13639: URL: https://github.com/apache/lucene/pull/13639#issuecomment-2278225033 @gsmiller thanks for your advise. I think you are right. So I do this. Could you please review it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1711693361 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/FacetFieldCollectorManager.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet; + +import java.io.IOException; +import java.util.Collection; +import org.apache.lucene.sandbox.facet.cutters.FacetCutter; +import org.apache.lucene.sandbox.facet.misc.FacetRollup; +import org.apache.lucene.sandbox.facet.recorders.FacetRecorder; +import org.apache.lucene.search.CollectorManager; + +/** + * Collector manager for {@link FacetFieldCollector}. Returns the same extension of {@link + * FacetRecorder} that was used to collect results. + */ +public final class FacetFieldCollectorManager +implements CollectorManager { + + private final FacetCutter facetCutter; + private final V facetRecorder; + private final FacetRollup facetRollup; + + /** Create collector for a cutter + recorder pair */ + public FacetFieldCollectorManager( Review Comment: I've pushed a commit https://github.com/apache/lucene/pull/13568/commits/e9af4f538a7c58f1dc6712d14449147d913fb829 to move `FacetRollup`'s methods to `FacetCutter` and add expert mode ctor to `TaxonomyFacetsCutter` to force-disable rollup when users want it. Please let me know what you think! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
epotyom commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2278257636 @stefanvodita , @gsmiller , @mikemccand just wanted to let you know that I think I addressed all existing comments, and I marked as resolved the ones that don't seem to need follow ups. Thank you again for reviewing the PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]
gsmiller merged PR #13201: URL: https://github.com/apache/lucene/pull/13201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Improve AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight ScorerSupplier cost estimation [lucene]
gsmiller closed issue #13029: Improve AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight ScorerSupplier cost estimation URL: https://github.com/apache/lucene/issues/13029 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]
gsmiller commented on PR #13201: URL: https://github.com/apache/lucene/pull/13201#issuecomment-2278379745 @msfroh hope you don't mind, but since my PR feedback was pretty minor, I went ahead and made those small changes on your branch and merged. I'll work on getting this backported shortly (a few conflicts to resolve). If you spot anything in the change you disagree with, please let me know and we can work through that. Thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Deprecate `COSINE` before Lucene 10 release [lucene]
benwtrent closed issue #13281: Deprecate `COSINE` before Lucene 10 release URL: https://github.com/apache/lucene/issues/13281 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]
msfroh commented on PR #13201: URL: https://github.com/apache/lucene/pull/13201#issuecomment-2278441860 Thanks, Greg! I can create a follow-up PR with your suggestion on delegating to the rewritten `BooleanQuery` for the cost estimate. I think if we go down that path (basically doing the rewrite early), we might be able to clean up the code a bit. I think `WeightOrDocIdSetIterator` might become unnecessary (since we'll know ahead of time which one we'll have). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Lucene99FlatVectorsReader.getFloatVectorValues(): NPE: Cannot read field "vectorEncoding" because "fieldEntry" is null [lucene]
benwtrent commented on issue #13626: URL: https://github.com/apache/lucene/issues/13626#issuecomment-2278487437 OK, I was able to replicate with the following test: ``` public void testTryToThrowNPE() throws Exception { try (var dir = newDirectory()) { IndexWriterConfig iwc = new IndexWriterConfig(); iwc.setMergePolicy(new ForceMergePolicy(iwc.getMergePolicy())).setCodec(new Lucene912VectorCodec()); try (var writer = new IndexWriter(dir, iwc)) { for (int i = 0; i < 10; i++) { var doc = new Document(); if (random().nextBoolean()) { doc.add(new KnnFloatVectorField("field", new float[] {1, 2, 3})); } writer.addDocument(doc); if (random().nextBoolean()) { writer.commit(); } } for (int i = 0; i < 10; i++) { var doc = new Document(); if (random().nextBoolean()) { // add a vector but a different field doc.add(new KnnFloatVectorField("otherVector", new float[] {1, 2, 3})); } writer.addDocument(doc); if (random().nextBoolean()) { writer.commit(); } } writer.forceMerge(1); } } } ``` The error goes away if you use the perField codec like so: ``` public static class Lucene99VectorCodec extends FilterCodec { public Lucene99VectorCodec() { super("Lucene99VectorCodec", new Lucene99Codec()); } @Override public KnnVectorsFormat knnVectorsFormat() { return new PerFieldKnnVectorsFormat() { @Override public KnnVectorsFormat getKnnVectorsFormatForField(String field) { return new Lucene99HnswVectorsFormat(16, 250); } }; } } ``` I am not sure its valid to have more than one kNN field and not use the perfield format. The logic seems to make that assumption. @msokolov ^ What do you think of this bug. I am not sure what our behavior should be. We can easily check for a `null` field and return `null`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Lucene99FlatVectorsReader.getFloatVectorValues(): NPE: Cannot read field "vectorEncoding" because "fieldEntry" is null [lucene]
benwtrent commented on issue #13626: URL: https://github.com/apache/lucene/issues/13626#issuecomment-2278489924 Looking at what PointValues does: ``` FieldInfo fieldInfo = readState.fieldInfos.fieldInfo(fieldName); if (fieldInfo == null) { throw new IllegalArgumentException("field=\"" + fieldName + "\" is unrecognized"); } ``` Maybe kNN should do the same thing. At least its a better error than an NPE. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] testMergeStability failing for Knn formats [lucene]
benwtrent opened a new issue, #13640: URL: https://github.com/apache/lucene/issues/13640 ### Description All KNN formats are periodically failing `testMergeStability`. I have verified its due to https://github.com/apache/lucene/pull/13566 The stability failure is due to a different size in the `vex` (e.g. vector graph connections). ### Gradle command to reproduce ``` ./gradlew test --tests TestLucene99HnswScalarQuantizedVectorsFormat.testMergeStability -Dtests.seed=84298DFFA7C134B7 -Dtests.locale=kn -Dtests.timezone=Hongko -Dtests.asserts=true -Dtests.file.encoding=UTF-8 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] testMergeStability failing for Knn formats [lucene]
benwtrent commented on issue #13640: URL: https://github.com/apache/lucene/issues/13640#issuecomment-2278701946 @msokolov ^ I haven't been able to look into fixing it yet. Just now noticed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Unify how missing field entries are handle in knn formats [lucene]
benwtrent opened a new pull request, #13641: URL: https://github.com/apache/lucene/pull/13641 It is possible to inappropriately use the knn formats and attempt to merge segments with mismatched field names. None of the formats actually check for `null`, they just all assume that the field entry exists. I opted to throw an IllegalArgumentException here to keep with the current behavior, but prevent an unexpected NPE. Additionally, this unifies the scalar quantized formats, these checked for null values and returned null. But we should behave uniformly across all formats. closes: https://github.com/apache/lucene/issues/13626 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Delegating the matches in PointRangeQuery weight to relate method [lucene]
gsmiller commented on PR #13599: URL: https://github.com/apache/lucene/pull/13599#issuecomment-2278737360 @jainankitk thanks for the iterations! I'm fine with making this change as you currently having. I'll get it merged. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]
benwtrent commented on code in PR #13635: URL: https://github.com/apache/lucene/pull/13635#discussion_r1712183885 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -70,6 +72,43 @@ public static void search( search(scorer, knnCollector, graph, graphSearcher, acceptOrds); } + /** + * Searches the HNSW graph for for the nerest neighbors of a query vector, starting from the + * provided entry points. + * + * @param scorer the scorer to compare the query with the nodes + * @param knnCollector a collector of top knn results to be returned + * @param graph the graph values. May represent the entire graph, or a level in a hierarchical + * graph. + * @param acceptOrds {@link Bits} that represents the allowed document ordinals to match, or + * {@code null} if they are all allowed to match. + * @param entryPointOrds the entry points for search. + */ + public static void search( + RandomVectorScorer scorer, + KnnCollector knnCollector, + HnswGraph graph, + Bits acceptOrds, + DocIdSetIterator entryPointOrds) Review Comment: @seanmacavaney looking at the code, I think you can do what you want if you get access to the underlying `SparseOffHeapVectorValues*` implementation. Then you can use the `IndexedDISI` to advance to each matched doc & use `disi.index()` to get the corresponding ordinal. Maybe `FlatVectorsReader` can have a method called `DocIdSetIterator getVectorOrdinals(String field, DocIdSetIterator docIdIterator)` ? Or `RandomAccessVectorValues` can have such a method. Both of these interfaces are experimental, so changing them is OK. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Delegating the matches in PointRangeQuery weight to relate method [lucene]
gsmiller merged PR #13599: URL: https://github.com/apache/lucene/pull/13599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Remove redundant code in PointRangeQuery Weight [lucene]
gsmiller closed issue #13598: Remove redundant code in PointRangeQuery Weight URL: https://github.com/apache/lucene/issues/13598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TermsQuery as MultiTermQuery can dramatically overestimate its cost [lucene]
gsmiller commented on issue #12483: URL: https://github.com/apache/lucene/issues/12483#issuecomment-2278767613 Hey @romseygeek - there's been a recent improvement to `AbstractMultiTermQueryConstantScoreWrapper` that may help the use-case you've described here (#13201). I'm curious if this solves the performance regression you're seeing, or if you still have issues. This change will only help when there are 16 or fewer terms in the terms query, so if you have more than that, I suspect you won't see any improvements. The change was just merged onto `main` and `branch_9x` today, so you'd have to cherry-pick it down (or wait for 9.12). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
gsmiller commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1712226750 ## lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java: ## @@ -45,58 +45,26 @@ class DrillSidewaysQuery extends Query { final Query baseQuery; - final FacetsCollectorManager drillDownCollectorManager; - final FacetsCollectorManager[] drillSidewaysCollectorManagers; - final List managedDrillDownCollectors; - final List managedDrillSidewaysCollectors; + final CollectorOwner drillDownCollectorOwner; + final List> drillSidewaysCollectorOwners; final Query[] drillDownQueries; final boolean scoreSubDocsAtOnce; /** * Construct a new {@code DrillSidewaysQuery} that will create new {@link FacetsCollector}s for - * each {@link LeafReaderContext} using the provided {@link FacetsCollectorManager}s. The caller - * can access the created {@link FacetsCollector}s through {@link #managedDrillDownCollectors} and - * {@link #managedDrillSidewaysCollectors}. + * each {@link LeafReaderContext} using the provided {@link FacetsCollectorManager}s. */ DrillSidewaysQuery( Query baseQuery, - FacetsCollectorManager drillDownCollectorManager, - FacetsCollectorManager[] drillSidewaysCollectorManagers, - Query[] drillDownQueries, - boolean scoreSubDocsAtOnce) { -// Note that the "managed" facet collector lists are synchronized here since bulkScorer() -// can be invoked concurrently and needs to remain thread-safe. We're OK with synchronizing -// on the whole list as contention is expected to remain very low: -this( -baseQuery, -drillDownCollectorManager, -drillSidewaysCollectorManagers, -Collections.synchronizedList(new ArrayList<>()), -Collections.synchronizedList(new ArrayList<>()), -drillDownQueries, -scoreSubDocsAtOnce); - } - - /** - * Needed for {@link Query#rewrite(IndexSearcher)}. Ensures the same "managed" lists get used - * since {@link DrillSideways} accesses references to these through the original {@code - * DrillSidewaysQuery}. - */ - private DrillSidewaysQuery( - Query baseQuery, - FacetsCollectorManager drillDownCollectorManager, - FacetsCollectorManager[] drillSidewaysCollectorManagers, - List managedDrillDownCollectors, - List managedDrillSidewaysCollectors, + CollectorOwner drillDownCollectorOwner, Review Comment: +1 I don't think we should be concerned with breaking the API for `DrillSidewaysQuery` since it's pkg-private. Does that sound right @mikemccand ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
gsmiller commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1712245084 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/cutters/ranges/LongRangeFacetCutter.java: ## @@ -0,0 +1,413 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.cutters.ranges; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Comparator; +import java.util.List; +import java.util.Objects; +import org.apache.lucene.facet.MultiLongValues; +import org.apache.lucene.facet.MultiLongValuesSource; +import org.apache.lucene.facet.range.LongRange; +import org.apache.lucene.sandbox.facet.cutters.FacetCutter; +import org.apache.lucene.sandbox.facet.cutters.LeafFacetCutter; +import org.apache.lucene.search.LongValues; +import org.apache.lucene.search.LongValuesSource; + +/** {@link RangeFacetCutter} for ranges of long values. It's based on LongRangeCounter class. */ +public abstract class LongRangeFacetCutter extends RangeFacetCutter { + + MultiLongValuesSource valuesSource; + LongValuesSource singleValues; // TODO: refactor - weird that we have both multi and single here. + LongRangeAndPos[] sortedRanges; + + int requestedRangeCount; + + List elementaryIntervals; + + /** elementary interval boundaries used for efficient counting (bsearch to find interval) */ + long[] boundaries; + + int[] pos; + + // Default interval position, when elementary interval is mapped to this interval + // it is skipped. + static final int SKIP_INTERVAL_POSITION = -1; + + /** Create {@link FacetCutter} for provided value source and long ranges. */ + public static LongRangeFacetCutter create( + String field, + MultiLongValuesSource longValuesSource, + LongValuesSource singleLongValuesSource, + LongRange[] longRanges) { +if (areOverlappingRanges(longRanges)) { + return new OverlappingLongRangeFacetCutter( + field, longValuesSource, singleLongValuesSource, longRanges); +} +return new NonOverlappingLongRangeFacetCutter( +field, longValuesSource, singleLongValuesSource, longRanges); + } + + public static LongRangeFacetCutter create( + String field, MultiLongValuesSource longValuesSource, LongRange[] longRanges) { +return create(field, longValuesSource, null, longRanges); + } + + // caller handles conversion of Doubles and DoubleRange to Long and LongRange + // ranges need not be sorted + LongRangeFacetCutter( + String field, + MultiLongValuesSource longValuesSource, + LongValuesSource singleLongValuesSource, + LongRange[] longRanges) { +super(field); +valuesSource = longValuesSource; +if (singleLongValuesSource != null) { + singleValues = singleLongValuesSource; +} else { + singleValues = MultiLongValuesSource.unwrapSingleton(valuesSource); +} + +sortedRanges = new LongRangeAndPos[longRanges.length]; +requestedRangeCount = longRanges.length; + +for (int i = 0; i < longRanges.length; i++) { + sortedRanges[i] = new LongRangeAndPos(longRanges[i], i); +} + +Arrays.sort(this.sortedRanges, Comparator.comparingLong(r -> r.range.min)); +elementaryIntervals = buildElementaryIntervals(); + +// Keep track of elementary interval boundary ends (for binary search) along with the requested +// range they map back to (and -1 when they map to a "gap" range in case of ExclusiveRanges): +boundaries = new long[elementaryIntervals.size()]; +pos = new int[elementaryIntervals.size()]; +Arrays.fill(pos, SKIP_INTERVAL_POSITION); +int currRange = 0; +for (int i = 0; i < boundaries.length; i++) { + boundaries[i] = elementaryIntervals.get(i).end; + if (currRange < sortedRanges.length) { +LongRangeAndPos curr = sortedRanges[currRange]; +if (boundaries[i] == curr.range.max) { + pos[i] = curr.pos; + currRange++; +} + } +} + } + + abstract List buildElementaryIntervals(); + + private static boolean areOverlappingRanges(LongRange[] ranges) { +if (ranges.length == 0) { + return false; +} + +// Copy before sorting so we don't mess with the caller's
Re: [PR] Compute facets while collecting [lucene]
gsmiller commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1712245609 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/cutters/ranges/DoubleRangeFacetCutter.java: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.cutters.ranges; + +import java.io.IOException; +import org.apache.lucene.facet.MultiDoubleValuesSource; +import org.apache.lucene.facet.MultiLongValuesSource; +import org.apache.lucene.facet.range.DoubleRange; +import org.apache.lucene.facet.range.LongRange; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.sandbox.facet.cutters.LeafFacetCutter; +import org.apache.lucene.search.DoubleValuesSource; +import org.apache.lucene.search.LongValuesSource; +import org.apache.lucene.util.NumericUtils; + +/** + * {@link RangeFacetCutter} for ranges of double values. + * + * Based on {@link org.apache.lucene.facet.range.DoubleRangeFacetCounts}, this class translates + * double ranges to long ranges using {@link NumericUtils#doubleToSortableLong} and delegates + * faceting work to a {@link LongRangeFacetCutter}. + */ +public class DoubleRangeFacetCutter extends RangeFacetCutter { + + LongRangeFacetCutter longRangeFacetCutter; + + MultiDoubleValuesSource multiDoubleValuesSource; + DoubleValuesSource singleDoubleValuesSource; + DoubleRange[] doubleRanges; + + MultiLongValuesSource multiLongValuesSource; + LongValuesSource singleLongValuesSource; + + LongRange[] longRanges; + + /** Constructor. */ + public DoubleRangeFacetCutter( + String field, MultiDoubleValuesSource valuesSource, DoubleRange[] doubleRanges) { +super(field); +this.multiDoubleValuesSource = valuesSource; +this.singleDoubleValuesSource = MultiDoubleValuesSource.unwrapSingleton(valuesSource); +this.doubleRanges = doubleRanges; +if (singleDoubleValuesSource != null) { // TODO: ugly! + this.singleLongValuesSource = singleDoubleValuesSource.toSortableLongDoubleValuesSource(); +} else { + this.multiLongValuesSource = multiDoubleValuesSource.toSortableMultiLongValuesSource(); +} +this.longRanges = mapDoubleRangesToSortableLong(doubleRanges); +this.longRangeFacetCutter = +LongRangeFacetCutter.create( Review Comment: I'm good with that. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
gsmiller commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1712247433 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/FacetFieldCollectorManager.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet; + +import java.io.IOException; +import java.util.Collection; +import org.apache.lucene.sandbox.facet.cutters.FacetCutter; +import org.apache.lucene.sandbox.facet.misc.FacetRollup; +import org.apache.lucene.sandbox.facet.recorders.FacetRecorder; +import org.apache.lucene.search.CollectorManager; + +/** + * Collector manager for {@link FacetFieldCollector}. Returns the same extension of {@link + * FacetRecorder} that was used to collect results. + */ +public final class FacetFieldCollectorManager +implements CollectorManager { + + private final FacetCutter facetCutter; + private final V facetRecorder; + private final FacetRollup facetRollup; + + /** Create collector for a cutter + recorder pair */ + public FacetFieldCollectorManager( Review Comment: This looks good. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
gsmiller commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1712250405 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ordinals/CandidateSetOrdinalIterator.java: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.ordinals; + +import java.io.IOException; +import org.apache.lucene.internal.hppc.IntHashSet; + +/** + * {@link OrdinalIterator} that filters out ordinals from delegate if they are not in the candidate Review Comment: Looks good now. Thanks! And thanks for leaving the TODO to explore leading the iteration with he recorder ordinals in certain cases (but agreed it's probably not the common-case so I think you've optimized in the right direction for now) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
gsmiller commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1712251401 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/FacetRecorder.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.recorders; + +import java.io.IOException; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.sandbox.facet.cutters.LeafFacetCutter; +import org.apache.lucene.sandbox.facet.misc.FacetRollup; +import org.apache.lucene.sandbox.facet.ordinals.OrdinalIterator; + +/** + * Record data for each facet of each doc. + * + * TODO: In the next iteration we can add an extra layer between FacetRecorder and + * LeafFacetRecorder, e.g. SliceFacetRecorder. The new layer will be created per {@link + * org.apache.lucene.search.Collector}, which means that collecting of multiple leafs (segments) + * within a slice is sequential and can be done to a single non-sync map to improve performance and + * reduce memory consumption. We already tried that, but didn't see any performance improvement. + * Given that it also makes lazy leaf recorder init in {@link + * org.apache.lucene.sandbox.facet.FacetFieldCollector} trickier, it was decided to rollback the + * initial attempt and try again later, in the next iteration. + */ +public interface FacetRecorder { + /** Get leaf recorder. */ + LeafFacetRecorder getLeafRecorder(LeafReaderContext context) throws IOException; + + /** Return next collected ordinal, or {@link LeafFacetCutter#NO_MORE_ORDS} */ + OrdinalIterator recordedOrds(); + + /** True if there are no records */ + boolean isEmpty(); Review Comment: I like it. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Compute facets while collecting [lucene]
gsmiller commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2278854633 @epotyom I looked over the recent changes and went through the unresolved comments. I don't think there's anything blocking at this point from my point of view. @mikemccand do you have any blocking concerns on this? If not, I will start working on the merge Monday morning. Thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] expand TestSegmentToThreadMapping coverage w.r.t. (excess) documents per slice [lucene]
github-actions[bot] commented on PR #13508: URL: https://github.com/apache/lucene/pull/13508#issuecomment-2278903000 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]
jpountz commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1712554332 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.io.IOException; +import java.lang.foreign.MemorySegment; +import java.nio.ByteOrder; +import jdk.incubator.vector.LongVector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; +import org.apache.lucene.store.IndexInput; + +final class MemorySegmentPostingDecodingUtil extends PostingDecodingUtil { + + private static final VectorSpecies LONG_SPECIES = + PanamaVectorConstants.PRERERRED_LONG_SPECIES; + + private final IndexInput in; + private final MemorySegment memorySegment; + + MemorySegmentPostingDecodingUtil(IndexInput in, MemorySegment memorySegment) { +this.in = in; +this.memorySegment = memorySegment; + } + + @Override + public void readLongs(int count, long[] b) throws IOException { +if (count < LONG_SPECIES.length()) { + in.readLongs(b, 0, count); +} else { + long offset = in.getFilePointer(); + long endOffset = offset + count * Long.BYTES; + int loopBound = LONG_SPECIES.loopBound(count - 1); + for (int i = 0; + i < loopBound; + i += LONG_SPECIES.length(), offset += LONG_SPECIES.length() * Long.BYTES) { +LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, ByteOrder.LITTLE_ENDIAN) Review Comment: This method is only used when decoding 8 or 16 bits per value. I am not sure why, but the micro benchmark sees a small but consistent speedup by copying memory this way rather than using readLongs. If you don't like it, I can remove it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org