date:20240809

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-09 Thread via GitHub



uschindler commented on PR #13636:
URL: https://github.com/apache/lucene/pull/13636#issuecomment-2277393866

   I cleaned up the constants a bit more, there are still some duplicated in 
VectorUtilSupport, but I'd change this in the followup PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] `gradlew eclipse` no longer works [lucene]

2024-08-09 Thread via GitHub



uschindler commented on issue #13638:
URL: https://github.com/apache/lucene/issues/13638#issuecomment-2277400477

   Theres also some minor issue: Since this commit, whenever I commit something 
to the repo it complains about line endings of `versions.toml`:
   
   > warning: in the working copy of 'versions.toml', CRLF will be replaced by 
LF the next time Git touches it
   
   This happens after every time I execute Gradle.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Knn(Float-->Byte)VectorField javadocs update in KnnByteVectorQuery [lucene]

2024-08-09 Thread via GitHub



cpoerschke merged PR #13637:
URL: https://github.com/apache/lucene/pull/13637


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-09 Thread via GitHub



jpountz commented on code in PR #13636:
URL: https://github.com/apache/lucene/pull/13636#discussion_r1711141270


##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java:
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.internal.vectorization;
+
+import java.io.IOException;
+import java.lang.foreign.MemorySegment;
+import java.nio.ByteOrder;
+import jdk.incubator.vector.LongVector;
+import jdk.incubator.vector.VectorOperators;
+import jdk.incubator.vector.VectorSpecies;
+import org.apache.lucene.store.IndexInput;
+
+final class MemorySegmentPostingDecodingUtil extends PostingDecodingUtil {
+
+  private static final VectorSpecies LONG_SPECIES =
+  PanamaVectorConstants.PRERERRED_LONG_SPECIES;
+
+  private final IndexInput in;
+  private final MemorySegment memorySegment;
+
+  MemorySegmentPostingDecodingUtil(IndexInput in, MemorySegment memorySegment) 
{
+this.in = in;
+this.memorySegment = memorySegment;
+  }
+
+  @Override
+  public void readLongs(int count, long[] b) throws IOException {
+if (count < LONG_SPECIES.length()) {
+  in.readLongs(b, 0, count);
+} else {
+  long offset = in.getFilePointer();
+  long endOffset = offset + count * Long.BYTES;
+  int loopBound = LONG_SPECIES.loopBound(count - 1);
+  for (int i = 0;
+  i < loopBound;
+  i += LONG_SPECIES.length(), offset += LONG_SPECIES.length() * 
Long.BYTES) {
+LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, 
ByteOrder.LITTLE_ENDIAN)
+.intoArray(b, i);
+  }
+
+  // Handle the tail by reading a vector that is aligned with with `count` 
on the right side.
+  int i = count - LONG_SPECIES.length();
+  offset = endOffset - LONG_SPECIES.length() * Long.BYTES;
+  LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, 
ByteOrder.LITTLE_ENDIAN)
+  .intoArray(b, i);
+
+  in.seek(endOffset);
+}
+  }
+
+  @Override
+  public void splitLongs(int count, long[] b, int bShift, long bMask, long[] 
c, long cMask)
+  throws IOException {
+if (count < LONG_SPECIES.length()) {
+  // Not enough data to vectorize without going out-of-bounds. In 
practice, this branch is never
+  // used if the bit width is 256, and is used for 2 and 3 bits per value 
if the bit width is
+  // 512.
+  in.readLongs(c, 0, count);
+  for (int i = 0; i < count; ++i) {
+b[i] = (c[i] >>> bShift) & bMask;
+c[i] &= cMask;
+  }
+} else {
+  long offset = in.getFilePointer();
+  long endOffset = offset + count * Long.BYTES;
+  int loopBound = LONG_SPECIES.loopBound(count - 1);

Review Comment:
   This is to make sure that the loop doesn't handle the rightmost vector. We 
could do `loopBound(N)` but then we'd need to protect the code after the loop 
with `if (loopBound < N)` to avoid doing redundant work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711163320


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/LongRangeFacetCutter.java:
##
@@ -0,0 +1,431 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.ranges;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+import org.apache.lucene.facet.MultiLongValues;
+import org.apache.lucene.facet.MultiLongValuesSource;
+import org.apache.lucene.facet.range.LongRange;
+import org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter;
+import org.apache.lucene.search.LongValues;
+import org.apache.lucene.search.LongValuesSource;
+
+/** {@link RangeFacetCutter} for ranges of long values. * */
+public abstract class LongRangeFacetCutter extends RangeFacetCutter {
+
+  MultiLongValuesSource valuesSource;
+  LongValuesSource singleValues; // TODO: refactor - weird that we have both 
multi and single here.
+  LongRangeAndPos[] sortedRanges;
+
+  int requestedRangeCount;
+
+  List elementaryIntervals;
+
+  long[] boundaries;
+  int[] pos;
+
+  // Default interval position, when elementary interval is mapped to this 
interval
+  // it is skipped.
+  static final int SKIP_INTERVAL_POSITION = -1;
+
+  // Temporary callers should ensure that passed in single values sources are 
wrapped
+  /** add doc * */
+  public static LongRangeFacetCutter create(
+  String field,
+  MultiLongValuesSource longValuesSource,
+  LongValuesSource singleLongValuesSource,
+  LongRange[] longRanges) {
+if (areOverlappingRanges(longRanges)) {
+  return new OverlappingLongRangeFacetCutter(
+  field, longValuesSource, singleLongValuesSource, longRanges);
+}
+return new ExclusiveLongRangeFacetCutter(
+field, longValuesSource, singleLongValuesSource, longRanges);
+  }
+
+  public static LongRangeFacetCutter create(
+  String field, MultiLongValuesSource longValuesSource, LongRange[] 
longRanges) {
+return create(field, longValuesSource, null, longRanges);
+  }
+
+  // caller handles conversion of Doubles and DoubleRange to Long and LongRange
+  // ranges need not be sorted
+  LongRangeFacetCutter(
+  String field,
+  MultiLongValuesSource longValuesSource,
+  LongValuesSource singleLongValuesSource,
+  LongRange[] longRanges) {
+super(field);
+valuesSource = longValuesSource;
+if (singleLongValuesSource != null) {
+  singleValues = singleLongValuesSource;
+} else {
+  singleValues = MultiLongValuesSource.unwrapSingleton(valuesSource);
+}
+
+sortedRanges = new LongRangeAndPos[longRanges.length];
+requestedRangeCount = longRanges.length;
+
+for (int i = 0; i < longRanges.length; i++) {
+  sortedRanges[i] = new LongRangeAndPos(longRanges[i], i);
+}
+
+Arrays.sort(this.sortedRanges, Comparator.comparingLong(r -> r.range.min));
+elementaryIntervals = buildElementaryIntervals();

Review Comment:
   I've added javadoc to `buildElementaryIntervals` method



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711174092


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/OverlappingLongRangeFacetCutter.java:
##
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.ranges;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.facet.MultiLongValues;
+import org.apache.lucene.facet.MultiLongValuesSource;
+import org.apache.lucene.facet.range.LongRange;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter;
+import org.apache.lucene.search.LongValues;
+import org.apache.lucene.search.LongValuesSource;
+
+/**
+ * {@link RangeFacetCutter} for ranges of long value that overlap. Uses 
segment tree optimisation to
+ * find all matching ranges for a given value https://blog.mikemccandless.com/2013/12/fast-range-faceting-using-segment-trees.html";>fast-range-faceting-
+ * using-segment-trees.html
+ */
+class OverlappingLongRangeFacetCutter extends LongRangeFacetCutter {
+  private final LongRangeNode root;
+
+  OverlappingLongRangeFacetCutter(
+  String field,
+  MultiLongValuesSource longValuesSource,
+  LongValuesSource singleLongValuesSource,
+  LongRange[] longRanges) {
+super(field, longValuesSource, singleLongValuesSource, longRanges);
+
+// Build binary tree on top of intervals:
+root = split(0, elementaryIntervals.size(), elementaryIntervals);
+
+// Set outputs, so we know which range to output for each node in the tree:
+for (int i = 0; i < sortedRanges.length; i++) {
+  root.addOutputs(i, sortedRanges[i]);
+}
+  }
+
+  @Override
+  List buildElementaryIntervals() {
+// Maps all range inclusive endpoints to int flags; 1
+// = start of interval, 2 = end of interval.  We need to
+// track the start vs end case separately because if a
+// given point is both, then it must be its own
+// elementary interval:
+Map endsMap = new HashMap<>();
+
+endsMap.put(Long.MIN_VALUE, 1);
+endsMap.put(Long.MAX_VALUE, 2);
+
+for (LongRangeAndPos rangeAndPos : sortedRanges) {

Review Comment:
   I've added a TODO to dudup with original method 
`OverlappingLongRangeCounter#buildElementaryIntervals` - we can consider 
refactoring as part of this work. WDYT?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711194467


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/ExclusiveLongRangeFacetCutter.java:
##
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.ranges;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.lucene.facet.MultiLongValues;
+import org.apache.lucene.facet.MultiLongValuesSource;
+import org.apache.lucene.facet.range.LongRange;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter;
+import org.apache.lucene.search.LongValues;
+import org.apache.lucene.search.LongValuesSource;
+
+/** {@link RangeFacetCutter} for ranges of long value that don't overlap. * */
+class ExclusiveLongRangeFacetCutter extends LongRangeFacetCutter {
+  ExclusiveLongRangeFacetCutter(
+  String field,
+  MultiLongValuesSource longValuesSource,
+  LongValuesSource singleLongValuesSource,
+  LongRange[] longRanges) {
+super(field, longValuesSource, singleLongValuesSource, longRanges);
+  }
+
+  @Override
+  List buildElementaryIntervals() {
+List elementaryIntervals = new ArrayList<>();
+long prev = Long.MIN_VALUE;
+for (LongRangeAndPos range : sortedRanges) {
+  if (range.range().min > prev) {
+// add a "gap" range preceding requested range if necessary:
+elementaryIntervals.add(new InclusiveRange(prev, range.range().min - 
1));
+  }
+  // add the requested range:
+  elementaryIntervals.add(new InclusiveRange(range.range().min, 
range.range().max));
+  prev = range.range().max + 1;
+}
+if (elementaryIntervals.isEmpty() == false) {
+  long lastEnd = elementaryIntervals.get(elementaryIntervals.size() - 
1).end();
+  if (lastEnd < Long.MAX_VALUE) {
+elementaryIntervals.add(new InclusiveRange(lastEnd + 1, 
Long.MAX_VALUE));
+  }
+} else {
+  // If no ranges were requested, create a single entry from MIN_VALUE to 
MAX_VALUE:
+  elementaryIntervals.add(new InclusiveRange(Long.MIN_VALUE, 
Long.MAX_VALUE));
+}
+
+return elementaryIntervals;
+  }
+
+  @Override
+  public FacetLeafCutter createLeafCutter(LeafReaderContext context) throws 
IOException {
+if (singleValues != null) {
+  LongValues values = singleValues.getValues(context, null);
+  return new ExclusiveLongRangeSinglevalueFacetLeafCutter(
+  values, boundaries, pos, requestedRangeCount);
+} else {
+  MultiLongValues values = valuesSource.getValues(context);
+  return new ExclusiveLongRangeMultivalueFacetLeafCutter(
+  values, boundaries, pos, requestedRangeCount);
+}
+  }
+
+  /**
+   * TODO: dedup ExclusiveLongRangeMultivalueFacetLeafCutter and

Review Comment:
   I believe we can, but I'd suggest we do it as a follow up task to contain 
size of the PR - these are package-private static classes, so we are not going 
to change any API. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711199728


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/LongRangeFacetCutter.java:
##
@@ -0,0 +1,431 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.ranges;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+import org.apache.lucene.facet.MultiLongValues;
+import org.apache.lucene.facet.MultiLongValuesSource;
+import org.apache.lucene.facet.range.LongRange;
+import org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter;
+import org.apache.lucene.search.LongValues;
+import org.apache.lucene.search.LongValuesSource;
+
+/** {@link RangeFacetCutter} for ranges of long values. * */
+public abstract class LongRangeFacetCutter extends RangeFacetCutter {
+
+  MultiLongValuesSource valuesSource;
+  LongValuesSource singleValues; // TODO: refactor - weird that we have both 
multi and single here.
+  LongRangeAndPos[] sortedRanges;
+
+  int requestedRangeCount;
+
+  List elementaryIntervals;
+
+  long[] boundaries;
+  int[] pos;
+
+  // Default interval position, when elementary interval is mapped to this 
interval
+  // it is skipped.
+  static final int SKIP_INTERVAL_POSITION = -1;
+
+  // Temporary callers should ensure that passed in single values sources are 
wrapped
+  /** add doc * */
+  public static LongRangeFacetCutter create(
+  String field,
+  MultiLongValuesSource longValuesSource,
+  LongValuesSource singleLongValuesSource,
+  LongRange[] longRanges) {
+if (areOverlappingRanges(longRanges)) {
+  return new OverlappingLongRangeFacetCutter(
+  field, longValuesSource, singleLongValuesSource, longRanges);
+}
+return new ExclusiveLongRangeFacetCutter(
+field, longValuesSource, singleLongValuesSource, longRanges);
+  }
+
+  public static LongRangeFacetCutter create(
+  String field, MultiLongValuesSource longValuesSource, LongRange[] 
longRanges) {
+return create(field, longValuesSource, null, longRanges);
+  }
+
+  // caller handles conversion of Doubles and DoubleRange to Long and LongRange
+  // ranges need not be sorted
+  LongRangeFacetCutter(
+  String field,
+  MultiLongValuesSource longValuesSource,
+  LongValuesSource singleLongValuesSource,
+  LongRange[] longRanges) {
+super(field);
+valuesSource = longValuesSource;
+if (singleLongValuesSource != null) {
+  singleValues = singleLongValuesSource;
+} else {
+  singleValues = MultiLongValuesSource.unwrapSingleton(valuesSource);
+}
+
+sortedRanges = new LongRangeAndPos[longRanges.length];
+requestedRangeCount = longRanges.length;
+
+for (int i = 0; i < longRanges.length; i++) {
+  sortedRanges[i] = new LongRangeAndPos(longRanges[i], i);
+}
+
+Arrays.sort(this.sortedRanges, Comparator.comparingLong(r -> r.range.min));
+elementaryIntervals = buildElementaryIntervals();
+
+// Keep track of elementary interval boundary ends (for binary search) 
along with the requested
+// range they map back to (and -1 when they map to a "gap" range in case 
of ExclusiveRanges):
+boundaries = new long[elementaryIntervals.size()];
+pos = new int[elementaryIntervals.size()];
+Arrays.fill(pos, SKIP_INTERVAL_POSITION);
+int currRange = 0;
+for (int i = 0; i < boundaries.length; i++) {
+  boundaries[i] = elementaryIntervals.get(i).end;
+  if (currRange < sortedRanges.length) {
+LongRangeAndPos curr = sortedRanges[currRange];
+if (boundaries[i] == curr.range.max) {
+  pos[i] = curr.pos;
+  currRange++;
+}
+  }
+}
+  }
+
+  abstract List buildElementaryIntervals();
+
+  private static boolean areOverlappingRanges(LongRange[] ranges) {
+if (ranges.length == 0) {
+  return false;
+}
+
+// Copy before sorting so we don't mess with the caller's original ranges:
+LongRange[] sortedRanges = new LongRange[ranges.length];
+System.arraycopy(ranges, 0, sortedRanges, 0, ranges.length);
+Arrays.sort(sortedRanges, Comparator.co

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711205288


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/RangeOrdLabelBiMap.java:
##
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.ranges;
+
+import java.io.IOException;
+import java.util.List;
+import org.apache.lucene.facet.range.Range;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.sandbox.facet.abstracts.OrdLabelBiMap;
+
+/** {@link OrdLabelBiMap} for ranges. */
+public class RangeOrdLabelBiMap implements OrdLabelBiMap {
+
+  Range[] ranges;
+
+  /** Constructor that takes array of Range objects as input * */
+  public RangeOrdLabelBiMap(Range[] inputRanges) {
+ranges = inputRanges;
+  }
+
+  /** Constructor that takes List of Range objects as input * */
+  public RangeOrdLabelBiMap(List inputRanges) {

Review Comment:
   Oh I think I misunderstood the suggestion, you suggest to remove the second 
constructor, not the class. +1, removing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711213814


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/RangeOrdLabelBiMap.java:
##
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.ranges;
+
+import java.io.IOException;
+import java.util.List;
+import org.apache.lucene.facet.range.Range;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.sandbox.facet.abstracts.OrdLabelBiMap;
+
+/** {@link OrdLabelBiMap} for ranges. */
+public class RangeOrdLabelBiMap implements OrdLabelBiMap {
+
+  Range[] ranges;
+
+  /** Constructor that takes array of Range objects as input * */
+  public RangeOrdLabelBiMap(Range[] inputRanges) {
+ranges = inputRanges;
+  }
+
+  /** Constructor that takes List of Range objects as input * */
+  public RangeOrdLabelBiMap(List inputRanges) {
+ranges = inputRanges.toArray(new Range[0]);
+  }
+
+  @Override
+  public FacetLabel getLabel(int ordinal) throws IOException {
+if (ordinal >= 0 && ordinal < ranges.length) {
+  return new FacetLabel(ranges[ordinal].label);
+}
+return null;
+  }
+
+  @Override
+  public FacetLabel[] getLabels(int[] ordinals) throws IOException {
+FacetLabel[] facetLabels = new FacetLabel[ordinals.length];
+for (int i = 0; i < ordinals.length; i++) {
+  facetLabels[i] = new FacetLabel(ranges[ordinals[i]].label);
+}
+return facetLabels;
+  }
+
+  @Override
+  public int getOrd(FacetLabel label) {
+throw new UnsupportedOperationException("Not yet supported for ranges");

Review Comment:
   As it was suggested by Greg - we can't; so we split OrdLableBiMap into 
OrdToLabel and LabelToOrd interfaces.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-09 Thread via GitHub



uschindler commented on code in PR #13636:
URL: https://github.com/apache/lucene/pull/13636#discussion_r1711262142


##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java:
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.internal.vectorization;
+
+import java.io.IOException;
+import java.lang.foreign.MemorySegment;
+import java.nio.ByteOrder;
+import jdk.incubator.vector.LongVector;
+import jdk.incubator.vector.VectorOperators;
+import jdk.incubator.vector.VectorSpecies;
+import org.apache.lucene.store.IndexInput;
+
+final class MemorySegmentPostingDecodingUtil extends PostingDecodingUtil {
+
+  private static final VectorSpecies LONG_SPECIES =
+  PanamaVectorConstants.PRERERRED_LONG_SPECIES;
+
+  private final IndexInput in;
+  private final MemorySegment memorySegment;
+
+  MemorySegmentPostingDecodingUtil(IndexInput in, MemorySegment memorySegment) 
{
+this.in = in;
+this.memorySegment = memorySegment;
+  }
+
+  @Override
+  public void readLongs(int count, long[] b) throws IOException {
+if (count < LONG_SPECIES.length()) {
+  in.readLongs(b, 0, count);
+} else {
+  long offset = in.getFilePointer();
+  long endOffset = offset + count * Long.BYTES;
+  int loopBound = LONG_SPECIES.loopBound(count - 1);
+  for (int i = 0;
+  i < loopBound;
+  i += LONG_SPECIES.length(), offset += LONG_SPECIES.length() * 
Long.BYTES) {
+LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, 
ByteOrder.LITTLE_ENDIAN)

Review Comment:
   Why is this better/faster than a simple copyMemory as `in.readLongs()` is 
doing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-09 Thread via GitHub



uschindler commented on code in PR #13636:
URL: https://github.com/apache/lucene/pull/13636#discussion_r1711262142


##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java:
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.internal.vectorization;
+
+import java.io.IOException;
+import java.lang.foreign.MemorySegment;
+import java.nio.ByteOrder;
+import jdk.incubator.vector.LongVector;
+import jdk.incubator.vector.VectorOperators;
+import jdk.incubator.vector.VectorSpecies;
+import org.apache.lucene.store.IndexInput;
+
+final class MemorySegmentPostingDecodingUtil extends PostingDecodingUtil {
+
+  private static final VectorSpecies LONG_SPECIES =
+  PanamaVectorConstants.PRERERRED_LONG_SPECIES;
+
+  private final IndexInput in;
+  private final MemorySegment memorySegment;
+
+  MemorySegmentPostingDecodingUtil(IndexInput in, MemorySegment memorySegment) 
{
+this.in = in;
+this.memorySegment = memorySegment;
+  }
+
+  @Override
+  public void readLongs(int count, long[] b) throws IOException {
+if (count < LONG_SPECIES.length()) {
+  in.readLongs(b, 0, count);
+} else {
+  long offset = in.getFilePointer();
+  long endOffset = offset + count * Long.BYTES;
+  int loopBound = LONG_SPECIES.loopBound(count - 1);
+  for (int i = 0;
+  i < loopBound;
+  i += LONG_SPECIES.length(), offset += LONG_SPECIES.length() * 
Long.BYTES) {
+LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, 
ByteOrder.LITTLE_ENDIAN)

Review Comment:
   Why is this better/faster than a simple copyMemory as `in.readLongs()` is 
doing? At the end it is copied to an on-heap array anyways.
   
   I was expecting the whole PR to do the splitLongs directly on the 
MemorySegment and not on on-heap arrays.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711338640


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/LongAggregationsFacetRecorder.java:
##
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.recorders;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.List;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.internal.hppc.IntCursor;
+import org.apache.lucene.internal.hppc.IntObjectHashMap;
+import org.apache.lucene.sandbox.facet.abstracts.FacetLeafRecorder;
+import org.apache.lucene.sandbox.facet.abstracts.FacetRecorder;
+import org.apache.lucene.sandbox.facet.abstracts.FacetRollup;
+import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator;
+import org.apache.lucene.sandbox.facet.abstracts.Reducer;
+import org.apache.lucene.search.LongValues;
+import org.apache.lucene.search.LongValuesSource;
+
+/**
+ * {@link FacetRecorder} that computes multiple long aggregations per facet. 
TODO: [premature
+ * optimization idea] if instead of one array we keep aggregations in two 
LongVector (one for MAX
+ * aggregation and one for SUM) we can benefit from SIMD?
+ */
+public class LongAggregationsFacetRecorder implements FacetRecorder {
+
+  private IntObjectHashMap perOrdinalValues;
+  private List> leafValues;
+
+  private final LongValuesSource[] longValuesSources;
+  private final Reducer[] reducers;
+
+  /** Constructor. */
+  public LongAggregationsFacetRecorder(LongValuesSource[] longValuesSources, 
Reducer[] reducers) {
+assert longValuesSources.length == reducers.length;
+this.longValuesSources = longValuesSources;
+this.reducers = reducers;
+perOrdinalValues = new IntObjectHashMap<>();
+leafValues = Collections.synchronizedList(new ArrayList<>());
+  }
+
+  @Override
+  public FacetLeafRecorder getLeafRecorder(LeafReaderContext context) throws 
IOException {
+LongValues[] longValues = new LongValues[longValuesSources.length];
+for (int i = 0; i < longValuesSources.length; i++) {
+  longValues[i] = longValuesSources[i].getValues(context, null);
+}
+IntObjectHashMap valuesRecorder = new IntObjectHashMap<>();
+leafValues.add(valuesRecorder);
+return new LongAggregationsFacetLeafRecorder(longValues, reducers, 
valuesRecorder);
+  }
+
+  @Override
+  public OrdinalIterator recordedOrds() {
+if (perOrdinalValues.isEmpty()) {
+  return null;
+}
+Iterator ordIterator = perOrdinalValues.keys().iterator();
+return new OrdinalIterator() {
+  @Override
+  public int nextOrd() throws IOException {
+if (ordIterator.hasNext()) {
+  return ordIterator.next().value;
+} else {
+  return NO_MORE_ORDS;
+}
+  }
+};
+  }
+
+  @Override
+  public boolean isEmpty() {
+return perOrdinalValues.isEmpty();
+  }
+
+  @Override
+  public void reduce(FacetRollup facetRollup) throws IOException {
+boolean firstElement = true;
+for (IntObjectHashMap leafValue : leafValues) {
+  if (firstElement) {
+perOrdinalValues = leafValue;
+firstElement = false;
+  } else {
+for (IntObjectHashMap.IntObjectCursor elem : leafValue) {
+  long[] vals = perOrdinalValues.get(elem.key);
+  if (vals == null) {
+perOrdinalValues.put(elem.key, elem.value);
+  } else {
+for (int i = 0; i < longValuesSources.length; i++) {
+  vals[i] = reducers[i].reduce(vals[i], elem.value[i]);
+}
+  }
+}
+  }
+}
+if (firstElement) {
+  // TODO: do we need empty map by default?
+  perOrdinalValues = new IntObjectHashMap<>();
+}
+// TODO: implement rollup
+if (facetRollup != null
+&& facetRollup.getDimOrdsToRollup().nextOrd() != 
OrdinalIterator.NO_MORE_ORDS) {
+  throw new UnsupportedOperationException("Rollup is required, but not 
implemented");
+}
+  }
+
+  public long getRecordedValue(int ord, int valuesId) {
+if (valuesId < 0 || valuesId >= longValu

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711348674


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ordinal_iterators/CandidateSetOrdinalIterator.java:
##
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.ordinal_iterators;
+
+import java.io.IOException;
+import org.apache.lucene.internal.hppc.IntHashSet;
+import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator;
+
+/**
+ * {@link OrdinalIterator} that filters out ordinals from delegate if they are 
not in the candidate
+ * set.
+ *
+ * Can be handy to get results only for specific facets.
+ */
+public class CandidateSetOrdinalIterator implements OrdinalIterator {

Review Comment:
   I've added some unit tests for this class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711347682


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/misc/LongValueFacetCutter.java:
##
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.misc;
+
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.concurrent.atomic.AtomicInteger;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.SortedNumericDocValues;
+import org.apache.lucene.internal.hppc.IntLongHashMap;
+import org.apache.lucene.internal.hppc.LongCursor;
+import org.apache.lucene.internal.hppc.LongHashSet;
+import org.apache.lucene.internal.hppc.LongIntHashMap;
+import org.apache.lucene.sandbox.facet.abstracts.FacetCutter;
+import org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter;
+import org.apache.lucene.sandbox.facet.abstracts.OrdLabelBiMap;
+
+/**
+ * {@link FacetCutter} and {@link OrdLabelBiMap} for distinct long values.
+ *
+ * TODO: This class is quite inefficient. Will optimise later. TODO: add 
support for other value

Review Comment:
   I've added some unit tests for this class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711353146


##
lucene/sandbox/src/test/org/apache/lucene/sandbox/facet/SandboxFacetTestCase.java:
##
@@ -0,0 +1,407 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.facet.FacetResult;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsCollector.MatchingDocs;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.LabelAndValue;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.TaxonomyFacetLabels;
+import org.apache.lucene.facet.taxonomy.TaxonomyFacetLabels.FacetLabelReader;
+import org.apache.lucene.facet.taxonomy.TaxonomyReader;
+import org.apache.lucene.sandbox.facet.abstracts.OrdLabelBiMap;
+import org.apache.lucene.sandbox.facet.abstracts.OrdToComparable;
+import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator;
+import org.apache.lucene.sandbox.facet.ordinal_iterators.TopnOrdinalIterator;
+import org.apache.lucene.sandbox.facet.recorders.CountFacetRecorder;
+import 
org.apache.lucene.sandbox.facet.taxonomy.TaxonomyChildrenOrdinalIterator;
+import org.apache.lucene.sandbox.facet.taxonomy.TaxonomyOrdLabelBiMap;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.BytesRef;
+
+public abstract class SandboxFacetTestCase extends LuceneTestCase {
+  // we don't have access to overall count for all facets from count recorder,
+  // and we can't compute it as a SUM of values for each facet ID because we 
need to respect cases
+  // where
+  // the same doc belongs to multiple facets (e.g. overlapping ranges and
+  // multi value fields). We can add an extra range that includes everything,
+  // or consider supporting overall count in CountFacetRecorder. But it is not 
exactly the value
+  // we can get now, as this value wouldn't respect top-n cutoff. Is this 
value a must have facets
+  // feature?
+  static final int VALUE_CANT_BE_COMPUTED = -5;
+
+  /**
+   * Utility method that uses {@link FacetLabelReader} to get facet labels for 
each hit in {@link
+   * MatchingDocs}. The method returns {@code List>} where 
outer list has one entry
+   * per document and inner list has all {@link FacetLabel} entries that 
belong to a document. The
+   * inner list may be empty if no {@link FacetLabel} are found for a hit.
+   *
+   * @param taxoReader {@link TaxonomyReader} used to read taxonomy during 
search. This instance is
+   * expected to be open for reading.
+   * @param fc {@link FacetsCollector} A collector with matching hits.
+   * @param dimension facet dimension for which labels are requested. A null 
value fetches labels
+   * for all dimensions.
+   * @return {@code List} where outer list has one non-null 
entry per document. and
+   * inner list contain all {@link FacetLabel} entries that belong to a 
document.
+   * @throws IOException when a low-level IO issue occurs.
+   */
+  public List> getAllTaxonomyFacetLabels(
+  String dimension, TaxonomyReader taxoReader, FacetsCollector fc) throws 
IOException {
+List> actualLabels = new ArrayList<>();
+TaxonomyFacetLabels taxoLabels =
+new TaxonomyFacetLabels(taxoReader, 
FacetsConfig.DEFAULT_INDEX_FIELD_NAME);
+for (MatchingDocs m : fc.getMatchingDocs()) {
+  FacetLabelReader facetLabelReader = 
taxoLabels.getFacetLabelReader(m.context);
+  DocIdSetIterator disi = m.bits.iterator();
+  while (disi.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
+actualLabels.add(allFacetLabels(disi.docID(), dimension, 
facetLabelReader));
+  }
+}
+return actualLabels;
+  }
+
+  /**
+   * Utility method to get all facet labels for an input docId and dimension 
using the supplied
+   * {@link FacetLabelReader}.
+

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



stefanvodita commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711375034


##
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java:
##
@@ -45,58 +45,26 @@ class DrillSidewaysQuery extends Query {
 
   final Query baseQuery;
 
-  final FacetsCollectorManager drillDownCollectorManager;
-  final FacetsCollectorManager[] drillSidewaysCollectorManagers;
-  final List managedDrillDownCollectors;
-  final List managedDrillSidewaysCollectors;
+  final CollectorOwner drillDownCollectorOwner;
+  final List> drillSidewaysCollectorOwners;
 
   final Query[] drillDownQueries;
 
   final boolean scoreSubDocsAtOnce;
 
   /**
* Construct a new {@code DrillSidewaysQuery} that will create new {@link 
FacetsCollector}s for
-   * each {@link LeafReaderContext} using the provided {@link 
FacetsCollectorManager}s. The caller
-   * can access the created {@link FacetsCollector}s through {@link 
#managedDrillDownCollectors} and
-   * {@link #managedDrillSidewaysCollectors}.
+   * each {@link LeafReaderContext} using the provided {@link 
FacetsCollectorManager}s.
*/
   DrillSidewaysQuery(
   Query baseQuery,
-  FacetsCollectorManager drillDownCollectorManager,
-  FacetsCollectorManager[] drillSidewaysCollectorManagers,
-  Query[] drillDownQueries,
-  boolean scoreSubDocsAtOnce) {
-// Note that the "managed" facet collector lists are synchronized here 
since bulkScorer()
-// can be invoked concurrently and needs to remain thread-safe. We're OK 
with synchronizing
-// on the whole list as contention is expected to remain very low:
-this(
-baseQuery,
-drillDownCollectorManager,
-drillSidewaysCollectorManagers,
-Collections.synchronizedList(new ArrayList<>()),
-Collections.synchronizedList(new ArrayList<>()),
-drillDownQueries,
-scoreSubDocsAtOnce);
-  }
-
-  /**
-   * Needed for {@link Query#rewrite(IndexSearcher)}. Ensures the same 
"managed" lists get used
-   * since {@link DrillSideways} accesses references to these through the 
original {@code
-   * DrillSidewaysQuery}.
-   */
-  private DrillSidewaysQuery(
-  Query baseQuery,
-  FacetsCollectorManager drillDownCollectorManager,
-  FacetsCollectorManager[] drillSidewaysCollectorManagers,
-  List managedDrillDownCollectors,
-  List managedDrillSidewaysCollectors,
+  CollectorOwner drillDownCollectorOwner,

Review Comment:
   Oh, I see, sorry!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



stefanvodita commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711379107


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/CountFacetRecorder.java:
##
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.recorders;
+
+import static 
org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator.NO_MORE_ORDS;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.List;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.internal.hppc.IntCursor;
+import org.apache.lucene.internal.hppc.IntIntHashMap;
+import org.apache.lucene.sandbox.facet.abstracts.FacetLeafRecorder;
+import org.apache.lucene.sandbox.facet.abstracts.FacetRecorder;
+import org.apache.lucene.sandbox.facet.abstracts.FacetRollup;
+import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator;
+
+/**
+ * {@link FacetRecorder} to count facets. TODO: add an option to keep counts 
in an array, to improve
+ * performance for facets with small number of ordinals e.g. range facets. 
Options: - {@link
+ * org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter} can inform 
{@link FacetLeafRecorder}
+ * about expected number of facet ordinals ({@link
+ * org.apache.lucene.sandbox.facet.FacetFieldCollector} can orchestrate that). 
If expeted facet ord
+ * number is below some threshold - use array instead of a map? - first 100/1k 
counts in array, the
+ * rest - in a map; the limit can also be provided in a constructor? It is 
similar to what
+ * LongValuesFacetCounts does today.
+ */
+public class CountFacetRecorder implements FacetRecorder {
+
+  // TODO: deprecate - it is cheaper to merge during reduce than to lock 
threads during collection.
+  // TODO: alternatively, we can consider collecting 2 (3, 4, ..., can be 
parametrizes) slices to a
+  // single sync map
+  //   which can reduce thread contention compared to single sync map for 
all slices; at the
+  // same time there will
+  //   be less work for reduce method. So far reduce wasn't a bottleneck 
for us, but it is
+  // definitely not free.
+  private final boolean useSyncMap;
+
+  /** Create */
+  public CountFacetRecorder() {
+this(false);
+  }
+
+  IntIntHashMap values;
+  List perLeafValues;
+
+  /**
+   * Create.
+   *
+   * @param useSyncMap if true, use single sync map for all leafs.
+   */
+  public CountFacetRecorder(boolean useSyncMap) {
+super();
+if (useSyncMap) {
+  values = new SafeIntIntHashMap();
+} else {
+  // Has to be synchronizedList as we have one recorder per all slices.
+  perLeafValues = Collections.synchronizedList(new ArrayList<>());
+}
+this.useSyncMap = useSyncMap;
+  }
+
+  /** Get count for provided ordinal. */
+  public int getCount(int ord) {
+// TODO: allow or don't allow missing values?
+return values.get(ord);
+  }
+
+  private static final class SafeIntIntHashMap extends IntIntHashMap {
+@Override
+public synchronized int addTo(int key, int incrementValue) {
+  return super.addTo(key, incrementValue);
+}
+  }
+
+  @Override
+  public FacetLeafRecorder getLeafRecorder(LeafReaderContext context) {
+if (useSyncMap) {
+  return new CountLeafRecorder(values);
+} else {
+  IntIntHashMap leafValues = new IntIntHashMap();
+  perLeafValues.add(leafValues);
+  return new CountLeafRecorder(leafValues);
+}
+  }
+
+  @Override
+  public OrdinalIterator recordedOrds() {
+// TODO: is that performant enough?
+// TODO: even if this is called before collection started, we want it to 
use results from the
+// time when nextOrd
+//  is first called. Does ordIterator work like that? I've run some tests 
that confirmed
+// expected behavior,
+//  but I'm not sure IntIntMap guarantees that. We should at least add a 
unit test to make sure
+// it always work
+//  that way.
+Iterator ordIterator = values.keys().iterator();
+return new OrdinalIterator() {
+  @Override
+  public int nextOrd() throws IOException {
+if (ordIterator.hasNext(

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



stefanvodita commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711380093


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ranges/OverlappingLongRangeFacetCutter.java:
##
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.ranges;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.facet.MultiLongValues;
+import org.apache.lucene.facet.MultiLongValuesSource;
+import org.apache.lucene.facet.range.LongRange;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.sandbox.facet.abstracts.FacetLeafCutter;
+import org.apache.lucene.search.LongValues;
+import org.apache.lucene.search.LongValuesSource;
+
+/**
+ * {@link RangeFacetCutter} for ranges of long value that overlap. Uses 
segment tree optimisation to
+ * find all matching ranges for a given value https://blog.mikemccandless.com/2013/12/fast-range-faceting-using-segment-trees.html";>fast-range-faceting-
+ * using-segment-trees.html
+ */
+class OverlappingLongRangeFacetCutter extends LongRangeFacetCutter {
+  private final LongRangeNode root;
+
+  OverlappingLongRangeFacetCutter(
+  String field,
+  MultiLongValuesSource longValuesSource,
+  LongValuesSource singleLongValuesSource,
+  LongRange[] longRanges) {
+super(field, longValuesSource, singleLongValuesSource, longRanges);
+
+// Build binary tree on top of intervals:
+root = split(0, elementaryIntervals.size(), elementaryIntervals);
+
+// Set outputs, so we know which range to output for each node in the tree:
+for (int i = 0; i < sortedRanges.length; i++) {
+  root.addOutputs(i, sortedRanges[i]);
+}
+  }
+
+  @Override
+  List buildElementaryIntervals() {
+// Maps all range inclusive endpoints to int flags; 1
+// = start of interval, 2 = end of interval.  We need to
+// track the start vs end case separately because if a
+// given point is both, then it must be its own
+// elementary interval:
+Map endsMap = new HashMap<>();
+
+endsMap.put(Long.MIN_VALUE, 1);
+endsMap.put(Long.MAX_VALUE, 2);
+
+for (LongRangeAndPos rangeAndPos : sortedRanges) {

Review Comment:
   Sure, seems reasonable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711382272


##
lucene/core/src/java/org/apache/lucene/search/CollectorOwner.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * Like {@link CollectorManager}, but it owns the collectors its manager 
creates. It is convenient
+ * that clients of the class don't have to worry about keeping the list of 
collectors, as well as
+ * about making the collectors type (C) compatible when reduce is called. 
Instance of this class
+ * also caches results of {@link CollectorManager#reduce(Collection)}.
+ *
+ * Note that instance of this class ignores any {@link Collector} created 
by {@link
+ * CollectorManager#newCollector()} directly, not through {@link 
#newCollector()}
+ *
+ * @lucene.experimental
+ */
+public final class CollectorOwner {

Review Comment:
   > Ahhh OK thank you for the explanation. I like the new `CollectorOwner` ... 
I just wish we could somehow iterate `CollectorManager` towards that, or 
deprecate / phase out `CollectorManager` somehow, or ...
   
   I'm not sure how to do that TBH. Changing `CollectorManager` to keep its 
collectors means that we either have to make it abstract class (which might not 
be convenient for users) or make sure each implementation keeps a list of its 
collectors, which adds some code duplication? Alternatively we could rename it 
to something like `CollectorSource` (similarly to DoubleValues**Source**) or 
`CollectorSupplier`? Although `reduce` method doesn't seem to fit into 
Source/Supplier category well.
   
   > But it seems worth at least creating an actual issue to track this TODO 
and see if we can converge at some point?
   
   Good point, I'll create an issue when/if this PR is merged, just in case we 
come up with a better solution as part of this PR later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



stefanvodita commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711388569


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/LongAggregationsFacetRecorder.java:
##
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.recorders;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.List;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.internal.hppc.IntCursor;
+import org.apache.lucene.internal.hppc.IntObjectHashMap;
+import org.apache.lucene.sandbox.facet.abstracts.FacetLeafRecorder;
+import org.apache.lucene.sandbox.facet.abstracts.FacetRecorder;
+import org.apache.lucene.sandbox.facet.abstracts.FacetRollup;
+import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator;
+import org.apache.lucene.sandbox.facet.abstracts.Reducer;
+import org.apache.lucene.search.LongValues;
+import org.apache.lucene.search.LongValuesSource;
+
+/**
+ * {@link FacetRecorder} that computes multiple long aggregations per facet. 
TODO: [premature
+ * optimization idea] if instead of one array we keep aggregations in two 
LongVector (one for MAX
+ * aggregation and one for SUM) we can benefit from SIMD?
+ */
+public class LongAggregationsFacetRecorder implements FacetRecorder {
+
+  private IntObjectHashMap perOrdinalValues;
+  private List> leafValues;
+
+  private final LongValuesSource[] longValuesSources;
+  private final Reducer[] reducers;
+
+  /** Constructor. */
+  public LongAggregationsFacetRecorder(LongValuesSource[] longValuesSources, 
Reducer[] reducers) {
+assert longValuesSources.length == reducers.length;
+this.longValuesSources = longValuesSources;
+this.reducers = reducers;
+perOrdinalValues = new IntObjectHashMap<>();
+leafValues = Collections.synchronizedList(new ArrayList<>());
+  }
+
+  @Override
+  public FacetLeafRecorder getLeafRecorder(LeafReaderContext context) throws 
IOException {
+LongValues[] longValues = new LongValues[longValuesSources.length];
+for (int i = 0; i < longValuesSources.length; i++) {
+  longValues[i] = longValuesSources[i].getValues(context, null);
+}
+IntObjectHashMap valuesRecorder = new IntObjectHashMap<>();
+leafValues.add(valuesRecorder);
+return new LongAggregationsFacetLeafRecorder(longValues, reducers, 
valuesRecorder);
+  }
+
+  @Override
+  public OrdinalIterator recordedOrds() {
+if (perOrdinalValues.isEmpty()) {
+  return null;
+}
+Iterator ordIterator = perOrdinalValues.keys().iterator();
+return new OrdinalIterator() {
+  @Override
+  public int nextOrd() throws IOException {
+if (ordIterator.hasNext()) {
+  return ordIterator.next().value;
+} else {
+  return NO_MORE_ORDS;
+}
+  }
+};
+  }
+
+  @Override
+  public boolean isEmpty() {
+return perOrdinalValues.isEmpty();
+  }
+
+  @Override
+  public void reduce(FacetRollup facetRollup) throws IOException {
+boolean firstElement = true;
+for (IntObjectHashMap leafValue : leafValues) {
+  if (firstElement) {
+perOrdinalValues = leafValue;
+firstElement = false;
+  } else {
+for (IntObjectHashMap.IntObjectCursor elem : leafValue) {
+  long[] vals = perOrdinalValues.get(elem.key);
+  if (vals == null) {
+perOrdinalValues.put(elem.key, elem.value);
+  } else {
+for (int i = 0; i < longValuesSources.length; i++) {
+  vals[i] = reducers[i].reduce(vals[i], elem.value[i]);
+}
+  }
+}
+  }
+}
+if (firstElement) {
+  // TODO: do we need empty map by default?
+  perOrdinalValues = new IntObjectHashMap<>();
+}
+// TODO: implement rollup
+if (facetRollup != null
+&& facetRollup.getDimOrdsToRollup().nextOrd() != 
OrdinalIterator.NO_MORE_ORDS) {
+  throw new UnsupportedOperationException("Rollup is required, but not 
implemented");
+}
+  }
+
+  public long getRecordedValue(int ord, int valuesId) {
+if (valuesId < 0 || valuesId >= lon

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711547954


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ordinals/OrdinalGetter.java:
##
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.ordinals;
+
+/** Interface to return an ordinal. */
+public interface OrdinalGetter {

Review Comment:
   I've made the change in this commit, please let me know what you think! 
https://github.com/apache/lucene/pull/13568/commits/9d158e225346d31cebb0e4eea32d5c425c155ca3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-09 Thread via GitHub



gsmiller commented on code in PR #13636:
URL: https://github.com/apache/lucene/pull/13636#discussion_r1711646318


##
lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultPostingDecodingUtil.java:
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.internal.vectorization;
+
+import java.io.IOException;
+import org.apache.lucene.store.IndexInput;
+
+final class DefaultPostingDecodingUtil extends PostingDecodingUtil {
+
+  protected final IndexInput in;
+
+  public DefaultPostingDecodingUtil(IndexInput in) {
+this.in = in;
+  }
+
+  @Override
+  public void splitLongs(int count, long[] b, int bShift, long bMask, long[] 
c, long cMask)
+  throws IOException {
+assert count <= 64;
+in.readLongs(c, 0, count);
+// The below loop is auto-vectorized
+for (int i = 0; i < count; ++i) {
+  b[i] = (c[i] >>> bShift) & bMask;
+  c[i] &= cMask;

Review Comment:
   Thanks @jpountz I think the version you landed on makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] These attributes are better for the final state(#13628) [lucene]

2024-08-09 Thread via GitHub



gsmiller commented on code in PR #13630:
URL: https://github.com/apache/lucene/pull/13630#discussion_r1711652689


##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene99/Lucene99SkipWriter.java:
##
@@ -46,8 +46,8 @@
  * uptos(position, payload). 4. start offset.
  */
 public final class Lucene99SkipWriter extends MultiLevelSkipListWriter {
-  private int[] lastSkipDoc;
-  private long[] lastSkipDocPointer;
+  private final int[] lastSkipDoc;
+  private final long[] lastSkipDocPointer;
   private long[] lastSkipPosPointer;
   private long[] lastSkipPayPointer;

Review Comment:
   @mrhbj were you planning to push an update to this PR that also marks these 
fields `final`? Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] These attributes are better for the final state(#13628) [lucene]

2024-08-09 Thread via GitHub



mrhbj commented on PR #13630:
URL: https://github.com/apache/lucene/pull/13630#issuecomment-2278184332

   ok i will do this
   
   
   
   ---Original---
   From: "Greg ***@***.***>
   Date: Fri, Aug 9, 2024 23:12 PM
   To: ***@***.***>;
   Cc: ***@***.**@***.***>;
   Subject: Re: [apache/lucene] These attributes are better for the 
finalstate(#13628) (PR #13630)
   
   
   
   

   @gsmiller commented on this pull request.


   In 
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene99/Lucene99SkipWriter.java:
>private long[] lastSkipPosPointer;private long[] 
lastSkipPayPointer;  
   @mrhbj were you planning to push an update to this PR that also marks these 
fields final? Thanks!

   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you were mentioned.Message ID: ***@***.***>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] These attributes are better for the final state(#13628)（#13630） [lucene]

2024-08-09 Thread via GitHub



mrhbj opened a new pull request, #13639:
URL: https://github.com/apache/lucene/pull/13639

   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] These attributes are better for the final state(#13628)（#13630） [lucene]

2024-08-09 Thread via GitHub



mrhbj commented on PR #13639:
URL: https://github.com/apache/lucene/pull/13639#issuecomment-2278225033

   @gsmiller thanks for your advise. I think you are right. So I do this. Could 
you please review it？


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1711693361


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/FacetFieldCollectorManager.java:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet;
+
+import java.io.IOException;
+import java.util.Collection;
+import org.apache.lucene.sandbox.facet.cutters.FacetCutter;
+import org.apache.lucene.sandbox.facet.misc.FacetRollup;
+import org.apache.lucene.sandbox.facet.recorders.FacetRecorder;
+import org.apache.lucene.search.CollectorManager;
+
+/**
+ * Collector manager for {@link FacetFieldCollector}. Returns the same 
extension of {@link
+ * FacetRecorder} that was used to collect results.
+ */
+public final class FacetFieldCollectorManager
+implements CollectorManager {
+
+  private final FacetCutter facetCutter;
+  private final V facetRecorder;
+  private final FacetRollup facetRollup;
+
+  /** Create collector for a cutter + recorder pair */
+  public FacetFieldCollectorManager(

Review Comment:
   I've pushed a commit 
https://github.com/apache/lucene/pull/13568/commits/e9af4f538a7c58f1dc6712d14449147d913fb829
 to move `FacetRollup`'s methods to `FacetCutter` and add expert mode ctor to 
`TaxonomyFacetsCutter` to force-disable rollup when users want it. Please let 
me know what you think!  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



epotyom commented on PR #13568:
URL: https://github.com/apache/lucene/pull/13568#issuecomment-2278257636

   @stefanvodita , @gsmiller , @mikemccand just wanted to let you know that I 
think I addressed all existing comments, and I marked as resolved the ones that 
don't seem to need follow ups. Thank you again for reviewing the PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]

2024-08-09 Thread via GitHub



gsmiller merged PR #13201:
URL: https://github.com/apache/lucene/pull/13201


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Improve AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight ScorerSupplier cost estimation [lucene]

2024-08-09 Thread via GitHub



gsmiller closed issue #13029: Improve 
AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight ScorerSupplier cost 
estimation
URL: https://github.com/apache/lucene/issues/13029


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]

2024-08-09 Thread via GitHub



gsmiller commented on PR #13201:
URL: https://github.com/apache/lucene/pull/13201#issuecomment-2278379745

   @msfroh hope you don't mind, but since my PR feedback was pretty minor, I 
went ahead and made those small changes on your branch and merged. I'll work on 
getting this backported shortly (a few conflicts to resolve). If you spot 
anything in the change you disagree with, please let me know and we can work 
through that. Thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Deprecate `COSINE` before Lucene 10 release [lucene]

2024-08-09 Thread via GitHub



benwtrent closed issue #13281: Deprecate `COSINE` before Lucene 10 release
URL: https://github.com/apache/lucene/issues/13281


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]

2024-08-09 Thread via GitHub



msfroh commented on PR #13201:
URL: https://github.com/apache/lucene/pull/13201#issuecomment-2278441860

   Thanks, Greg! I can create a follow-up PR with your suggestion on delegating 
to the rewritten `BooleanQuery` for the cost estimate.
   
   I think if we go down that path (basically doing the rewrite early), we 
might be able to clean up the code a bit. I think `WeightOrDocIdSetIterator` 
might become unnecessary (since we'll know ahead of time which one we'll have).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Lucene99FlatVectorsReader.getFloatVectorValues(): NPE: Cannot read field "vectorEncoding" because "fieldEntry" is null [lucene]

2024-08-09 Thread via GitHub



benwtrent commented on issue #13626:
URL: https://github.com/apache/lucene/issues/13626#issuecomment-2278487437

   OK, I was able to replicate with the following test:
   ```
public void testTryToThrowNPE() throws Exception {
   try (var dir = newDirectory()) {
 IndexWriterConfig iwc = new IndexWriterConfig();
 iwc.setMergePolicy(new 
ForceMergePolicy(iwc.getMergePolicy())).setCodec(new Lucene912VectorCodec());
 try (var writer = new IndexWriter(dir, iwc)) {
   for (int i = 0; i < 10; i++) {
 var doc = new Document();
 if (random().nextBoolean()) {
   doc.add(new KnnFloatVectorField("field", new float[] {1, 2, 3}));
 }
 writer.addDocument(doc);
 if (random().nextBoolean()) {
   writer.commit();
 }
   }
   for (int i = 0; i < 10; i++) {
 var doc = new Document();
 if (random().nextBoolean()) {
   // add a vector but a different field
   doc.add(new KnnFloatVectorField("otherVector", new float[] {1, 
2, 3}));
 }
 writer.addDocument(doc);
 if (random().nextBoolean()) {
   writer.commit();
 }
   }
   writer.forceMerge(1);
 }
   }
 }
   ```
   
   The error goes away if you use the perField codec like so:
   
   ```
   public static class Lucene99VectorCodec extends FilterCodec
 {
   public Lucene99VectorCodec() {
 super("Lucene99VectorCodec", new Lucene99Codec());
   }
   
   @Override
   public KnnVectorsFormat knnVectorsFormat() {
 return new PerFieldKnnVectorsFormat() {
   @Override
   public KnnVectorsFormat getKnnVectorsFormatForField(String field) {
 return new Lucene99HnswVectorsFormat(16, 250);
   }
 };
   }
 }
   ```
   
   I am not sure its valid to have more than one kNN field and not use the 
perfield format. The logic seems to make that assumption.
   
   @msokolov ^ What do you think of this bug. I am not sure what our behavior 
should be. We can easily check for a `null` field and return `null`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Lucene99FlatVectorsReader.getFloatVectorValues(): NPE: Cannot read field "vectorEncoding" because "fieldEntry" is null [lucene]

2024-08-09 Thread via GitHub



benwtrent commented on issue #13626:
URL: https://github.com/apache/lucene/issues/13626#issuecomment-2278489924

   Looking at what PointValues does:
   ```
   FieldInfo fieldInfo = readState.fieldInfos.fieldInfo(fieldName);
   if (fieldInfo == null) {
 throw new IllegalArgumentException("field=\"" + fieldName + "\" is 
unrecognized");
   }
   ```
   
   Maybe kNN should do the same thing. At least its a better error than an NPE.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[I] testMergeStability failing for Knn formats [lucene]

2024-08-09 Thread via GitHub



benwtrent opened a new issue, #13640:
URL: https://github.com/apache/lucene/issues/13640

   ### Description
   
   All KNN formats are periodically failing `testMergeStability`. 
   
   I have verified its due to https://github.com/apache/lucene/pull/13566 
   
   The stability failure is due to a different size in the `vex` (e.g. vector 
graph connections).
   
   ### Gradle command to reproduce
   
   ```
   ./gradlew test --tests 
TestLucene99HnswScalarQuantizedVectorsFormat.testMergeStability 
-Dtests.seed=84298DFFA7C134B7 -Dtests.locale=kn -Dtests.timezone=Hongko 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] testMergeStability failing for Knn formats [lucene]

2024-08-09 Thread via GitHub



benwtrent commented on issue #13640:
URL: https://github.com/apache/lucene/issues/13640#issuecomment-2278701946

   @msokolov ^ I haven't been able to look into fixing it yet. Just now noticed 
it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Unify how missing field entries are handle in knn formats [lucene]

2024-08-09 Thread via GitHub



benwtrent opened a new pull request, #13641:
URL: https://github.com/apache/lucene/pull/13641

   It is possible to inappropriately use the knn formats and attempt to merge 
segments with mismatched field names.
   
   None of the formats actually check for `null`, they just all assume that the 
field entry exists. 
   
   I opted to throw an IllegalArgumentException here to keep with the current 
behavior, but prevent an unexpected NPE.
   
   Additionally, this unifies the scalar quantized formats, these checked for 
null values and returned null. But we should behave uniformly across all 
formats. 
   
   closes: https://github.com/apache/lucene/issues/13626


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Delegating the matches in PointRangeQuery weight to relate method [lucene]

2024-08-09 Thread via GitHub



gsmiller commented on PR #13599:
URL: https://github.com/apache/lucene/pull/13599#issuecomment-2278737360

   @jainankitk thanks for the iterations! I'm fine with making this change as 
you currently having. I'll get it merged. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]

2024-08-09 Thread via GitHub



benwtrent commented on code in PR #13635:
URL: https://github.com/apache/lucene/pull/13635#discussion_r1712183885


##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##
@@ -70,6 +72,43 @@ public static void search(
 search(scorer, knnCollector, graph, graphSearcher, acceptOrds);
   }
 
+  /**
+   * Searches the HNSW graph for for the nerest neighbors of a query vector, 
starting from the
+   * provided entry points.
+   *
+   * @param scorer the scorer to compare the query with the nodes
+   * @param knnCollector a collector of top knn results to be returned
+   * @param graph the graph values. May represent the entire graph, or a level 
in a hierarchical
+   * graph.
+   * @param acceptOrds {@link Bits} that represents the allowed document 
ordinals to match, or
+   * {@code null} if they are all allowed to match.
+   * @param entryPointOrds the entry points for search.
+   */
+  public static void search(
+  RandomVectorScorer scorer,
+  KnnCollector knnCollector,
+  HnswGraph graph,
+  Bits acceptOrds,
+  DocIdSetIterator entryPointOrds)

Review Comment:
   @seanmacavaney looking at the code, I think you can do what you want if you 
get access to the underlying `SparseOffHeapVectorValues*` implementation. Then 
you can use the `IndexedDISI` to advance to each matched doc & use 
`disi.index()` to get the corresponding ordinal.
   
   Maybe `FlatVectorsReader` can have a method called `DocIdSetIterator 
getVectorOrdinals(String field, DocIdSetIterator docIdIterator)` ?
   
   Or `RandomAccessVectorValues` can have such a method. Both of these 
interfaces are experimental, so changing them is OK.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Delegating the matches in PointRangeQuery weight to relate method [lucene]

2024-08-09 Thread via GitHub



gsmiller merged PR #13599:
URL: https://github.com/apache/lucene/pull/13599


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Remove redundant code in PointRangeQuery Weight [lucene]

2024-08-09 Thread via GitHub



gsmiller closed issue #13598: Remove redundant code in PointRangeQuery Weight
URL: https://github.com/apache/lucene/issues/13598


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] TermsQuery as MultiTermQuery can dramatically overestimate its cost [lucene]

2024-08-09 Thread via GitHub



gsmiller commented on issue #12483:
URL: https://github.com/apache/lucene/issues/12483#issuecomment-2278767613

   Hey @romseygeek - there's been a recent improvement to 
`AbstractMultiTermQueryConstantScoreWrapper` that may help the use-case you've 
described here (#13201). I'm curious if this solves the performance regression 
you're seeing, or if you still have issues. This change will only help when 
there are 16 or fewer terms in the terms query, so if you have more than that, 
I suspect you won't see any improvements. The change was just merged onto 
`main` and `branch_9x` today, so you'd have to cherry-pick it down (or wait for 
9.12).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



gsmiller commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1712226750


##
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java:
##
@@ -45,58 +45,26 @@ class DrillSidewaysQuery extends Query {
 
   final Query baseQuery;
 
-  final FacetsCollectorManager drillDownCollectorManager;
-  final FacetsCollectorManager[] drillSidewaysCollectorManagers;
-  final List managedDrillDownCollectors;
-  final List managedDrillSidewaysCollectors;
+  final CollectorOwner drillDownCollectorOwner;
+  final List> drillSidewaysCollectorOwners;
 
   final Query[] drillDownQueries;
 
   final boolean scoreSubDocsAtOnce;
 
   /**
* Construct a new {@code DrillSidewaysQuery} that will create new {@link 
FacetsCollector}s for
-   * each {@link LeafReaderContext} using the provided {@link 
FacetsCollectorManager}s. The caller
-   * can access the created {@link FacetsCollector}s through {@link 
#managedDrillDownCollectors} and
-   * {@link #managedDrillSidewaysCollectors}.
+   * each {@link LeafReaderContext} using the provided {@link 
FacetsCollectorManager}s.
*/
   DrillSidewaysQuery(
   Query baseQuery,
-  FacetsCollectorManager drillDownCollectorManager,
-  FacetsCollectorManager[] drillSidewaysCollectorManagers,
-  Query[] drillDownQueries,
-  boolean scoreSubDocsAtOnce) {
-// Note that the "managed" facet collector lists are synchronized here 
since bulkScorer()
-// can be invoked concurrently and needs to remain thread-safe. We're OK 
with synchronizing
-// on the whole list as contention is expected to remain very low:
-this(
-baseQuery,
-drillDownCollectorManager,
-drillSidewaysCollectorManagers,
-Collections.synchronizedList(new ArrayList<>()),
-Collections.synchronizedList(new ArrayList<>()),
-drillDownQueries,
-scoreSubDocsAtOnce);
-  }
-
-  /**
-   * Needed for {@link Query#rewrite(IndexSearcher)}. Ensures the same 
"managed" lists get used
-   * since {@link DrillSideways} accesses references to these through the 
original {@code
-   * DrillSidewaysQuery}.
-   */
-  private DrillSidewaysQuery(
-  Query baseQuery,
-  FacetsCollectorManager drillDownCollectorManager,
-  FacetsCollectorManager[] drillSidewaysCollectorManagers,
-  List managedDrillDownCollectors,
-  List managedDrillSidewaysCollectors,
+  CollectorOwner drillDownCollectorOwner,

Review Comment:
   +1 I don't think we should be concerned with breaking the API for 
`DrillSidewaysQuery` since it's pkg-private. Does that sound right @mikemccand ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



gsmiller commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1712245084


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/cutters/ranges/LongRangeFacetCutter.java:
##
@@ -0,0 +1,413 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.cutters.ranges;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+import org.apache.lucene.facet.MultiLongValues;
+import org.apache.lucene.facet.MultiLongValuesSource;
+import org.apache.lucene.facet.range.LongRange;
+import org.apache.lucene.sandbox.facet.cutters.FacetCutter;
+import org.apache.lucene.sandbox.facet.cutters.LeafFacetCutter;
+import org.apache.lucene.search.LongValues;
+import org.apache.lucene.search.LongValuesSource;
+
+/** {@link RangeFacetCutter} for ranges of long values. It's based on 
LongRangeCounter class. */
+public abstract class LongRangeFacetCutter extends RangeFacetCutter {
+
+  MultiLongValuesSource valuesSource;
+  LongValuesSource singleValues; // TODO: refactor - weird that we have both 
multi and single here.
+  LongRangeAndPos[] sortedRanges;
+
+  int requestedRangeCount;
+
+  List elementaryIntervals;
+
+  /** elementary interval boundaries used for efficient counting (bsearch to 
find interval) */
+  long[] boundaries;
+
+  int[] pos;
+
+  // Default interval position, when elementary interval is mapped to this 
interval
+  // it is skipped.
+  static final int SKIP_INTERVAL_POSITION = -1;
+
+  /** Create {@link FacetCutter} for provided value source and long ranges. */
+  public static LongRangeFacetCutter create(
+  String field,
+  MultiLongValuesSource longValuesSource,
+  LongValuesSource singleLongValuesSource,
+  LongRange[] longRanges) {
+if (areOverlappingRanges(longRanges)) {
+  return new OverlappingLongRangeFacetCutter(
+  field, longValuesSource, singleLongValuesSource, longRanges);
+}
+return new NonOverlappingLongRangeFacetCutter(
+field, longValuesSource, singleLongValuesSource, longRanges);
+  }
+
+  public static LongRangeFacetCutter create(
+  String field, MultiLongValuesSource longValuesSource, LongRange[] 
longRanges) {
+return create(field, longValuesSource, null, longRanges);
+  }
+
+  // caller handles conversion of Doubles and DoubleRange to Long and LongRange
+  // ranges need not be sorted
+  LongRangeFacetCutter(
+  String field,
+  MultiLongValuesSource longValuesSource,
+  LongValuesSource singleLongValuesSource,
+  LongRange[] longRanges) {
+super(field);
+valuesSource = longValuesSource;
+if (singleLongValuesSource != null) {
+  singleValues = singleLongValuesSource;
+} else {
+  singleValues = MultiLongValuesSource.unwrapSingleton(valuesSource);
+}
+
+sortedRanges = new LongRangeAndPos[longRanges.length];
+requestedRangeCount = longRanges.length;
+
+for (int i = 0; i < longRanges.length; i++) {
+  sortedRanges[i] = new LongRangeAndPos(longRanges[i], i);
+}
+
+Arrays.sort(this.sortedRanges, Comparator.comparingLong(r -> r.range.min));
+elementaryIntervals = buildElementaryIntervals();
+
+// Keep track of elementary interval boundary ends (for binary search) 
along with the requested
+// range they map back to (and -1 when they map to a "gap" range in case 
of ExclusiveRanges):
+boundaries = new long[elementaryIntervals.size()];
+pos = new int[elementaryIntervals.size()];
+Arrays.fill(pos, SKIP_INTERVAL_POSITION);
+int currRange = 0;
+for (int i = 0; i < boundaries.length; i++) {
+  boundaries[i] = elementaryIntervals.get(i).end;
+  if (currRange < sortedRanges.length) {
+LongRangeAndPos curr = sortedRanges[currRange];
+if (boundaries[i] == curr.range.max) {
+  pos[i] = curr.pos;
+  currRange++;
+}
+  }
+}
+  }
+
+  abstract List buildElementaryIntervals();
+
+  private static boolean areOverlappingRanges(LongRange[] ranges) {
+if (ranges.length == 0) {
+  return false;
+}
+
+// Copy before sorting so we don't mess with the caller's

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



gsmiller commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1712245609


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/cutters/ranges/DoubleRangeFacetCutter.java:
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.cutters.ranges;
+
+import java.io.IOException;
+import org.apache.lucene.facet.MultiDoubleValuesSource;
+import org.apache.lucene.facet.MultiLongValuesSource;
+import org.apache.lucene.facet.range.DoubleRange;
+import org.apache.lucene.facet.range.LongRange;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.sandbox.facet.cutters.LeafFacetCutter;
+import org.apache.lucene.search.DoubleValuesSource;
+import org.apache.lucene.search.LongValuesSource;
+import org.apache.lucene.util.NumericUtils;
+
+/**
+ * {@link RangeFacetCutter} for ranges of double values.
+ *
+ * Based on {@link org.apache.lucene.facet.range.DoubleRangeFacetCounts}, 
this class translates
+ * double ranges to long ranges using {@link 
NumericUtils#doubleToSortableLong} and delegates
+ * faceting work to a {@link LongRangeFacetCutter}.
+ */
+public class DoubleRangeFacetCutter extends RangeFacetCutter {
+
+  LongRangeFacetCutter longRangeFacetCutter;
+
+  MultiDoubleValuesSource multiDoubleValuesSource;
+  DoubleValuesSource singleDoubleValuesSource;
+  DoubleRange[] doubleRanges;
+
+  MultiLongValuesSource multiLongValuesSource;
+  LongValuesSource singleLongValuesSource;
+
+  LongRange[] longRanges;
+
+  /** Constructor. */
+  public DoubleRangeFacetCutter(
+  String field, MultiDoubleValuesSource valuesSource, DoubleRange[] 
doubleRanges) {
+super(field);
+this.multiDoubleValuesSource = valuesSource;
+this.singleDoubleValuesSource = 
MultiDoubleValuesSource.unwrapSingleton(valuesSource);
+this.doubleRanges = doubleRanges;
+if (singleDoubleValuesSource != null) { // TODO: ugly!
+  this.singleLongValuesSource = 
singleDoubleValuesSource.toSortableLongDoubleValuesSource();
+} else {
+  this.multiLongValuesSource = 
multiDoubleValuesSource.toSortableMultiLongValuesSource();
+}
+this.longRanges = mapDoubleRangesToSortableLong(doubleRanges);
+this.longRangeFacetCutter =
+LongRangeFacetCutter.create(

Review Comment:
   I'm good with that. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



gsmiller commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1712247433


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/FacetFieldCollectorManager.java:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet;
+
+import java.io.IOException;
+import java.util.Collection;
+import org.apache.lucene.sandbox.facet.cutters.FacetCutter;
+import org.apache.lucene.sandbox.facet.misc.FacetRollup;
+import org.apache.lucene.sandbox.facet.recorders.FacetRecorder;
+import org.apache.lucene.search.CollectorManager;
+
+/**
+ * Collector manager for {@link FacetFieldCollector}. Returns the same 
extension of {@link
+ * FacetRecorder} that was used to collect results.
+ */
+public final class FacetFieldCollectorManager
+implements CollectorManager {
+
+  private final FacetCutter facetCutter;
+  private final V facetRecorder;
+  private final FacetRollup facetRollup;
+
+  /** Create collector for a cutter + recorder pair */
+  public FacetFieldCollectorManager(

Review Comment:
   This looks good. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



gsmiller commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1712250405


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/ordinals/CandidateSetOrdinalIterator.java:
##
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.ordinals;
+
+import java.io.IOException;
+import org.apache.lucene.internal.hppc.IntHashSet;
+
+/**
+ * {@link OrdinalIterator} that filters out ordinals from delegate if they are 
not in the candidate

Review Comment:
   Looks good now. Thanks! And thanks for leaving the TODO to explore leading 
the iteration with he recorder ordinals in certain cases (but agreed it's 
probably not the common-case so I think you've optimized in the right direction 
for now)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



gsmiller commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1712251401


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/FacetRecorder.java:
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.recorders;
+
+import java.io.IOException;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.sandbox.facet.cutters.LeafFacetCutter;
+import org.apache.lucene.sandbox.facet.misc.FacetRollup;
+import org.apache.lucene.sandbox.facet.ordinals.OrdinalIterator;
+
+/**
+ * Record data for each facet of each doc.
+ *
+ * TODO: In the next iteration we can add an extra layer between 
FacetRecorder and
+ * LeafFacetRecorder, e.g. SliceFacetRecorder. The new layer will be created 
per {@link
+ * org.apache.lucene.search.Collector}, which means that collecting of 
multiple leafs (segments)
+ * within a slice is sequential and can be done to a single non-sync map to 
improve performance and
+ * reduce memory consumption. We already tried that, but didn't see any 
performance improvement.
+ * Given that it also makes lazy leaf recorder init in {@link
+ * org.apache.lucene.sandbox.facet.FacetFieldCollector} trickier, it was 
decided to rollback the
+ * initial attempt and try again later, in the next iteration.
+ */
+public interface FacetRecorder {
+  /** Get leaf recorder. */
+  LeafFacetRecorder getLeafRecorder(LeafReaderContext context) throws 
IOException;
+
+  /** Return next collected ordinal, or {@link LeafFacetCutter#NO_MORE_ORDS} */
+  OrdinalIterator recordedOrds();
+
+  /** True if there are no records */
+  boolean isEmpty();

Review Comment:
   I like it. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

2024-08-09 Thread via GitHub



gsmiller commented on PR #13568:
URL: https://github.com/apache/lucene/pull/13568#issuecomment-2278854633

   @epotyom I looked over the recent changes and went through the unresolved 
comments. I don't think there's anything blocking at this point from my point 
of view. @mikemccand do you have any blocking concerns on this? If not, I will 
start working on the merge Monday morning. Thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] expand TestSegmentToThreadMapping coverage w.r.t. (excess) documents per slice [lucene]

2024-08-09 Thread via GitHub



github-actions[bot] commented on PR #13508:
URL: https://github.com/apache/lucene/pull/13508#issuecomment-2278903000

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-09 Thread via GitHub



jpountz commented on code in PR #13636:
URL: https://github.com/apache/lucene/pull/13636#discussion_r1712554332


##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java:
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.internal.vectorization;
+
+import java.io.IOException;
+import java.lang.foreign.MemorySegment;
+import java.nio.ByteOrder;
+import jdk.incubator.vector.LongVector;
+import jdk.incubator.vector.VectorOperators;
+import jdk.incubator.vector.VectorSpecies;
+import org.apache.lucene.store.IndexInput;
+
+final class MemorySegmentPostingDecodingUtil extends PostingDecodingUtil {
+
+  private static final VectorSpecies LONG_SPECIES =
+  PanamaVectorConstants.PRERERRED_LONG_SPECIES;
+
+  private final IndexInput in;
+  private final MemorySegment memorySegment;
+
+  MemorySegmentPostingDecodingUtil(IndexInput in, MemorySegment memorySegment) 
{
+this.in = in;
+this.memorySegment = memorySegment;
+  }
+
+  @Override
+  public void readLongs(int count, long[] b) throws IOException {
+if (count < LONG_SPECIES.length()) {
+  in.readLongs(b, 0, count);
+} else {
+  long offset = in.getFilePointer();
+  long endOffset = offset + count * Long.BYTES;
+  int loopBound = LONG_SPECIES.loopBound(count - 1);
+  for (int i = 0;
+  i < loopBound;
+  i += LONG_SPECIES.length(), offset += LONG_SPECIES.length() * 
Long.BYTES) {
+LongVector.fromMemorySegment(LONG_SPECIES, memorySegment, offset, 
ByteOrder.LITTLE_ENDIAN)

Review Comment:
   This method is only used when decoding 8 or 16 bits per value. I am not sure 
why, but the micro benchmark sees a small but consistent speedup by copying 
memory this way rather than using readLongs. If you don't like it, I can remove 
it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

53 matches

Mail list logo