Re: [PR] Add target search concurrency to TieredMergePolicy [lucene]

2024-07-17 Thread via GitHub


jpountz commented on code in PR #13430:
URL: https://github.com/apache/lucene/pull/13430#discussion_r1680544831


##
lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java:
##
@@ -1095,6 +1095,12 @@ public static TieredMergePolicy 
newTieredMergePolicy(Random r) {
 } else {
   tmp.setSegmentsPerTier(TestUtil.nextInt(r, 10, 50));
 }
+if (rarely(r)) {
+  tmp.setTargetSearchConcurrency(TestUtil.nextInt(r, 2, 20));
+} else {
+  tmp.setTargetSearchConcurrency(TestUtil.nextInt(r, 10, 50));

Review Comment:
   I would have expected the opposite: that we commonly set a reasonable target 
search concurrency, and rarely set a high value?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Ensure to use IOContext.READONCE when reading segment files [lucene]

2024-07-17 Thread via GitHub


ChrisHegarty merged PR #13574:
URL: https://github.com/apache/lucene/pull/13574


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Compute facets while collecting [lucene]

2024-07-17 Thread via GitHub


epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1680705815


##
lucene/sandbox/src/test/org/apache/lucene/sandbox/facet/TestRangeFacet.java:
##
@@ -0,0 +1,1654 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet;
+
+import static org.apache.lucene.facet.FacetsConfig.DEFAULT_INDEX_FIELD_NAME;
+
+import com.carrotsearch.randomizedtesting.generators.RandomNumbers;
+import java.io.IOException;
+import java.util.List;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.DoubleDocValuesField;
+import org.apache.lucene.document.DoublePoint;
+import org.apache.lucene.document.LongPoint;
+import org.apache.lucene.document.NumericDocValuesField;
+import org.apache.lucene.document.SortedNumericDocValuesField;
+import org.apache.lucene.facet.DrillDownQuery;
+import org.apache.lucene.facet.DrillSideways;
+import org.apache.lucene.facet.FacetField;
+import org.apache.lucene.facet.FacetResult;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.LabelAndValue;
+import org.apache.lucene.facet.MultiDoubleValuesSource;
+import org.apache.lucene.facet.MultiLongValuesSource;
+import org.apache.lucene.facet.range.DoubleRange;
+import org.apache.lucene.facet.range.LongRange;
+import org.apache.lucene.facet.range.Range;
+import org.apache.lucene.facet.taxonomy.TaxonomyReader;
+import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader;
+import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.sandbox.facet.abstracts.OrdLabelBiMap;
+import org.apache.lucene.sandbox.facet.ranges.DoubleRangeFacetCutter;
+import org.apache.lucene.sandbox.facet.ranges.LongRangeFacetCutter;
+import org.apache.lucene.sandbox.facet.ranges.RangeOrdLabelBiMap;
+import org.apache.lucene.sandbox.facet.recorders.CountFacetRecorder;
+import org.apache.lucene.sandbox.facet.taxonomy.TaxonomyFacetsCutter;
+import org.apache.lucene.search.CollectorOwner;
+import org.apache.lucene.search.DoubleValues;
+import org.apache.lucene.search.DoubleValuesSource;
+import org.apache.lucene.search.Explanation;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.LongValuesSource;
+import org.apache.lucene.search.MatchAllDocsQuery;
+import org.apache.lucene.search.MultiCollectorManager;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.tests.index.RandomIndexWriter;
+import org.apache.lucene.tests.search.DummyTotalHitCountCollector;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.NumericUtils;
+
+/**
+ * Test sandbox facet ranges. Mostly test cases from LongRangeFacetCounts 
adopted for sandbox
+ * faceting.
+ */
+public class TestRangeFacet extends SandboxFacetTestCase {
+
+  public void testBasicLong() throws Exception {
+Directory d = newDirectory();
+RandomIndexWriter w = new RandomIndexWriter(random(), d);
+Document doc = new Document();
+NumericDocValuesField field = new NumericDocValuesField("field", 0L);
+doc.add(field);
+for (long l = 0; l < 100; l++) {
+  field.setLongValue(l);
+  w.addDocument(doc);
+}
+
+// Also add Long.MAX_VALUE
+field.setLongValue(Long.MAX_VALUE);
+w.addDocument(doc);
+
+IndexReader r = w.getReader();
+w.close();
+
+IndexSearcher s = newSearcher(r);
+LongRange[] inputRanges =
+new LongRange[] {
+  new LongRange("less than 10", 0L, true, 10L, false),
+  new LongRange("less than or equal to 10", 0L, true, 10L, true),
+  new LongRange("over 90", 90L, false, 100L, false),
+  new LongRange("90 or above", 90L, true, 100L, false),
+  new LongRange("over 1000", 1000L, false, Long.MAX_VALUE, true),
+};
+
+MultiLongValuesSource valuesSource = 
MultiLongValuesSource.fromLongField("field");
+LongRangeFacetCutter longRangeFacetCutter =
+LongRangeFacetCutter.create("field", valuesSource, inputR

Re: [PR] [9.x] Ensure to use IOContext.READONCE when reading segment files [lucene]

2024-07-17 Thread via GitHub


uschindler commented on code in PR #13578:
URL: https://github.com/apache/lucene/pull/13578#discussion_r1680744068


##
lucene/test-framework/src/java/org/apache/lucene/tests/store/MockDirectoryWrapper.java:
##
@@ -814,6 +814,14 @@ public synchronized IndexInput openInput(String name, 
IOContext context) throws
 
 context = LuceneTestCase.newIOContext(randomState, context);
 final boolean confined = context == IOContext.READONCE;

Review Comment:
   I think in 9.x this must be `context.readOnce`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [9.x] Ensure to use IOContext.READONCE when reading segment files [lucene]

2024-07-17 Thread via GitHub


uschindler commented on code in PR #13578:
URL: https://github.com/apache/lucene/pull/13578#discussion_r1680744068


##
lucene/test-framework/src/java/org/apache/lucene/tests/store/MockDirectoryWrapper.java:
##
@@ -814,6 +814,14 @@ public synchronized IndexInput openInput(String name, 
IOContext context) throws
 
 context = LuceneTestCase.newIOContext(randomState, context);
 final boolean confined = context == IOContext.READONCE;

Review Comment:
   I think in 9.x this must be `final boolean confined = context.readOnce`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-17 Thread via GitHub


iverase commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1680751732


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java:
##
@@ -1792,61 +1794,88 @@ public DocValuesSkipper getSkipper(FieldInfo field) 
throws IOException {
 if (input.length() > 0) {
   input.prefetch(0, 1);
 }
+// TODO: should we write to disk the actual max level for this segment?
 return new DocValuesSkipper() {
-  int minDocID = -1;
-  int maxDocID = -1;
-  long minValue, maxValue;
-  int docCount;
+  final int[] minDocID = new int[SKIP_INDEX_MAX_LEVEL];
+  final int[] maxDocID = new int[SKIP_INDEX_MAX_LEVEL];
+
+  {
+for (int i = 0; i < SKIP_INDEX_MAX_LEVEL; i++) {
+  minDocID[i] = maxDocID[i] = -1;
+}
+  }
+
+  final long[] minValue = new long[SKIP_INDEX_MAX_LEVEL];
+  final long[] maxValue = new long[SKIP_INDEX_MAX_LEVEL];
+  final int[] docCount = new int[SKIP_INDEX_MAX_LEVEL];
+  int levels;
 
   @Override
   public void advance(int target) throws IOException {
 if (target > entry.maxDocId) {
-  minDocID = DocIdSetIterator.NO_MORE_DOCS;
-  maxDocID = DocIdSetIterator.NO_MORE_DOCS;
+  // skipper is exhausted
+  for (int i = 0; i < SKIP_INDEX_MAX_LEVEL; i++) {
+minDocID[i] = maxDocID[i] = DocIdSetIterator.NO_MORE_DOCS;
+  }
 } else {
+  // find next interval
+  assert target > maxDocID[0] : "target must be bigger that current 
interval";
   while (true) {
-maxDocID = input.readInt();
-if (maxDocID >= target) {
-  minDocID = input.readInt();
-  maxValue = input.readLong();
-  minValue = input.readLong();
-  docCount = input.readInt();
+levels = input.readByte();

Review Comment:
   I modified the implementation so we will have access to higher levels at any 
point. Let me know what you think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Compute facets while collecting [lucene]

2024-07-17 Thread via GitHub


stefanvodita commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1680778354


##
lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java:
##
@@ -0,0 +1,714 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.demo.facet;
+
+import static org.apache.lucene.facet.FacetsConfig.DEFAULT_INDEX_FIELD_NAME;
+import static 
org.apache.lucene.sandbox.facet.ComparableUtils.rankCountOrdToComparable;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.DoubleDocValuesField;
+import org.apache.lucene.document.NumericDocValuesField;
+import org.apache.lucene.facet.DrillDownQuery;
+import org.apache.lucene.facet.DrillSideways;
+import org.apache.lucene.facet.FacetField;
+import org.apache.lucene.facet.FacetResult;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.LabelAndValue;
+import org.apache.lucene.facet.MultiLongValuesSource;
+import org.apache.lucene.facet.range.LongRange;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.TaxonomyReader;
+import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader;
+import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.index.IndexWriterConfig.OpenMode;
+import org.apache.lucene.sandbox.facet.ComparableUtils;
+import org.apache.lucene.sandbox.facet.FacetFieldCollector;
+import org.apache.lucene.sandbox.facet.FacetFieldCollectorManager;
+import org.apache.lucene.sandbox.facet.abstracts.OrdLabelBiMap;
+import org.apache.lucene.sandbox.facet.abstracts.OrdToComparable;
+import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator;
+import org.apache.lucene.sandbox.facet.abstracts.Reducer;
+import org.apache.lucene.sandbox.facet.ordinal_iterators.TopnOrdinalIterator;
+import org.apache.lucene.sandbox.facet.ranges.LongRangeFacetCutter;
+import org.apache.lucene.sandbox.facet.ranges.RangeOrdLabelBiMap;
+import org.apache.lucene.sandbox.facet.recorders.CountFacetRecorder;
+import org.apache.lucene.sandbox.facet.recorders.LongAggregationsFacetRecorder;
+import org.apache.lucene.sandbox.facet.recorders.MultiFacetsRecorder;
+import 
org.apache.lucene.sandbox.facet.taxonomy.TaxonomyChildrenOrdinalIterator;
+import org.apache.lucene.sandbox.facet.taxonomy.TaxonomyFacetsCutter;
+import org.apache.lucene.sandbox.facet.taxonomy.TaxonomyOrdLabelBiMap;
+import org.apache.lucene.search.CollectorOwner;
+import org.apache.lucene.search.DoubleValuesSource;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.LongValuesSource;
+import org.apache.lucene.search.MatchAllDocsQuery;
+import org.apache.lucene.search.MultiCollectorManager;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.search.TopScoreDocCollectorManager;
+import org.apache.lucene.store.ByteBuffersDirectory;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+
+/** Demo for sandbox faceting. */
+public class SandboxFacetsExample {

Review Comment:
   There are other places as well where we use the `SandboxFacet` name. For 
something descriptive, how about `MatchTimeFacet`? For something cute, how 
about thinking around precious stones? I like 
[`Lustre`](https://en.wikipedia.org/wiki/Lustre_(mineralogy)). If you want 
something in between cute and technical, 
[`Facetron`](https://facetron.com/thefacetron/) is a faceting machine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mai

Re: [PR] Add target search concurrency to TieredMergePolicy [lucene]

2024-07-17 Thread via GitHub


carlosdelest commented on code in PR #13430:
URL: https://github.com/apache/lucene/pull/13430#discussion_r1680829569


##
lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java:
##
@@ -1095,6 +1095,12 @@ public static TieredMergePolicy 
newTieredMergePolicy(Random r) {
 } else {
   tmp.setSegmentsPerTier(TestUtil.nextInt(r, 10, 50));
 }
+if (rarely(r)) {
+  tmp.setTargetSearchConcurrency(TestUtil.nextInt(r, 2, 20));
+} else {
+  tmp.setTargetSearchConcurrency(TestUtil.nextInt(r, 10, 50));

Review Comment:
   I see - I was mimicking the segments per tier approach here, but I guess 
that makes sense. Changing it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-07-17 Thread via GitHub


rmuir commented on code in PR #13572:
URL: https://github.com/apache/lucene/pull/13572#discussion_r1680841649


##
lucene/core/build.gradle:
##
@@ -14,10 +14,43 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
+plugins {
+  id "c"
+}
 
 apply plugin: 'java-library'
+apply plugin: 'c'
 
 description = 'Lucene core library'
+model {
+  toolChains {
+gcc(Gcc) {
+  target("linux_aarch64"){
+cppCompiler.withArguments { args ->
+  args << "-O3 --shared"

Review Comment:
   @goankur at a glance it seems you are ignoring return value from it. i think 
it should be `accum = vdotq_s32(accum, v1, v2)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-17 Thread via GitHub


benwtrent commented on issue #13519:
URL: https://github.com/apache/lucene/issues/13519#issuecomment-2233036266

   > In any case, you have worked with the code far longer than I have, so if 
you are confident about it please go ahead and commit :)
   
   @naveentatikonda could you test my branch as well? I want to make sure my 
numbers were not flukes on my end. 
   
   We should also test with different data sets & different similarity metrics. 
We should verify that euclidean is also fixed, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [9.x] Ensure to use IOContext.READONCE when reading segment files [lucene]

2024-07-17 Thread via GitHub


ChrisHegarty commented on code in PR #13578:
URL: https://github.com/apache/lucene/pull/13578#discussion_r1680862628


##
lucene/test-framework/src/java/org/apache/lucene/tests/store/MockDirectoryWrapper.java:
##
@@ -814,6 +814,14 @@ public synchronized IndexInput openInput(String name, 
IOContext context) throws
 
 context = LuceneTestCase.newIOContext(randomState, context);
 final boolean confined = context == IOContext.READONCE;

Review Comment:
   The failure is because of a use of `READ` rather than `READONCE` when 
checksumming a segments file in `SegmentInfos::readCommit`. So, the new check 
in the MockDirectoryWrapper is working as expected - finding a really problem.  
Orthogonally, we could expand the check as you suggested, but it is not 
strictly necessary here, unless you think that it should be done anyway?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [9.x] Ensure to use IOContext.READONCE when reading segment files [lucene]

2024-07-17 Thread via GitHub


uschindler commented on code in PR #13578:
URL: https://github.com/apache/lucene/pull/13578#discussion_r1680884653


##
lucene/test-framework/src/java/org/apache/lucene/tests/store/MockDirectoryWrapper.java:
##
@@ -814,6 +814,14 @@ public synchronized IndexInput openInput(String name, 
IOContext context) throws
 
 context = LuceneTestCase.newIOContext(randomState, context);
 final boolean confined = context == IOContext.READONCE;

Review Comment:
   Ah nice. We have removed the IOContext in checksum IndexInput in main 
branch. 👍
   
   I think the suggested change here is not needed. It's legacy code anyways.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


ChrisHegarty commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2233089190

   > Can someone take a stab at summarizing this for CHANGES.txt, thus avoiding 
details a reader is unlikely to know about? Like me :-). How will a user of 
Lucene benefit?
   
   ```
   * GITHUB#13570, GITHUB#13574, GITHUB#13535: Avoid performance degradation 
with closing shared Arenas. Index
   files are mmapped one-to-one with the JDK's foreign shared Arena. The JVM 
deoptimizes the top few frames of
   all threads when closing a shared Arena (see JDK-8335480). We mitigate this 
situation by 1) using a confined
   Arena where appropriate, and 2) grouping files from the same segment to a 
single shared Arena.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add target search concurrency to TieredMergePolicy [lucene]

2024-07-17 Thread via GitHub


jpountz merged PR #13430:
URL: https://github.com/apache/lucene/pull/13430


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Add a MergePolicy wrapper that preserves search concurrency? [lucene]

2024-07-17 Thread via GitHub


jpountz closed issue #12877: Add a MergePolicy wrapper that preserves search 
concurrency?
URL: https://github.com/apache/lucene/issues/12877


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Compute facets while collecting [lucene]

2024-07-17 Thread via GitHub


mikemccand commented on PR #13568:
URL: https://github.com/apache/lucene/pull/13568#issuecomment-2233193543

   > Computing facets during collection is more expensive, because we need to 
collect in each searcher slice, and then merge results in 
`CollectorManager#reduce`. At the same time, it reduces latency, because 
initial collection is done in parallel and there is an opportunity to reuse 
values computed for a document and use for the doc for all the different 
collectors (drillsideways).
   
   +1, this is a nice side effect of this approach.  It would resolve 
https://github.com/apache/lucene/issues/13503.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [9.x] Ensure to use IOContext.READONCE when reading segment files [lucene]

2024-07-17 Thread via GitHub


uschindler commented on code in PR #13578:
URL: https://github.com/apache/lucene/pull/13578#discussion_r1681009450


##
lucene/test-framework/src/java/org/apache/lucene/tests/store/MockDirectoryWrapper.java:
##
@@ -814,6 +814,14 @@ public synchronized IndexInput openInput(String name, 
IOContext context) throws
 
 context = LuceneTestCase.newIOContext(randomState, context);
 final boolean confined = context == IOContext.READONCE;

Review Comment:
   i think we should maybe still make the check like this. this would cover 
more problems.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-17 Thread via GitHub


jpountz commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1680897686


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java:
##
@@ -207,65 +210,133 @@ void accumulate(long value) {
   maxValue = Math.max(maxValue, value);
 }
 
+void accumulate(SkipAccumulator other) {
+  minDocID = Math.min(minDocID, other.minDocID);
+  maxDocID = Math.max(maxDocID, other.maxDocID);
+  minValue = Math.min(minValue, other.minValue);
+  maxValue = Math.max(maxValue, other.maxValue);
+  docCount += other.docCount;
+}
+
 void nextDoc(int docID) {
   maxDocID = docID;
   ++docCount;
 }
 
-void writeTo(DataOutput output) throws IOException {
-  output.writeInt(maxDocID);
-  output.writeInt(minDocID);
-  output.writeLong(maxValue);
-  output.writeLong(minValue);
-  output.writeInt(docCount);
+public static SkipAccumulator merge(List list, int index, 
int length) {
+  SkipAccumulator acc = new SkipAccumulator(list.get(index).minDocID);
+  for (int i = 0; i < length; i++) {
+acc.accumulate(list.get(index + i));
+  }
+  return acc;
 }
   }
 
   private void writeSkipIndex(FieldInfo field, DocValuesProducer 
valuesProducer)
   throws IOException {
 assert field.hasDocValuesSkipIndex();
-// TODO: This disk compression once we introduce levels
-long start = data.getFilePointer();
-SortedNumericDocValues values = valuesProducer.getSortedNumeric(field);
+final long start = data.getFilePointer();
+final SortedNumericDocValues values = 
valuesProducer.getSortedNumeric(field);
 long globalMaxValue = Long.MIN_VALUE;
 long globalMinValue = Long.MAX_VALUE;
 int globalDocCount = 0;
 int maxDocId = -1;
+final List> accumulators = new 
ArrayList<>(SKIP_INDEX_MAX_LEVEL);
+for (int i = 0; i < SKIP_INDEX_MAX_LEVEL; i++) {
+  accumulators.add(new ArrayList<>());
+}
 SkipAccumulator accumulator = null;
-int counter = 0;
+final int maxAccumulators = 1 << (SKIP_INDEX_LEVEL_SHIFT * 
(SKIP_INDEX_MAX_LEVEL - 1));
 for (int doc = values.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc 
= values.nextDoc()) {
-  if (counter == 0) {
+  if (accumulator == null) {
 accumulator = new SkipAccumulator(doc);
+accumulators.get(0).add(accumulator);
   }
   accumulator.nextDoc(doc);
   for (int i = 0, end = values.docValueCount(); i < end; ++i) {
 accumulator.accumulate(values.nextValue());
   }
-  if (++counter == skipIndexIntervalSize) {
+  if (accumulator.docCount == skipIndexIntervalSize) {
 globalMaxValue = Math.max(globalMaxValue, accumulator.maxValue);
 globalMinValue = Math.min(globalMinValue, accumulator.minValue);
 globalDocCount += accumulator.docCount;
 maxDocId = accumulator.maxDocID;
-accumulator.writeTo(data);
-counter = 0;
+accumulator = null;
+if (accumulators.size() == maxAccumulators) {
+  writeLevels(accumulators);
+  for (List accumulatorList : accumulators) {
+accumulatorList.clear();
+  }
+}
   }
 }
 
-if (counter > 0) {
-  globalMaxValue = Math.max(globalMaxValue, accumulator.maxValue);
-  globalMinValue = Math.min(globalMinValue, accumulator.minValue);
-  globalDocCount += accumulator.docCount;
-  maxDocId = accumulator.maxDocID;
-  accumulator.writeTo(data);
+if (accumulators.isEmpty() == false) {
+  if (accumulator != null) {
+globalMaxValue = Math.max(globalMaxValue, accumulator.maxValue);
+globalMinValue = Math.min(globalMinValue, accumulator.minValue);
+globalDocCount += accumulator.docCount;
+maxDocId = accumulator.maxDocID;
+  }
+  writeLevels(accumulators);
 }
 meta.writeLong(start); // record the start in meta
 meta.writeLong(data.getFilePointer() - start); // record the length
+assert globalDocCount == 0 || globalMaxValue >= globalMinValue;
 meta.writeLong(globalMaxValue);
 meta.writeLong(globalMinValue);
+assert globalDocCount <= maxDocId + 1;
 meta.writeInt(globalDocCount);
 meta.writeInt(maxDocId);
   }
 
+  private void writeLevels(List> accumulators) throws 
IOException {
+for (int i = 1; i < accumulators.size(); i++) {
+  buildLevel(accumulators.get(i), accumulators.get(i - 1));
+}
+int totalAccumulators = accumulators.get(0).size();
+for (int index = 0; index < totalAccumulators; index++) {
+  // compute how many levels we need to write for the current accumulator
+  final int levels = getLevels(index, totalAccumulators);
+  // build the levels
+  final SkipAccumulator[] accLevels = new SkipAccumulator[levels];
+  for (int level = 0; level < levels; level++) {
+accLevels[level] =
+accumulators.get(level).get(index 

Re: [PR] [9.x] Ensure to use IOContext.READONCE when reading segment files [lucene]

2024-07-17 Thread via GitHub


ChrisHegarty commented on code in PR #13578:
URL: https://github.com/apache/lucene/pull/13578#discussion_r1681020288


##
lucene/test-framework/src/java/org/apache/lucene/tests/store/MockDirectoryWrapper.java:
##
@@ -814,6 +814,14 @@ public synchronized IndexInput openInput(String name, 
IOContext context) throws
 
 context = LuceneTestCase.newIOContext(randomState, context);
 final boolean confined = context == IOContext.READONCE;

Review Comment:
   yes, I agree. Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


ChrisHegarty commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1681055574


##
lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java:
##
@@ -125,4 +134,31 @@ private final MemorySegment[] map(
 }
 return segments;
   }
+
+  public ConcurrentHashMap attachment() {
+return new ConcurrentHashMap<>();
+  }
+
+  /**
+   * Gets an arena for the give path, potentially aggregating files from the 
same segment into a
+   * single ref counted shared arena. A ref counted shared arena, if created 
will be added to the
+   * given arenas map.
+   */
+  static Arena getSharedArena(Path p, ConcurrentHashMap arenas) {
+String filename = p.getFileName().toString();
+String segmentName = IndexFileNames.parseSegmentName(filename);
+if (filename.length() == segmentName.length()) {
+  // no segment found; just use a shared segment
+  return Arena.ofShared();

Review Comment:
   Ok, https://github.com/apache/lucene/pull/13574 is merged, and we have 
enforcement in the tests to disallow opening `segment_*` with a context other 
than `READONCE`. Unless I'm mistaken, then this concern has been addressed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [9.x] Ensure to use IOContext.READONCE when reading segment files [lucene]

2024-07-17 Thread via GitHub


ChrisHegarty merged PR #13578:
URL: https://github.com/apache/lucene/pull/13578


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-17 Thread via GitHub


iverase commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1681076840


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java:
##
@@ -207,65 +210,133 @@ void accumulate(long value) {
   maxValue = Math.max(maxValue, value);
 }
 
+void accumulate(SkipAccumulator other) {
+  minDocID = Math.min(minDocID, other.minDocID);
+  maxDocID = Math.max(maxDocID, other.maxDocID);
+  minValue = Math.min(minValue, other.minValue);
+  maxValue = Math.max(maxValue, other.maxValue);
+  docCount += other.docCount;
+}
+
 void nextDoc(int docID) {
   maxDocID = docID;
   ++docCount;
 }
 
-void writeTo(DataOutput output) throws IOException {
-  output.writeInt(maxDocID);
-  output.writeInt(minDocID);
-  output.writeLong(maxValue);
-  output.writeLong(minValue);
-  output.writeInt(docCount);
+public static SkipAccumulator merge(List list, int index, 
int length) {
+  SkipAccumulator acc = new SkipAccumulator(list.get(index).minDocID);
+  for (int i = 0; i < length; i++) {
+acc.accumulate(list.get(index + i));
+  }
+  return acc;
 }
   }
 
   private void writeSkipIndex(FieldInfo field, DocValuesProducer 
valuesProducer)
   throws IOException {
 assert field.hasDocValuesSkipIndex();
-// TODO: This disk compression once we introduce levels
-long start = data.getFilePointer();
-SortedNumericDocValues values = valuesProducer.getSortedNumeric(field);
+final long start = data.getFilePointer();
+final SortedNumericDocValues values = 
valuesProducer.getSortedNumeric(field);
 long globalMaxValue = Long.MIN_VALUE;
 long globalMinValue = Long.MAX_VALUE;
 int globalDocCount = 0;
 int maxDocId = -1;
+final List> accumulators = new 
ArrayList<>(SKIP_INDEX_MAX_LEVEL);
+for (int i = 0; i < SKIP_INDEX_MAX_LEVEL; i++) {
+  accumulators.add(new ArrayList<>());
+}
 SkipAccumulator accumulator = null;
-int counter = 0;
+final int maxAccumulators = 1 << (SKIP_INDEX_LEVEL_SHIFT * 
(SKIP_INDEX_MAX_LEVEL - 1));
 for (int doc = values.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc 
= values.nextDoc()) {
-  if (counter == 0) {
+  if (accumulator == null) {
 accumulator = new SkipAccumulator(doc);
+accumulators.get(0).add(accumulator);
   }
   accumulator.nextDoc(doc);
   for (int i = 0, end = values.docValueCount(); i < end; ++i) {
 accumulator.accumulate(values.nextValue());
   }
-  if (++counter == skipIndexIntervalSize) {
+  if (accumulator.docCount == skipIndexIntervalSize) {
 globalMaxValue = Math.max(globalMaxValue, accumulator.maxValue);
 globalMinValue = Math.min(globalMinValue, accumulator.minValue);
 globalDocCount += accumulator.docCount;
 maxDocId = accumulator.maxDocID;
-accumulator.writeTo(data);
-counter = 0;
+accumulator = null;
+if (accumulators.size() == maxAccumulators) {

Review Comment:
   I removed the List of List so this is not an issue any more 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-17 Thread via GitHub


iverase commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1681079162


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java:
##
@@ -207,65 +210,133 @@ void accumulate(long value) {
   maxValue = Math.max(maxValue, value);
 }
 
+void accumulate(SkipAccumulator other) {
+  minDocID = Math.min(minDocID, other.minDocID);
+  maxDocID = Math.max(maxDocID, other.maxDocID);

Review Comment:
   Done. The assert need to be `minDocID <= other.minDocId && maxDocId < 
other.maxDocID)` because we initializr the accumulator with the minDocID



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-17 Thread via GitHub


iverase commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1681080290


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java:
##
@@ -207,65 +210,133 @@ void accumulate(long value) {
   maxValue = Math.max(maxValue, value);
 }
 
+void accumulate(SkipAccumulator other) {
+  minDocID = Math.min(minDocID, other.minDocID);
+  maxDocID = Math.max(maxDocID, other.maxDocID);
+  minValue = Math.min(minValue, other.minValue);
+  maxValue = Math.max(maxValue, other.maxValue);
+  docCount += other.docCount;
+}
+
 void nextDoc(int docID) {
   maxDocID = docID;
   ++docCount;
 }
 
-void writeTo(DataOutput output) throws IOException {
-  output.writeInt(maxDocID);
-  output.writeInt(minDocID);
-  output.writeLong(maxValue);
-  output.writeLong(minValue);
-  output.writeInt(docCount);
+public static SkipAccumulator merge(List list, int index, 
int length) {
+  SkipAccumulator acc = new SkipAccumulator(list.get(index).minDocID);
+  for (int i = 0; i < length; i++) {
+acc.accumulate(list.get(index + i));
+  }
+  return acc;
 }
   }
 
   private void writeSkipIndex(FieldInfo field, DocValuesProducer 
valuesProducer)
   throws IOException {
 assert field.hasDocValuesSkipIndex();
-// TODO: This disk compression once we introduce levels
-long start = data.getFilePointer();
-SortedNumericDocValues values = valuesProducer.getSortedNumeric(field);
+final long start = data.getFilePointer();
+final SortedNumericDocValues values = 
valuesProducer.getSortedNumeric(field);
 long globalMaxValue = Long.MIN_VALUE;
 long globalMinValue = Long.MAX_VALUE;
 int globalDocCount = 0;
 int maxDocId = -1;
+final List> accumulators = new 
ArrayList<>(SKIP_INDEX_MAX_LEVEL);

Review Comment:
   yes, this list of list is awful. I was thinking in reusing ti but it creates 
more issues that it helps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-17 Thread via GitHub


iverase commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1681080747


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java:
##
@@ -207,65 +210,133 @@ void accumulate(long value) {
   maxValue = Math.max(maxValue, value);
 }
 
+void accumulate(SkipAccumulator other) {
+  minDocID = Math.min(minDocID, other.minDocID);
+  maxDocID = Math.max(maxDocID, other.maxDocID);
+  minValue = Math.min(minValue, other.minValue);
+  maxValue = Math.max(maxValue, other.maxValue);
+  docCount += other.docCount;
+}
+
 void nextDoc(int docID) {
   maxDocID = docID;
   ++docCount;
 }
 
-void writeTo(DataOutput output) throws IOException {
-  output.writeInt(maxDocID);
-  output.writeInt(minDocID);
-  output.writeLong(maxValue);
-  output.writeLong(minValue);
-  output.writeInt(docCount);
+public static SkipAccumulator merge(List list, int index, 
int length) {
+  SkipAccumulator acc = new SkipAccumulator(list.get(index).minDocID);
+  for (int i = 0; i < length; i++) {
+acc.accumulate(list.get(index + i));
+  }
+  return acc;
 }
   }
 
   private void writeSkipIndex(FieldInfo field, DocValuesProducer 
valuesProducer)
   throws IOException {
 assert field.hasDocValuesSkipIndex();
-// TODO: This disk compression once we introduce levels
-long start = data.getFilePointer();
-SortedNumericDocValues values = valuesProducer.getSortedNumeric(field);
+final long start = data.getFilePointer();
+final SortedNumericDocValues values = 
valuesProducer.getSortedNumeric(field);
 long globalMaxValue = Long.MIN_VALUE;
 long globalMinValue = Long.MAX_VALUE;
 int globalDocCount = 0;
 int maxDocId = -1;
+final List> accumulators = new 
ArrayList<>(SKIP_INDEX_MAX_LEVEL);
+for (int i = 0; i < SKIP_INDEX_MAX_LEVEL; i++) {
+  accumulators.add(new ArrayList<>());
+}
 SkipAccumulator accumulator = null;
-int counter = 0;
+final int maxAccumulators = 1 << (SKIP_INDEX_LEVEL_SHIFT * 
(SKIP_INDEX_MAX_LEVEL - 1));
 for (int doc = values.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc 
= values.nextDoc()) {
-  if (counter == 0) {
+  if (accumulator == null) {
 accumulator = new SkipAccumulator(doc);
+accumulators.get(0).add(accumulator);
   }
   accumulator.nextDoc(doc);
   for (int i = 0, end = values.docValueCount(); i < end; ++i) {
 accumulator.accumulate(values.nextValue());
   }
-  if (++counter == skipIndexIntervalSize) {
+  if (accumulator.docCount == skipIndexIntervalSize) {
 globalMaxValue = Math.max(globalMaxValue, accumulator.maxValue);
 globalMinValue = Math.min(globalMinValue, accumulator.minValue);
 globalDocCount += accumulator.docCount;
 maxDocId = accumulator.maxDocID;
-accumulator.writeTo(data);
-counter = 0;
+accumulator = null;
+if (accumulators.size() == maxAccumulators) {
+  writeLevels(accumulators);
+  for (List accumulatorList : accumulators) {
+accumulatorList.clear();
+  }
+}
   }
 }
 
-if (counter > 0) {
-  globalMaxValue = Math.max(globalMaxValue, accumulator.maxValue);
-  globalMinValue = Math.min(globalMinValue, accumulator.minValue);
-  globalDocCount += accumulator.docCount;
-  maxDocId = accumulator.maxDocID;
-  accumulator.writeTo(data);
+if (accumulators.isEmpty() == false) {
+  if (accumulator != null) {
+globalMaxValue = Math.max(globalMaxValue, accumulator.maxValue);
+globalMinValue = Math.min(globalMinValue, accumulator.minValue);
+globalDocCount += accumulator.docCount;
+maxDocId = accumulator.maxDocID;
+  }
+  writeLevels(accumulators);
 }
 meta.writeLong(start); // record the start in meta
 meta.writeLong(data.getFilePointer() - start); // record the length
+assert globalDocCount == 0 || globalMaxValue >= globalMinValue;
 meta.writeLong(globalMaxValue);
 meta.writeLong(globalMinValue);
+assert globalDocCount <= maxDocId + 1;
 meta.writeInt(globalDocCount);
 meta.writeInt(maxDocId);
   }
 
+  private void writeLevels(List> accumulators) throws 
IOException {
+for (int i = 1; i < accumulators.size(); i++) {
+  buildLevel(accumulators.get(i), accumulators.get(i - 1));
+}
+int totalAccumulators = accumulators.get(0).size();
+for (int index = 0; index < totalAccumulators; index++) {
+  // compute how many levels we need to write for the current accumulator
+  final int levels = getLevels(index, totalAccumulators);
+  // build the levels
+  final SkipAccumulator[] accLevels = new SkipAccumulator[levels];
+  for (int level = 0; level < levels; level++) {
+accLevels[level] =
+accumulators.get(level).get(index 

Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-17 Thread via GitHub


iverase commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1681081567


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java:
##
@@ -207,65 +210,133 @@ void accumulate(long value) {
   maxValue = Math.max(maxValue, value);
 }
 
+void accumulate(SkipAccumulator other) {
+  minDocID = Math.min(minDocID, other.minDocID);
+  maxDocID = Math.max(maxDocID, other.maxDocID);
+  minValue = Math.min(minValue, other.minValue);
+  maxValue = Math.max(maxValue, other.maxValue);
+  docCount += other.docCount;
+}
+
 void nextDoc(int docID) {
   maxDocID = docID;
   ++docCount;
 }
 
-void writeTo(DataOutput output) throws IOException {
-  output.writeInt(maxDocID);
-  output.writeInt(minDocID);
-  output.writeLong(maxValue);
-  output.writeLong(minValue);
-  output.writeInt(docCount);
+public static SkipAccumulator merge(List list, int index, 
int length) {
+  SkipAccumulator acc = new SkipAccumulator(list.get(index).minDocID);
+  for (int i = 0; i < length; i++) {
+acc.accumulate(list.get(index + i));
+  }
+  return acc;
 }
   }
 
   private void writeSkipIndex(FieldInfo field, DocValuesProducer 
valuesProducer)
   throws IOException {
 assert field.hasDocValuesSkipIndex();
-// TODO: This disk compression once we introduce levels
-long start = data.getFilePointer();
-SortedNumericDocValues values = valuesProducer.getSortedNumeric(field);
+final long start = data.getFilePointer();
+final SortedNumericDocValues values = 
valuesProducer.getSortedNumeric(field);
 long globalMaxValue = Long.MIN_VALUE;
 long globalMinValue = Long.MAX_VALUE;
 int globalDocCount = 0;
 int maxDocId = -1;
+final List> accumulators = new 
ArrayList<>(SKIP_INDEX_MAX_LEVEL);
+for (int i = 0; i < SKIP_INDEX_MAX_LEVEL; i++) {
+  accumulators.add(new ArrayList<>());
+}
 SkipAccumulator accumulator = null;
-int counter = 0;
+final int maxAccumulators = 1 << (SKIP_INDEX_LEVEL_SHIFT * 
(SKIP_INDEX_MAX_LEVEL - 1));
 for (int doc = values.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc 
= values.nextDoc()) {
-  if (counter == 0) {
+  if (accumulator == null) {
 accumulator = new SkipAccumulator(doc);
+accumulators.get(0).add(accumulator);
   }
   accumulator.nextDoc(doc);
   for (int i = 0, end = values.docValueCount(); i < end; ++i) {
 accumulator.accumulate(values.nextValue());
   }
-  if (++counter == skipIndexIntervalSize) {
+  if (accumulator.docCount == skipIndexIntervalSize) {
 globalMaxValue = Math.max(globalMaxValue, accumulator.maxValue);
 globalMinValue = Math.min(globalMinValue, accumulator.minValue);
 globalDocCount += accumulator.docCount;
 maxDocId = accumulator.maxDocID;
-accumulator.writeTo(data);
-counter = 0;
+accumulator = null;
+if (accumulators.size() == maxAccumulators) {
+  writeLevels(accumulators);
+  for (List accumulatorList : accumulators) {
+accumulatorList.clear();
+  }
+}
   }
 }
 
-if (counter > 0) {
-  globalMaxValue = Math.max(globalMaxValue, accumulator.maxValue);
-  globalMinValue = Math.min(globalMinValue, accumulator.minValue);
-  globalDocCount += accumulator.docCount;
-  maxDocId = accumulator.maxDocID;
-  accumulator.writeTo(data);
+if (accumulators.isEmpty() == false) {
+  if (accumulator != null) {
+globalMaxValue = Math.max(globalMaxValue, accumulator.maxValue);
+globalMinValue = Math.min(globalMinValue, accumulator.minValue);
+globalDocCount += accumulator.docCount;
+maxDocId = accumulator.maxDocID;
+  }
+  writeLevels(accumulators);
 }
 meta.writeLong(start); // record the start in meta
 meta.writeLong(data.getFilePointer() - start); // record the length
+assert globalDocCount == 0 || globalMaxValue >= globalMinValue;
 meta.writeLong(globalMaxValue);
 meta.writeLong(globalMinValue);
+assert globalDocCount <= maxDocId + 1;
 meta.writeInt(globalDocCount);
 meta.writeInt(maxDocId);
   }
 
+  private void writeLevels(List> accumulators) throws 
IOException {
+for (int i = 1; i < accumulators.size(); i++) {
+  buildLevel(accumulators.get(i), accumulators.get(i - 1));
+}
+int totalAccumulators = accumulators.get(0).size();
+for (int index = 0; index < totalAccumulators; index++) {
+  // compute how many levels we need to write for the current accumulator
+  final int levels = getLevels(index, totalAccumulators);
+  // build the levels
+  final SkipAccumulator[] accLevels = new SkipAccumulator[levels];

Review Comment:
   yes, I did not realize when I modified the way we were writing the levels



-- 
This is an automated message

Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-17 Thread via GitHub


iverase commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1681082271


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java:
##
@@ -1792,61 +1794,91 @@ public DocValuesSkipper getSkipper(FieldInfo field) 
throws IOException {
 if (input.length() > 0) {
   input.prefetch(0, 1);
 }
+// TODO: should we write to disk the actual max level for this segment?
 return new DocValuesSkipper() {
-  int minDocID = -1;
-  int maxDocID = -1;
-  long minValue, maxValue;
-  int docCount;
+  final int[] minDocID = new int[SKIP_INDEX_MAX_LEVEL];
+  final int[] maxDocID = new int[SKIP_INDEX_MAX_LEVEL];
+
+  {
+for (int i = 0; i < SKIP_INDEX_MAX_LEVEL; i++) {
+  minDocID[i] = maxDocID[i] = -1;
+}
+  }
+
+  final long[] minValue = new long[SKIP_INDEX_MAX_LEVEL];
+  final long[] maxValue = new long[SKIP_INDEX_MAX_LEVEL];
+  final int[] docCount = new int[SKIP_INDEX_MAX_LEVEL];
+  int levels;
 
   @Override
   public void advance(int target) throws IOException {
 if (target > entry.maxDocId) {
-  minDocID = DocIdSetIterator.NO_MORE_DOCS;
-  maxDocID = DocIdSetIterator.NO_MORE_DOCS;
+  // skipper is exhausted
+  for (int i = 0; i < SKIP_INDEX_MAX_LEVEL; i++) {
+minDocID[i] = maxDocID[i] = DocIdSetIterator.NO_MORE_DOCS;
+  }
 } else {
+  // find next interval
+  assert target > maxDocID[0] : "target must be bigger that current 
interval";
   while (true) {
-maxDocID = input.readInt();
-if (maxDocID >= target) {
-  minDocID = input.readInt();
-  maxValue = input.readLong();
-  minValue = input.readLong();
-  docCount = input.readInt();
+levels = input.readByte();
+assert levels <= SKIP_INDEX_MAX_LEVEL && levels > 0
+: "level out of range [" + levels + "]";
+boolean competitive = true;
+// check if current interval is competitive or we can jump to the 
next position
+for (int level = levels - 1; level >= 0; level--) {
+  if ((maxDocID[level] = input.readInt()) < target) {
+input.skipBytes(SKIP_INDEX_JUMP_LENGTH_PER_LEVEL[level]); // 
the jump for the level
+competitive = false;
+break;
+  }
+  minDocID[level] = input.readInt();
+  maxValue[level] = input.readLong();
+  minValue[level] = input.readLong();
+  docCount[level] = input.readInt();
+}
+if (competitive) {
+  // adjust levels
+  while (levels < SKIP_INDEX_MAX_LEVEL) {
+if (maxDocID[levels] == -1 || maxDocID[levels] < target) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-17 Thread via GitHub


iverase commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1681083206


##
lucene/test-framework/src/java/org/apache/lucene/tests/index/AssertingLeafReader.java:
##
@@ -1194,24 +1194,27 @@ public int numLevels() {
 @Override
 public int minDocID(int level) {
   assertThread("Doc values skipper", creationThread);
-  Objects.checkIndex(level, numLevels());
   int minDocID = in.minDocID(level);
   assert minDocID <= in.maxDocID(level);
-  if (level > 0) {
-assert minDocID <= in.minDocID(level - 1);
+  if (minDocID != -1 && minDocID != DocIdSetIterator.NO_MORE_DOCS) {
+Objects.checkIndex(level, numLevels());
+  }
+  for (int i = 0; i < level; i++) {
+assert minDocID >= in.minDocID(i);

Review Comment:
   exactly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-17 Thread via GitHub


iverase commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1681096883


##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -3301,17 +3301,17 @@ private static void checkDocValueSkipper(FieldInfo fi, 
DocValuesSkipper skipper)
   if (skipper.maxDocID(0) == NO_MORE_DOCS) {
 break;
   }
+  if (skipper.minDocID(0) < doc) {
+throw new CheckIndexException(
+"skipper dv iterator for field: "
++ fieldName
++ " reports wrong minDocID, got "
++ skipper.minDocID(0)
++ " < "
++ doc);
+  }
   int levels = skipper.numLevels();
   for (int level = 0; level < levels; level++) {
-if (skipper.minDocID(level) < doc) {
-  throw new CheckIndexException(
-  "skipper dv iterator for field: "
-  + fieldName
-  + " reports wrong minDocID, got "
-  + skipper.minDocID(level)
-  + " < "
-  + doc);
-}

Review Comment:
   The current condition is `skipper.minDocID(level) < doc` where the doc is 
the provided doc on advance. So the provided doc needs to be lower than or 
equal to the minDocID not to trigger the error.  I
   
   It checks we have advance to the closer interval forward. This check is 
strange as it is only valid for implementations with exact bounds and it is 
clearly not valid for levels above 0.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-17 Thread via GitHub


jpountz commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1681097434


##
lucene/test-framework/src/java/org/apache/lucene/tests/index/AssertingLeafReader.java:
##
@@ -1194,24 +1194,27 @@ public int numLevels() {
 @Override
 public int minDocID(int level) {
   assertThread("Doc values skipper", creationThread);
-  Objects.checkIndex(level, numLevels());
   int minDocID = in.minDocID(level);
   assert minDocID <= in.maxDocID(level);
-  if (level > 0) {
-assert minDocID <= in.minDocID(level - 1);
+  if (minDocID != -1 && minDocID != DocIdSetIterator.NO_MORE_DOCS) {
+Objects.checkIndex(level, numLevels());
+  }
+  for (int i = 0; i < level; i++) {
+assert minDocID >= in.minDocID(i);

Review Comment:
   Could we make `numLevels` return 1 to avoid this issue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add levels to DocValues skipper index [lucene]

2024-07-17 Thread via GitHub


iverase commented on code in PR #13563:
URL: https://github.com/apache/lucene/pull/13563#discussion_r1681112760


##
lucene/test-framework/src/java/org/apache/lucene/tests/index/AssertingLeafReader.java:
##
@@ -1194,24 +1194,27 @@ public int numLevels() {
 @Override
 public int minDocID(int level) {
   assertThread("Doc values skipper", creationThread);
-  Objects.checkIndex(level, numLevels());
   int minDocID = in.minDocID(level);
   assert minDocID <= in.maxDocID(level);
-  if (level > 0) {
-assert minDocID <= in.minDocID(level - 1);
+  if (minDocID != -1 && minDocID != DocIdSetIterator.NO_MORE_DOCS) {
+Objects.checkIndex(level, numLevels());
+  }
+  for (int i = 0; i < level; i++) {
+assert minDocID >= in.minDocID(i);

Review Comment:
   yes, done and revert changes here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Are we properly accounting for `NeighborArray.rwlock`? [lucene]

2024-07-17 Thread via GitHub


msokolov opened a new issue, #13580:
URL: https://github.com/apache/lucene/issues/13580

   ### Description
   
   Really two issues:
   1. `OnHeapHnswGraph.ramBytesUsed` has a complicated job - it's hard to tell 
whether it takes into account the probably quite significant RAM usage of the 
`ReentrantReadWriteLock` that gets created for every node in the graph.
   2. We create these locks even in the case where we are flushing (via 
`HnswGraphBuilder`) and don't ever use them. We only need locks available when 
merging
   3. We could hash the node ids and share a smaller pool of locks among them 
and thus save a bunch of allocations 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


magibney commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2233474047

   This is looking good. I think making it possible for MMapDirectory to 
customize/bypass grouping is still important -- something along the lines of 
what Uwe was thinking in [these](#issuecomment-2228827755) 
[comments](#issuecomment-2228845198):
   >>Maybe the file pattern should really be a Predicate to separate the 
implementation. Maybe add a method to return protected Optional 
getArenaGroupingKey(String filename) (or as Function>), 
which returns empty optional if no grouping allowed or the segment number 
otherwise as key.
   >
   >This could be similarily implemented like the 
setPreload(BiPredicate) method.
   
   >It could even be simpler by passing the grouping key to the provider method 
directly in addition to filename. Then the code that figures out which group is 
used can reside in MMapDirectory only.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


uschindler commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2233498077

   I am fine with everything - BUT: we should still check how the grouping 
works when for an existing segment an additional `*.del` file is updated (same 
for softdeletes and docvalues updates). If it is treated as same segment, at 
least for del files it would not cause too big issues (because those are quite 
small), but if we have in-place atomic updates, we could have files already 
closed and deleted, but arena still there.
   
   I will do some checks on Windows which is very sensitive with deleting still 
open files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Are we properly accounting for `NeighborArray.rwlock`? [lucene]

2024-07-17 Thread via GitHub


msokolov commented on issue #13580:
URL: https://github.com/apache/lucene/issues/13580#issuecomment-2233498918

   I also found we are acquiring these locks for writing in 
`HnswGraphBuilder.addDiverseNeighbors` even in the single-threaded case where 
it is not needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Compute facets while collecting [lucene]

2024-07-17 Thread via GitHub


epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1681197072


##
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java:
##
@@ -252,9 +213,9 @@ public int hashCode() {
 final int prime = 31;

Review Comment:
   I'm not sure what are benefits of doing deep hashCode in this case? But I 
believe we are not changing the behavior with this change, both 
FacetsCollectorManager and CollectorOwner don't override default hashCode 
method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


uschindler commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2233534218

   On Windows I got the following test error - kindly what I expected (Windows 
is not allowed to delete files which are still open or mmapped):
   
   ```
   TestIndexWriter > testDeleteUnusedFiles FAILED
   java.lang.AssertionError
   at 
__randomizedtesting.SeedInfo.seed([6C3A14A35BBB8424:D30C462C33E329CB]:0)
   at org.junit.Assert.fail(Assert.java:87)
   at org.junit.Assert.assertTrue(Assert.java:42)
   at org.junit.Assert.assertTrue(Assert.java:53)
   at 
org.apache.lucene.index.TestIndexWriter.testDeleteUnusedFiles(TestIndexWriter.java:1303)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


ChrisHegarty commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2233544810

   > I am fine with everything - BUT: we should still check how the grouping 
works when for an existing segment an additional `*.del` file is updated (same 
for softdeletes and docvalues updates). If it is treated as same segment, at 
least for del files it would not cause too big issues (because those are quite 
small), but if we have in-place atomic updates, we could have files already 
closed and deleted, but arena still there.
   > 
   > I will do some checks on Windows which is very sensitive with deleting 
still open files.
   
   Same, I checking too.  Additionally, I'm still sketching out the few options 
around API support for a `groupingFunction`, etc, to allow for an opt-out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


uschindler commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2233547375

   It fails reproducible. Let me look into it :-) Not sure what the issue is 
(if it is deletes, softdeletes or docvalues updates).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


ChrisHegarty commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2233549425

   > On Windows I got the following test error - actually what I expected 
(Windows is not allowed to delete files which are still open or mmapped
   
   Ha! you're too fast. I'm still spinning up a VM to test Windows.
   
   > It fails reproducible. Let me look into it :-) Not sure what the issue is 
(if it is deletes, softdeletes or docvalues updates).
   
   Thanks. It's with you. Appreciate it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Expose flat vectors in "user space" [lucene]

2024-07-17 Thread via GitHub


benwtrent commented on issue #13468:
URL: https://github.com/apache/lucene/issues/13468#issuecomment-2233567273

   Maybe this is done?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Are we properly accounting for `NeighborArray.rwlock`? [lucene]

2024-07-17 Thread via GitHub


benwtrent commented on issue #13580:
URL: https://github.com/apache/lucene/issues/13580#issuecomment-2233568004

   @msokolov related: https://github.com/apache/lucene/issues/12732


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


uschindler commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2233584894

   > It fails reproducible. Let me look into it :-) Not sure what the issue is 
(if it is deletes, softdeletes or docvalues updates).
   
   This test failure is "interesting". This test enforces MMapDirectory and it 
actually tests that the file can't be deleted. It is a CFS file
   
   ```
 // NOTE: here we rely on "Windows" behavior, i.e. even though IW 
wanted to delete _0.cfs
 // since it was merged away, because we have a reader open against 
this file,
 // it should still be here:
 assertTrue(Files.exists(indexPath.resolve("_0.cfs")));
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


uschindler commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2233610476

   Test also fails in main branch, so this is not new. Must have been to do 
with our previous changes. Interestingly it does not fail on [Policeman 
Jenkins](https://jenkins.thetaphi.de/job/Lucene-main-Windows/). I need more 
beers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestSnapshotDeletionPolicy#testMultiThreadedSnapshotting assertion failure [lucene]

2024-07-17 Thread via GitHub


aoli-al commented on issue #13571:
URL: https://github.com/apache/lucene/issues/13571#issuecomment-2233656919

   Hi @jpountz and @benwtrent, I saw your previous discussion about the test 
failure. Do you think this patch will help you to replay the failure? Also, let 
me know if you have any questions 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


uschindler commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2233693234

   Hi, this failure is unrelated. The difference is: Locally I use Windows 11, 
Jenkins uses Windows 10. Actually it looks like on Windows 11 it is in fact 
possible to delete memory mapped files.
   
   I have to confirm this (something rigs a bell in my head).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Compute facets while collecting [lucene]

2024-07-17 Thread via GitHub


mikemccand commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1681230468


##
lucene/core/src/java/org/apache/lucene/search/CollectorOwner.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * Like {@link CollectorManager}, but it owns the collectors its manager 
creates. It is convenient
+ * that clients of the class don't have to worry about keeping the list of 
collectors, as well as
+ * about making the collectors type (C) compatible when reduce is called. 
Instance of this class
+ * also caches results of {@link CollectorManager#reduce(Collection)}.

Review Comment:
   `also caches` -> `also cache`?



##
lucene/core/src/java/org/apache/lucene/search/CollectorOwner.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * Like {@link CollectorManager}, but it owns the collectors its manager 
creates. It is convenient
+ * that clients of the class don't have to worry about keeping the list of 
collectors, as well as
+ * about making the collectors type (C) compatible when reduce is called. 
Instance of this class

Review Comment:
   `collectors` -> `collector's`?
   
   `instance` -> `instances`?



##
lucene/core/src/java/org/apache/lucene/search/CollectorOwner.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * Like {@link CollectorManager}, but it owns the collectors its manager 
creates. It is convenient

Review Comment:
   Maybe also explain that it wraps an existing (already created) 
`CollectorManager`?



##
lucene/core/src/java/org/apache/lucene/search/CollectorOwner.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the Li

Re: [I] Are we properly accounting for `NeighborArray.rwlock`? [lucene]

2024-07-17 Thread via GitHub


msokolov commented on issue #13580:
URL: https://github.com/apache/lucene/issues/13580#issuecomment-2233756360

   ooh thanks @benwtrent I had forgotten about that. I'm working up a new 
version based on (3) above that I hope will reduce this usage and only require 
it for concurrent mergers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


uschindler commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2233938475

   So somebody what has more knowledge about how the filenames of deletion 
and/or softdeletes/docvalues instant update files are generated should help us.
   
   I think deleteion files should not be a problem, as those are READONCE, but 
for the docvalues updates we may possibly need to have them on the exclusion 
list for grouping.
   
   I have no time to look into this now, I will check how to fix the test with 
Windows 11 and possibly setup a Windows 11 node on Policeman Jenkins.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Expose flat vectors in "user space" [lucene]

2024-07-17 Thread via GitHub


msokolov commented on issue #13468:
URL: https://github.com/apache/lucene/issues/13468#issuecomment-2233960860

   Yes, thanks - I'll resolve


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Expose flat vectors in "user space" [lucene]

2024-07-17 Thread via GitHub


msokolov closed issue #13468: Expose flat vectors in "user space"
URL: https://github.com/apache/lucene/issues/13468


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestSnapshotDeletionPolicy#testMultiThreadedSnapshotting assertion failure [lucene]

2024-07-17 Thread via GitHub


benwtrent commented on issue #13571:
URL: https://github.com/apache/lucene/issues/13571#issuecomment-2233982449

   This is indeed interesting @aoli-al I will have to dig deeper to see. This 
part of the codebase is...opaque to say the least :/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Gradle build: cleanup of dependency resolution and consolidation of dependency versions [lucene]

2024-07-17 Thread via GitHub


dsmiley commented on code in PR #13484:
URL: https://github.com/apache/lucene/pull/13484#discussion_r1681635416


##
versions.lock:
##


Review Comment:
   Overall; thanks for doing this @dweiss !
   
   What script generates versions.lock?  The "because" sections look mysterious.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


ChrisHegarty commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2234125875

   I added a grouping function to the API - see 
[4591f7f](https://github.com/apache/lucene/pull/13570/commits/4591f7f904f0f76d4ef3fa5b4accee184183ddf1)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


ChrisHegarty commented on PR #13570:
URL: https://github.com/apache/lucene/pull/13570#issuecomment-2234130516

   > So somebody what has more knowledge about how the filenames of deletion 
and/or softdeletes/docvalues instant update files are generated should help us.
   > 
   > I think deleteion files should not be a problem, as those are READONCE, 
but for the docvalues updates we may possibly need to have them on the 
exclusion list for grouping.
   
   I'll try to look into this, and also get someone to help, if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Compute facets while collecting [lucene]

2024-07-17 Thread via GitHub


epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1681693916


##
lucene/sandbox/src/test/org/apache/lucene/sandbox/facet/SandboxFacetTestCase.java:
##
@@ -0,0 +1,407 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.facet.FacetResult;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsCollector.MatchingDocs;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.LabelAndValue;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.TaxonomyFacetLabels;
+import org.apache.lucene.facet.taxonomy.TaxonomyFacetLabels.FacetLabelReader;
+import org.apache.lucene.facet.taxonomy.TaxonomyReader;
+import org.apache.lucene.sandbox.facet.abstracts.OrdLabelBiMap;
+import org.apache.lucene.sandbox.facet.abstracts.OrdToComparable;
+import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator;
+import org.apache.lucene.sandbox.facet.ordinal_iterators.TopnOrdinalIterator;
+import org.apache.lucene.sandbox.facet.recorders.CountFacetRecorder;
+import 
org.apache.lucene.sandbox.facet.taxonomy.TaxonomyChildrenOrdinalIterator;
+import org.apache.lucene.sandbox.facet.taxonomy.TaxonomyOrdLabelBiMap;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.BytesRef;
+
+public abstract class SandboxFacetTestCase extends LuceneTestCase {
+  // we don't have access to overall count for all facets from count recorder,
+  // and we can't compute it as a SUM of values for each facet ID because we 
need to respect cases
+  // where
+  // the same doc belongs to multiple facets (e.g. overlapping ranges and
+  // multi value fields). We can add an extra range that includes everything,
+  // or consider supporting overall count in CountFacetRecorder. But it is not 
exactly the value
+  // we can get now, as this value wouldn't respect top-n cutoff. Is this 
value a must have facets
+  // feature?
+  static final int VALUE_CANT_BE_COMPUTED = -5;

Review Comment:
   > Also, what is out plan to address this permanently?
   
   I've changed the comment back to TODO and added some details, WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Compute facets while collecting [lucene]

2024-07-17 Thread via GitHub


epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1681713454


##
lucene/sandbox/src/test/org/apache/lucene/sandbox/facet/SandboxFacetTestCase.java:
##
@@ -0,0 +1,407 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.facet.FacetResult;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsCollector.MatchingDocs;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.LabelAndValue;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.TaxonomyFacetLabels;
+import org.apache.lucene.facet.taxonomy.TaxonomyFacetLabels.FacetLabelReader;
+import org.apache.lucene.facet.taxonomy.TaxonomyReader;
+import org.apache.lucene.sandbox.facet.abstracts.OrdLabelBiMap;
+import org.apache.lucene.sandbox.facet.abstracts.OrdToComparable;
+import org.apache.lucene.sandbox.facet.abstracts.OrdinalIterator;
+import org.apache.lucene.sandbox.facet.ordinal_iterators.TopnOrdinalIterator;
+import org.apache.lucene.sandbox.facet.recorders.CountFacetRecorder;
+import 
org.apache.lucene.sandbox.facet.taxonomy.TaxonomyChildrenOrdinalIterator;
+import org.apache.lucene.sandbox.facet.taxonomy.TaxonomyOrdLabelBiMap;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.BytesRef;
+
+public abstract class SandboxFacetTestCase extends LuceneTestCase {
+  // we don't have access to overall count for all facets from count recorder,
+  // and we can't compute it as a SUM of values for each facet ID because we 
need to respect cases
+  // where
+  // the same doc belongs to multiple facets (e.g. overlapping ranges and
+  // multi value fields). We can add an extra range that includes everything,
+  // or consider supporting overall count in CountFacetRecorder. But it is not 
exactly the value
+  // we can get now, as this value wouldn't respect top-n cutoff. Is this 
value a must have facets
+  // feature?
+  static final int VALUE_CANT_BE_COMPUTED = -5;
+
+  /**
+   * Utility method that uses {@link FacetLabelReader} to get facet labels for 
each hit in {@link
+   * MatchingDocs}. The method returns {@code List>} where 
outer list has one entry
+   * per document and inner list has all {@link FacetLabel} entries that 
belong to a document. The
+   * inner list may be empty if no {@link FacetLabel} are found for a hit.
+   *
+   * @param taxoReader {@link TaxonomyReader} used to read taxonomy during 
search. This instance is
+   * expected to be open for reading.
+   * @param fc {@link FacetsCollector} A collector with matching hits.
+   * @param dimension facet dimension for which labels are requested. A null 
value fetches labels
+   * for all dimensions.
+   * @return {@code List} where outer list has one non-null 
entry per document. and
+   * inner list contain all {@link FacetLabel} entries that belong to a 
document.
+   * @throws IOException when a low-level IO issue occurs.
+   */
+  public List> getAllTaxonomyFacetLabels(
+  String dimension, TaxonomyReader taxoReader, FacetsCollector fc) throws 
IOException {
+List> actualLabels = new ArrayList<>();
+TaxonomyFacetLabels taxoLabels =
+new TaxonomyFacetLabels(taxoReader, 
FacetsConfig.DEFAULT_INDEX_FIELD_NAME);
+for (MatchingDocs m : fc.getMatchingDocs()) {
+  FacetLabelReader facetLabelReader = 
taxoLabels.getFacetLabelReader(m.context);
+  DocIdSetIterator disi = m.bits.iterator();
+  while (disi.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
+actualLabels.add(allFacetLabels(disi.docID(), dimension, 
facetLabelReader));
+  }
+}
+return actualLabels;
+  }
+
+  /**
+   * Utility method to get all facet labels for an input docId and dimension 
using the supplied
+   * {@link FacetLabelReader}.
+   

Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


magibney commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1681711345


##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -83,6 +86,19 @@ public class MMapDirectory extends FSDirectory {
*/
   public static final BiPredicate NO_FILES = (filename, 
context) -> false;
 
+  /** Argument for {@link #setGroupingFunction(Function)} that configures no 
grouping. */
+  public static final Function> NO_GROUPING = 
filename -> Optional.empty();
+
+  /** Argument for {@link #setGroupingFunction(Function)} that configures 
grouping by segment. */
+  public static final Function> GROUP_BY_SEGMENT =
+  filename -> {
+String segmentName = IndexFileNames.parseSegmentName(filename);
+if (filename.length() == segmentName.length()) {
+  return Optional.empty();
+}
+return Optional.of(segmentName);
+  };

Review Comment:
   I think Uwe had floated the idea of doing extra validation that we're 
actually getting a segment name -- e.g. something like:
   ```java
   String segmentPrefix = IndexFileNames.parseSegmentName(filename);
   int segmentPrefixLength = segmentPrefix.length();
   if (segmentPrefixLength > 1 && segmentPrefixLength != filename.length() && 
segmentName.charAt(0) == '_') {
 String segmentName = segmentPrefix.substring(1);
 try {
   // try to parse a proper segment identifier
   Long.parseLong(segmentName, Character.MAX_RADIX);
   return Optional.of(segmentName);
 } catch (NumberFormatException ex) {
   // unable to parse as a segment identifier
 }
   }
   return Optional.empty();
   ```
   
   If we can account for all explicit exceptions (where we don't expect to be 
able to parse a segment name) we could even throw `IllegalArgumentException` 
here instead of returning `Optional.empty()`? (this would prevent grouping from 
being used in cases where it should not be, by failing hard -- and could be a 
good complement to ensuring `READ_ONCE` is used where appropriate).
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] HnswLock: access locks via hash and only use for concurrent indexing [lucene]

2024-07-17 Thread via GitHub


msokolov opened a new pull request, #13581:
URL: https://github.com/apache/lucene/pull/13581

   Addresses https://github.com/apache/lucene/issues/13580 by adding a locking 
wrapper for OnHeapHnswGraph's NeighborArrays, and supplying this when running 
concurrent merges. With this:
   
   1. We no longer guard access to rows with locks when graph build is done in 
a single thread.
   2. We only create a fixed number of locks rather than a lock per node



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] HnswLock: access locks via hash and only use for concurrent indexing [lucene]

2024-07-17 Thread via GitHub


msokolov commented on PR #13581:
URL: https://github.com/apache/lucene/pull/13581#issuecomment-2234263814

   Tested with 1M 256-dim docs ... at least it doesn't seem to make anything 
worse?
   
   # M=16,width=50
   Tests with 1M 256-d vectors, M=16, beam-width-index=50
   | condition | merge threads | recall | latency | index time |
   |---|---||-||
   | candidate |  12   | 0.927  | 2.37| 177206 |
   | candidate |  12   | 0.928  | 2.57| 174283 |
   | baseline  |  12   | 0.927  | 2.49| 178528 |
   | baseline  |  12   | 0.927  | 2.40| 180748 |
   | candidate |   1   | 0.925  | 2.37| 176221 |
   | candidate |   1   | 0.925  | 2.36| 186181 |
   | baseline  |   1   | 0.926  | 2.46| 176939 |
   | baseline  |   1   | 0.926  | 2.55| 185635 |
   
   # M=32,width=100
   Tests with 1M 256-d vectors, M=32, beam-width-index=100
   | condition | merge threads | recall | latency | index time |
   |---|---||-||
   | candidate |  12   | 0.969  | 2.54| 390794 |
   | candidate |  12   | 0.969  | 2.51| 434838 |
   | baseline  |  12   | 0.969  | 2.55| 393600 |
   | baseline  |  12   | 0.967  | 2.82| 441376 |
   | candidate |   1   | 0.969  | 2.48| 428682 |
   | candidate |   1   | 0.970  | 2.49| 392494 |
   | baseline  |   1   | 0.970  | 2.72| 403489 |
   | baseline  |   1   | 0.968  | 2.44| 409693 |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Stop bounding outer window. [lucene]

2024-07-17 Thread via GitHub


jpountz opened a new pull request, #13582:
URL: https://github.com/apache/lucene/pull/13582

   Currently `MaxScoreBulkScorer` requires its "outer" window to be at least 
`WINDOW_SIZE`. The intuition there was that we should make sure we should use 
the whole range of the bit set that we are using to collect matches. The 
downside is that it may force us to use an upper level in the skip list that 
has worse upper bounds for the scores.
   
   luceneutil suggests that this is not a good trade-off: removing this 
requirement makes some queries a bit slower, but `OrHighMin` and `OrHighRare` 
much faster:
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
CountOrHighHigh   56.30 (36.8%)   52.86 
(28.4%)   -6.1% ( -52% -   93%) 0.623
 Or2Terms2StopWords  162.62  (2.7%)  153.87  
(4.2%)   -5.4% ( -11% -1%) 0.000
 CountOrHighMed  103.04 (24.6%)   99.36 
(19.0%)   -3.6% ( -37% -   53%) 0.667
 IntNRQ  344.56  (9.9%)  333.25 
(11.0%)   -3.3% ( -22% -   19%) 0.407
OrStopWords   32.05  (4.6%)   31.15  
(6.8%)   -2.8% ( -13% -9%) 0.201
LowTerm  899.50  (4.9%)  877.72  
(5.7%)   -2.4% ( -12% -8%) 0.227
MedTerm  506.67  (6.2%)  496.16  
(7.1%)   -2.1% ( -14% -   12%) 0.412
   HighTerm  444.00  (6.1%)  435.85  
(7.3%)   -1.8% ( -14% -   12%) 0.467
  CountTerm 9416.88  (3.5%) 9260.39  
(5.2%)   -1.7% (  -9% -7%) 0.319
   OrHighNotLow  367.96  (7.0%)  362.09  
(6.0%)   -1.6% ( -13% -   12%) 0.515
  HighTermDayOfYearSort  857.10  (3.8%)  849.83  
(3.8%)   -0.8% (  -8% -7%) 0.556
   OrHighNotMed  328.60  (7.0%)  325.94  
(6.1%)   -0.8% ( -12% -   13%) 0.743
Prefix3  287.21  (2.4%)  285.09  
(1.7%)   -0.7% (  -4% -3%) 0.349
   Or3Terms  156.94  (3.1%)  155.80  
(4.0%)   -0.7% (  -7% -6%) 0.589
 Fuzzy2   87.16  (1.3%)   86.59  
(1.1%)   -0.7% (  -3% -1%) 0.156
 OrHighHigh   75.93  (2.3%)   75.62  
(2.1%)   -0.4% (  -4% -4%) 0.611
   HighTermTitleBDVSort   13.53  (3.3%)   13.48  
(6.8%)   -0.4% ( -10% -   10%) 0.859
CountAndHighMed  120.03  (2.9%)  119.60  
(1.9%)   -0.4% (  -5% -4%) 0.705
 Fuzzy1   89.56  (1.1%)   89.25  
(1.2%)   -0.3% (  -2% -1%) 0.422
   AndStopWords   29.15  (3.5%)   29.05  
(3.6%)   -0.3% (  -7% -7%) 0.811
  And3Terms  155.12  (2.2%)  154.91  
(2.5%)   -0.1% (  -4% -4%) 0.883
And2Terms2StopWords  149.66  (2.5%)  149.48  
(2.3%)   -0.1% (  -4% -4%) 0.897
Respell   49.39  (1.4%)   49.34  
(1.3%)   -0.1% (  -2% -2%) 0.826
   Wildcard   77.63  (3.2%)   77.56  
(3.0%)   -0.1% (  -6% -6%) 0.935
   OrNotHighLow  964.64  (2.9%)  964.50  
(2.2%)   -0.0% (  -4% -5%) 0.987
  OrHighNotHigh  233.73  (7.3%)  233.73  
(6.5%)0.0% ( -12% -   14%) 1.000
   CountAndHighHigh   41.12  (2.4%)   41.15  
(2.2%)0.1% (  -4% -4%) 0.937
  HighTermMonthSort 3589.46  (1.3%) 3594.78  
(2.7%)0.1% (  -3% -4%) 0.853
 TermDTSort  360.82  (8.1%)  362.55  
(5.2%)0.5% ( -11% -   14%) 0.852
   PKLookup  285.81  (1.8%)  287.35  
(1.7%)0.5% (  -2% -4%) 0.418
  OrNotHighHigh  265.21  (7.0%)  266.65  
(6.3%)0.5% ( -11% -   14%) 0.830
AndHighHigh   68.85  (2.6%)   69.25  
(2.2%)0.6% (  -4% -5%) 0.527
 Phrase   11.48  (3.1%)   11.56  
(4.1%)0.6% (  -6% -8%) 0.644
 AndHighMed  147.82  (2.3%)  148.76  
(2.0%)0.6% (  -3% -5%) 0.438
 AndHighLow  759.97  (3.8%)  765.97  
(2.2%)0.8% (  -5% -7%) 0.502
  HighTermTitleSort  141.20  (2.9%)  142.32  
(3.1%)0.8% (  -5% -6%) 0.479
   OrNotHighMed  377.50  (6.6%)  380.75  
(5.8%)0.9% ( -10% -   14%)

Re: [PR] Compute facets while collecting [lucene]

2024-07-17 Thread via GitHub


epotyom commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1681768843


##
lucene/core/src/java/org/apache/lucene/search/CollectorOwner.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * Like {@link CollectorManager}, but it owns the collectors its manager 
creates. It is convenient
+ * that clients of the class don't have to worry about keeping the list of 
collectors, as well as
+ * about making the collectors type (C) compatible when reduce is called. 
Instance of this class
+ * also caches results of {@link CollectorManager#reduce(Collection)}.
+ *
+ * Note that instance of this class ignores any {@link Collector} created 
by {@link
+ * CollectorManager#newCollector()} directly, not through {@link 
#newCollector()}
+ *
+ * @lucene.experimental
+ */
+public final class CollectorOwner {

Review Comment:
   The problem with using `CollectorManager` is that reduce method takes 
`Collection collectors` as an argument. It means that users not only have to 
manage the list of collectors themselves, but also make sure that their 
`CollectorManager` and collection of `Collector`s are of the same type `C`. It 
becomes a problem for DrillSideways, where we want to use different 
`CollectorManager`s for different dimensions. In theory we can overcome that by 
keeping a list of types for dimensions, and cast Collectors' collection items 
to the right type before calling `#reduce`. But creating `CollectorOwner` class 
looks like better solution to me - it allows DrillSideways and IndexSearcher to 
be ignorant of collector's and result's types. As a bonus, it manages the list 
of Collectors for both `DrillSideways` and `IndexSearcher`.
   
   Please let me know if you think its not worth it - we can revisit the idea 
of keeping list of types in DrillSideways?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Compute facets while collecting [lucene]

2024-07-17 Thread via GitHub


epotyom commented on PR #13568:
URL: https://github.com/apache/lucene/pull/13568#issuecomment-2234352412

   Thank you for reviewing @mikemccand ! I've made the changes and updated the 
branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


uschindler commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1681841607


##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -167,6 +188,7 @@ public MMapDirectory(Path path, LockFactory lockFactory, 
long maxChunkSize) thro
   throw new IllegalArgumentException("Maximum chunk size for mmap must be 
>0");
 }
 this.chunkSizePower = Long.SIZE - 1 - 
Long.numberOfLeadingZeros(maxChunkSize);
+this.groupingFunction = GROUP_BY_SEGMENT;

Review Comment:
   See above.



##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -102,6 +118,11 @@ public class MMapDirectory extends FSDirectory {
*/
   public static final long DEFAULT_MAX_CHUNK_SIZE;
 
+  /** A provider specific context object or null, that will be passed to 
openInput. */
+  final Object attachment = PROVIDER.attachment();
+
+  Function> groupingFunction;

Review Comment:
   Maybe initialize it here (like for preload). Can also be private.



##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -184,6 +206,21 @@ public void setPreload(BiPredicate 
preload) {
 this.preload = preload;
   }
 
+  /**

Review Comment:
   Maybe add some short description also to class definition (like for preload).



##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -83,6 +86,19 @@ public class MMapDirectory extends FSDirectory {
*/
   public static final BiPredicate NO_FILES = (filename, 
context) -> false;
 
+  /** Argument for {@link #setGroupingFunction(Function)} that configures no 
grouping. */
+  public static final Function> NO_GROUPING = 
filename -> Optional.empty();
+
+  /** Argument for {@link #setGroupingFunction(Function)} that configures 
grouping by segment. */
+  public static final Function> GROUP_BY_SEGMENT =
+  filename -> {
+String segmentName = IndexFileNames.parseSegmentName(filename);
+if (filename.length() == segmentName.length()) {
+  return Optional.empty();
+}
+return Optional.of(segmentName);
+  };

Review Comment:
   Do we also have a function that parses the Long without reimplementing it 
here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Aggregate files from the same segment into a single Arena [lucene]

2024-07-17 Thread via GitHub


uschindler commented on code in PR #13570:
URL: https://github.com/apache/lucene/pull/13570#discussion_r1681847294


##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -83,6 +86,19 @@ public class MMapDirectory extends FSDirectory {
*/
   public static final BiPredicate NO_FILES = (filename, 
context) -> false;
 
+  /** Argument for {@link #setGroupingFunction(Function)} that configures no 
grouping. */
+  public static final Function> NO_GROUPING = 
filename -> Optional.empty();
+
+  /** Argument for {@link #setGroupingFunction(Function)} that configures 
grouping by segment. */
+  public static final Function> GROUP_BY_SEGMENT =
+  filename -> {
+String segmentName = IndexFileNames.parseSegmentName(filename);
+if (filename.length() == segmentName.length()) {
+  return Optional.empty();
+}
+return Optional.of(segmentName);
+  };

Review Comment:
   Maybe add a method to validate and extract a segment name to 
`IndexFileNames` class. We possibly also have one in test framework that could 
be moved...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] HnswLock: access locks via hash and only use for concurrent indexing [lucene]

2024-07-17 Thread via GitHub


benwtrent commented on PR #13581:
URL: https://github.com/apache/lucene/pull/13581#issuecomment-2234506747

   It's weird that the merge threads have no impact on index build time. 
   
   Did you create a bunch of segments and then force merge to exercise this 
code path?
   
   Or was the buffer size too large to even allow merges?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-07-17 Thread via GitHub


goankur commented on code in PR #13572:
URL: https://github.com/apache/lucene/pull/13572#discussion_r1682009005


##
lucene/core/build.gradle:
##
@@ -14,10 +14,43 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
+plugins {
+  id "c"
+}
 
 apply plugin: 'java-library'
+apply plugin: 'c'
 
 description = 'Lucene core library'
+model {
+  toolChains {
+gcc(Gcc) {
+  target("linux_aarch64"){
+cppCompiler.withArguments { args ->
+  args << "-O3 --shared"

Review Comment:
   That was it, just needed a fresh pair of eyes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-07-17 Thread via GitHub


goankur commented on PR #13572:
URL: https://github.com/apache/lucene/pull/13572#issuecomment-2235177271

   > Do we even need to use intrinsics? function is so simple that the compiler 
seems to do the right thing, e.g. use `SDOT` dot production instruction, given 
the correct flags:
   > 
   > https://godbolt.org/z/KG1dPnrqn
   > 
   > 
https://developer.arm.com/documentation/102651/a/What-are-dot-product-intructions-
   
   
   
   > I haven't benchmarked, just seems `SDOT` is the one to optimize for, and 
GCC can both recognize the code shape and autovectorize to it without hassle.
   > 
   > my cheap 2021 phone has `asimddp` feature in /proc/cpuinfo, dot product 
support seems widespread.
   > 
   > You can use it directly via intrinsic, too, no need to use add/multiply 
intrinsic: 
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#dot-product
   > 
   > But unless it is really faster than what GCC does with simple C, no need.
   
   With the updated compile flags, the performance of auto-vectorized code is 
slightly better than explicitly vectorized code (see results). Interesting 
thing to note is that both C-based implementations have `10X` better throughout 
compared to the Panama API based java implementation (unless I am not doing 
apples-to-apples comparison).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-07-17 Thread via GitHub


rmuir commented on PR #13572:
URL: https://github.com/apache/lucene/pull/13572#issuecomment-2235209531

   > With the updated compile flags, the performance of auto-vectorized code is 
slightly better than explicitly vectorized code (see results). Interesting 
thing to note is that both C-based implementations have `10X` better throughout 
compared to the Panama API based java implementation (unless I am not doing 
apples-to-apples comparison).
   
   This seems correct to me. The java Vector API is not performant for the 
integer case. Hotspot doesn't much understand ARM at all and definitely doesn't 
even have instructions such as `SDOT` in its vocabulary at all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Early terminate visit BKD leaf when current value greater than upper point in sorted dim. [lucene]

2024-07-17 Thread via GitHub


vsop-479 commented on PR #12528:
URL: https://github.com/apache/lucene/pull/12528#issuecomment-2235220539

   > if there are slowdowns due to the extra check for each visited point.
   
   Maybe this extra check can be weakened by [Branch predictor
   ](https://en.wikipedia.org/wiki/Branch_predictor)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-07-17 Thread via GitHub


rmuir commented on code in PR #13572:
URL: https://github.com/apache/lucene/pull/13572#discussion_r1682041365


##
lucene/core/build.gradle:
##
@@ -14,12 +14,59 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
+plugins {
+  id "c"
+}
 
 apply plugin: 'java-library'
+apply plugin: 'c'
 
 description = 'Lucene core library'
+model {
+  toolChains {
+gcc(Gcc) {
+  target("linux_aarch64"){
+path '/usr/bin/'
+cCompiler.executable 'gcc10-cc'
+cCompiler.withArguments { args ->
+  args << "--shared"
+   << "-O3"
+   << "-march=armv8.2-a+dotprod"

Review Comment:
   for simple case of benchmarking i would just try `-march=native` to compile 
the best it can for the specific cpu/microarch. Then you could test your 
`dot8s` on ARM 256-bit SVE (graviton3, graviton4) which might be interesting...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Early terminate visit BKD leaf when current value greater than upper point in sorted dim. [lucene]

2024-07-17 Thread via GitHub


vsop-479 commented on PR #12528:
URL: https://github.com/apache/lucene/pull/12528#issuecomment-2235705078

   > if there are slowdowns due to the extra check for each visited point.
   
   Maybe this extra check can be weakened by [Branch 
predictor](https://en.wikipedia.org/wiki/Branch_predictor)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org