Re: [PR] Initial upgrade to Gradle 8.14 (rc2) [lucene]

2025-04-23 Thread via GitHub


dweiss commented on PR #14533:
URL: https://github.com/apache/lucene/pull/14533#issuecomment-2823279050

   > I don't understand this one: their language server works fine with java 24 
and i'm able to navigate the new JDK24 classfile api, etc. I guess I haven't 
used it extensively, due to the gradle issue, and they've got some bugfixes 
since the last release.
   
   I don't fully understand it either - the latest artifact for the batch ecj 
compiler (what we use) seems to be from march:
   https://repo1.maven.org/maven2/org/eclipse/jdt/ecj/3.41.0/
   
   this doesn't support Java 24. I've just downloaded the latest Eclipse and it 
seems to contain the same batch compiler. Their website says this:
   
   
![image](https://github.com/user-attachments/assets/e291fcde-0d4e-4330-8e65-2a9fbf08edbd)
   
   There are some marketplace extensions to support Java 24 but these seem to 
be external - so... I don't think Eclipse supports Java 24 yet?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] deps(java): bump org.apache.commons:commons-compress from 1.19 to 1.27.1 [lucene]

2025-04-23 Thread via GitHub


dweiss commented on PR #14540:
URL: https://github.com/apache/lucene/pull/14540#issuecomment-2823322410

   Ok, so here's how to work this out. First, show what's bringing in the 
conflicting dependencies, like the message shows:
   
   ```
   ./gradlew :lucene:benchmark:dependencyInsight --dependency 
"commons-codec:commons-codec" --configuration "compileClasspath"
   ...
   commons-codec:commons-codec:1.17.1
   \--- org.apache.commons:commons-compress:1.27.1
\--- compileClasspath
   
   ./gradlew :lucene:analysis:phonetic:dependencyInsight --dependency 
"commons-codec:commons-codec" --configuration "compileClasspath"
   ...
   commons-codec:commons-codec:1.17.2
   \--- compileClasspath
   ```
   
   The above means :lucene:benchmark pulls commons-codec as a transitive 
dependency and phonetic imports it directly. Previously, palantir's 
consistent-versions would try to make all dependencies "consistent" globally - 
unfortunately this isn't really working well with larger projects with more 
complicated configurations. This "dependency-locking" mechanism that I wrote 
and that we use in Lucene attempts to detect such inconsistencies but doesn't 
try to fix them. There are a few ways to fix this:
   
   * Add an explicit dependency on commons-codec or on 
:lucene:analysis:phonetic to :lucene:benchmark - then gradle will align the 
transitive dependency to the requested version. Here is what the dependency 
insight looks like after adding a dependency on phonetic:
   ```
   ./gradlew :lucene:benchmark:dependencyInsight --dependency 
"commons-codec:commons-codec" --configuration "compileClasspath"
   ...
   \--- project :lucene:analysis:phonetic
\--- compileClasspath
   
   commons-codec:commons-codec:1.17.1 -> 1.17.2
   \--- org.apache.commons:commons-compress:1.27.1
\--- compileClasspath
   ```
   note the auto-aligned dependency.
   
   * Add gradle's dependency constraint on the version of commons-codec to 
benchmarks.
   
   * remove :lucene:benchmark from the set of configurations included in the 
dependency lock checks 
   (this would mean two different versions of commons-codec are used in 
different configurations).
   
   I went with the first option because this seems like the simplest one. Note 
that the latest version of commons-compress
   brings additional dependencies (commons-io and commons-lang3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Tone down TestIndexWriterDelete.testDeleteAllRepeated (OOMs sometimes) [lucene]

2025-04-23 Thread via GitHub


dweiss closed issue #14508: Tone down 
TestIndexWriterDelete.testDeleteAllRepeated (OOMs sometimes)
URL: https://github.com/apache/lucene/issues/14508


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make TestIndexWriterDelete.testDeleteAllRepeated a monster test and force FSDirectory to prevent OOM. [lucene]

2025-04-23 Thread via GitHub


dweiss merged PR #14526:
URL: https://github.com/apache/lucene/pull/14526


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


jpountz commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2056836360


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess
+List fileTypes =
+hintClasses.getOrDefault(FileTypeHint.class, List.of());
+if (fileTypes.size() > 1) {
+  throw new IllegalArgumentException("Multiple file type hints specified: 
" + fileTypes);
+}
+List fileData = 
hintClasses.getOrDefault(FileDataHint.class, List.of());
+if (fileData.size() > 1) {
+  throw new IllegalArgumentException("Multiple file data hints specified: 
" + fileData);
+}
+List dataAccess =
+hintClasses.getOrDefault(DataAccessHint.class, List.of());
+if (dataAccess.size() > 1) {
+  throw new IllegalArgumentException("Multiple data access hints 
specified: " + dataAccess);
+}
+  }
+
+  protected ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   How would you customize read advices on a custom directory in the end? I 
would have assumed something like that if it was on IOContext:
   
   ```java
   Directory dir = new FilterDirectory(mmapDirectory) {
 @Override
 public IndexInput openInput(String name, IOContext context) {
   if (someCondition()) {
 // enforce specific read advice instead of relying on hints
 context = context.withReadAdvice(ReadAdvice.NORMAL);
   }
   return in.openInput(name, context);
 }
   }
   ```
   
   I'm not sure how it can work with a method on `Directory` that would never 
be called by the wrapped directory?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce the number of comparisons when lowerPoint is equal to upperPoint [lucene]

2025-04-23 Thread via GitHub


github-actions[bot] commented on PR #14267:
URL: https://github.com/apache/lucene/pull/14267#issuecomment-2825838739

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-04-23 Thread via GitHub


github-actions[bot] commented on PR #14131:
URL: https://github.com/apache/lucene/pull/14131#issuecomment-2825838798

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Run a workflow on dependabot's PRs (gradlew updateLicenses writeLocks) [lucene]

2025-04-23 Thread via GitHub


rmuir commented on issue #14506:
URL: https://github.com/apache/lucene/issues/14506#issuecomment-2824924310

   thanks for looking into this @dweiss ! 
   
   I've got a PR to hopefully reduce the number of dependabot PRs for python 
and actions dependencies, since some of them release very often (multiple times 
a week): https://github.com/apache/lucene/pull/14544
   
   I think it might help reduce the maintenance costs in a different way. I 
think most of the java dependencies don't update as frequently but we can 
revisit that one when we are "caught up" 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]

2025-04-23 Thread via GitHub


gf2121 commented on PR #14529:
URL: https://github.com/apache/lucene/pull/14529#issuecomment-2824954402

   I personally prefer not to eagerly load docs for advance/advanceExact, or 
maybe leave it to another issue/PR. 
   
   I update code to one of the version during my local iteration, which reduces 
the complexity by using a single `orRange` instead of multiple round of 
`orRange`. I wonder if this will make sense to you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]

2025-04-23 Thread via GitHub


jpountz commented on code in PR #14439:
URL: https://github.com/apache/lucene/pull/14439#discussion_r2056783050


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java:
##
@@ -53,11 +63,31 @@ public LeafCollector getLeafCollector(LeafReaderContext 
context) throws IOExcept
   // The segment has no values, nothing to do.
   throw new CollectionTerminatedException();
 }
+
+// We can use multi range traversal logic to collect the histogram on 
numeric
+// field indexed as point for MATCH_ALL cases. In future, this can be 
extended
+// for Point Range Query cases as well
+if (context.reader().hasDeletions() == false
+&& weight.count(context) == context.reader().maxDoc()) {

Review Comment:
   I would check weight against `null`, applications may have collectors that 
don't set a weight.



##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java:
##
@@ -53,11 +63,31 @@ public LeafCollector getLeafCollector(LeafReaderContext 
context) throws IOExcept
   // The segment has no values, nothing to do.
   throw new CollectionTerminatedException();
 }
+
+// We can use multi range traversal logic to collect the histogram on 
numeric
+// field indexed as point for MATCH_ALL cases. In future, this can be 
extended
+// for Point Range Query cases as well
+if (context.reader().hasDeletions() == false

Review Comment:
   You don't need this check, Weight#count already accounts for deletions.



##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/PointTreeBulkCollector.java:
##
@@ -0,0 +1,227 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.plain.histograms;
+
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
+
+import java.io.IOException;
+import java.util.function.Function;
+import org.apache.lucene.index.PointValues;
+import org.apache.lucene.internal.hppc.LongIntHashMap;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.util.NumericUtils;
+
+class PointTreeBulkCollector {
+  private static Function bytesToLong(int numBytes) {
+if (numBytes == Long.BYTES) {
+  // Used by LongPoint, DoublePoint
+  return a -> NumericUtils.sortableBytesToLong(a, 0);
+} else if (numBytes == Integer.BYTES) {
+  // Used by IntPoint, FloatPoint, LatLonPoint, LatLonShape
+  return a -> (long) NumericUtils.sortableBytesToInt(a, 0);
+}
+
+return null;
+  }
+
+  static boolean canCollectEfficiently(final PointValues pointValues, final 
long bucketWidth)
+  throws IOException {
+// TODO: Do we really need pointValues.getDocCount() == pointValues.size() 
for this optimization
+if (pointValues == null
+|| pointValues.getNumDimensions() != 1
+|| pointValues.getDocCount() != pointValues.size()) {
+  return false;
+}
+
+final Function byteToLong = 
bytesToLong(pointValues.getBytesPerDimension());
+if (byteToLong == null) {
+  return false;
+}
+
+long leafMinBucket =
+Math.floorDiv(byteToLong.apply(pointValues.getMinPackedValue()), 
bucketWidth);
+long leafMaxBucket =
+Math.floorDiv(byteToLong.apply(pointValues.getMaxPackedValue()), 
bucketWidth);
+
+// We want that # leaf nodes is more than # buckets so that we can 
completely skip over
+// some of the leaf nodes. Higher this ratio, more efficient it is than 
naive approach!
+if ((pointValues.size() / 512) < (leafMaxBucket - leafMinBucket)) {

Review Comment:
   Sounds good to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Move sloppySin into SloppyMath from GeoUtils [lucene]

2025-04-23 Thread via GitHub


jainankitk commented on code in PR #14516:
URL: https://github.com/apache/lucene/pull/14516#discussion_r2056713597


##
lucene/core/src/java/org/apache/lucene/util/SloppyMath.java:
##
@@ -178,6 +178,30 @@ public static double asin(double a) {
 }
   }
 
+  // some sloppyish stuff, do we really need this to be done in a sloppy way?
+  // unless it is performance sensitive, we should try to remove.
+  // This is performance sensitive, check SloppySinBenchmark and 
RectangleBenchmark
+  private static final double PIO2 = Math.PI / 2D;
+
+  /**
+   * Returns the trigonometric sine of an angle converted as a cos operation.
+   *
+   * Note that this is not quite right... e.g. sin(0) != 0
+   *
+   * Special cases:
+   *
+   * 
+   *   If the argument is {@code NaN} or an infinity, then the result is 
{@code NaN}.
+   * 
+   *
+   * @param a an angle, in radians.
+   * @return the sine of the argument.
+   * @see Math#sin(double)
+   */
+  public static double sin(double a) {

Review Comment:
   > I'm not familiar with what the Rectangle code is doing here: I don't tend 
to think about sin() being associated with Rectanges...
   
   `sin()` is used for calculating and adjusting the longitude range of a 
circle's bounding Rectangle based on the latitude



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]

2025-04-23 Thread via GitHub


jainankitk commented on code in PR #14439:
URL: https://github.com/apache/lucene/pull/14439#discussion_r2056686537


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java:
##
@@ -53,11 +65,30 @@ public LeafCollector getLeafCollector(LeafReaderContext 
context) throws IOExcept
   // The segment has no values, nothing to do.
   throw new CollectionTerminatedException();
 }
+
+// We can use multi range traversal logic to collect the histogram on 
numeric
+// field indexed as point for MATCH_ALL cases. In future, this can be 
extended
+// for Point Range Query cases as well
+if (query instanceof MatchAllDocsQuery && context.reader().hasDeletions() 
== false) {

Review Comment:
   Oh yeah, that's a good point! Hadn't thought about queries on other fields. 
Changed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]

2025-04-23 Thread via GitHub


jainankitk commented on code in PR #14439:
URL: https://github.com/apache/lucene/pull/14439#discussion_r2056834941


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java:
##
@@ -53,11 +63,31 @@ public LeafCollector getLeafCollector(LeafReaderContext 
context) throws IOExcept
   // The segment has no values, nothing to do.
   throw new CollectionTerminatedException();
 }
+
+// We can use multi range traversal logic to collect the histogram on 
numeric
+// field indexed as point for MATCH_ALL cases. In future, this can be 
extended
+// for Point Range Query cases as well
+if (context.reader().hasDeletions() == false
+&& weight.count(context) == context.reader().maxDoc()) {

Review Comment:
   Added `null` check



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055497713


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess
+List fileTypes =
+hintClasses.getOrDefault(FileTypeHint.class, List.of());
+if (fileTypes.size() > 1) {
+  throw new IllegalArgumentException("Multiple file type hints specified: 
" + fileTypes);
+}
+List fileData = 
hintClasses.getOrDefault(FileDataHint.class, List.of());
+if (fileData.size() > 1) {
+  throw new IllegalArgumentException("Multiple file data hints specified: 
" + fileData);
+}
+List dataAccess =
+hintClasses.getOrDefault(DataAccessHint.class, List.of());
+if (dataAccess.size() > 1) {
+  throw new IllegalArgumentException("Multiple data access hints 
specified: " + dataAccess);
+}
+  }
+
+  protected ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   This becomes a lot more complicated in [subsequent 
PRs](https://github.com/apache/lucene/pull/14510/files#diff-a320c032354752cddf187b571d2832ca7a0dcfee5a1f3bda19578e2559338f8b)



##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess
+List fileTypes =
+hintClasses.getOrDefault(FileTypeHint.class, List.of());
+if (fileTypes.size() > 1) {
+  throw new IllegalArgumentException("Multiple file type hints specified: 
" + fileTypes);
+}
+List fileData = 
hintClasses.getOrDefault(FileDataHint.class, List.of());
+if (fileData.size() > 1) {
+  throw new IllegalArgumentException("Multiple file data hints specified: 
" + fileData);
+}
+List dataAccess =
+hintClasses.getOrDefault(DataAccessHint.class, List.of());
+if (dataAccess.size() > 1) {
+  throw new IllegalArgumentException("Multiple data access hints 
specified: " + dataAccess);
+}
+  }
+
+  protected ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   This becomes a lot more complicated in [subsequent 
PRs](https://github.com/apache/lucene/pull/14510/files#diff-a320c032354752cddf187b571d2832ca7a0dcfee5a1f3bda19578e2559338f8b).
 I'll add javadocs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] deps(java): bump com.github.luben:zstd-jni from 1.5.5-11 to 1.5.7-2 [lucene]

2025-04-23 Thread via GitHub


dweiss merged PR #14505:
URL: https://github.com/apache/lucene/pull/14505


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055502471


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess

Review Comment:
   Because some uses can only specify some of the hint types - eg 
[here](https://github.com/apache/lucene/pull/14510/files#diff-6b2f0b03b74eed609dc835a669a38299f6bd24a3c2431cf7d58de27a869fc96e).
 I also want the ability to specify other hint types if necessary, eg 
[here](https://github.com/apache/lucene/pull/14509/commits/62fffb269edc6b7e981396a6a99f44b98d005c37)
 and 
[here](https://github.com/apache/lucene/pull/14510/files#diff-498e50abd4c94dc980f7bb096e63b64051a2ac77053f0a943643206699962bbc).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] deps(java): bump de.undercouch.download from 5.2.0 to 5.6.0 [lucene]

2025-04-23 Thread via GitHub


dweiss merged PR #14503:
URL: https://github.com/apache/lucene/pull/14503


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use a non-deprecated assertThat, and change several test assertions to use assertThat [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14518:
URL: https://github.com/apache/lucene/pull/14518#discussion_r2055485408


##
lucene/core/src/test/org/apache/lucene/document/TestDocument.java:
##
@@ -312,21 +324,21 @@ public void testFieldSetValue() throws Exception {
 
 // ensure that queries return expected results without DateFilter first
 ScoreDoc[] hits = searcher.search(query, 1000).scoreDocs;
-assertEquals(3, hits.length);
-int result = 0;
+assertThat(hits, arrayWithSize(3));
+Set seen = new HashSet<>();
 StoredFields storedFields = searcher.storedFields();
 for (int i = 0; i < 3; i++) {
   Document doc2 = storedFields.document(hits[i].doc);
   Field f = (Field) doc2.getField("id");
-  if (f.stringValue().equals("id1")) result |= 1;
-  else if (f.stringValue().equals("id2")) result |= 2;
-  else if (f.stringValue().equals("id3")) result |= 4;
-  else fail("unexpected id field");
+  switch (f.stringValue()) {
+case "id1", "id2", "id3" -> seen.add(f.stringValue());
+default -> fail("unexpected id field");
+  }
 }
 writer.close();
 reader.close();
 dir.close();
-assertEquals("did not see all IDs", 7, result);
+assertThat("did not see all IDs", seen, containsInAnyOrder("id1", "id2", 
"id3"));

Review Comment:
   `containsInAnyOrder` fails if there are any extra items in the collection.
   
   More generally, `assertEquals(..., Set.of())` checks if the value is a 
`Set`, which is not actually important here. The collection could be a `List` 
and the test would still work. So checking for `Set` isn't required here for 
the test to pass, whereas `containsInAnyOrder` doesn't care about the type of 
collection used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055496607


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {

Review Comment:
   A more useful implementation can be found 
[here](https://github.com/apache/lucene/pull/14510/files#diff-a320c032354752cddf187b571d2832ca7a0dcfee5a1f3bda19578e2559338f8b).
 There are a couple of reasons for putting it here:
   
   - I wanted it on `Directory` so that `IOContext` is just a data transfer 
object, it is not opinionated on what it holds
   - Overrides of `Directory` can override this to specify their own validation 
logic for any new hints they use themselves



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ci: bump actions/stale from 5.2.0 to 9.1.0 [lucene]

2025-04-23 Thread via GitHub


stefanvodita commented on PR #14498:
URL: https://github.com/apache/lucene/pull/14498#issuecomment-2823469274

   Thanks David! I've enabled debug mode. I'll push and have a look at the 
logs, then disable debug mode again if nothing seems out of place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ci: bump actions/stale from 5.2.0 to 9.1.0 [lucene]

2025-04-23 Thread via GitHub


stefanvodita merged PR #14498:
URL: https://github.com/apache/lucene/pull/14498


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


rmuir commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055620022


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {

Review Comment:
   I still feel my question is unanswered. I think adding a new method to 
Directory is a Very Serious Thing and I dont understand why it absolutely must 
be here, as to me this doesn't make sense.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055636449


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {

Review Comment:
   I put it in `Directory` so it is alongside the `toReadAdvice` method, to 
keep `IOContext` as dumb as possible, and so it can be overridden by subclasses 
to add additional validation where needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use a non-deprecated assertThat, and change several test assertions to use assertThat [lucene]

2025-04-23 Thread via GitHub


rmuir commented on PR #14518:
URL: https://github.com/apache/lucene/pull/14518#issuecomment-2823693691

   my concern wasn't about verbosity: if you are concerned about this, steer 
clear of java!
   
   instead just keeping "barrier to entry" to understanding the tests low. This 
includes requiring developer to go and understand a third-party library in 
order to debug it. But if this is really more natural to most developers, then 
I'm ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


ChrisHegarty commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055674573


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {

Review Comment:
   Adding the invariant that an IOContext can have only instance of a concrete 
hint type seems ok, but I wonder if we need the validation at all. Maybe the 
validation can rather be enforced in a test-specific AssertingXXX wrapper? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Move sloppySin into SloppyMath from GeoUtils [lucene]

2025-04-23 Thread via GitHub


rmuir commented on code in PR #14516:
URL: https://github.com/apache/lucene/pull/14516#discussion_r2055723991


##
lucene/core/src/java/org/apache/lucene/util/SloppyMath.java:
##
@@ -178,6 +178,30 @@ public static double asin(double a) {
 }
   }
 
+  // some sloppyish stuff, do we really need this to be done in a sloppy way?
+  // unless it is performance sensitive, we should try to remove.
+  // This is performance sensitive, check SloppySinBenchmark and 
RectangleBenchmark
+  private static final double PIO2 = Math.PI / 2D;
+
+  /**
+   * Returns the trigonometric sine of an angle converted as a cos operation.
+   *
+   * Note that this is not quite right... e.g. sin(0) != 0
+   *
+   * Special cases:
+   *
+   * 
+   *   If the argument is {@code NaN} or an infinity, then the result is 
{@code NaN}.
+   * 
+   *
+   * @param a an angle, in radians.
+   * @return the sine of the argument.
+   * @see Math#sin(double)
+   */
+  public static double sin(double a) {

Review Comment:
   I agree it is better to have the benchmark. I'm not familiar with what the 
Rectangle code is doing here: I don't tend to think about sin() being 
associated with Rectanges...
   
   But the background is that the `cos()` and `asin()` are needed for this 
use-case: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/document/LatLonPointDistanceComparator.java#L37-L43
   
   We try hard to avoid calling these functions at all, but in some cases such 
as the sorting, many calls are needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestForTooMuchCloning: too many calls to IndexInput.clone during TermRangeQuery: 7 [lucene]

2025-04-23 Thread via GitHub


rmuir commented on issue #14546:
URL: https://github.com/apache/lucene/issues/14546#issuecomment-2823753882

   This test is old, i think. The background IIRC is that for some Directory 
impls (e.g. NIOFSDirectory or others that use "buffered input"), `clone()` may 
cause at minimum a 1KB read/buffer refill. For MMapDirectory the clone() may be 
cheaper and so excessive cloning may go unnoticed, yet cause a performance 
issue for some.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Run a workflow on dependabot's PRs (gradlew updateLicenses writeLocks) [lucene]

2025-04-23 Thread via GitHub


dweiss commented on issue #14506:
URL: https://github.com/apache/lucene/issues/14506#issuecomment-2823987442

   So this would require some more extensive restructuring of workflows.  A 
commit from within a workflow won't trigger the same set of checks (this is 
done to avoid endless loops in distributed jobs on gh side). So we'd need to 
extract any checks as a reusable action, then configure dependabot PRs to use a 
special workflow that updates locks, then checks if all other checks pass, then 
push the commit if they do. 
   
   It is kind-of-doable but also a bit fragile. Let's live with manual update 
for now and see how pressing the issue is, then revisit. I attach the workflow 
I used for testing.
   
   
[dependabot-update-shas.txt](https://github.com/user-attachments/files/19866430/dependabot-update-shas.txt)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Run a workflow on dependabot's PRs (gradlew updateLicenses writeLocks) [lucene]

2025-04-23 Thread via GitHub


dweiss closed issue #14506: Run a workflow on dependabot's PRs (gradlew 
updateLicenses writeLocks)
URL: https://github.com/apache/lucene/issues/14506


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2056374094


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess
+List fileTypes =
+hintClasses.getOrDefault(FileTypeHint.class, List.of());
+if (fileTypes.size() > 1) {
+  throw new IllegalArgumentException("Multiple file type hints specified: 
" + fileTypes);
+}
+List fileData = 
hintClasses.getOrDefault(FileDataHint.class, List.of());
+if (fileData.size() > 1) {
+  throw new IllegalArgumentException("Multiple file data hints specified: 
" + fileData);
+}
+List dataAccess =
+hintClasses.getOrDefault(DataAccessHint.class, List.of());
+if (dataAccess.size() > 1) {
+  throw new IllegalArgumentException("Multiple data access hints 
specified: " + dataAccess);
+}
+  }
+
+  protected ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   Does the logic 
[here](https://github.com/apache/lucene/pull/14510/files#diff-a320c032354752cddf187b571d2832ca7a0dcfee5a1f3bda19578e2559338f8bR107)
 really belong in `IOContext`? It feels like `ReadAdvice` is now more an 
implementation detail of `Directory`, so the mapping from hints -> ReadAdvice 
belongs in `Directory`. IOContext is more a POJO that has little opinions on 
what hints it passes thorough.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2056382795


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess
+List fileTypes =
+hintClasses.getOrDefault(FileTypeHint.class, List.of());
+if (fileTypes.size() > 1) {
+  throw new IllegalArgumentException("Multiple file type hints specified: 
" + fileTypes);
+}
+List fileData = 
hintClasses.getOrDefault(FileDataHint.class, List.of());
+if (fileData.size() > 1) {
+  throw new IllegalArgumentException("Multiple file data hints specified: 
" + fileData);
+}
+List dataAccess =
+hintClasses.getOrDefault(DataAccessHint.class, List.of());
+if (dataAccess.size() > 1) {
+  throw new IllegalArgumentException("Multiple data access hints 
specified: " + dataAccess);
+}
+  }
+
+  protected ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   There's various options here allowing different ReadAdvice for compound 
directories - one is a subinterface of `IOContext` with an extra method to get 
file-specific hints that `CompoundDirectory` implementations would know how to 
access. This PR isn't the final implementation, there's going to be a few 
changes as the use of hints is expanded.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Compute the doc range more efficiently when flushing doc block [lucene]

2025-04-23 Thread via GitHub


stefanvodita commented on PR #14447:
URL: https://github.com/apache/lucene/pull/14447#issuecomment-2823807598

   Sorry, I don't know if I was clear enough. I was asking if we can also 
revert the change to `Lucene101PostingsWriter` and only change 
`Lucene103PostingsWriter` this time.
   
   Thanks for adding the changelog entry!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Compute the doc range more efficiently when flushing doc block [lucene]

2025-04-23 Thread via GitHub


bugmakerr commented on PR #14447:
URL: https://github.com/apache/lucene/pull/14447#issuecomment-2823829947

   > Sorry, I don't know if I was clear enough. I was asking if we can also 
revert the change to `Lucene101PostingsWriter` and only change 
`Lucene103PostingsWriter` this time.
   > 
   > Thanks for adding the changelog entry!
   
   My fault, I forgot to revert.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-04-23 Thread via GitHub


gf2121 commented on PR #14333:
URL: https://github.com/apache/lucene/pull/14333#issuecomment-2824644842

   Hi @stefanvodita 
   
   We plan to allow CI chew on this change for a couple of weeks and backport 
it if everything goes well. See 
https://github.com/apache/lucene/pull/14333#issuecomment-2789601238.
   
   When backporting, i'll try to catch all changes made to `Lucene103Codec`, 
including https://github.com/apache/lucene/pull/14447. So for now, feel free to 
just merge it into main :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-04-23 Thread via GitHub


stefanvodita commented on PR #14333:
URL: https://github.com/apache/lucene/pull/14333#issuecomment-2824654704

   You anticipated my concern with #14447 😄 
   Thank you for handling this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix TestForTooMuchCloning [lucene]

2025-04-23 Thread via GitHub


gf2121 merged PR #14547:
URL: https://github.com/apache/lucene/pull/14547


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


jpountz commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055940755


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {

Review Comment:
   Could this be a compile-time invariant instead of a runtime invariant, e.g. 
~~by tracking the 3 hints separately, or~~ by using something similar to 
Guava's ClassToInstanceMap instead of a Set?
   
   EDIT: lated read the explanation why it can't be 3 different hints



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


jpountz commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055940755


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {

Review Comment:
   Could this be a compile-time invariant instead of a runtime invariant, e.g. 
by tracking the 3 hints separately, or by using something similar to Guava's 
ClassToInstanceMap instead of a Set?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestForTooMuchCloning: too many calls to IndexInput.clone during TermRangeQuery: 7 [lucene]

2025-04-23 Thread via GitHub


mikemccand commented on issue #14546:
URL: https://github.com/apache/lucene/issues/14546#issuecomment-2824159527

   +1 to bump the clone limit to 7 so the test passes again -- clone inflation!!
   
   `clone` is likely super cheap for `MMapDirectory`?  Hmm it is the JDK 
version specific `MemorySegmentIndexInput` (with `SingleSegmentImpl` subclass 
to optimize the common case ) that is actually cloned, and it's making a new 
class instance, setting a few members, making a shallow copy of the actual 
mapped segments array, and (hmm) calling `.toString()` on the to-be-cloned 
`IndexInput` heh (this just returns the `resourceDescription` member).  So yeah 
it looks quite cheap.
   
   Also, this clone cost (fixed, one-time cost per segment) is amortized away 
as the query + segment matches more and more hits.  So crazy fast queries 
matching very few hits are a wee bit slower and still crazy fast, and slow 
queries matching many hits are not affected in any meaningful way ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055991981


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {

Review Comment:
   OK, after looking at it a bit, I've just moved the hint checking into 
`IOContext` in 
   ae1bc87. It does mean that no hint at all can have more than one instance as 
a hint though - but that might be ok?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]

2025-04-23 Thread via GitHub


mikemccand commented on PR #14439:
URL: https://github.com/apache/lucene/pull/14439#issuecomment-2824226140

   This is a nice optimization, using points (if the user indexed them) to 
carefully optimize counting of ranges.
   
   > @stefanvodita - Thanks for a prompt review. Addressed most of the review 
comments. Adding JMH benchmark instead of the not so useful performance test 
added earlier. The benchark results demonstrate significant increase in 
throughput with increasing # documents and bucket width (lesser buckets mean 
less low level traversal in point range tree and more documents collected in 
bulk):
   
   Oooh those JMH benchy results are nice!  Though, it's dangerous testing only 
on random data -- you can draw random conclusions/results.  But it's better 
than no benchmark!  Maybe we should add histogram faceting benchy to Lucene's 
nightly benchmarks?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]

2025-04-23 Thread via GitHub


jpountz commented on code in PR #14439:
URL: https://github.com/apache/lucene/pull/14439#discussion_r2056150570


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java:
##
@@ -53,11 +65,30 @@ public LeafCollector getLeafCollector(LeafReaderContext 
context) throws IOExcept
   // The segment has no values, nothing to do.
   throw new CollectionTerminatedException();
 }
+
+// We can use multi range traversal logic to collect the histogram on 
numeric
+// field indexed as point for MATCH_ALL cases. In future, this can be 
extended
+// for Point Range Query cases as well
+if (query instanceof MatchAllDocsQuery && context.reader().hasDeletions() 
== false) {

Review Comment:
   I'ml not sure if we're talking about the same case. You seem to be talking 
about the case when the histogram is computed on the same field as the range 
query. I was referring to the case when a query is not a `MatchAllDocsQuery` 
but still fully matches a segment, e.g. a `TermQuery` on a stop word like `a` 
or a wide `PointRangeQuery` (not necessarily on the same field as the 
histogram).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2056382795


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess
+List fileTypes =
+hintClasses.getOrDefault(FileTypeHint.class, List.of());
+if (fileTypes.size() > 1) {
+  throw new IllegalArgumentException("Multiple file type hints specified: 
" + fileTypes);
+}
+List fileData = 
hintClasses.getOrDefault(FileDataHint.class, List.of());
+if (fileData.size() > 1) {
+  throw new IllegalArgumentException("Multiple file data hints specified: 
" + fileData);
+}
+List dataAccess =
+hintClasses.getOrDefault(DataAccessHint.class, List.of());
+if (dataAccess.size() > 1) {
+  throw new IllegalArgumentException("Multiple data access hints 
specified: " + dataAccess);
+}
+  }
+
+  protected ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   There's various options here allowing different ReadAdvice for compound 
directories - one is a subinterface of `IOContext` with an extra method to get 
file-specific hints that `CompoundDirectory` implementations would know how to 
access. Other options are available of course.
   
   This PR isn't the final implementation, there's going to be a few changes as 
the use of hints is expanded.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestForTooMuchCloning: too many calls to IndexInput.clone during TermRangeQuery: 7 [lucene]

2025-04-23 Thread via GitHub


gf2121 commented on issue #14546:
URL: https://github.com/apache/lucene/issues/14546#issuecomment-2823836350

   Thanks @ChrisHegarty for git bisect and @rmuir for the context.
   
   (I allowed myself to edit your comment to point to the correct PR.)
   
   I knew the new trie will clone the indexInput twice instead of FST only once 
so I changed the limitation from 5 to 6 per segment, while i did not expect it 
will clone 7 times so much. I printed out all the clone traces:
   
   1. at 
org.apache.lucene.codecs.lucene103.blocktree.FieldReader.newReader(FieldReader.java:92)
   2. at 
org.apache.lucene.codecs.lucene103.blocktree.TrieReader.(TrieReader.java:79)
   3. 
org.apache.lucene.codecs.lucene103.blocktree.IntersectTermsEnum.(IntersectTermsEnum.java:85)
   4. 
org.apache.lucene.codecs.lucene103.blocktree.FieldReader.newReader(FieldReader.java:92)
   5. 
org.apache.lucene.codecs.lucene103.blocktree.TrieReader.(TrieReader.java:79)
   6. at 
org.apache.lucene.codecs.lucene103.blocktree.SegmentTermsEnum.initIndexInput(SegmentTermsEnum.java:83)
   7. 
org.apache.lucene.codecs.lucene103.Lucene103PostingsReader$BlockPostingsEnum.reset(Lucene103PostingsReader.java:517)
   
   So the `RangeQuery` intersect with `IntersectTermsEnum` (3 times), then 
`seeExact` with `SegmentTermsEnum` (3 times), and get postings (1 times). The 
TrieReader is constructed twice so we cloned twice more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestForTooMuchCloning: too many calls to IndexInput.clone during TermRangeQuery: 7 [lucene]

2025-04-23 Thread via GitHub


gf2121 commented on issue #14546:
URL: https://github.com/apache/lucene/issues/14546#issuecomment-2824072414

   As the the added clones are cloning index file `tip` of term dictionary, 
which is usually opened with mmap. I raised a fix 
https://github.com/apache/lucene/pull/14547 to simply increase the limit of 
clones per segment from 6 to 7. Feel free to beat me if we think it is too many.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055991981


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {

Review Comment:
   OK, after looking at it a bit, I've just moved the hint checking into 
`IOContext`. It does mean that no hint at all can have more than one instance 
as a hint though - but that might be ok?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add AnytimeRankingSearcher for SLA-Aware Early Termination with Bin-Based Score Boosting [lucene]

2025-04-23 Thread via GitHub


atris commented on PR #14525:
URL: https://github.com/apache/lucene/pull/14525#issuecomment-2824857619

   @jpountz Thanks! Here is the paper: https://arxiv.org/abs/2104.08976
   
   Note that the core inspiration of this PR's approach comes from the paper, 
but the implementation diverges in certain ways:
   
   The paper talks about using bins mainly for hard cutoffs and filtering. The 
PR, instead, uses bins to compute adaptive score boosts, and wire that directly 
using the new Collector.
   
   The PR also adds:
•   index-time graph-based binning (exact + approximate). This adds 
minimal indexing latency but gives significant improvements in search time.
•   bin-level boosting at segment level
   
   So while the high-level idea overlaps, the implementation is more ambitious 
and also opens doors for implementations like bin skipping and multiple fields 
support and then use graph intersection or fusion to identify documents that 
are strongly connected across multiple semantic dimensions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055991981


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {

Review Comment:
   OK, after looking at it a bit, I've just moved the hint checking into 
`IOContext` in 
   ae1bc87 in a type-independent way. It does mean that no hint at all can have 
more than one instance as a hint though - but that might be ok?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Compute the doc range more efficiently when flushing doc block [lucene]

2025-04-23 Thread via GitHub


stefanvodita merged PR #14447:
URL: https://github.com/apache/lucene/pull/14447


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055706972


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {

Review Comment:
   Or maybe add the validation to `BaseDirectoryWrapper`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial upgrade to Gradle 8.14 (rc2) [lucene]

2025-04-23 Thread via GitHub


dweiss commented on PR #14533:
URL: https://github.com/apache/lucene/pull/14533#issuecomment-2823770803

   I've no idea either. All I know is that if I download eclipse-jdt, it 
doesn't have support for Java 24 yet. There is a milestone version which 
supposedly has it - but not an official stable version, as far as I can see. 
And the maven repo definitely doesn't have the batch compiler supporting 
java24. That eclipse-jdls has a fork of jdt... which is also weird. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


jpountz commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055959591


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess
+List fileTypes =
+hintClasses.getOrDefault(FileTypeHint.class, List.of());
+if (fileTypes.size() > 1) {
+  throw new IllegalArgumentException("Multiple file type hints specified: 
" + fileTypes);
+}
+List fileData = 
hintClasses.getOrDefault(FileDataHint.class, List.of());
+if (fileData.size() > 1) {
+  throw new IllegalArgumentException("Multiple file data hints specified: 
" + fileData);
+}
+List dataAccess =
+hintClasses.getOrDefault(DataAccessHint.class, List.of());
+if (dataAccess.size() > 1) {
+  throw new IllegalArgumentException("Multiple data access hints 
specified: " + dataAccess);
+}
+  }
+
+  protected ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   This may also help with making things work on compound files, it looks like 
it is not possible to customize the read advice of sub-files of a compound file 
with the current approach?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


jpountz commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055961207


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess

Review Comment:
   Thanks, that makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestForTooMuchCloning: too many calls to IndexInput.clone during TermRangeQuery: 7 [lucene]

2025-04-23 Thread via GitHub


gf2121 closed issue #14546: TestForTooMuchCloning: too many calls to 
IndexInput.clone during TermRangeQuery: 7
URL: https://github.com/apache/lucene/issues/14546


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial upgrade to Gradle 8.14 (rc2) [lucene]

2025-04-23 Thread via GitHub


rmuir commented on PR #14533:
URL: https://github.com/apache/lucene/pull/14533#issuecomment-2823807465

   Yeah I think "big old eclipse" lags behind, whereas the jdtls is "the 
latest" and sees more usage (e.g. all VSCode users and other editors).
   
   There's even a plugin to allow use of jdtls in "big old eclipse": 
https://github.com/redhat-developer/eclipseide-jdtls/
   
   very confusing ecosystem, but that's my read on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial upgrade to Gradle 8.14 (rc2) [lucene]

2025-04-23 Thread via GitHub


dweiss commented on PR #14533:
URL: https://github.com/apache/lucene/pull/14533#issuecomment-2823791230

   Eclipse 4.36M1 seems to have Java 24 support in the batch compiler. This is 
marked as "stable" but artifacts for this release are not on maven central yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial upgrade to Gradle 8.14 (rc2) [lucene]

2025-04-23 Thread via GitHub


dweiss commented on PR #14533:
URL: https://github.com/apache/lucene/pull/14533#issuecomment-2823856413

   I believe they use p2 repositories for interim releases and only do a maven 
push once in a quarter for stable final releases. 
   
   If this becomes a problem, we can pull ecj directly from p2 drops, like here:
   
https://www.eclipse.org/downloads/download.php?file=/eclipse/downloads/drops4/S-4.36M1-202504031800/ecj-4.36M1.jar&r=1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Fix TestForTooMuchCloning [lucene]

2025-04-23 Thread via GitHub


gf2121 opened a new pull request, #14547:
URL: https://github.com/apache/lucene/pull/14547

   closes #14546


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


jpountz commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055947695


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess
+List fileTypes =
+hintClasses.getOrDefault(FileTypeHint.class, List.of());
+if (fileTypes.size() > 1) {
+  throw new IllegalArgumentException("Multiple file type hints specified: 
" + fileTypes);
+}
+List fileData = 
hintClasses.getOrDefault(FileDataHint.class, List.of());
+if (fileData.size() > 1) {
+  throw new IllegalArgumentException("Multiple file data hints specified: 
" + fileData);
+}
+List dataAccess =
+hintClasses.getOrDefault(DataAccessHint.class, List.of());
+if (dataAccess.size() > 1) {
+  throw new IllegalArgumentException("Multiple data access hints 
specified: " + dataAccess);
+}
+  }
+
+  protected ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   Then can we move it to IOContext instead of Directory? `FilterDirectory` 
could customize the read advice by wrapping the IOContext that is passed to 
createOutput/openInput.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055950140


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {

Review Comment:
   I like that idea - so theres no actual logic involved, but you're only 
allowed one instance of each type in `IOContext`, whilst not being restrictive 
on how many hints can be specified.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]

2025-04-23 Thread via GitHub


benwtrent commented on PR #14543:
URL: https://github.com/apache/lucene/pull/14543#issuecomment-2824317132

   🤔 seems like this is another candidate for a 10.2.1 bugfix release. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-04-23 Thread via GitHub


stefanvodita commented on PR #14333:
URL: https://github.com/apache/lucene/pull/14333#issuecomment-2824596025

   If we think the issue last week was with the tests, should we go ahead and 
back-port this change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]

2025-04-23 Thread via GitHub


gf2121 commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2057617346


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -491,6 +492,14 @@ public int advance(int target) throws IOException {
 return doc;
   }
 
+  @Override
+  public void intoBitSet(int upTo, FixedBitSet bitSet, int offset) throws 
IOException {
+assert doc >= offset;

Review Comment:
   > (or maybe a better place for this would be asserting doc values
   
   +1
   
   > It looks like some implementations of intoBitsetWithinBlock try to cover 
the case when the current doc doesn't exist, maybe they can be simplified?
   
   Done :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ci: bump actions/stale from 5.2.0 to 9.1.0 [lucene]

2025-04-23 Thread via GitHub


dweiss commented on PR #14498:
URL: https://github.com/apache/lucene/pull/14498#issuecomment-2823456560

   @stefanvodita - sorry for missing your comment, please do!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] deps(java): bump flexmark from 0.61.24 to 0.64.8 [lucene]

2025-04-23 Thread via GitHub


dweiss commented on PR #14499:
URL: https://github.com/apache/lucene/pull/14499#issuecomment-2823452992

   Documentation markdown-to-html dependency. Tested by comparing the output 
manually (no difference).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] deps(java): bump flexmark from 0.61.24 to 0.64.8 [lucene]

2025-04-23 Thread via GitHub


dweiss merged PR #14499:
URL: https://github.com/apache/lucene/pull/14499


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] deps(java): bump de.undercouch.download from 5.2.0 to 5.6.0 [lucene]

2025-04-23 Thread via GitHub


dweiss commented on PR #14503:
URL: https://github.com/apache/lucene/pull/14503#issuecomment-2823442570

   build script/ dataset download dependency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] TestForTooMuchCloning: too many calls to IndexInput.clone during TermRangeQuery: 7 [lucene]

2025-04-23 Thread via GitHub


ChrisHegarty opened a new issue, #14546:
URL: https://github.com/apache/lucene/issues/14546

   ### Description
   
   ```
   TestForTooMuchCloning > test FAILED
   java.lang.AssertionError: too many calls to IndexInput.clone during 
TermRangeQuery: 7
   at 
__randomizedtesting.SeedInfo.seed([3B8D209D5C693E94:B3D91F47F295536C]:0)
   at org.junit.Assert.fail(Assert.java:89)
   at org.junit.Assert.assertTrue(Assert.java:42)
   at 
org.apache.lucene.index.TestForTooMuchCloning.test(TestForTooMuchCloning.java:88)
   at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
   at java.base/java.lang.reflect.Method.invoke(Method.java:580)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
   at 
org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   at 
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
   at java.base/java.lang.Thread.run(Thread.java:1575)
   ```
   
   ### Gradle command to reproduce
   
   ./gradlew test --tests TestForTooMuchCloning.test 
-Dtests.seed=3B8D209D5C693E94 -Dtests.locale=kk-KZ 
-Dtests.timezone=America/Indiana/Marengo -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe

Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-23 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2055496607


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {

Review Comment:
   A more useful implementation can be found 
[here](https://github.com/apache/lucene/pull/14510/files#diff-a320c032354752cddf187b571d2832ca7a0dcfee5a1f3bda19578e2559338f8b).
 There are a couple of reasons for putting it here:
   
   - I wanted it on `Directory` so that `IOContext` is just a data transfer 
object, it is not opinionated on what it holds - `Directory` and subclasses 
contains all the logic for validating and using hints.
   - Overrides of `Directory` can override this to specify their own validation 
logic for any new hints they use themselves



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] ci: bump actions/stale from 5.2.0 to 9.1.0 [lucene]

2025-04-23 Thread via GitHub


stefanvodita commented on PR #14498:
URL: https://github.com/apache/lucene/pull/14498#issuecomment-2823576204

   Compare the [last run with 
v5](https://github.com/apache/lucene/actions/runs/14607185573/job/40978452884) 
with the [first run with 
v9](https://github.com/apache/lucene/actions/runs/14614033999/job/40998064219). 
The more recent run didn't encounter any new stale PRs, which is likely 
correct, and it is significantly more efficient than the older version, using 
less operations per PR and being able to get through all the PRs with budget to 
spare, while the older action could not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestForTooMuchCloning: too many calls to IndexInput.clone during TermRangeQuery: 7 [lucene]

2025-04-23 Thread via GitHub


ChrisHegarty commented on issue #14546:
URL: https://github.com/apache/lucene/issues/14546#issuecomment-2823593983

   git bisect tells me that this commit is the culprit #14380. /cc @gf2121 
   
   ```
   878e23f77390058b6787e0a0537de9087d1019b3 is the first bad commit
   commit 878e23f77390058b6787e0a0537de9087d1019b3
   Author: Guo Feng <52390227+gf2...@users.noreply.github.com>
   Date:   Mon Apr 14 23:19:04 2025 +0800
   
   A specialized Trie for Block Tree Index (#14380)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Initial upgrade to Gradle 8.14 (rc2) [lucene]

2025-04-23 Thread via GitHub


rmuir commented on PR #14533:
URL: https://github.com/apache/lucene/pull/14533#issuecomment-2823604993

   I'm using https://github.com/eclipse-jdtls/eclipse.jdt.ls which, is just 
this same eclipse stuff packaged in a different way?
   
   ![Screen_Shot_2025-04-23_at_05 09 
47](https://github.com/user-attachments/assets/0519eaed-982f-46ae-813a-6ba31c3c7d19)
   
   It had the java 24 support like a week after the java 24 release. Sorry, I 
don't understand eclipse ecosystem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]

2025-04-23 Thread via GitHub


gf2121 commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2057599583


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -625,8 +635,36 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int 
target) throws IOException
 disi.exists = false;
 return false;
   }
+
+  @Override
+  boolean intoBitsetWithinBlock(IndexedDISI disi, int upTo, FixedBitSet 
bitSet, int offset)
+  throws IOException {
+if (disi.exists) {
+  if (disi.doc >= upTo) {
+return true;
+  }
+  bitSet.set(disi.doc - offset);
+}
+
+for (int i = disi.index, to = disi.nextBlockIndex; i < to; i++) {

Review Comment:
   I changed this to make it clearer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]

2025-04-23 Thread via GitHub


gf2121 commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2057680596


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -625,6 +634,29 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int 
target) throws IOException
 disi.exists = false;
 return false;
   }
+
+  @Override
+  boolean intoBitsetWithinBlock(IndexedDISI disi, int upTo, FixedBitSet 
bitSet, int offset)
+  throws IOException {
+if (disi.doc >= upTo) {
+  return true;
+}
+bitSet.set(disi.doc - offset);
+
+for (int i = disi.index + 1, to = disi.nextBlockIndex; i <= to; i++) {

Review Comment:
   In `advanceWithinBlock`, `disi.index++` is executed in the loop body before 
`if (doc >= targetInBlock)` so it is not actually excluded. 
`advanceWithinBlock` can return true when `disi.index` equals 
`disi.nextBlockIndex`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]

2025-04-23 Thread via GitHub


jpountz commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2057662695


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -491,6 +492,14 @@ public int advance(int target) throws IOException {
 return doc;
   }
 
+  @Override
+  public void intoBitSet(int upTo, FixedBitSet bitSet, int offset) throws 
IOException {
+assert doc >= offset;
+while (method.intoBitsetWithinBlock(this, upTo, bitSet, offset) == false) {
+  readBlockHeader();

Review Comment:
   I wonder if we should call something like below after reading the block 
header (similar to what `advance(target)` does):
   
   ```
   boolean found = method.advanceWithinBlock(this, block);
   assert found;
   ```
   This would guarantee that `disi.doc` is always a doc ID of the block when 
`intoBitsetWithinBlock` is called, which could help simplify some 
implementations of `intoBitsetWithinBlock`? (I'm looking at `DENSE` in 
particular)



##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -625,6 +634,29 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int 
target) throws IOException
 disi.exists = false;
 return false;
   }
+
+  @Override
+  boolean intoBitsetWithinBlock(IndexedDISI disi, int upTo, FixedBitSet 
bitSet, int offset)
+  throws IOException {
+if (disi.doc >= upTo) {
+  return true;
+}
+bitSet.set(disi.doc - offset);
+
+for (int i = disi.index + 1, to = disi.nextBlockIndex; i <= to; i++) {

Review Comment:
   I'm still a bit confused, why is `disi.nextBlockIndex` included in the loop 
when it's excluded in `advanceWithinBlock`?



##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -720,6 +805,15 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int 
target) {
  * return whether this document exists.
  */
 abstract boolean advanceExactWithinBlock(IndexedDISI disi, int target) 
throws IOException;
+
+/**
+ * Similar to {@link DocIdSetIterator#intoBitSet}, load docs in this block 
into a bitset. This
+ * mehtod return true if there are remaining docs (gte upTo) in the block, 
otherwise false. When
+ * false return, fp of disi#slice is at blockEnd and disi#index is correct 
but other status vars
+ * are undefined. Caller should decode the header of next block by {@link 
#readBlockHeader()}.
+ */
+abstract boolean intoBitsetWithinBlock(

Review Comment:
   Nit: for consistency with how it's called on `DocIdSetIterator`.
   
   ```suggestion
   abstract boolean intoBitSetWithinBlock(
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use a non-deprecated assertThat, and change several test assertions to use assertThat [lucene]

2025-04-23 Thread via GitHub


dweiss merged PR #14518:
URL: https://github.com/apache/lucene/pull/14518


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]

2025-04-23 Thread via GitHub


gf2121 commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2057597668


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -535,6 +544,7 @@ private void readBlockHeader() throws IOException {
   method = Method.SPARSE;
   blockEnd = slice.getFilePointer() + (numValues << 1);
   nextExistDocInBlock = -1;
+  exists = false;

Review Comment:
   This is to make `intoBitset` work with `advanceExact`, which is not really 
needed. I removed this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]

2025-04-23 Thread via GitHub


jainankitk commented on PR #14439:
URL: https://github.com/apache/lucene/pull/14439#issuecomment-2825127094

> Oooh those JMH benchy results are nice! Though, it's dangerous testing 
only on random data -- you can draw random conclusions/results. But it's better 
than no benchmark! Maybe we should add histogram faceting benchy to Lucene's 
nightly benchmarks?
   
   Thanks @mikemccand for the feedback. Having histogram faceting benchmark in 
Lucene's nightly will be great! Created issue 
https://github.com/mikemccand/luceneutil/issues/375 for following up on this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Avoid reload block when seeking backward in SegmentTermsEnum. [lucene]

2025-04-23 Thread via GitHub


github-actions[bot] commented on PR #13253:
URL: https://github.com/apache/lucene/pull/13253#issuecomment-2825839442

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]

2025-04-23 Thread via GitHub


jpountz commented on code in PR #14529:
URL: https://github.com/apache/lucene/pull/14529#discussion_r2056179949


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -535,6 +544,7 @@ private void readBlockHeader() throws IOException {
   method = Method.SPARSE;
   blockEnd = slice.getFilePointer() + (numValues << 1);
   nextExistDocInBlock = -1;
+  exists = false;

Review Comment:
   Is this a pre-existing bug?



##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -625,8 +635,36 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int 
target) throws IOException
 disi.exists = false;
 return false;
   }
+
+  @Override
+  boolean intoBitsetWithinBlock(IndexedDISI disi, int upTo, FixedBitSet 
bitSet, int offset)
+  throws IOException {
+if (disi.exists) {
+  if (disi.doc >= upTo) {
+return true;
+  }
+  bitSet.set(disi.doc - offset);
+}
+
+for (int i = disi.index, to = disi.nextBlockIndex; i < to; i++) {

Review Comment:
   I am a bit surprised that this loop starts at `disi.index` instead of 
`disi.index + 1` since the doc at `disi.index` was already handled above?



##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java:
##
@@ -491,6 +492,14 @@ public int advance(int target) throws IOException {
 return doc;
   }
 
+  @Override
+  public void intoBitSet(int upTo, FixedBitSet bitSet, int offset) throws 
IOException {
+assert doc >= offset;

Review Comment:
   Can you also assert something like `doc == NOT_MORE_DOCS || 
advanceExact(doc)`? It shouldn't be legal to call intoBitSet after advanceExact 
returns false (or maybe a better place for this would be asserting doc values).
   
   It looks like some implementations of `intoBitsetWithinBlock` try to cover 
the case when the current doc doesn't exist, maybe they can be simplified?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Run a workflow on dependabot's PRs (gradlew updateLicenses writeLocks) [lucene]

2025-04-23 Thread via GitHub


dweiss commented on issue #14506:
URL: https://github.com/apache/lucene/issues/14506#issuecomment-2825119807

   Yes, there is no rush with this. I think the workflows do need some love but 
it's not really a high priority. I'm chipping at cleaning up gradle build to 
speed it up. It's a slow process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]

2025-04-23 Thread via GitHub


jainankitk commented on code in PR #14439:
URL: https://github.com/apache/lucene/pull/14439#discussion_r2056830312


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java:
##
@@ -53,11 +63,31 @@ public LeafCollector getLeafCollector(LeafReaderContext 
context) throws IOExcept
   // The segment has no values, nothing to do.
   throw new CollectionTerminatedException();
 }
+
+// We can use multi range traversal logic to collect the histogram on 
numeric
+// field indexed as point for MATCH_ALL cases. In future, this can be 
extended
+// for Point Range Query cases as well
+if (context.reader().hasDeletions() == false

Review Comment:
   Ah, nice!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org