[I] java.lang.AssertionError in backward compat tests ("failed to parse ... as date") [lucene]

2024-02-05 Thread via GitHub


dweiss opened a new issue, #13073:
URL: https://github.com/apache/lucene/issues/13073

   ### Description
   
   These failures do reproduce but you need the linedocsfile. I'll take a look.
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] java.lang.AssertionError in backward compat tests ("failed to parse ... as date") [lucene]

2024-02-05 Thread via GitHub


dweiss commented on issue #13073:
URL: https://github.com/apache/lucene/issues/13073#issuecomment-1926587718

   The embedded "linedocsfile" (europarl) has a different date field format 
compared to the "large" enwiki used on jenkins. 
   ```
   (1)  2004-03-30  Istituzioni europee proteggerlo in tutti i campi.   
[...]
   ```
   ```
   APL (codepage)   11-JAN-2010 16:54:37.000 languages.  From a 
user's standpoint
   ```
   
   After #13046, TestIndexSortBackwardsCompatibility tries to parse the date 
and fails because it only recognizes "-MM-dd". 
   
   I wonder what the best fix here should be - make the test recognize both 
date format or correct the test-resource-creation script and make dates there 
more consistent?
   
   I think correcting the test will be simpler... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2024-02-05 Thread via GitHub


vsop-479 commented on PR #11888:
URL: https://github.com/apache/lucene/pull/11888#issuecomment-1926589435

   @jpountz 
   Can we push on this change by checking whether our test case has covered all 
the status, that `TermsEnum.seekExact` or  `TermsEnum.seekCeil` may emit?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make date parsing more flexible for linedocsfile (europarl, enwiki) [lucene]

2024-02-05 Thread via GitHub


dweiss commented on PR #13075:
URL: https://github.com/apache/lucene/pull/13075#issuecomment-1926668465

   This makes date parsing accept both europarl and enwiki. The tests still 
assume the docs come from europarl though and fail on the large 
enwiki.random.lines.txt:
   ```
   gradlew -p lucene\backward-codecs test 
-Dtests.linedocsfile=/../enwiki.random.lines.txt
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] java.lang.AssertionError in backward compat tests ("failed to parse ... as date") [lucene]

2024-02-05 Thread via GitHub


dweiss commented on issue #13073:
URL: https://github.com/apache/lucene/issues/13073#issuecomment-1926670843

   @s1monw - some of those assertions added in #13046 only hold for the 
built-in europarl. I fixed date parsing but I'm not sure how to deal with the 
problem that the line resource can be different. Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] java.lang.AssertionError in backward compat tests ("failed to parse ... as date") [lucene]

2024-02-05 Thread via GitHub


dweiss commented on issue #13073:
URL: https://github.com/apache/lucene/issues/13073#issuecomment-1926679038

   There was actually just a single assertion that caused the problems. I've 
just remove it since it seems to be duplicated anyway with a term (body:the) 
that exists in both data sets. There is a marginal possibility of these tests 
failing in the future if the test resources change. Don't know if it's worth 
considering.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make date parsing more flexible for linedocsfile (europarl, enwiki) [lucene]

2024-02-05 Thread via GitHub


uschindler commented on code in PR #13075:
URL: https://github.com/apache/lucene/pull/13075#discussion_r1478028235


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestIndexSortBackwardsCompatibility.java:
##
@@ -147,6 +150,36 @@ public void testSortedIndex() throws Exception {
 }
   }
 
+  /**
+   * Converts date formats for europarl ("2023-02-23") and enwiki 
("12-JAN-2010 12:32:45.000") into
+   * {@link LocalDateTime}.
+   */
+  static Function DATE_STRING_TO_LOCALDATETIME =
+  new Function<>() {
+final DateTimeFormatter euroParl =
+new DateTimeFormatterBuilder()
+.parseStrict()
+.parseCaseInsensitive()
+.appendPattern("-MM-dd")
+.toFormatter(Locale.ROOT);
+
+final DateTimeFormatter enwiki =
+new DateTimeFormatterBuilder()
+.parseStrict()
+.parseCaseInsensitive()
+.appendPattern("dd-MMM- HH:mm:ss['.'SSS]")
+.toFormatter(Locale.ROOT);
+
+@Override
+public LocalDateTime apply(String s) {
+  if (s.matches("^[0-9]{4}-[0-9]{2}-[0-9]{2}$")) {
+return euroParl.parse(s, 
LocalDate::from).atTime(LocalTime.MIDNIGHT);

Review Comment:
   `.atStartOfDay()` would be shorter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make date parsing more flexible for linedocsfile (europarl, enwiki) [lucene]

2024-02-05 Thread via GitHub


uschindler commented on PR #13075:
URL: https://github.com/apache/lucene/pull/13075#issuecomment-1926758404

   P.S.: Maybe add a quick test for the parsing logic to check that both 
formats are accepted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make date parsing more flexible for linedocsfile (europarl, enwiki) [lucene]

2024-02-05 Thread via GitHub


dweiss commented on PR #13075:
URL: https://github.com/apache/lucene/pull/13075#issuecomment-1926938848

   > Looks fine. Why did you create a `Function` for 
parsing instead of a simple static method? I think you did this to hide the 
formatter instances?
   
   Yes, correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


pmpailis opened a new pull request, #13076:
URL: https://github.com/apache/lucene/pull/13076

   
   This PR adds support for binary Hamming distance as a similarity metric for 
byte vectors. The drive behind this is that there is an increasing interest in 
applying hashing techniques for embeddings (both in text & image based search 
applications e.g. [Binary Passage Retriever](https://arxiv.org/abs/2106.00882)) 
due to their much reduced size & performance gains that one might get. A 
natural way to compare binary vectors is to use Hamming distance by computing 
the population count of the XOR result of two embeddings, i.e. count how many 
different bits exist in the two binary vectors. 
   
   In Lucene, we can leverage the existing `byte[]` support, and use that to 
store the binary vectors. The size of the byte vectors would be `d / 8` or `(d 
/ 8) + 1` if `d % 8 > 0`, where d is the dimension of a binary vector. So for 
example, a binary vector of 64 bits, could use a `KnnByteVectorField` with 
`vectorDimension=8`. However, this transformation is currently outside of the 
scope of this PR. 
   
   To compute the Hamming distance, since we have a bounded pool of values 
ranging from `Byte.MIN_VALUE` to `Byte.MAX_VALUE`, this PR makes use of a look 
up table to retrieve the appropriate population count. Similarly, for the 
Panama implementation, we rely on the approach discussed 
[here](https://arxiv.org/abs/1611.07612) to compute the population count of low 
& high bits of the vectors' XOR result using a look-up table as well. 
   
   To convert the computed distance to a similarity score we finaly normalize 
through `1 / (1 + hamming_distance)`
   
   
   Benchmarks for the scalar & vectorized implementations running on my M2 Pro 
(Neon) dev machine: 
   ```
   Benchmark(size)   Mode  CntScore 
  Error   Units
   VectorUtilBenchmark.binaryHammingDistanceScalar   1  thrpt   15  514.613 
± 6.496  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar   8  thrpt   15  216.716 
± 1.682  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar  16  thrpt   15  135.528 
± 1.606  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar  32  thrpt   15   76.745 
± 0.654  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar  50  thrpt   15   52.226 
± 0.444  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar  64  thrpt   15   41.246 
± 0.139  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 100  thrpt   15   29.119 
± 0.086  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 128  thrpt   15   22.639 
± 0.138  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 207  thrpt   15   14.382 
± 0.058  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 256  thrpt   15   11.813 
± 0.060  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 300  thrpt   15   10.253 
± 0.257  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 512  thrpt   156.145 
± 0.015  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 702  thrpt   154.461 
± 0.133  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar1024  thrpt   153.091 
± 0.003  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector   1  thrpt   15  499.861 
± 0.476  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector   8  thrpt   15  191.430 
± 3.243  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector  16  thrpt   15  298.697 
± 4.448  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector  32  thrpt   15  222.700 
± 5.461  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector  50  thrpt   15  129.853 
± 0.325  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector  64  thrpt   15  156.657 
± 4.337  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 100  thrpt   15   83.879 
± 1.864  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 128  thrpt   15   88.028 
± 2.156  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 207  thrpt   15   43.573 
± 1.085  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 256  thrpt   15   49.415 
± 0.865  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 300  thrpt   15   34.535 
± 0.408  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 512  thrpt   15   25.953 
± 0.197  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 702  thrpt   15   17.483 
± 0.033  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector1024  thrpt   15   13.305 
± 0.085  ops/us
   ```
   
   `BINARY_HAMMING_DISTANCE` is currently only supported for `byte[]` vectors 
so I had to slightly refactor some tests to distinguish between float and byte 
versions of knn-search. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mai

Re: [PR] clean up smoketester GPG leaks [lucene]

2024-02-05 Thread via GitHub


hurutoriya commented on PR #12882:
URL: https://github.com/apache/lucene/pull/12882#issuecomment-1926941192

   @janhoy Thank you for suggestion. I've never do the QA. 
   
   > If not, we need to QA that the script is not broken by this change.
   
   OK, let me try the QA 🙏 .
   How should I do for QA? (I will try understand a concrete step)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make date parsing more flexible for linedocsfile (europarl, enwiki) [lucene]

2024-02-05 Thread via GitHub


dweiss commented on PR #13075:
URL: https://github.com/apache/lucene/pull/13075#issuecomment-1926946487

   > P.S.: Maybe add a quick test for the parsing logic to check that both 
formats are accepted.
   
   Maybe it'd be better to move this logic into LineFileDocs so that the date 
field's value is normalized somehow? Then we can add TestLineFileDocs and test 
it there. Adding a test method to TestIndexSortBackwardsCompatibility will 
multiply it over all the parameters, which doesn't make sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-02-05 Thread via GitHub


benwtrent commented on code in PR #12962:
URL: https://github.com/apache/lucene/pull/12962#discussion_r1478196742


##
lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenByteKnnVectorQuery.java:
##
@@ -24,15 +24,8 @@
 import org.apache.lucene.index.LeafReaderContext;
 import org.apache.lucene.index.VectorEncoding;
 import org.apache.lucene.index.VectorSimilarityFunction;
-import org.apache.lucene.search.DocIdSetIterator;
-import org.apache.lucene.search.HitQueue;
-import org.apache.lucene.search.KnnByteVectorQuery;
-import org.apache.lucene.search.KnnCollector;
-import org.apache.lucene.search.Query;
-import org.apache.lucene.search.ScoreDoc;
-import org.apache.lucene.search.TopDocs;
-import org.apache.lucene.search.TopDocsCollector;
-import org.apache.lucene.search.TotalHits;
+import org.apache.lucene.search.*;

Review Comment:
   Don't think we want `*` here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make date parsing more flexible for linedocsfile (europarl, enwiki) [lucene]

2024-02-05 Thread via GitHub


dweiss commented on PR #13075:
URL: https://github.com/apache/lucene/pull/13075#issuecomment-1927005366

   I've moved that utility function to LineFileDocs and added a basic test case 
to TestLineFileDocs (good idea). I feel tempted to normalize the date field's 
value in LineFileDocs but this may break other stuff so better to leave it for 
a follow-up issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Enable parent field in sorted bwc tests [lucene]

2024-02-05 Thread via GitHub


jpountz commented on code in PR #13067:
URL: https://github.com/apache/lucene/pull/13067#discussion_r1478307290


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestIndexSortBackwardsCompatibility.java:
##
@@ -82,14 +84,20 @@ public void testSortedIndexAddDocBlocks() throws Exception {
 .setOpenMode(IndexWriterConfig.OpenMode.APPEND)
 .setIndexSort(sort)
 .setMergePolicy(newLogMergePolicy());
+if (this.version.onOrAfter(FIRST_PARENT_DOC_VERSION)) {
+  indexWriterConfig.setParentField(PARENT_FIELD_NAME);
+}
 // open writer
 try (IndexWriter writer = new IndexWriter(directory, indexWriterConfig)) {
   // add 10 docs
   for (int i = 0; i < 10; i++) {
 Document child = new Document();
 child.add(new StringField("relation", "child", Field.Store.NO));
 child.add(new StringField("bid", "" + i, Field.Store.NO));
-child.add(new NumericDocValuesField("dateDV", i));
+if (version.onOrAfter(FIRST_PARENT_DOC_VERSION)
+== false) { // only add this to earlier versions
+  child.add(new NumericDocValuesField("dateDV", i));
+}

Review Comment:
   Why do we need this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


benwtrent commented on code in PR #13076:
URL: https://github.com/apache/lucene/pull/13076#discussion_r1478274441


##
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##
@@ -214,4 +214,11 @@ public static float[] checkFinite(float[] v) {
 }
 return v;
   }
+
+  public static float binaryHammingDistance(byte[] a, byte[] b) {
+if (a.length != b.length) {
+  throw new IllegalArgumentException("vector dimensions differ: " + 
a.length + "!=" + b.length);
+}
+return 1f / (1 + IMPL.binaryHammingDistance(a, b));

Review Comment:
   This should return `IMPL.binaryHammingDistance(a, b)`. 
   
   The user's of this function will have to transform the distance to a score 
separately.



##
lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java:
##
@@ -116,4 +143,14 @@ public float compare(byte[] v1, byte[] v2) {
* @return the value of the similarity function applied to the two vectors
*/
   public abstract float compare(byte[] v1, byte[] v2);
+
+  /**
+   * Defines which encodings are supported by the similarity function - used 
in tests to control
+   * randomization
+   *
+   * @return a list of all supported VectorEncodings for the given similarity
+   */
+  public Set supportedVectorEncodings() {
+return EnumSet.of(VectorEncoding.BYTE, VectorEncoding.FLOAT32);
+  }

Review Comment:
   Do we do anything with the set except check membership? I think a simple 
`bool supportsVectorEncoding(VectorEncoding)` function would be preferred.



##
lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java:
##
@@ -94,6 +98,29 @@ public float compare(float[] v1, float[] v2) {
 public float compare(byte[] v1, byte[] v2) {
   return scaleMaxInnerProductScore(dotProduct(v1, v2));
 }
+  },
+  /**
+   * Binary Hamming distance; Computes how many bits are different in two 
bytes.
+   *
+   * Only supported for bytes. To convert the distance to a similarity 
score we normalize using 1
+   * / (1 + hammingDistance)
+   */
+  BINARY_HAMMING_DISTANCE {
+@Override
+public float compare(float[] v1, float[] v2) {
+  throw new UnsupportedOperationException(
+  BINARY_HAMMING_DISTANCE.name() + " is only supported for byte 
vectors");
+}
+
+@Override
+public float compare(byte[] v1, byte[] v2) {
+  return binaryHammingDistance(v1, v2);

Review Comment:
   The score transformation for the distance should be here. Note 
`squareDistance`



##
lucene/core/src/test/org/apache/lucene/index/KnnGraphTestCase.java:
##
@@ -54,35 +55,65 @@
 import org.apache.lucene.util.Bits;
 import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.IOUtils;
-import org.apache.lucene.util.VectorUtil;
 import org.apache.lucene.util.hnsw.HnswGraph;
-import org.apache.lucene.util.hnsw.HnswGraph.NodesIterator;
 import org.apache.lucene.util.hnsw.HnswGraphBuilder;
 import org.junit.After;
 import org.junit.Before;
 
 /** Tests indexing of a knn-graph */
-public class TestKnnGraph extends LuceneTestCase {

Review Comment:
   This is a fairly large refactor adding test coverage, but I think it 
detracts from this PR. Could you change all this back and restrict the 
similarity function when using floats?
   
   There are so many abstract functions, etc. it seems like this entire test 
case should be rewritten from the ground up if we were to 100% test byte vs. 
float for it.



##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizedVectorSimilarity.java:
##
@@ -40,6 +40,8 @@ static ScalarQuantizedVectorSimilarity fromVectorSimilarity(
   case EUCLIDEAN -> new Euclidean(constMultiplier);
   case COSINE, DOT_PRODUCT -> new DotProduct(constMultiplier);
   case MAXIMUM_INNER_PRODUCT -> new MaximumInnerProduct(constMultiplier);
+  case BINARY_HAMMING_DISTANCE -> throw new IllegalArgumentException(
+  "Cannot use Hamming distance with scalar quantization");
 };

Review Comment:
   It would be good if the error message had the literal 
`BINARY_HAMMING_DISTANCE` string to indicate the similarity enumeration we 
failed at.



##
lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java:
##
@@ -94,6 +98,29 @@ public float compare(float[] v1, float[] v2) {
 public float compare(byte[] v1, byte[] v2) {
   return scaleMaxInnerProductScore(dotProduct(v1, v2));
 }
+  },
+  /**
+   * Binary Hamming distance; Computes how many bits are different in two 
bytes.
+   *
+   * Only supported for bytes. To convert the distance to a similarity 
score we normalize using 1
+   * / (1 + hammingDistance)
+   */
+  BINARY_HAMMING_DISTANCE {
+@Override
+public float compare(float[] v1, float[] v2) {
+  throw new UnsupportedOperationException(
+  BINARY_HAMMING_DISTANCE.name() + " is only supported for byte 
vectors");
+}
+
+@Override
+public float compare(byte[] v1, byte[

Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


benwtrent commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927115056

   @pmpailis could you also push a `CHANGES.txt` update? It is would be under 
`New Features` for `Lucene 9.10.0`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump release to Java 21 [lucene]

2024-02-05 Thread via GitHub


mikemccand commented on PR #12753:
URL: https://github.com/apache/lucene/pull/12753#issuecomment-1927152977

   Is this ready to go?  Thank you for all the hard work here @ChrisHegarty and 
@rmuir!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


rmuir commented on code in PR #13076:
URL: https://github.com/apache/lucene/pull/13076#discussion_r1478382295


##
lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##
@@ -576,4 +578,114 @@ private int squareDistanceBody128(byte[] a, byte[] b, int 
limit) {
 // reduce
 return acc1.add(acc2).reduceLanes(ADD);
   }
+
+  private static final ByteVector lookup128 =
+  ByteVector.fromArray(
+  ByteVector.SPECIES_128,
+  new byte[] {
+0, 1, 1, 2, 1, 2, 2, 3,
+1, 2, 2, 3, 2, 3, 3, 4,
+  },
+  0);
+
+  private static final ByteVector lookup256 =
+  (ByteVector) lookup128.castShape(ByteVector.SPECIES_256, 0);
+  private static final ByteVector lookup512 =
+  (ByteVector) lookup128.castShape(ByteVector.SPECIES_512, 0);
+
+  @Override
+  public int binaryHammingDistance(byte[] a, byte[] b) {
+int res = 0;
+int i = 0;
+if (a.length >= 16) {
+  if (VECTOR_BITSIZE >= 512) {
+i += ByteVector.SPECIES_512.loopBound(a.length);
+res += binaryHammingDistanceBody512(a, b, i);
+  } else if (VECTOR_BITSIZE == 256) {
+i += ByteVector.SPECIES_256.loopBound(a.length);
+res += binaryHammingDistanceBody256(a, b, i);
+  } else {
+i += ByteVector.SPECIES_128.loopBound(a.length);
+res += binaryHammingDistanceBody128(a, b, i);
+  }
+}
+
+// scalar tail
+for (; i < a.length; i++) {
+  res += HAMMING_DISTANCE_LOOKUP_TABLE[(a[i] ^ b[i]) & 0xFF];
+}
+return res;
+  }
+
+  private int binaryHammingDistanceBody512(byte[] a, byte[] b, int limit) {
+int res = 0;
+for (int i = 0; i < limit; i += ByteVector.SPECIES_512.length()) {
+  ByteVector bva64 = ByteVector.fromArray(ByteVector.SPECIES_512, a, i);
+  ByteVector bvb64 = ByteVector.fromArray(ByteVector.SPECIES_512, b, i);
+  ByteVector xor64 = bva64.lanewise(XOR, bvb64);
+
+  ByteVector low_mask = ByteVector.broadcast(ByteVector.SPECIES_512, 0x0f);
+  ByteVector low_bits = xor64.and(low_mask);
+  ByteVector high_bits = xor64.lanewise(LSHR, 4).and(low_mask);
+
+  var popcnt1 = lookup512.rearrange(low_bits.toShuffle());
+  var popcnt2 = lookup512.rearrange(high_bits.toShuffle());
+
+  var total = popcnt1.add(popcnt2);
+
+  // Need to break up the total ByteVector as the result might not
+  // fit in a byte
+  var acc1 = total.castShape(ShortVector.SPECIES_512, 0);
+  var acc2 = total.castShape(ShortVector.SPECIES_512, 1);

Review Comment:
   Vector castShape() with part number > 0 really needs to be avoided. It is 
incredibly slow. Have you benchmarked non-mac machines with 256 or 512-bit 
vectors?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Compute multiple float aggregations in one go [lucene]

2024-02-05 Thread via GitHub


mikemccand commented on code in PR #12547:
URL: https://github.com/apache/lucene/pull/12547#discussion_r1478386039


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java:
##
@@ -37,33 +37,43 @@ abstract class FloatTaxonomyFacets extends TaxonomyFacets {
 
   // TODO: also use native hash map for sparse collection, like 
IntTaxonomyFacets
 
-  /** Aggregation function used for combining values. */
-  final AssociationAggregationFunction aggregationFunction;
+  /** Aggregation functions used for combining values. */
+  final List aggregationFunctions;
 
   /** Per-ordinal value. */
-  float[] values;
+  float[][] values;
+
+  @Override
+  boolean hasValues() {
+return values != null;
+  }
+
+  void initializeValueCounters() {
+if (values == null) {
+  values = new float[aggregationFunctions.size()][taxoReader.getSize()];

Review Comment:
   Maybe open a spinoff to implement sparse values for `FloatTaxonomyFacets`?  
But let's not block this otherwise great change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


rmuir commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927213508

   I'm confused about the use of lookup table. naively, i'd try to just xor + 
popcnt:
   
   
https://docs.oracle.com/en/java/javase/21/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorOperators.html#XOR
   
https://docs.oracle.com/en/java/javase/21/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorOperators.html#BIT_COUNT
   
   I'm curious if any explicit vector code is needed actually at all. 
Integer.bitCount() has autovectorization support in hotspot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


rmuir commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927228480

   even if it doesn't autovectorize, i suspect just gathering e.g. 4/8 bytes at 
a time with BitUtil varhandle and using single int/long xor + popcount would 
perform very well as a baseline.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Bump release to Java 21 [lucene]

2024-02-05 Thread via GitHub


rmuir commented on PR #12753:
URL: https://github.com/apache/lucene/pull/12753#issuecomment-1927294356

   @mikemccand I think it is heading in the right direction. There are a few 
more tasks to do here I think though, e.g. we still need to update 
`releaseWizard.py` and `smokeTestRelease.py`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make FSTCompiler.compile() to only return the FSTMetadata [lucene]

2024-02-05 Thread via GitHub


mikemccand commented on PR #12831:
URL: https://github.com/apache/lucene/pull/12831#issuecomment-1927303409

   This is technically an API break, but `FSTCompiler` is an experimental API 
and effectively an internal Lucene datastructure, so I think we can safely 
backport to 9.x without deprecation path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make FSTCompiler.compile() to only return the FSTMetadata [lucene]

2024-02-05 Thread via GitHub


mikemccand merged PR #12831:
URL: https://github.com/apache/lucene/pull/12831


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927329824

   Hi,
   I don't want to discuss about sense/nonsense of this disatance, but the 
implementation could been made very simple and then we may not even need to 
have a Panama Vector variant:
   
   - This is just `Long#bitCount(a ^ b)`
   - a, b should be longs to work effective, so the best would be to replace 
the loop by using 
https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/util/BitUtil.html#VH_LE_LONG
 and iterate using the varhandle. We have another PR already that provides 
native byte order VarHandles (see #888). As the bitcount of longs does not 
depend on byte order we can use native byte order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2024-02-05 Thread via GitHub


uschindler commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1927340132

   I forgot about this PR, we should really apply it. #13076 is another 
candidate that could make use of this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Forbidden Thread.sleep API [lucene]

2024-02-05 Thread via GitHub


shubhamvishu commented on PR #13001:
URL: https://github.com/apache/lucene/pull/13001#issuecomment-1927343001

   @uschindler I have removed the added file. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Forbidden Thread.sleep API [lucene]

2024-02-05 Thread via GitHub


uschindler merged PR #13001:
URL: https://github.com/apache/lucene/pull/13001


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Can we ban `Thread.sleep`? [lucene]

2024-02-05 Thread via GitHub


uschindler closed issue #12946: Can we ban `Thread.sleep`?
URL: https://github.com/apache/lucene/issues/12946


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Forbidden Thread.sleep API [lucene]

2024-02-05 Thread via GitHub


uschindler commented on PR #13001:
URL: https://github.com/apache/lucene/pull/13001#issuecomment-1927410583

   I moved the changes entry to Lucene 10.0, as to me it makes no sense to 
apply this to Lucene 9.x which is not used for active development. New code 
enters main first and I had trouble in merging this (and 9.x has many more 
sleeps).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches [lucene]

2024-02-05 Thread via GitHub


mikemccand commented on code in PR #12345:
URL: https://github.com/apache/lucene/pull/12345#discussion_r1478503121


##
lucene/core/src/java/org/apache/lucene/index/ExitableIndexReader.java:
##
@@ -0,0 +1,539 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.lucene.util.Bits;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.automaton.CompiledAutomaton;
+
+/**
+ * The {@link ExitableIndexReader} is used to timeout I/O operation which is 
done during query
+ * rewrite. After this time is exceeded, the search thread is stopped by 
throwing a {@link
+ * ExitableIndexReader.TimeExceededException}
+ */
+public final class ExitableIndexReader extends IndexReader {
+  private final IndexReader indexReader;
+  private final QueryTimeout queryTimeout;
+
+  /**
+   * Create a ExitableIndexReader wrapper over another {@link IndexReader} 
with a specified timeout.
+   *
+   * @param indexReader the wrapped {@link IndexReader}
+   * @param queryTimeout max time allowed for collecting hits after which 
{@link
+   * ExitableIndexReader.TimeExceededException} is thrown
+   */
+  public ExitableIndexReader(IndexReader indexReader, QueryTimeout 
queryTimeout)
+  throws IOException {
+this.indexReader = new ExitableIndexReaderWrapper((DirectoryReader) 
indexReader, queryTimeout);
+this.queryTimeout = queryTimeout;
+  }
+
+  /** Returns queryTimeout instance. */
+  public QueryTimeout getQueryTimeout() {
+return queryTimeout;
+  }
+
+  /** Thrown when elapsed search time exceeds allowed search time. */
+  @SuppressWarnings("serial")
+  static class TimeExceededException extends RuntimeException {
+private TimeExceededException() {
+  super("TimeLimit Exceeded");
+}
+
+private TimeExceededException(Exception e) {
+  super(e);
+}
+  }
+
+  @Override
+  public TermVectors termVectors() throws IOException {
+if (queryTimeout.shouldExit()) {
+  throw new ExitableIndexReader.TimeExceededException();
+}
+return indexReader.termVectors();
+  }
+
+  @Override
+  public int numDocs() {
+if (queryTimeout.shouldExit()) {
+  throw new ExitableIndexReader.TimeExceededException();
+}
+return indexReader.numDocs();
+  }
+
+  @Override
+  public int maxDoc() {
+if (queryTimeout.shouldExit()) {
+  throw new ExitableIndexReader.TimeExceededException();
+}
+return indexReader.maxDoc();
+  }
+
+  @Override
+  public StoredFields storedFields() throws IOException {
+if (queryTimeout.shouldExit()) {

Review Comment:
   Can we factor out a `private` helper method that does this `if` and the 
`throw`?  Just like `ExitableDirectoryReader.checkAndThrow`.



##
lucene/core/src/java/org/apache/lucene/index/ExitableIndexReader.java:
##
@@ -0,0 +1,539 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.index;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.lucene.util.Bits;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.automaton.CompiledAuto

Re: [PR] LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) [lucene]

2024-02-05 Thread via GitHub


uschindler merged PR #888:
URL: https://github.com/apache/lucene/pull/888


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927511826

   The native order PR was merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927626207

   Hi,
   I modified the scalar variant like that:
   ```java
 @Override
 public int binaryHammingDistance(byte[] a, byte[] b) {
   int distance = 0, i = 0;
   for (; i < a.length + 1 - Long.BYTES; i += Long.BYTES) {
 distance += Long.bitCount(((long) BitUtil.VH_NATIVE_LONG.get(a, i) ^ 
(long) BitUtil.VH_NATIVE_LONG.get(b, i)) & 0xL);
   }
   // int tail
   for (; i < a.length + 1 - Integer.BYTES; i += Integer.BYTES) {
 distance += Integer.bitCount(((int) BitUtil.VH_NATIVE_INT.get(a, i) ^ 
(int) BitUtil.VH_NATIVE_INT.get(b, i)) & 0x);
   }
   // byte tail
   for (; i < a.length; i++) {
 distance += Integer.bitCount((a[i] ^ b[i]) & 0xFF);
   }
   return distance;
 }
   ```
   
   This one only uses popcnt CPU instruction. I then ran your benchmakrk to 
compare the panama-vectorized one with my new implementation as above:
   
   ```
   Benchmark(size)   Mode  CntScore 
   Error   Units
   VectorUtilBenchmark.binaryHammingDistanceScalar   1  thrpt   15  258,511 
± 17,969  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 128  thrpt   15   62,364 
±  0,723  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 207  thrpt   15   40,302 
±  0,703  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 256  thrpt   15   42,025 
±  0,891  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 300  thrpt   15   35,065 
±  3,125  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 512  thrpt   15   24,391 
±  1,987  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar 702  thrpt   15   17,505 
±  0,684  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar1024  thrpt   15   13,806 
±  0,102  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector   1  thrpt   15  231,651 
±  9,975  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 128  thrpt   15   16,760 
±  0,251  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 207  thrpt   15   10,317 
±  0,251  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 256  thrpt   158,887 
±  0,559  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 300  thrpt   157,466 
±  0,345  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 512  thrpt   154,706 
±  0,080  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector 702  thrpt   153,062 
±  0,566  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector1024  thrpt   152,404 
±  0,024  ops/us
    
   
   Please not the SCALAR one here is the above variant. As this one four (!) 
times faster than the Panama variant, there is no need to have a 
panama-vectorized one and it looks like it is not working at all. I think the 
lookup table is a bad idea.
   
   To make it short:
   - Remove the impl and the table from VectorUtilSupport (scalar and 
vectorized)
   - Implement my above methoid directly in VectorUtil class
   
   This will be a short PR. Make sure to add more tests (there's only one for 
the VectorUtil method).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make date parsing more flexible for linedocsfile (europarl, enwiki) [lucene]

2024-02-05 Thread via GitHub


dweiss merged PR #13075:
URL: https://github.com/apache/lucene/pull/13075


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make date parsing more flexible for linedocsfile (europarl, enwiki) [lucene]

2024-02-05 Thread via GitHub


dweiss commented on PR #13075:
URL: https://github.com/apache/lucene/pull/13075#issuecomment-1927650998

   I'll merge this in so that we can avoid jenkins failures. If there has to be 
a follow-up, I'll open another issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] java.lang.AssertionError in backward compat tests ("failed to parse ... as date") [lucene]

2024-02-05 Thread via GitHub


dweiss closed issue #13073: java.lang.AssertionError in backward compat tests 
("failed to parse ... as date")
URL: https://github.com/apache/lucene/issues/13073


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Throw CorruptSegmentInfoException on encountering missing segment info (_N.si) file in CheckIndex [lucene]

2024-02-05 Thread via GitHub


mikemccand commented on code in PR #12872:
URL: https://github.com/apache/lucene/pull/12872#discussion_r1478678480


##
lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java:
##
@@ -389,13 +386,25 @@ private static void parseSegmentInfos(
 }
 
 long totalDocs = 0;
+SegmentInfo info;
 for (int seg = 0; seg < numSegments; seg++) {
   String segName = input.readString();
   byte[] segmentID = new byte[StringHelper.ID_LENGTH];
   input.readBytes(segmentID, 0, segmentID.length);
   Codec codec = readCodec(input);
-  SegmentInfo info =
-  codec.segmentInfoFormat().read(directory, segName, segmentID, 
IOContext.READ);
+  try {
+info = codec.segmentInfoFormat().read(directory, segName, segmentID, 
IOContext.READ);
+  } catch (ThreadInterruptedException e) {
+throw e;
+  } catch (Exception e) {

Review Comment:
   Thanks @gokaai!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927678421

   I am not sure if we really need the Integer tail. Mabye only implement the 
Long variant and the tail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Throw CorruptSegmentInfoException on encountering missing segment info (_N.si) file in CheckIndex [lucene]

2024-02-05 Thread via GitHub


mikemccand commented on PR #12872:
URL: https://github.com/apache/lucene/pull/12872#issuecomment-1927677782

   Thanks @gokaai -- I'll try to review soon!
   
   If possible please try not to force-push: it removes the history of the past 
commits and makes it harder to see what changed on this iteration :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


rmuir commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927739974

   Seems to autovectorize just fine, i took uwe's branch and dumped assembly on 
my AVX2 machine and see e.g. 256-bit xor and population count logic. I checked 
the logic in openjdk and it will use `vpopcntdq` on AVX-512 if available, etc. 
So this solution is much better than some explicit vector stuff because it will 
do the right thing depending on CPU.
   
   ```
   ...
  0.35%0x7fffe0141fa3:   vmovdqu 0x10(%rax,%r8,1),%ymm9
   0x7fffe0141faa:   vpxor  0x10(%rdx,%r8,1),%ymm9,%ymm9
  0.04%0x7fffe0141fb1:   movabs $0xf0f0f0f,%r8
  0.35%0x7fffe0141fbb:   vmovq  %r8,%xmm10
   0x7fffe0141fc0:   vpbroadcastd %xmm10,%ymm10
   0x7fffe0141fc5:   vpsrlw $0x4,%ymm9,%ymm11
   0x7fffe0141fcb:   vpand  %ymm10,%ymm11,%ymm11
  0.40%0x7fffe0141fd0:   vpand  %ymm10,%ymm9,%ymm10
   0x7fffe0141fd5:   vmovdqu -0x59829d(%rip),%ymm12 
   # Stub::popcount_lut

 ;   {external_word}
   0x7fffe0141fdd:   vpshufb %ymm10,%ymm12,%ymm10
  0.02%0x7fffe0141fe2:   vpshufb %ymm11,%ymm12,%ymm11
  0.48%0x7fffe0141fe7:   vpaddb %ymm10,%ymm11,%ymm11
   0x7fffe0141fec:   vpxor  %ymm12,%ymm12,%ymm12
   0x7fffe0141ff1:   vpsadbw %ymm12,%ymm11,%ymm10
  0.07%0x7fffe0141ff6:   vpermilps $0x8,%ymm10,%ymm9
  0.35%0x7fffe0141ffc:   vpermpd $0x8,%ymm9,%ymm9
   0x7fffe0142002:   vpaddd %xmm9,%xmm1,%xmm1 
   ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] LUCENE-10393: Unify binary dictionary and dictionary writer in kuromoji and nori [lucene]

2024-02-05 Thread via GitHub


mikemccand commented on PR #740:
URL: https://github.com/apache/lucene/pull/740#issuecomment-1927759464

   @mocobeta just checking: it looks like this was never backported to 9.x (I 
hit unexpected merge conflicts while backporting an FST change) -- was that 
intentional?  Were there API breaks that prevented backport maybe?  Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927790053

   I removed the integer tail and have see no difference (especially looked 
also at the non-aligned sizes):
   
   ```java
 @Override
 public int binaryHammingDistance(byte[] a, byte[] b) {
   int distance = 0, i = 0;
   for (int upperBound = a.length & ~(Long.BYTES - 1); i < upperBound; i += 
Long.BYTES) {
 distance += Long.bitCount(((long) BitUtil.VH_NATIVE_LONG.get(a, i) ^ 
(long) BitUtil.VH_NATIVE_LONG.get(b, i)) & 0xL);
   }
   // tail:
   for (; i < a.length; i++) {
 distance += Integer.bitCount((a[i] ^ b[i]) & 0xFF);
   }
   return distance;
 }
   ```
   
   I think this code is simpliest and most effective. It may not be the best 
for vector `(length % 8) == 5...7`, but I think we can live with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927824571

   Here's my branch: 
https://github.com/apache/lucene/compare/main...uschindler:lucene:binary_hamming_distance
   
   I can merge this into this branch, but the code cleanup and removal of 
useless vectorization and those (public) lookup tables needs to be done 
after the merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] LUCENE-10393: Unify binary dictionary and dictionary writer in kuromoji and nori [lucene]

2024-02-05 Thread via GitHub


uschindler commented on PR #740:
URL: https://github.com/apache/lucene/pull/740#issuecomment-1927857893

   Yes this was intentional. It breaks API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


rmuir commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927946636

   Thanks @uschindler , this is the way to go: compiler does a good job. java 
already has all the necessary logic here to autovectorize and use e.g. 
`vpopcntdq` or AVX2 lookup-table counting algorithm depending on the cpu 
features detected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927995019

   I figured that the `& 0x` is useless. You only need it when widening 
into int. Will update my branch and paste code here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding binary Hamming distance as similarity option for byte vectors [lucene]

2024-02-05 Thread via GitHub


uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1928027541

   This my final code:
   
   ```java
 @Override
 public int binaryHammingDistance(byte[] a, byte[] b) {
   int distance = 0, i = 0;
   for (final int upperBound = a.length & ~(Long.BYTES - 1); i < 
upperBound; i += Long.BYTES) {
 distance += Long.bitCount((long) BitUtil.VH_NATIVE_LONG.get(a, i) ^ 
(long) BitUtil.VH_NATIVE_LONG.get(b, i));
   }
   // tail:
   for (; i < a.length; i++) {
 distance += Integer.bitCount((a[i] ^ b[i]) & 0xFF);
   }
   return distance;
 }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Add getter for SynonymQuery#field [lucene]

2024-02-05 Thread via GitHub


AndreyBozhko opened a new pull request, #13077:
URL: https://github.com/apache/lucene/pull/13077

   ### Description
   
   
   Since all the query terms must have the same field, the value is exposed 
anyway via
   ```java
   synonymQuery.getTerms().get(0).field()
   ```
   but it's cleaner if one could instead do
   ```java
   synonymQuery.getField()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Backport SOLR-14765 to branch_8_11 [lucene-solr]

2024-02-05 Thread via GitHub


risdenk commented on PR #2682:
URL: https://github.com/apache/lucene-solr/pull/2682#issuecomment-1928651515

   Sorry for delayed response @HoustonPutman no need to wait for me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make FSTCompiler.compile() to only return the FSTMetadata [lucene]

2024-02-05 Thread via GitHub


dungba88 commented on PR #12831:
URL: https://github.com/apache/lucene/pull/12831#issuecomment-1928655191

   Thank you for merging @mikemccand !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add getter for SynonymQuery#field [lucene]

2024-02-05 Thread via GitHub


AndreyBozhko commented on PR #13077:
URL: https://github.com/apache/lucene/pull/13077#issuecomment-1928773466

   Thanks for the review @dungba88 - I added the javadoc as well (tried to 
match the style of other javadocs in the file).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] No static method version() crash in android project [lucene]

2024-02-05 Thread via GitHub


bjhexn opened a new issue, #13078:
URL: https://github.com/apache/lucene/issues/13078

   ### Description
   
   
   private fun createIndex(): IndexWriter {
   val path = Path(getDir("lunece", 0).path)
   val fsDirectory = FSDirectory.open(path) 
   val analyzer = StandardAnalyzer()
   val config = IndexWriterConfig(analyzer)
return IndexWriter(fsDirectory, config)
   }
   
   
java.lang.NoSuchMethodError: No static method 
version()Ljava/lang/Runtime$Version; in class Ljava/lang/Runtime; or its super 
classes (declaration of 'java.lang.Runtime' appears in 
/apex/com.android.art/javalib/core-oj.jar)

at 
org.apache.lucene.util.Constants.(Constants.java:40)

at 
org.apache.lucene.store.FSDirectory.open(FSDirectory.java:161)

at 
org.apache.lucene.store.FSDirectory.open(FSDirectory.java:156)

at 
com.myapp.newest.MainActivity.createIndex(MainActivity.kt:46)
   
   
   ### Version and environment details
   
   version:
implementation("org.apache.lucene:lucene-core:9.9.2")
implementation("org.apache.lucene:lucene-queryparser:9.9.2")
implementation("org.apache.lucene:lucene-analysis-common:9.9.2") 
   environment:
Android project
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] No static method version() crash in android project [lucene]

2024-02-05 Thread via GitHub


bjhexn commented on issue #13078:
URL: https://github.com/apache/lucene/issues/13078#issuecomment-1928929465

   Android gradle
   
   compileOptions {
   sourceCompatibility = JavaVersion.VERSION_17
   targetCompatibility = JavaVersion.VERSION_17
   }
   kotlinOptions {
   jvmTarget = "17"
   }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] [Suggestion] Short circuit check for queued flushes in preUpdate() when checkPendingFlushOnUpdate is disabled [lucene]

2024-02-05 Thread via GitHub


CaptainDredge opened a new issue, #13079:
URL: https://github.com/apache/lucene/issues/13079

   ### Description
   
   `numQueuedFlushes()` is a blocking function which gets called in 
`DocumentWriter#preUpdate()` to check if there are any queued flushes. If 
`checkPendingFlushOnUpdate` is disabled i.e. its false then the overall 
condition `flushControl.numQueuedFlushes() > 0 && 
config.checkPendingFlushOnUpdate` could be short circuited if we reverse the 
check to `config.checkPendingFlushOnUpdate && flushControl.numQueuedFlushes() > 
0`. This will prevent a blocking call.
   Code pointer: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DocumentsWriter.java#L387
   
   
   Currently while benchmarking I can see this particular issue significantly 
contributing to locking of write threads which is increasing indexing latency.
   
   https://github.com/apache/lucene/assets/20185657/05e6a52d-0811-443b-8637-43e47c67dc6c";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] [Suggestion] Short circuit check for queued flushes in preUpdate() when checkPendingFlushOnUpdate is disabled [lucene]

2024-02-05 Thread via GitHub


CaptainDredge commented on issue #13079:
URL: https://github.com/apache/lucene/issues/13079#issuecomment-1928949795

   cc: @mgodwan, @backslasht


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org