[PR] Do not use mock merge policy for TestFuzzyQuery#testFuzziness [lucene]

2024-02-04 Thread via GitHub


easyice opened a new pull request, #13070:
URL: https://github.com/apache/lucene/pull/13070

   git bisect shows this commit as the perpetrator: 
f7cab1645017d863331b42900581b67d3591e2da
   
   ```
  > org.junit.ComparisonFailure: expected: but was:
  > at 
__randomizedtesting.SeedInfo.seed([7DF2C3FF35FEFFC6:36C7D4343E606C7C]:0)
  > at org.junit.Assert.assertEquals(Assert.java:117)
  > at org.junit.Assert.assertEquals(Assert.java:146)
  > at 
org.apache.lucene.search.TestFuzzyQuery.testFuzziness(TestFuzzyQuery.java:156)
   ```
   
   Reproduce command:
   ```
   ./gradlew test --tests TestFuzzyQuery.testFuzziness 
-Dtests.seed=7DF2C3FF35FEFFC6 -Dtests.nightly=true -Dtests.locale=ru-KG 
-Dtests.timezone=America/Virgin -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use growNoCopy in some places [lucene]

2024-02-04 Thread via GitHub


easyice commented on PR #12951:
URL: https://github.com/apache/lucene/pull/12951#issuecomment-1925637932

   @epotyom @dweiss Hi, can you help me to merge if it looks okay?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Reproducible failure in TestParentBlockJoinFloatKnnVectorQuery.testSkewedIndex [lucene]

2024-02-04 Thread via GitHub


easyice opened a new issue, #13071:
URL: https://github.com/apache/lucene/issues/13071

   ### Description
   
   ```
  > org.junit.ComparisonFailure: expected:<[6]> but was:<[14]>
  > at 
__randomizedtesting.SeedInfo.seed([3332C40C31EA01FF:3CFF33AC05C3913B]:0)
  > at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:117)
  > at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:146)
  > at 
org.apache.lucene.search.join.ParentBlockJoinKnnVectorQueryTestCase.assertIdMatches(ParentBlockJoinKnnVectorQueryTestCase.java:325)
  > at 
org.apache.lucene.search.join.ParentBlockJoinKnnVectorQueryTestCase.testSkewedIndex(ParentBlockJoinKnnVectorQueryTestCase.java:277)
  > at 
org.apache.lucene.search.join.TestParentBlockJoinFloatKnnVectorQuery.testSkewedIndex(TestParentBlockJoinFloatKnnVectorQuery.java:37)
   ```
   
   ### Gradle command to reproduce
   
   ```
   ./gradlew test --tests 
TestParentBlockJoinFloatKnnVectorQuery.testSkewedIndex 
-Dtests.seed=3332C40C31EA01FF -Dtests.nightly=true -Dtests.locale=fr-MU 
-Dtests.timezone=Asia/Choibalsan -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Contributing a deep-learning, BERT-based analyzer [lucene]

2024-02-04 Thread via GitHub


lmessinger commented on issue #13065:
URL: https://github.com/apache/lucene/issues/13065#issuecomment-1925739977

   I mean, create just the tokens - the lemmas / wordpieces


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use growNoCopy in some places [lucene]

2024-02-04 Thread via GitHub


dweiss merged PR #12951:
URL: https://github.com/apache/lucene/pull/12951


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use growNoCopy in some places [lucene]

2024-02-04 Thread via GitHub


dweiss commented on PR #12951:
URL: https://github.com/apache/lucene/pull/12951#issuecomment-1925823962

   I've backported to branch_9x as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Move `brToString(BytesRef)` to `ToStringUtils` [lucene]

2024-02-04 Thread via GitHub


uschindler commented on code in PR #13068:
URL: https://github.com/apache/lucene/pull/13068#discussion_r1477431195


##
lucene/core/src/java/org/apache/lucene/util/ToStringUtils.java:
##
@@ -32,11 +32,37 @@ public static void byteArray(StringBuilder buffer, byte[] 
bytes) {
 
   private static final char[] HEX = "0123456789abcdef".toCharArray();
 
+  /**
+   * Unlike {@link Long#toHexString(long)} returns a String with a "0x" prefix 
and all the leading
+   * zeros.
+   */
   public static String longHex(long x) {
 char[] asHex = new char[16];
 for (int i = 16; --i >= 0; x >>>= 4) {
   asHex[i] = HEX[(int) x & 0x0F];
 }
 return "0x" + new String(asHex);
   }
+
+  @SuppressWarnings("unused")
+  public static String brToString(BytesRef b) {
+if (b == null) {
+  return "null";
+}
+try {
+  return b.utf8ToString() + " " + b;
+} catch (Throwable t) {

Review Comment:
   This method should not catch Throwable, can we not make a multi-catch out of 
it and explicitely:
   
   - `AssertionFailedError` (if we hit an Assertion)
   - `RuntimeException` (anything is wrong and we have wrong offsets or bytes 
array is too short or incomplete surrogates



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Move `brToString(BytesRef)` to `ToStringUtils` [lucene]

2024-02-04 Thread via GitHub


sabi0 commented on code in PR #13068:
URL: https://github.com/apache/lucene/pull/13068#discussion_r1477434373


##
lucene/core/src/java/org/apache/lucene/util/ToStringUtils.java:
##
@@ -32,11 +32,37 @@ public static void byteArray(StringBuilder buffer, byte[] 
bytes) {
 
   private static final char[] HEX = "0123456789abcdef".toCharArray();
 
+  /**
+   * Unlike {@link Long#toHexString(long)} returns a String with a "0x" prefix 
and all the leading
+   * zeros.
+   */
   public static String longHex(long x) {
 char[] asHex = new char[16];
 for (int i = 16; --i >= 0; x >>>= 4) {
   asHex[i] = HEX[(int) x & 0x0F];
 }
 return "0x" + new String(asHex);
   }
+
+  @SuppressWarnings("unused")
+  public static String brToString(BytesRef b) {
+if (b == null) {
+  return "null";
+}
+try {
+  return b.utf8ToString() + " " + b;
+} catch (Throwable t) {

Review Comment:
   Nice catch. Thank you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Optimize counts on 2-clauses disjunctions [lucene]

2024-02-04 Thread via GitHub


jpountz closed issue #12644: Optimize counts on 2-clauses disjunctions
URL: https://github.com/apache/lucene/issues/12644


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Optimize counts on 2-clauses disjunctions [lucene]

2024-02-04 Thread via GitHub


jpountz commented on issue #12644:
URL: https://github.com/apache/lucene/issues/12644#issuecomment-1926402796

   Adressed via https://github.com/apache/lucene/pull/13036.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org