[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

2022-09-26 Thread GitBox


uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1257975327

   OpenJDK 19 was released by Eclipse Temurin: 
https://adoptium.net/de/temurin/releases?version=19
   
   Thanks @gdams and @adoptium.
   
   @msokolov: I will merge this to main, 9.x and 9.4 after adding changes.txt. 
When building the release you may need to remove your gradle.properties file in 
Lucene's directory. I will post a message.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

2022-09-26 Thread GitBox


rmuir commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r979984394


##
gradle/generation/local-settings.gradle:
##
@@ -98,12 +98,12 @@ org.gradle.workers.max=${maxWorkers}
 # Maximum number of test JVMs forked per test task.
 tests.jvms=${testsJvms}
 
-# Disable auto JVM provisioning (we don't use toolchains yet but want no 
surprises).
-org.gradle.java.installations.auto-download=false
+# Enable auto JVM provisioning.
+org.gradle.java.installations.auto-download=true
 
 # Set these to enable automatic JVM location discovery.
-org.gradle.java.installations.fromEnv=JDK11,JDK12,JDK13,JDK14,JDK15,JDK16,JDK17
-org.gradle.java.installations.paths=(custom paths)
+org.gradle.java.installations.fromEnv=JDK17,JDK19,JAVA19_HOME

Review Comment:
   can we give these a consistent name, e.g. `JAVA17_HOME,JAVA19_HOME`? I 
assume I can set these so that gradle won't download stuff, i'd prefer to use 
my package manager.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

2022-09-26 Thread GitBox


uschindler commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r979984774


##
gradle/generation/local-settings.gradle:
##
@@ -98,12 +98,12 @@ org.gradle.workers.max=${maxWorkers}
 # Maximum number of test JVMs forked per test task.
 tests.jvms=${testsJvms}
 
-# Disable auto JVM provisioning (we don't use toolchains yet but want no 
surprises).
-org.gradle.java.installations.auto-download=false
+# Enable auto JVM provisioning.
+org.gradle.java.installations.auto-download=true
 
 # Set these to enable automatic JVM location discovery.
-org.gradle.java.installations.fromEnv=JDK11,JDK12,JDK13,JDK14,JDK15,JDK16,JDK17
-org.gradle.java.installations.paths=(custom paths)
+org.gradle.java.installations.fromEnv=JDK17,JDK19,JAVA19_HOME

Review Comment:
   yes, I wanted to change this, too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

2022-09-26 Thread GitBox


uschindler commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r979985242


##
gradle/generation/local-settings.gradle:
##
@@ -98,12 +98,12 @@ org.gradle.workers.max=${maxWorkers}
 # Maximum number of test JVMs forked per test task.
 tests.jvms=${testsJvms}
 
-# Disable auto JVM provisioning (we don't use toolchains yet but want no 
surprises).
-org.gradle.java.installations.auto-download=false
+# Enable auto JVM provisioning.
+org.gradle.java.installations.auto-download=true
 
 # Set these to enable automatic JVM location discovery.
-org.gradle.java.installations.fromEnv=JDK11,JDK12,JDK13,JDK14,JDK15,JDK16,JDK17
-org.gradle.java.installations.paths=(custom paths)
+org.gradle.java.installations.fromEnv=JDK17,JDK19,JAVA19_HOME

Review Comment:
   Exactly that what you can do! When you set those environments, Gradle won't 
download anything.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

2022-09-26 Thread GitBox


uschindler commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r979990105


##
gradle/generation/local-settings.gradle:
##
@@ -98,12 +98,12 @@ org.gradle.workers.max=${maxWorkers}
 # Maximum number of test JVMs forked per test task.
 tests.jvms=${testsJvms}
 
-# Disable auto JVM provisioning (we don't use toolchains yet but want no 
surprises).
-org.gradle.java.installations.auto-download=false
+# Enable auto JVM provisioning.
+org.gradle.java.installations.auto-download=true
 
 # Set these to enable automatic JVM location discovery.
-org.gradle.java.installations.fromEnv=JDK11,JDK12,JDK13,JDK14,JDK15,JDK16,JDK17
-org.gradle.java.installations.paths=(custom paths)
+org.gradle.java.installations.fromEnv=JDK17,JDK19,JAVA19_HOME

Review Comment:
   Done. BTW, Policeman Jenkins was doing exactly this before the 
auto-provisioning worked. The JAVA19_HOME env var is currently the only one 
used. JAVA17_HOME was just added for consistency.
   
   Could you still test this. It will automatically download like any other JAR 
file from Maven Central. To get rid of it and for testing do `rm -rf 
~/.gradle/jdks/*`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

2022-09-26 Thread GitBox


uschindler commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r979991502


##
gradle/generation/local-settings.gradle:
##
@@ -98,12 +98,12 @@ org.gradle.workers.max=${maxWorkers}
 # Maximum number of test JVMs forked per test task.
 tests.jvms=${testsJvms}
 
-# Disable auto JVM provisioning (we don't use toolchains yet but want no 
surprises).
-org.gradle.java.installations.auto-download=false
+# Enable auto JVM provisioning.
+org.gradle.java.installations.auto-download=true
 
 # Set these to enable automatic JVM location discovery.
-org.gradle.java.installations.fromEnv=JDK11,JDK12,JDK13,JDK14,JDK15,JDK16,JDK17
-org.gradle.java.installations.paths=(custom paths)
+org.gradle.java.installations.fromEnv=JDK17,JDK19,JAVA19_HOME

Review Comment:
   FYI, to actually run all tests, you still need to set RUNTIME_JAVA_HOME as 
before. The autoprovisioned JDK is only used for compiling the code and 
packaging the MR-JAR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #11818: Fix codec name in index header for Lucene94FieldInfosFormat.

2022-09-26 Thread GitBox


jpountz merged PR #11818:
URL: https://github.com/apache/lucene/pull/11818


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

2022-09-26 Thread GitBox


uschindler commented on code in PR #912:
URL: https://github.com/apache/lucene/pull/912#discussion_r979990105


##
gradle/generation/local-settings.gradle:
##
@@ -98,12 +98,12 @@ org.gradle.workers.max=${maxWorkers}
 # Maximum number of test JVMs forked per test task.
 tests.jvms=${testsJvms}
 
-# Disable auto JVM provisioning (we don't use toolchains yet but want no 
surprises).
-org.gradle.java.installations.auto-download=false
+# Enable auto JVM provisioning.
+org.gradle.java.installations.auto-download=true
 
 # Set these to enable automatic JVM location discovery.
-org.gradle.java.installations.fromEnv=JDK11,JDK12,JDK13,JDK14,JDK15,JDK16,JDK17
-org.gradle.java.installations.paths=(custom paths)
+org.gradle.java.installations.fromEnv=JDK17,JDK19,JAVA19_HOME

Review Comment:
   Done. BTW, Policeman Jenkins was doing exactly this before the 
auto-provisioning worked. The JAVA19_HOME env var is currently the only one 
used. JAVA17_HOME was just added for consistency.
   
   Could you still test this on your Mac? Github jobs already did this and all 
worked. It will automatically download like any other JAR file from Maven 
Central. To get rid of it and for testing do `rm -rf ~/.gradle/jdks/*`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vsop-479 commented on pull request #11722: Binary search the entries when all suffixes have the same length in a leaf block.

2022-09-26 Thread GitBox


vsop-479 commented on PR #11722:
URL: https://github.com/apache/lucene/pull/11722#issuecomment-1258003910

   @jpountz 
   I added a test to BasePostingsFormatTestCase. Please have a review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler merged pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

2022-09-26 Thread GitBox


uschindler merged PR #912:
URL: https://github.com/apache/lucene/pull/912


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] yugushihuang closed pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part

2022-09-26 Thread GitBox


yugushihuang closed pull request #1057: LUCENE-10670: Add a codec class to 
track merge time of each index part
URL: https://github.com/apache/lucene/pull/1057


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] yugushihuang commented on pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part

2022-09-26 Thread GitBox


yugushihuang commented on PR #1057:
URL: https://github.com/apache/lucene/pull/1057#issuecomment-1258240704

   @rmuir Thanks for pointing this out. I will close this PR and investigate 
other way. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler opened a new issue, #11819: Fix Java 19 MR-JAR compilation in IDEs

2022-09-26 Thread GitBox


uschindler opened a new issue, #11819:
URL: https://github.com/apache/lucene/issues/11819

   ### Description
   
   After merge of #912 when creting the Eclipse workspace files, it also adds 
the Java 19 part, which fails to compile with JDK 11 or 17. I think the same 
will be happing with Intellij and others.
   
   For exclipse we can exclude the source set, but this would mean you can't 
edit the source files with syntax highlighting and compilation checks (but you 
still see them). For me this would be fine, when I need to touch the Java 19 
imple, I can read the source folder to build path, so maybe excluding is the 
best option. On the other hand, the failing compilation does not hur 
functionality of Eclipse. If you don't like it just right-click on the 
"core/java19" folder and select "Build path -> Remove from build path".
   
   I have no idea about Intellij - @dweiss ?
   
   What do other think? Just hide the source sets for Java 19 (and the ones for 
20 and 21 coming later?).
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new issue, #11820: smoketester needs to run 'gradlew localSettings' first

2022-09-26 Thread GitBox


rmuir opened a new issue, #11820:
URL: https://github.com/apache/lucene/issues/11820

   ### Description
   
   smoketester runs gradle which generates a `gradle.properties` but it isn't 
picked up, which causes it to fail. I think if we just ran `gradlew 
localSettings` first, then it would actually use the `gradle.properties` that 
it generates?
   
   ```
   > Task :localSettings
   
   IMPORTANT. This is the first time you ran the build. I wrote some sane 
defaults (for this machine) to 'gradle.properties', they will be picked up on 
consecutive gradle invocations (not this one).
   
   Run gradlew :helpLocalSettings for more information.
   
   > Task :missing-doclet:compileJava
   > Task :missing-doclet:processResources NO-SOURCE
   > Task :missing-doclet:classes
   > Task :missing-doclet:jar
   > Task :gitStatus
   
   > Task :errorProneSkipped
   WARNING: errorprone disabled (skipped on builds not running inside CI 
environments, pass -Pvalidation.errorprone=true to enable)
   
   > Task :lucene:core:compileJava
   
   > Task :lucene:core:processResources
   > Task :lucene:core:classes
   > Task :lucene:core:compileMain19Java FAILED
   ```
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller merged pull request #11804: FacetsCollector#collect is no longer final to allow extension

2022-09-26 Thread GitBox


gsmiller merged PR #11804:
URL: https://github.com/apache/lucene/pull/11804


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #11820: smoketester needs to run 'gradlew localSettings' first

2022-09-26 Thread GitBox


rmuir commented on issue #11820:
URL: https://github.com/apache/lucene/issues/11820#issuecomment-1258366985

   To be clear, i set `JAVA19_HOME` env var, but gradle doesn't pick it up 
because it hasn't "pulled in" the `gradle.properties`, then it fails because it 
can't find java 19. I have `org.gradle.java.installations.auto-download=false` 
set in my `~/.gradle/gradle.properties` because i'm not allowing gradle to 
download JVMs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #11820: smoketester needs to run 'gradlew localSettings' first

2022-09-26 Thread GitBox


uschindler commented on issue #11820:
URL: https://github.com/apache/lucene/issues/11820#issuecomment-1258367644

   Yes, on Github CI jobs we do exactly this: 
https://github.com/apache/lucene/blob/ac12cd9f176bf50bc80f1d7528f4c35d5ad22adf/.github/workflows/distribution.yml#L37-L44
   
   I agree this should be fixed, but that's not a showstopper. With default 
settings of gradle all works, it just breaks if auto-provisioning was disabled 
globally like on your computer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #11820: smoketester needs to run 'gradlew localSettings' first

2022-09-26 Thread GitBox


uschindler commented on issue #11820:
URL: https://github.com/apache/lucene/issues/11820#issuecomment-1258369276

   Let's just add this additional exec before it executes `gradlew` for the 
first time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #11820: smoketester needs to run 'gradlew localSettings' first

2022-09-26 Thread GitBox


uschindler commented on issue #11820:
URL: https://github.com/apache/lucene/issues/11820#issuecomment-1258377651

   Something like this would be fine:
   ```python
   validateCmd = './gradlew --no-daemon localSettings'
   print('run "%s"' % validateCmd)
   java.run_java17(validateCmd, '%s/validate.log' % unpackPath)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #11820: smoketester needs to run 'gradlew localSettings' first

2022-09-26 Thread GitBox


uschindler commented on issue #11820:
URL: https://github.com/apache/lucene/issues/11820#issuecomment-1258378657

   Anyways we should fix gradlew somehow to make sure the settings file is 
there and restart itsself. Maybe it should also check for updates of settings. 
Because once you wrote the gradle.properties file it won't be updated anymore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller opened a new pull request, #11821: FacetsCollector#collect is no longer final to allow extension (#11804)

2022-09-26 Thread GitBox


gsmiller opened a new pull request, #11821:
URL: https://github.com/apache/lucene/pull/11821

   Just backporting. No PR requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller merged pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

2022-09-26 Thread GitBox


gsmiller merged PR #11738:
URL: https://github.com/apache/lucene/pull/11738


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stefanvodita commented on a diff in pull request #11815: Support deletions in rearrange (#11814)

2022-09-26 Thread GitBox


stefanvodita commented on code in PR #11815:
URL: https://github.com/apache/lucene/pull/11815#discussion_r980472199


##
lucene/misc/src/java/org/apache/lucene/misc/index/IndexRearranger.java:
##
@@ -175,5 +202,7 @@ public CacheHelper getReaderCacheHelper() {
   /** Select document within a CodecReader */
   public interface DocumentSelector {
 BitSet getFilteredLiveDocs(CodecReader reader) throws IOException;
+
+boolean isDeleted(LeafReader reader, int idx) throws IOException;

Review Comment:
   1. Agreed, that can be confusing, but better naming and documenting can 
alleviate the problem. Alternatively, we could define a separate interface to 
act on the rearranged index and mark deletes. What do you think?
   2. That’s true, whoever implements a document selector would have to 
identify deletes. I think you’re right - there’s no good reason to handle 
deletions as we’re rearranging. It can be a separate step, after the rearranged 
index has been produced.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stefanvodita commented on pull request #11815: Support deletions in rearrange (#11814)

2022-09-26 Thread GitBox


stefanvodita commented on PR #11815:
URL: https://github.com/apache/lucene/pull/11815#issuecomment-1258596125

   Thank you for the review @zhaih ! Your feedback is much appreciated. I left 
a detailed response to your comment on the code. Let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stefanvodita commented on pull request #11780: GH#11601: Add ability to compute reader states after refresh

2022-09-26 Thread GitBox


stefanvodita commented on PR #11780:
URL: https://github.com/apache/lucene/pull/11780#issuecomment-1258609162

   > How is it different from making a subclass/alternative of ReaderManager 
and just put the state calculation inside `refreshIfNeed`?
   
   I think there’s two things that would be different. Please correct me if I’m 
wrong.
   1. If we simply override `refreshIfNeeded`, we don’t have a way to return 
the new reader states. We would still need to create some new method that does 
the refresh and returns the reader states.
   2. Doing this in the `ReferenceManager` makes it work for both 
`ReaderManager` and `SearcherManager` with the right `StateCalculator` for 
each. I don’t know how valuable that is though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on a diff in pull request #11803: DrillSideways optimizations

2022-09-26 Thread GitBox


zhaih commented on code in PR #11803:
URL: https://github.com/apache/lucene/pull/11803#discussion_r980469938


##
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java:
##
@@ -149,63 +162,147 @@ public int score(LeafCollector collector, Bits 
acceptDocs, int min, int maxDoc)
*/
   private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, 
DocsAndCost[] dims)
   throws IOException {
-// if (DEBUG) {
-//  System.out.println("  doQueryFirstScoring");
-// }
 setScorer(collector, ScoreCachingWrappingScorer.wrap(baseScorer));
 
-int docID = baseScorer.docID();
+List allDims = Arrays.asList(dims);
+CollectionUtil.timSort(allDims, Comparator.comparingLong(e -> 
e.approximation.cost()));
+
+List twoPhaseDims = null;
+for (DocsAndCost dim : dims) {
+  if (dim.twoPhase != null) {
+if (twoPhaseDims == null) {
+  twoPhaseDims = new ArrayList<>(dims.length);
+}
+twoPhaseDims.add(dim);
+  }
+}
+if (twoPhaseDims != null) {
+  CollectionUtil.timSort(
+  twoPhaseDims,
+  new Comparator() {
+@Override
+public int compare(DocsAndCost o1, DocsAndCost o2) {
+  return Float.compare(o1.twoPhase.matchCost(), 
o2.twoPhase.matchCost());
+}
+  });
+}
+
+int docID = baseApproximation.docID();
 
 nextDoc:
 while (docID != PostingsEnum.NO_MORE_DOCS) {
+  assert docID == baseApproximation.docID();
+
   if (acceptDocs != null && acceptDocs.get(docID) == false) {
-docID = baseIterator.nextDoc();
+docID = baseApproximation.nextDoc();
 continue;
   }
-  LeafCollector failedCollector = null;
-  for (DocsAndCost dim : dims) {
-// TODO: should we sort this 2nd dimension of
-// docsEnums from most frequent to least?
+
+  DocsAndCost failedDim = null;
+  for (DocsAndCost dim : allDims) {
+final int dimDocID;
 if (dim.approximation.docID() < docID) {
-  dim.approximation.advance(docID);
+  dimDocID = dim.approximation.advance(docID);
+} else {
+  dimDocID = dim.approximation.docID();
 }
-
-boolean matches = false;
-if (dim.approximation.docID() == docID) {
-  if (dim.twoPhase == null) {
-matches = true;
+if (dimDocID != docID) {
+  if (failedDim != null) {
+int next = Math.min(dimDocID, failedDim.approximation.docID());
+docID = baseApproximation.advance(next);
+continue nextDoc;
   } else {
-matches = dim.twoPhase.matches();
+failedDim = dim;
   }
 }
+  }
 
-if (matches == false) {
-  if (failedCollector != null) {
-// More than one dim fails on this document, so
-// it's neither a hit nor a near-miss; move to
-// next doc:
-docID = baseIterator.nextDoc();
-continue nextDoc;
-  } else {
-failedCollector = dim.sidewaysLeafCollector;
+  if (baseTwoPhase != null && baseTwoPhase.matches() == false) {
+docID = baseApproximation.nextDoc();
+continue;
+  }
+
+  if (twoPhaseDims != null) {
+if (failedDim == null) {
+  for (DocsAndCost dim : twoPhaseDims) {
+assert dim.approximation.docID() == docID;
+if (dim.twoPhase.matches() == false) {
+  if (failedDim != null) {
+assert dim.approximation.docID() + 1 <= 
failedDim.approximation.docID();

Review Comment:
   Why is this assertion true? Shouldn't the two docID be the same and both 
equal to `docID` according to assertion in L228?



##
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java:
##
@@ -149,63 +162,147 @@ public int score(LeafCollector collector, Bits 
acceptDocs, int min, int maxDoc)
*/
   private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, 
DocsAndCost[] dims)
   throws IOException {
-// if (DEBUG) {
-//  System.out.println("  doQueryFirstScoring");
-// }
 setScorer(collector, ScoreCachingWrappingScorer.wrap(baseScorer));
 
-int docID = baseScorer.docID();
+List allDims = Arrays.asList(dims);
+CollectionUtil.timSort(allDims, Comparator.comparingLong(e -> 
e.approximation.cost()));
+
+List twoPhaseDims = null;
+for (DocsAndCost dim : dims) {
+  if (dim.twoPhase != null) {
+if (twoPhaseDims == null) {
+  twoPhaseDims = new ArrayList<>(dims.length);
+}
+twoPhaseDims.add(dim);
+  }
+}
+if (twoPhaseDims != null) {
+  CollectionUtil.timSort(
+  twoPhaseDims,
+  new Comparator() {
+@Override
+public int compare(DocsAndCost o1, DocsAndCost o2) {
+  return Float.compare(o1.twoPhase.matchCost(), 
o2.twoPhase.matchCost());
+

[GitHub] [lucene] gsmiller merged pull request #11821: FacetsCollector#collect is no longer final to allow extension (#11804)

2022-09-26 Thread GitBox


gsmiller merged PR #11821:
URL: https://github.com/apache/lucene/pull/11821


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #11780: GH#11601: Add ability to compute reader states after refresh

2022-09-26 Thread GitBox


zhaih commented on PR #11780:
URL: https://github.com/apache/lucene/pull/11780#issuecomment-1258623624

   For 1. I think we can always create a ReferenceManager of 
`IndexReaderAndState` so that those two are always refreshed together and you 
can acquire them together.
   For 2. I think people are discussing about deprecating SearcherManager right 
now (because we're having a timeout per indexSearcher which makes it not 
reusable) so I think there's not much value in it, but I could be wrong..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stevenschlansker opened a new pull request, #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

2022-09-26 Thread GitBox


stevenschlansker opened a new pull request, #11822:
URL: https://github.com/apache/lucene/pull/11822

   Adds a configurable timeout for PrimaryNode waiting for remotes to close.
   The default matches existing behavior. In the long run, maybe this wait loop 
goes away entirely, but I elected to make the most compatible change first.
   
   Fixes #11674
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler opened a new pull request, #11823: GH-11819: Fix the Eclipse part to support deveopment of the MR-JAR:

2022-09-26 Thread GitBox


uschindler opened a new pull request, #11823:
URL: https://github.com/apache/lucene/pull/11823

   This fixes the eclipse part of #11819 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #11819: Fix Java 19 MR-JAR compilation in IDEs

2022-09-26 Thread GitBox


uschindler commented on issue #11819:
URL: https://github.com/apache/lucene/issues/11819#issuecomment-1258639937

   Here is the fix for Eclipse: #11823


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #11815: Support deletions in rearrange (#11814)

2022-09-26 Thread GitBox


zhaih commented on PR #11815:
URL: https://github.com/apache/lucene/pull/11815#issuecomment-1258661752

   I think the overall process in my mind would be:
   1. Identify all deletes and save them
   2. Create a document selector which ignores all deletes in the original index
   3. Rearrange
   4. Apply deletes
   
   We can create a separate tool which do the step 1 and 4, or, which I think 
probably is more elegant, we can pass in another single DocumentSelector, which 
basically select which documents should be deleted afterwards. And we apply the 
deletions to the whole index after rearrange?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stefanvodita commented on pull request #11815: Support deletions in rearrange (#11814)

2022-09-26 Thread GitBox


stefanvodita commented on PR #11815:
URL: https://github.com/apache/lucene/pull/11815#issuecomment-1258665033

   > we can pass in another single DocumentSelector, which basically select 
which documents should be deleted afterwards. And we apply the deletions to the 
whole index after rearrange?
   
   I like the idea. I'll build a new revision following that line of thinking.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-26 Thread GitBox


mdmarshmallow commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r980528010


##
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+
+/** An {@link IndexOutput} that wraps another instance and tracks the number 
of bytes written */
+public class ByteTrackingIndexOutput extends IndexOutput {
+
+  private final IndexOutput output;
+  private final AtomicLong byteTracker;
+
+  protected ByteTrackingIndexOutput(IndexOutput output, AtomicLong 
byteTracker) {
+super(
+"Byte tracking wrapper for: " + output.getName(),
+"ByteTrackingIndexOutput{" + output.getName() + "}");
+this.output = output;
+this.byteTracker = byteTracker;
+  }
+
+  @Override
+  public void writeByte(byte b) throws IOException {
+byteTracker.incrementAndGet();
+output.writeByte(b);
+  }
+
+  @Override
+  public void writeBytes(byte[] b, int offset, int length) throws IOException {
+byteTracker.addAndGet(length);
+output.writeBytes(b, offset, length);
+  }
+
+  @Override
+  public void close() throws IOException {
+output.close();
+  }
+
+  @Override
+  public long getFilePointer() {
+return output.getFilePointer();
+  }
+
+  @Override
+  public long getChecksum() throws IOException {
+return output.getChecksum();
+  }
+
+  public String getWrappedName() {
+return output.getName();
+  }
+
+  public String getWrappedToString() {
+return output.toString();
+  }
+}

Review Comment:
   Oh I see what you're saying, I got a bit confused here. I'll change this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #11768: Fix tie-break bug in various Facets implementations

2022-09-26 Thread GitBox


gsmiller commented on code in PR #11768:
URL: https://github.com/apache/lucene/pull/11768#discussion_r980530138


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java:
##
@@ -189,10 +190,11 @@ private TopChildrenForPath 
getTopChildrenForPath(DimConfig dimConfig, int pathOr
 
 TopOrdAndFloatQueue.OrdAndValue reuse = null;
 while (ord != TaxonomyReader.INVALID_ORDINAL) {
-  if (values[ord] > 0) {
+  float value = values[ord];
+  if (value > 0) {
 aggregatedValue = aggregationFunction.aggregate(aggregatedValue, 
values[ord]);

Review Comment:
   Thanks for the catch. I intended to do this everyone but looks like I missed 
a spot. I'll go back through and re-check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #11768: Fix tie-break bug in various Facets implementations

2022-09-26 Thread GitBox


gsmiller commented on code in PR #11768:
URL: https://github.com/apache/lucene/pull/11768#discussion_r980530440


##
lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java:
##
@@ -626,7 +626,7 @@ public void testBasicWithCollectorManager() throws 
Exception {
 List topNDimsResult = r.facets.getTopDims(1, 2);
 assertEquals(1, topNDimsResult.size());
 assertEquals(
-"dim=Author path=[] value=5 childCount=4\n  Lisa (2)\n  Susan (1)\n",

Review Comment:
   Yeah, hindsight is 20/20 I suppose. We'll eventually converge on something 
that's correct :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller merged pull request #11768: Fix tie-break bug in various Facets implementations

2022-09-26 Thread GitBox


gsmiller merged PR #11768:
URL: https://github.com/apache/lucene/pull/11768


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #11803: DrillSideways optimizations

2022-09-26 Thread GitBox


gsmiller commented on code in PR #11803:
URL: https://github.com/apache/lucene/pull/11803#discussion_r980557793


##
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java:
##
@@ -149,63 +162,147 @@ public int score(LeafCollector collector, Bits 
acceptDocs, int min, int maxDoc)
*/
   private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, 
DocsAndCost[] dims)
   throws IOException {
-// if (DEBUG) {
-//  System.out.println("  doQueryFirstScoring");
-// }
 setScorer(collector, ScoreCachingWrappingScorer.wrap(baseScorer));
 
-int docID = baseScorer.docID();
+List allDims = Arrays.asList(dims);
+CollectionUtil.timSort(allDims, Comparator.comparingLong(e -> 
e.approximation.cost()));
+
+List twoPhaseDims = null;
+for (DocsAndCost dim : dims) {
+  if (dim.twoPhase != null) {
+if (twoPhaseDims == null) {
+  twoPhaseDims = new ArrayList<>(dims.length);
+}
+twoPhaseDims.add(dim);
+  }
+}
+if (twoPhaseDims != null) {
+  CollectionUtil.timSort(
+  twoPhaseDims,
+  new Comparator() {
+@Override
+public int compare(DocsAndCost o1, DocsAndCost o2) {
+  return Float.compare(o1.twoPhase.matchCost(), 
o2.twoPhase.matchCost());
+}
+  });
+}
+
+int docID = baseApproximation.docID();
 
 nextDoc:
 while (docID != PostingsEnum.NO_MORE_DOCS) {
+  assert docID == baseApproximation.docID();
+
   if (acceptDocs != null && acceptDocs.get(docID) == false) {
-docID = baseIterator.nextDoc();
+docID = baseApproximation.nextDoc();
 continue;
   }
-  LeafCollector failedCollector = null;
-  for (DocsAndCost dim : dims) {
-// TODO: should we sort this 2nd dimension of
-// docsEnums from most frequent to least?
+
+  DocsAndCost failedDim = null;
+  for (DocsAndCost dim : allDims) {
+final int dimDocID;
 if (dim.approximation.docID() < docID) {
-  dim.approximation.advance(docID);
+  dimDocID = dim.approximation.advance(docID);
+} else {
+  dimDocID = dim.approximation.docID();
 }
-
-boolean matches = false;
-if (dim.approximation.docID() == docID) {
-  if (dim.twoPhase == null) {
-matches = true;
+if (dimDocID != docID) {
+  if (failedDim != null) {
+int next = Math.min(dimDocID, failedDim.approximation.docID());
+docID = baseApproximation.advance(next);
+continue nextDoc;
   } else {
-matches = dim.twoPhase.matches();
+failedDim = dim;
   }
 }
+  }
 
-if (matches == false) {
-  if (failedCollector != null) {
-// More than one dim fails on this document, so
-// it's neither a hit nor a near-miss; move to
-// next doc:
-docID = baseIterator.nextDoc();
-continue nextDoc;
-  } else {
-failedCollector = dim.sidewaysLeafCollector;
+  if (baseTwoPhase != null && baseTwoPhase.matches() == false) {
+docID = baseApproximation.nextDoc();
+continue;
+  }
+
+  if (twoPhaseDims != null) {
+if (failedDim == null) {
+  for (DocsAndCost dim : twoPhaseDims) {
+assert dim.approximation.docID() == docID;
+if (dim.twoPhase.matches() == false) {
+  if (failedDim != null) {
+assert dim.approximation.docID() + 1 <= 
failedDim.approximation.docID();

Review Comment:
   Great catch, thanks! I think this must be a silly copy/paste on my part. I 
can't image what I was thinking otherwise. Also, we can be more aggressive with 
setting the next docID here. I'll update the PR with these changes. I've 
actually been working on another change that builds on this (on a separate 
fork) which allows sideways facets collectors to terminate early and then fold 
their conditions into hard requirements on the base query. That's a whole 
separate change to propose another day, but what slipped through here is an 
unfortunate side-effect of trying to keep two versions of a partial change in 
sync. Anyway, thanks again for the catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #11803: DrillSideways optimizations

2022-09-26 Thread GitBox


gsmiller commented on code in PR #11803:
URL: https://github.com/apache/lucene/pull/11803#discussion_r980560040


##
lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java:
##
@@ -149,63 +162,147 @@ public int score(LeafCollector collector, Bits 
acceptDocs, int min, int maxDoc)
*/
   private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, 
DocsAndCost[] dims)
   throws IOException {
-// if (DEBUG) {
-//  System.out.println("  doQueryFirstScoring");
-// }
 setScorer(collector, ScoreCachingWrappingScorer.wrap(baseScorer));
 
-int docID = baseScorer.docID();
+List allDims = Arrays.asList(dims);
+CollectionUtil.timSort(allDims, Comparator.comparingLong(e -> 
e.approximation.cost()));
+
+List twoPhaseDims = null;
+for (DocsAndCost dim : dims) {
+  if (dim.twoPhase != null) {
+if (twoPhaseDims == null) {
+  twoPhaseDims = new ArrayList<>(dims.length);
+}
+twoPhaseDims.add(dim);
+  }
+}
+if (twoPhaseDims != null) {
+  CollectionUtil.timSort(
+  twoPhaseDims,
+  new Comparator() {
+@Override
+public int compare(DocsAndCost o1, DocsAndCost o2) {
+  return Float.compare(o1.twoPhase.matchCost(), 
o2.twoPhase.matchCost());
+}
+  });
+}
+
+int docID = baseApproximation.docID();
 
 nextDoc:
 while (docID != PostingsEnum.NO_MORE_DOCS) {
+  assert docID == baseApproximation.docID();
+
   if (acceptDocs != null && acceptDocs.get(docID) == false) {
-docID = baseIterator.nextDoc();
+docID = baseApproximation.nextDoc();
 continue;
   }
-  LeafCollector failedCollector = null;
-  for (DocsAndCost dim : dims) {
-// TODO: should we sort this 2nd dimension of
-// docsEnums from most frequent to least?
+
+  DocsAndCost failedDim = null;
+  for (DocsAndCost dim : allDims) {
+final int dimDocID;
 if (dim.approximation.docID() < docID) {
-  dim.approximation.advance(docID);
+  dimDocID = dim.approximation.advance(docID);
+} else {
+  dimDocID = dim.approximation.docID();
 }
-
-boolean matches = false;
-if (dim.approximation.docID() == docID) {
-  if (dim.twoPhase == null) {
-matches = true;
+if (dimDocID != docID) {
+  if (failedDim != null) {
+int next = Math.min(dimDocID, failedDim.approximation.docID());
+docID = baseApproximation.advance(next);
+continue nextDoc;
   } else {
-matches = dim.twoPhase.matches();
+failedDim = dim;
   }
 }
+  }
 
-if (matches == false) {
-  if (failedCollector != null) {
-// More than one dim fails on this document, so
-// it's neither a hit nor a near-miss; move to
-// next doc:
-docID = baseIterator.nextDoc();
-continue nextDoc;
-  } else {
-failedCollector = dim.sidewaysLeafCollector;
+  if (baseTwoPhase != null && baseTwoPhase.matches() == false) {
+docID = baseApproximation.nextDoc();
+continue;
+  }
+
+  if (twoPhaseDims != null) {
+if (failedDim == null) {
+  for (DocsAndCost dim : twoPhaseDims) {
+assert dim.approximation.docID() == docID;

Review Comment:
   Sure, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stevenschlansker commented on issue #11674: PrimaryNode close waits for replicas to close, but there is no guarantee they ever will [LUCENE-10638]

2022-09-26 Thread GitBox


stevenschlansker commented on issue #11674:
URL: https://github.com/apache/lucene/issues/11674#issuecomment-1258739382

   Not hearing any other opinions, I opened a PR with a proposed solution. 
Hopefully that will spur some discussion :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

2022-09-26 Thread GitBox


mdmarshmallow commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1258901123

   Sorry for the delayed response, I haven't had internet for a few days. I 
added the option to include temp output in tracking, as well as the a method 
for real time byte tracking as well so we can let users get exactly what they 
need.
   
   Also @dsmiley that's an interesting suggestion! I'm not as familiar with 
Lucene as some of the other people commenting here but I would be open to 
adding this to metadata if other's agreed it would be useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dsmiley commented on pull request #11807: No need to rewrite queries in unified highlighter

2022-09-26 Thread GitBox


dsmiley commented on PR #11807:
URL: https://github.com/apache/lucene/pull/11807#issuecomment-1258920877

   > I thought the changes you made for unrecognized queries fixed the issues 
with the surround query parser? 
   
   I forget LOL but...
   
   > If not it would be good to implement query visitors for the queries that 
it produces, or at the very least have a test in the highlighting module for 
them.
   
   Indeed; let's test & see


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] hakanai commented on issue #6967: Different behaviour of JapaneseAnalyzer at indexing time vs. at search time results in no matches for some words. [LUCENE-5905]

2022-09-26 Thread GitBox


hakanai commented on issue #6967:
URL: https://github.com/apache/lucene/issues/6967#issuecomment-1258972455

   This issue is still alive and well in v8.5.1.
   
   The example discovered this time (in this case mode is set to SEARCH):
   
   1. input: 非公開
   tokens:
   - 非  (接頭詞)
   - 公開   (名詞)
   
   2. input: を非公開
   tokens:
   - を   (助詞)
   - 非公開(名詞)
   
   Somehow を preceding the word makes it happen, while having は preceding the 
word instead does not.
   
   This is also reproducible by typing the same examples into the text field on 
[Atilika's site](https://www.atilika.org/).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #11803: DrillSideways optimizations

2022-09-26 Thread GitBox


zhaih commented on PR #11803:
URL: https://github.com/apache/lucene/pull/11803#issuecomment-1258979907

   Changes LGTM, do we need to add some unit tests?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on a diff in pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

2022-09-26 Thread GitBox


zhaih commented on code in PR #11822:
URL: https://github.com/apache/lucene/pull/11822#discussion_r980742864


##
lucene/replicator/src/java/org/apache/lucene/replicator/nrt/PrimaryNode.java:
##
@@ -61,6 +61,7 @@ public abstract class PrimaryNode extends Node {
   private CopyState copyState;
 
   protected final long primaryGen;
+  private int remoteCloseTimeout = -1;

Review Comment:
   It'd better to name it `remoteCloseTimeoutMs`? And also change the getter 
and setter correspondingly?



##
lucene/replicator/src/java/org/apache/lucene/replicator/nrt/PrimaryNode.java:
##
@@ -196,6 +197,21 @@ public synchronized long getLastCommitVersion() {
 throw new AssertionError("missing VERSION_KEY");
   }
 
+  /**
+   * @return the number of milliseconds to wait during shutdown for remote 
replicas to close
+   */
+  public int getRemoteCloseTimeout() {
+return remoteCloseTimeout;
+  }
+
+  /**
+   * Set the number of milliseconds to wait during shutdown for remote 
replicas to close. {@code -1}
+   * (the default) means forever, and {@code 0} means don't wait at all.
+   */
+  public void setRemoteCloseTimeout(int remoteCloseTimeout) {

Review Comment:
   Can we add some unit tests for at least `remoteCloseTimeout = 0` case? Like 
produce some error or just interrupt the replicaNode and then close the 
primaryNode?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org