[GitHub] [lucene] uschindler commented on pull request #592: LUCENE-10366: Reduce the number of valid checks for ByteBufferIndexInput#readVInt

2022-01-12 Thread GitBox
uschindler commented on pull request #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-1010776138 Hi @gf2121, thanksfor taking the time to test more. It looks really like the guar checks are not affecting the whole thing, but as always we don't really know where the opti

[GitHub] [lucene] uschindler edited a comment on pull request #592: LUCENE-10366: Reduce the number of valid checks for ByteBufferIndexInput#readVInt

2022-01-12 Thread GitBox
uschindler edited a comment on pull request #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-1010776138 Hi @gf2121, thanks for taking the time to test more. It looks really like the guard checks are not affecting the whole thing, but as always we don't really know where

[GitHub] [lucene] uschindler commented on pull request #592: LUCENE-10366: Reduce the number of valid checks for ByteBufferIndexInput#readVInt

2022-01-12 Thread GitBox
uschindler commented on pull request #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-1010781533 I think about this: ```java @Override public final int readVInt() throws IOException { try { // using #position instead of #mark here as #position

[GitHub] [lucene] jpountz commented on pull request #592: LUCENE-10366: Reduce the number of valid checks for ByteBufferIndexInput#readVInt

2022-01-12 Thread GitBox
jpountz commented on pull request #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-1010892395 I wonder if one reason why this helps is because this method is so large that it is prevented from inlining sub function calls. The JVM bug that affected vint/vlong reads no longe

[GitHub] [lucene] jpountz edited a comment on pull request #592: LUCENE-10366: Reduce the number of valid checks for ByteBufferIndexInput#readVInt

2022-01-12 Thread GitBox
jpountz edited a comment on pull request #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-1010892395 I wonder if one reason why this helps is because this method is so large that it is prevented from inlining sub function calls. The JVM bug that affected vint/vlong reads n

[GitHub] [lucene] rmuir commented on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
rmuir commented on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1010975017 > Note that we also [moved to JDK 17](https://github.com/apache/lucene/commit/2ebc57a465fc584cf2242765898290a6c2f0f420) yesterday as well. While changes specifically to taxo-facetin

[GitHub] [lucene] rmuir commented on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
rmuir commented on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1010978360 Here is the list of changes: https://github.com/apache/lucene/compare/42fe2d5620acf62e6d1e1cefd22817ded8a70c11...82703757fe05a1d3b9efe28d766d4bc3df8a060f I suspect it is one o

[GitHub] [lucene] gf2121 commented on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
gf2121 commented on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1011007555 Thanks @gsmiller @rmuir for helping dig! I'm still thinking this degression comes from LUCENE-10350 because: * LUCENE-10250 was working around SSDV not taxo facets.

[GitHub] [lucene] uschindler commented on pull request #592: LUCENE-10366: Reduce the number of valid checks for ByteBufferIndexInput#readVInt

2022-01-12 Thread GitBox
uschindler commented on pull request #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-103861 > I wonder if one reason why this helps is because this method is so large that it is prevented from inlining sub function calls. The JVM bug that affected vint/vlong reads no

[GitHub] [lucene] uschindler commented on pull request #592: LUCENE-10366: Reduce the number of valid checks for ByteBufferIndexInput#readVInt

2022-01-12 Thread GitBox
uschindler commented on pull request #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-106528 We should maybe convert all of them back to loops in DataInput base class and remove the specializations and do more tests. I think the change here is then completely obsolete.

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical

2022-01-12 Thread GitBox
mayya-sharipova commented on a change in pull request #416: URL: https://github.com/apache/lucene/pull/416#discussion_r783193397 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsFormat.java ## @@ -35,14 +35,23 @@ * This file stores all

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical

2022-01-12 Thread GitBox
mayya-sharipova commented on a change in pull request #416: URL: https://github.com/apache/lucene/pull/416#discussion_r783194063 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsFormat.java ## @@ -54,13 +63,19 @@ * [int32] vector simi

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical

2022-01-12 Thread GitBox
mayya-sharipova commented on a change in pull request #416: URL: https://github.com/apache/lucene/pull/416#discussion_r783194796 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java ## @@ -147,7 +137,6 @@ private static IndexInput

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical

2022-01-12 Thread GitBox
mayya-sharipova commented on a change in pull request #416: URL: https://github.com/apache/lucene/pull/416#discussion_r783195370 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java ## @@ -464,30 +477,45 @@ private void readValue(

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical

2022-01-12 Thread GitBox
mayya-sharipova commented on a change in pull request #416: URL: https://github.com/apache/lucene/pull/416#discussion_r783195632 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java ## @@ -464,30 +477,45 @@ private void readValue(

[GitHub] [lucene] gsmiller commented on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
gsmiller commented on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1011212031 >The version of java didn't change. Mike was using java 17 well before yesterday. Click on a day and you can see the JVM version that ran the benchmark. It has been 17 for a whil

[GitHub] [lucene] gsmiller edited a comment on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
gsmiller edited a comment on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1011212031 >The version of java didn't change. Mike was using java 17 well before yesterday. Click on a day and you can see the JVM version that ran the benchmark. It has been 17 for

[GitHub] [lucene] gsmiller commented on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
gsmiller commented on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1011231123 I'm going to run some more local benchmarks with some of these other facet-related changes reverted. While I agree that none of them seem like they should impact the `#countAll`

[GitHub] [lucene] gsmiller commented on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
gsmiller commented on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1011241519 Benchmark results with all three other faceting changes (omitting this one) reverted (using `wikimedium10m`). None of the results are significant but there is a trend with "brows

[GitHub] [lucene] alessandrobenedetti commented on pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue

2022-01-12 Thread GitBox
alessandrobenedetti commented on pull request #380: URL: https://github.com/apache/lucene/pull/380#issuecomment-1011243486 I agree, I am a supporter of the https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it philosophy, so if we are not breaking any test in Lucene I suggest we go with

[GitHub] [lucene] gsmiller opened a new pull request #599: LUCENE-10368: Mark IntTaxonomyFacets as deprecated

2022-01-12 Thread GitBox
gsmiller opened a new pull request #599: URL: https://github.com/apache/lucene/pull/599 # Description IntTaxonomyFacets is intended as an internal implementation detail but has public visibility, so could also be serving as an extension point for user-created faceting implementation

[GitHub] [lucene] gsmiller commented on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
gsmiller commented on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1011267849 Here are results of just reverting LUCENE-10245. This doesn't look like the cause of the regression to me. None of the results are significant, and if anything, the tasks associa

[GitHub] [lucene] gsmiller opened a new pull request #600: LUCENE-10368: Make IntTaxonomyFacets pkg-private

2022-01-12 Thread GitBox
gsmiller opened a new pull request #600: URL: https://github.com/apache/lucene/pull/600 # Description IntTaxonomyFacets is intended as an internal implementation detail but has public visibility, so could also be serving as an extension point for user-created faceting implementation

[jira] [Commented] (LUCENE-10368) Reduce visibility of IntTaxonomyFacets

2022-01-12 Thread Greg Miller (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474747#comment-17474747 ] Greg Miller commented on LUCENE-10368: -- There are two separate PRs for this that s

[jira] [Assigned] (LUCENE-10368) Reduce visibility of IntTaxonomyFacets

2022-01-12 Thread Greg Miller (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Miller reassigned LUCENE-10368: Assignee: Greg Miller > Reduce visibility of IntTaxonomyFacets > ---

[GitHub] [lucene] gsmiller commented on a change in pull request #600: LUCENE-10368: Make IntTaxonomyFacets pkg-private

2022-01-12 Thread GitBox
gsmiller commented on a change in pull request #600: URL: https://github.com/apache/lucene/pull/600#discussion_r783300374 ## File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java ## @@ -62,8 +52,82 @@ protected IntTaxonomyFacets( } }

[GitHub] [lucene] rmuir commented on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
rmuir commented on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1011304253 maybe the issue only reproduces with beefy AMD cpus similar to mike's. We have seen this before, mike likes to buy "crazy CPUs" -- This is an automated message from the Apache Git

[GitHub] [lucene] gsmiller commented on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
gsmiller commented on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1011307147 @rmuir that's an interesting thought. I'll round out my local benchmarks here, but assuming I don't find anything interesting, I think we'll need to actually start reverting chan

[GitHub] [lucene] gsmiller commented on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
gsmiller commented on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1011307529 Here's what I get when reverting LUCENE-10356 only. I don't see anything here: ``` TaskQPS baseline StdDevQPS candidate StdDev

[GitHub] [lucene] jpountz merged pull request #531: addBackcompatIndexes.py should use Gradle, not Ant.

2022-01-12 Thread GitBox
jpountz merged pull request #531: URL: https://github.com/apache/lucene/pull/531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr..

[GitHub] [lucene] gsmiller commented on pull request #600: LUCENE-10368: Make IntTaxonomyFacets pkg-private

2022-01-12 Thread GitBox
gsmiller commented on pull request #600: URL: https://github.com/apache/lucene/pull/600#issuecomment-1011313093 Hmm, OK not sure what those failing tests are about (or how I missed them locally), but I'll dig into it. -- This is an automated message from the Apache Git Service. To respon

[GitHub] [lucene] gsmiller commented on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
gsmiller commented on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1011326157 Last one. Here are the results of reverting LUCENE-10250. I don't see anything here either. ``` TaskQPS baseline StdDevQPS candidate

[jira] [Created] (LUCENE-10374) Track down the "browse" taxonomy faceting qps regression

2022-01-12 Thread Greg Miller (Jira)
Greg Miller created LUCENE-10374: Summary: Track down the "browse" taxonomy faceting qps regression Key: LUCENE-10374 URL: https://issues.apache.org/jira/browse/LUCENE-10374 Project: Lucene - Core

[GitHub] [lucene] gsmiller commented on pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
gsmiller commented on pull request #597: URL: https://github.com/apache/lucene/pull/597#issuecomment-1011329266 OK, since I can't reproduce this locally, let's go ahead and try reverting LUCENE-10372 (this PR) to see the impact on the nightly benchmarks. I'll merge this PR. I also created

[GitHub] [lucene] gsmiller merged pull request #597: LUCENE-10372: Performance of TaxoFacets in Nightly benchmark decreased, revert LUCENE-10350

2022-01-12 Thread GitBox
gsmiller merged pull request #597: URL: https://github.com/apache/lucene/pull/597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr.

[jira] [Commented] (LUCENE-10374) Track down the "browse" taxonomy faceting qps regression

2022-01-12 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474790#comment-17474790 ] ASF subversion and git services commented on LUCENE-10374: -- Co

[jira] [Commented] (LUCENE-10355) Remove EMPTY LongValues in favor of LongValues#ZERO

2022-01-12 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474789#comment-17474789 ] ASF subversion and git services commented on LUCENE-10355: -- Co

[GitHub] [lucene] gsmiller commented on pull request #600: LUCENE-10368: Make IntTaxonomyFacets pkg-private

2022-01-12 Thread GitBox
gsmiller commented on pull request #600: URL: https://github.com/apache/lucene/pull/600#issuecomment-1011331548 Found the bug (doh!) and added a fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] gsmiller commented on pull request #600: LUCENE-10368: Make IntTaxonomyFacets pkg-private

2022-01-12 Thread GitBox
gsmiller commented on pull request #600: URL: https://github.com/apache/lucene/pull/600#issuecomment-1011332440 Ah, and now there are merge conflicts because of the efforts to find the taxonomy faceting performance regression (see #597). Ok, let's hold off on this change until those perfor

[jira] [Commented] (LUCENE-10368) Reduce visibility of IntTaxonomyFacets

2022-01-12 Thread Greg Miller (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474868#comment-17474868 ] Greg Miller commented on LUCENE-10368: -- Let's hold off on this change actually unt

[GitHub] [lucene] rmuir commented on pull request #592: LUCENE-10366: Reduce the number of valid checks for ByteBufferIndexInput#readVInt

2022-01-12 Thread GitBox
rmuir commented on pull request #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-1011356896 +1 let's make a separate change to look at re-rolling the loop back. the JVM bug in question doesn't affect this version of lucene (or probably the last one either?), it was from an

[GitHub] [lucene] jtibshirani commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical

2022-01-12 Thread GitBox
jtibshirani commented on a change in pull request #416: URL: https://github.com/apache/lucene/pull/416#discussion_r783342986 ## File path: lucene/core/src/test/org/apache/lucene/util/hnsw/TestHnswGraph.java ## @@ -116,33 +116,59 @@ public KnnVectorsFormat getKnnVectorsFormatFo

[GitHub] [lucene] mayya-sharipova commented on pull request #575: Small edits for KnnGraphTester

2022-01-12 Thread GitBox
mayya-sharipova commented on pull request #575: URL: https://github.com/apache/lucene/pull/575#issuecomment-1011505100 @jtibshirani Thanks for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [lucene] mayya-sharipova merged pull request #575: Small edits for KnnGraphTester

2022-01-12 Thread GitBox
mayya-sharipova merged pull request #575: URL: https://github.com/apache/lucene/pull/575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-un

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #416: LUCENE-10054 Make HnswGraph hierarchical

2022-01-12 Thread GitBox
mayya-sharipova commented on a change in pull request #416: URL: https://github.com/apache/lucene/pull/416#discussion_r783592663 ## File path: lucene/core/src/test/org/apache/lucene/index/TestKnnGraph.java ## @@ -279,6 +303,77 @@ public void testSearch() throws Exception {

[GitHub] [lucene] gf2121 commented on pull request #592: LUCENE-10366: Reduce the number of valid checks for ByteBufferIndexInput#readVInt

2022-01-12 Thread GitBox
gf2121 commented on pull request #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-1011782903 Thanks all! I've finished benchmarks (20 JVM * 200 repeat): **Read byte from guard** > Here is the [code](https://github.com/apache/lucene/pull/592/commits/5fae0b2f4ffc6de

[GitHub] [lucene] gf2121 edited a comment on pull request #592: LUCENE-10366: Reduce the number of valid checks for ByteBufferIndexInput#readVInt

2022-01-12 Thread GitBox
gf2121 edited a comment on pull request #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-1011782903 Thanks all! I've finished benchmarks (20 JVM * 200 repeat): **Read byte from guard** > Here is the [code](https://github.com/apache/lucene/pull/592/commits/5fae0b2f

[GitHub] [lucene] gf2121 edited a comment on pull request #592: LUCENE-10366: Reduce the number of valid checks for ByteBufferIndexInput#readVInt

2022-01-12 Thread GitBox
gf2121 edited a comment on pull request #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-1011782903 Thanks all! I've finished benchmarks (20 JVM * 200 repeat): **Read byte from guard** > Here is the [code](https://github.com/apache/lucene/pull/592/commits/5fae0b2f

[jira] [Created] (LUCENE-10375) Speed up HNSW merge by writing combined vector data

2022-01-12 Thread Julie Tibshirani (Jira)
Julie Tibshirani created LUCENE-10375: - Summary: Speed up HNSW merge by writing combined vector data Key: LUCENE-10375 URL: https://issues.apache.org/jira/browse/LUCENE-10375 Project: Lucene - Core

[GitHub] [lucene] jtibshirani opened a new pull request #601: LUCENE-10375: Write merged vectors to file before building graph

2022-01-12 Thread GitBox
jtibshirani opened a new pull request #601: URL: https://github.com/apache/lucene/pull/601 When merging segments together, the `KnnVectorsWriter` creates a `VectorValues` instance with a merged view of all the segments' vectors. This merged instance is used when constructing the new

[jira] [Comment Edited] (LUCENE-10375) Speed up HNSW merge by writing combined vector data

2022-01-12 Thread Julie Tibshirani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475122#comment-17475122 ] Julie Tibshirani edited comment on LUCENE-10375 at 1/13/22, 6:13 AM:

[jira] [Commented] (LUCENE-10375) Speed up HNSW merge by writing combined vector data

2022-01-12 Thread Julie Tibshirani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475122#comment-17475122 ] Julie Tibshirani commented on LUCENE-10375: --- I tried out the idea in this dra

[GitHub] [lucene] jtibshirani commented on a change in pull request #601: LUCENE-10375: Write merged vectors to file before building graph

2022-01-12 Thread GitBox
jtibshirani commented on a change in pull request #601: URL: https://github.com/apache/lucene/pull/601#discussion_r783657144 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsWriter.java ## @@ -145,6 +139,64 @@ public void writeField(Field