[GitHub] [lucene] thecoop opened a new pull request, #11847: Add a method allowing canonical strings to be returned from DataInput

2022-10-13 Thread GitBox
thecoop opened a new pull request, #11847: URL: https://github.com/apache/lucene/pull/11847 Use a shared buffer for decoding short strings -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [lucene] rmuir commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-10-13 Thread GitBox
rmuir commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1277440222 I don't know what this string interning is here, but I am strongly opposed to it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [lucene] rmuir merged pull request #11844: Mark TestLongBitSet.testHugeCapacity @Monster as it requires a lot of memory

2022-10-13 Thread GitBox
rmuir merged PR #11844: URL: https://github.com/apache/lucene/pull/11844 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[GitHub] [lucene] rmuir closed issue #11842: TestLongBitSet.testHugeCapacity OOM

2022-10-13 Thread GitBox
rmuir closed issue #11842: TestLongBitSet.testHugeCapacity OOM URL: https://github.com/apache/lucene/issues/11842 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[GitHub] [lucene] rmuir merged pull request #11846: WrapperDownloader: add retries for network blips around connect(), too

2022-10-13 Thread GitBox
rmuir merged PR #11846: URL: https://github.com/apache/lucene/pull/11846 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[GitHub] [lucene] rmuir closed issue #11845: WrapperDownloader should retry on Layer3/Layer4 network errors

2022-10-13 Thread GitBox
rmuir closed issue #11845: WrapperDownloader should retry on Layer3/Layer4 network errors URL: https://github.com/apache/lucene/issues/11845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [lucene] jpountz commented on pull request #11843: Remove cancellation check on every vector

2022-10-13 Thread GitBox
jpountz commented on PR #11843: URL: https://github.com/apache/lucene/pull/11843#issuecomment-1277592908 > I wonder if we are running benchmarks with the cancellation/timeout checker? We recently introduced support for benchmarking the impact of timeouts in the benchmark suite, but i

[GitHub] [lucene] jpountz merged pull request #11841: GITHUB-11761 (part 2): Fix unit tests to cleany work with new TierMer…

2022-10-13 Thread GitBox
jpountz merged PR #11841: URL: https://github.com/apache/lucene/pull/11841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-10-13 Thread GitBox
jpountz commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-1277704569 Closing: https://github.com/apache/lucene/pull/11831 has been merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [lucene] jpountz closed issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-10-13 Thread GitBox
jpountz closed issue #11761: Expand TieredMergePolicy deletePctAllowed limits URL: https://github.com/apache/lucene/issues/11761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [lucene] rmuir opened a new issue, #11848: Fix ExitableDirectoryReader sampling constants to be power-of-2

2022-10-13 Thread GitBox
rmuir opened a new issue, #11848: URL: https://github.com/apache/lucene/issues/11848 ### Description When looking at #11843, I noticed code of the following in several places in ExitableDirectoryReader: ``` if (calls++ % MAX_CALLS_XXX== 0) { checkAndThrow(); } ```

[GitHub] [lucene] benwtrent opened a new pull request, #11849: Fix failure to load larger data sets in KnnGraphTest

2022-10-13 Thread GitBox
benwtrent opened a new pull request, #11849: URL: https://github.com/apache/lucene/pull/11849 When running the `reindex` task with KnnGraphTest, exceptionally large datasets can be used. Since mmap is used to read the data, we need to know the buffer size. This size is limited to Integer.MA

[GitHub] [lucene] jtibshirani merged pull request #11843: Remove cancellation check on every vector

2022-10-13 Thread GitBox
jtibshirani merged PR #11843: URL: https://github.com/apache/lucene/pull/11843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

[GitHub] [lucene] benwtrent closed pull request #11849: Fix failure to load larger data sets in KnnGraphTest

2022-10-13 Thread GitBox
benwtrent closed pull request #11849: Fix failure to load larger data sets in KnnGraphTest URL: https://github.com/apache/lucene/pull/11849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [lucene] benwtrent commented on pull request #11849: Fix failure to load larger data sets in KnnGraphTest

2022-10-13 Thread GitBox
benwtrent commented on PR #11849: URL: https://github.com/apache/lucene/pull/11849#issuecomment-1277932553 @jtibshirani or @msokolov care to review? The bug was introduced back in https://github.com/apache/lucene/pull/1054 -- This is an automated message from the Apache Git Service. To re

[GitHub] [lucene] jtibshirani commented on pull request #11849: Fix failure to load larger data sets in KnnGraphTest

2022-10-13 Thread GitBox
jtibshirani commented on PR #11849: URL: https://github.com/apache/lucene/pull/11849#issuecomment-1278034553 Thanks for fixing this @benwtrent ! I wonder if we could take the simpler approach of just opening the file, and iterating through the vectors one by one. I don't think there's a cle

[GitHub] [lucene] benwtrent commented on pull request #11849: Fix failure to load larger data sets in KnnGraphTest

2022-10-13 Thread GitBox
benwtrent commented on PR #11849: URL: https://github.com/apache/lucene/pull/11849#issuecomment-1278060063 @jtibshirani My goal here was to fix the bug with as much as the original design as possible. I didn't want to spend a bunch of time re-factoring this code. I am open to simply

[GitHub] [lucene] rmuir opened a new pull request, #11850: Fix ExitableDirectoryReader sampling constants to be power-of-2

2022-10-13 Thread GitBox
rmuir opened a new pull request, #11850: URL: https://github.com/apache/lucene/pull/11850 If it's performance sensitive enough that we should do sampling, then we should avoid integer division too. Closes #11848 -- This is an automated message from the Apache Git Service. To respon

[GitHub] [lucene] msokolov opened a new issue, #11851: Luke web interface

2022-10-13 Thread GitBox
msokolov opened a new issue, #11851: URL: https://github.com/apache/lucene/issues/11851 ### Description I threw together a demo for ApacheCon to show off vector search and I wanted a scrappy UI I could hack on. Luke seemed like a good place to start since it is already in the Lucene

[GitHub] [lucene] msokolov opened a new pull request, #11852: Luke Webapp

2022-10-13 Thread GitBox
msokolov opened a new pull request, #11852: URL: https://github.com/apache/lucene/pull/11852 See #11851 for an overview of what this is -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [lucene] msokolov commented on pull request #11852: Luke Webapp

2022-10-13 Thread GitBox
msokolov commented on PR #11852: URL: https://github.com/apache/lucene/pull/11852#issuecomment-1278129891 So -- this is just a scrappy start I wanted to post to get an idea if people think this is worth including. The initial "overview" page is functionally equivalent to the Luke overview s

[GitHub] [lucene] rmuir commented on a diff in pull request #11852: Luke Webapp

2022-10-13 Thread GitBox
rmuir commented on code in PR #11852: URL: https://github.com/apache/lucene/pull/11852#discussion_r995094470 ## gradle/testing/randomization/policies/luke-tests.policy: ## @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributo

[GitHub] [lucene] rmuir commented on a diff in pull request #11852: Luke Webapp

2022-10-13 Thread GitBox
rmuir commented on code in PR #11852: URL: https://github.com/apache/lucene/pull/11852#discussion_r995097222 ## gradle/testing/randomization/policies/luke-tests.policy: ## @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributo

[GitHub] [lucene] msokolov commented on a diff in pull request #11852: Luke Webapp

2022-10-13 Thread GitBox
msokolov commented on code in PR #11852: URL: https://github.com/apache/lucene/pull/11852#discussion_r995097975 ## gradle/testing/randomization/policies/luke-tests.policy: ## @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contrib

[GitHub] [lucene] dsmiley commented on issue #11851: Luke web interface

2022-10-13 Thread GitBox
dsmiley commented on issue #11851: URL: https://github.com/apache/lucene/issues/11851#issuecomment-1278229377 I believe @romseygeek worked on a HTTP based Luke-like thing and invested a lot of time into it. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [lucene] dsmiley commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-10-13 Thread GitBox
dsmiley commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1278243705 This PR does not use `String.intern` which was the previous concern. So what's wrong here? -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [lucene] dsmiley commented on pull request #1069: [LUCENE-2587] Highlighter fragment bug

2022-10-13 Thread GitBox
dsmiley commented on PR #1069: URL: https://github.com/apache/lucene/pull/1069#issuecomment-1278245086 If only we renamed "Highlighter" to "OriginalHighlighter", maybe folks wouldn't continue to using this thing. Is the UnifiedHighlighter not satisfying you, and if so, why not? -- This

[GitHub] [lucene] rmuir commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-10-13 Thread GitBox
rmuir commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1278380008 It does essentially the same thing. Leaking memory on purpose into static finals. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [lucene] dsmiley commented on pull request #11847: Add a method allowing canonical strings to be returned from DataInput

2022-10-13 Thread GitBox
dsmiley commented on PR #11847: URL: https://github.com/apache/lucene/pull/11847#issuecomment-1278448167 The map isn't static. Even if there was a static map, *if* it was expressly used for known static strings, then it wouldn't be a leak but just re-use of constants. -- This is

[GitHub] [lucene] stefanvodita commented on pull request #11815: Support deletions in rearrange (#11814)

2022-10-13 Thread GitBox
stefanvodita commented on PR #11815: URL: https://github.com/apache/lucene/pull/11815#issuecomment-1278524902 The second revision comes with a lot more changes to support selecting deletes in the same fashion as segment content. I’ve reworked the tests to be more thorough, especially about

[GitHub] [lucene] zhaih commented on a diff in pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite

2022-10-13 Thread GitBox
zhaih commented on code in PR #11840: URL: https://github.com/apache/lucene/pull/11840#discussion_r995382563 ## lucene/classification/src/java/org/apache/lucene/classification/utils/NearestFuzzyQuery.java: ## @@ -31,13 +31,7 @@ import org.apache.lucene.index.TermStates; import

[GitHub] [lucene] zhaih commented on a diff in pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite

2022-10-13 Thread GitBox
zhaih commented on code in PR #11840: URL: https://github.com/apache/lucene/pull/11840#discussion_r995390126 ## lucene/core/src/java/org/apache/lucene/document/FeatureQuery.java: ## @@ -50,12 +49,12 @@ final class FeatureQuery extends Query { } @Override - public Query