[GitHub] [lucene] zhaih commented on pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-14 Thread via GitHub
zhaih commented on PR #12555: URL: https://github.com/apache/lucene/pull/12555#issuecomment-1720704189 Actually I just tried it myself and this will always reproduce the error: ``` actual.seekExact(0); actual.seekCeil(new BytesRef("")); for (int i = 0; i <

[GitHub] [lucene] iverase commented on pull request #12460: Allow reading binary doc values as a DataInput

2023-09-15 Thread via GitHub
iverase commented on PR #12460: URL: https://github.com/apache/lucene/pull/12460#issuecomment-1720945832 Thanks @jpountz and @uschindler for the input. I had a look into `RandomAccessInput` and I don't think this what we need. We need an DataInput that is positional ware so it supports seek

[GitHub] [lucene] epotyom commented on a diff in pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-15 Thread via GitHub
epotyom commented on code in PR #12555: URL: https://github.com/apache/lucene/pull/12555#discussion_r1327195378 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -1205,7 +1205,15 @@ public SeekStatus seekCeil(BytesRef text) throws I

[GitHub] [lucene] easyice opened a new pull request, #12557: Improve refresh speed with softdelete enable

2023-09-15 Thread via GitHub
easyice opened a new pull request, #12557: URL: https://github.com/apache/lucene/pull/12557 I found a flame graph in my production environment, the DocValuesConsumer for `___soft_deletes` field accounted for a large proportion ![image](https://github.com/apache/lucene/assets/23521001

[GitHub] [lucene] Shradha26 opened a new pull request, #12559: Choose sparse values in IntTaxonomyFacets when FacetsCollector has em…

2023-09-15 Thread via GitHub
Shradha26 opened a new pull request, #12559: URL: https://github.com/apache/lucene/pull/12559 …pty MatchingDocs ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [lucene] gokaai commented on a diff in pull request #12530: Fix CheckIndex to detect major corruption with old (not the latest) commit point

2023-09-15 Thread via GitHub
gokaai commented on code in PR #12530: URL: https://github.com/apache/lucene/pull/12530#discussion_r1327351474 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -610,6 +610,31 @@ public Status checkIndex(List onlySegments, ExecutorService executorServ

[GitHub] [lucene] gokaai commented on a diff in pull request #12530: Fix CheckIndex to detect major corruption with old (not the latest) commit point

2023-09-15 Thread via GitHub
gokaai commented on code in PR #12530: URL: https://github.com/apache/lucene/pull/12530#discussion_r1327351474 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -610,6 +610,31 @@ public Status checkIndex(List onlySegments, ExecutorService executorServ

[GitHub] [lucene] mikemccand commented on a diff in pull request #12530: Fix CheckIndex to detect major corruption with old (not the latest) commit point

2023-09-15 Thread via GitHub
mikemccand commented on code in PR #12530: URL: https://github.com/apache/lucene/pull/12530#discussion_r1327380574 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -610,6 +610,39 @@ public Status checkIndex(List onlySegments, ExecutorService executorServ

[GitHub] [lucene] gsmiller opened a new pull request, #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
gsmiller opened a new pull request, #12560: URL: https://github.com/apache/lucene/pull/12560 ### Description This extends the idea in GH#11878 to avoid advancing dependencies that are never referenced because of expression branching (i.e., ternary expressions). I think we should be

[GitHub] [lucene] zhaih commented on a diff in pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-15 Thread via GitHub
zhaih commented on code in PR #12555: URL: https://github.com/apache/lucene/pull/12555#discussion_r1327591088 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -1205,7 +1205,15 @@ public SeekStatus seekCeil(BytesRef text) throws IOE

[GitHub] [lucene] zhaih commented on a diff in pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-15 Thread via GitHub
zhaih commented on code in PR #12555: URL: https://github.com/apache/lucene/pull/12555#discussion_r1327593126 ## lucene/core/src/test/org/apache/lucene/codecs/lucene90/TestLucene90DocValuesFormat.java: ## @@ -958,4 +971,61 @@ public void testTermsEnumDictionary() throws IOExcept

[GitHub] [lucene] zhaih commented on pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-15 Thread via GitHub
zhaih commented on PR #12555: URL: https://github.com/apache/lucene/pull/12555#issuecomment-1721616521 Also pls add an entry to CHANGES.txt :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [lucene] zhaih commented on a diff in pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
zhaih commented on code in PR #12560: URL: https://github.com/apache/lucene/pull/12560#discussion_r1327603749 ## lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionFunctionValues.java: ## @@ -39,21 +39,21 @@ class ExpressionFunctionValues extends DoubleValues {

[GitHub] [lucene] gsmiller commented on a diff in pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
gsmiller commented on code in PR #12560: URL: https://github.com/apache/lucene/pull/12560#discussion_r1327624072 ## lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionFunctionValues.java: ## @@ -39,21 +39,21 @@ class ExpressionFunctionValues extends DoubleValues

[GitHub] [lucene] gsmiller commented on a diff in pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
gsmiller commented on code in PR #12560: URL: https://github.com/apache/lucene/pull/12560#discussion_r1327624072 ## lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionFunctionValues.java: ## @@ -39,21 +39,21 @@ class ExpressionFunctionValues extends DoubleValues

[GitHub] [lucene] gsmiller commented on issue #12558: IntTaxonomyFacets chooses dense values array when FacetsCollector has no MatchingDocs

2023-09-15 Thread via GitHub
gsmiller commented on issue #12558: URL: https://github.com/apache/lucene/issues/12558#issuecomment-1721652878 Thanks for opening this issue @Shradha26! Do we have a test case that reproduces this? I'm still a little confused on how we can actually arrive in this state? -- This is an aut

[GitHub] [lucene] gsmiller commented on pull request #12559: Choose sparse values in IntTaxonomyFacets when FacetsCollector has em…

2023-09-15 Thread via GitHub
gsmiller commented on PR #12559: URL: https://github.com/apache/lucene/pull/12559#issuecomment-1721725750 It's still a bit unclear to me how we can get in a state where `maxDoc` is zero. Do we understand how this is happening? Seems like possibly a bigger/different issue that we should fix

[GitHub] [lucene] zhaih commented on a diff in pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
zhaih commented on code in PR #12560: URL: https://github.com/apache/lucene/pull/12560#discussion_r1327707056 ## lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionFunctionValues.java: ## @@ -39,21 +39,21 @@ class ExpressionFunctionValues extends DoubleValues {

[GitHub] [lucene] gsmiller commented on a diff in pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
gsmiller commented on code in PR #12560: URL: https://github.com/apache/lucene/pull/12560#discussion_r1327725540 ## lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionFunctionValues.java: ## @@ -39,21 +39,21 @@ class ExpressionFunctionValues extends DoubleValues

[GitHub] [lucene] zhaih commented on a diff in pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
zhaih commented on code in PR #12560: URL: https://github.com/apache/lucene/pull/12560#discussion_r1327778019 ## lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionFunctionValues.java: ## @@ -39,21 +39,21 @@ class ExpressionFunctionValues extends DoubleValues {

[GitHub] [lucene] msokolov commented on a diff in pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
msokolov commented on code in PR #12560: URL: https://github.com/apache/lucene/pull/12560#discussion_r1327797753 ## lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionFunctionValues.java: ## @@ -39,21 +39,21 @@ class ExpressionFunctionValues extends DoubleValues

[GitHub] [lucene] gsmiller commented on a diff in pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
gsmiller commented on code in PR #12560: URL: https://github.com/apache/lucene/pull/12560#discussion_r1327828413 ## lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionFunctionValues.java: ## @@ -39,21 +39,21 @@ class ExpressionFunctionValues extends DoubleValues

[GitHub] [lucene] gsmiller commented on a diff in pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
gsmiller commented on code in PR #12560: URL: https://github.com/apache/lucene/pull/12560#discussion_r1327828846 ## lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionFunctionValues.java: ## @@ -39,21 +39,21 @@ class ExpressionFunctionValues extends DoubleValues

[GitHub] [lucene] gsmiller commented on a diff in pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
gsmiller commented on code in PR #12560: URL: https://github.com/apache/lucene/pull/12560#discussion_r1327829161 ## lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionFunctionValues.java: ## @@ -39,21 +39,21 @@ class ExpressionFunctionValues extends DoubleValues

[GitHub] [lucene] gsmiller commented on a diff in pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
gsmiller commented on code in PR #12560: URL: https://github.com/apache/lucene/pull/12560#discussion_r1327831100 ## lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionValueSource.java: ## @@ -90,16 +90,24 @@ public DoubleValues getValues(LeafReaderContext reade

[GitHub] [lucene] eraneverlaw opened a new issue, #12561: UAX29URLEmailTokenizerImpl.jflex matches emails with commas and invalid periods in the local part

2023-09-15 Thread via GitHub
eraneverlaw opened a new issue, #12561: URL: https://github.com/apache/lucene/issues/12561 ### Description The `UAX29URLEmailTokenizerImpl.jflex` code matches commas as part of email local part, as well as invalid leading, trailing, or consecutive periods. Examples of bad matches: `f

[GitHub] [lucene] epotyom commented on a diff in pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-15 Thread via GitHub
epotyom commented on code in PR #12555: URL: https://github.com/apache/lucene/pull/12555#discussion_r1327853406 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -1205,7 +1205,15 @@ public SeekStatus seekCeil(BytesRef text) throws I

[GitHub] [lucene] gsmiller commented on pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
gsmiller commented on PR #12560: URL: https://github.com/apache/lucene/pull/12560#issuecomment-1722012958 Thanks @zhaih / @msokolov ! Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [lucene] gsmiller merged pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
gsmiller merged PR #12560: URL: https://github.com/apache/lucene/pull/12560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

[GitHub] [lucene] msokolov commented on a diff in pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-15 Thread via GitHub
msokolov commented on code in PR #12560: URL: https://github.com/apache/lucene/pull/12560#discussion_r1327858524 ## lucene/expressions/src/java/org/apache/lucene/expressions/ExpressionFunctionValues.java: ## @@ -39,21 +39,21 @@ class ExpressionFunctionValues extends DoubleValues

[GitHub] [lucene] zhaih merged pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-16 Thread via GitHub
zhaih merged PR #12555: URL: https://github.com/apache/lucene/pull/12555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[GitHub] [lucene] zhaih commented on pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-16 Thread via GitHub
zhaih commented on PR #12555: URL: https://github.com/apache/lucene/pull/12555#issuecomment-1722343674 Merged and backported -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [lucene] zhaih commented on issue #12167: org.apache.lucene.search.grouping.TestGroupFacetCollector.testRandom fails reproducibly

2023-09-16 Thread via GitHub
zhaih commented on issue #12167: URL: https://github.com/apache/lucene/issues/12167#issuecomment-1722343849 Fixed by #12555, thanks @epotyom ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [lucene] zhaih closed issue #12167: org.apache.lucene.search.grouping.TestGroupFacetCollector.testRandom fails reproducibly

2023-09-16 Thread via GitHub
zhaih closed issue #12167: org.apache.lucene.search.grouping.TestGroupFacetCollector.testRandom fails reproducibly URL: https://github.com/apache/lucene/issues/12167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [lucene] zhaih opened a new issue, #12562: search.matchhighlight.TestPassageSelector.randomizedSanityCheck reproducible error

2023-09-16 Thread via GitHub
zhaih opened a new issue, #12562: URL: https://github.com/apache/lucene/issues/12562 ### Description Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-main/10242/ 1 tests failed. FAILED: org.apache.lucene.search.matchhighlight.TestPassageSelector.randomizedSan

[GitHub] [lucene] epotyom commented on pull request #12555: Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position bytes correctly (#12167)

2023-09-16 Thread via GitHub
epotyom commented on PR #12555: URL: https://github.com/apache/lucene/pull/12555#issuecomment-1722392891 @zhaih thank you for reviewing and merging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [lucene] dweiss commented on issue #12562: search.matchhighlight.TestPassageSelector.randomizedSanityCheck reproducible error

2023-09-16 Thread via GitHub
dweiss commented on issue #12562: URL: https://github.com/apache/lucene/issues/12562#issuecomment-1722400116 Thanks @zhaih - this looks like something I wrote... I'll be taking a look on Monday, please feel free to leave it to me. -- This is an automated message from the Apache Git Servic

[GitHub] [lucene] dweiss commented on issue #12562: search.matchhighlight.TestPassageSelector.randomizedSanityCheck reproducible error

2023-09-16 Thread via GitHub
dweiss commented on issue #12562: URL: https://github.com/apache/lucene/issues/12562#issuecomment-1722400642 Also, I'm not sure why the build indeed resulted in "build success" - will look into that as well,. ``` /home/jenkins/jenkins-slave/workspace/Lucene/Lucene-Check-main/gradlew -

[GitHub] [lucene] Deepika0510 commented on pull request #12345: LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches

2023-09-17 Thread via GitHub
Deepika0510 commented on PR #12345: URL: https://github.com/apache/lucene/pull/12345#issuecomment-1722444781 @jpountz To wrap all the leaves, we would need to wrap ReaderContext classes along with LeafReader classes as well right? Since, we generally access the leaves through the ReaderCont

[GitHub] [lucene] Shradha26 opened a new pull request, #12563: Branch 9 7

2023-09-17 Thread via GitHub
Shradha26 opened a new pull request, #12563: URL: https://github.com/apache/lucene/pull/12563 [Draft] Test case to replicate maxDoc == 0 when using IntTaxonomyFacets -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [lucene] Shradha26 commented on issue #12558: IntTaxonomyFacets chooses dense values array when FacetsCollector has no MatchingDocs

2023-09-17 Thread via GitHub
Shradha26 commented on issue #12558: URL: https://github.com/apache/lucene/issues/12558#issuecomment-1722494326 Hey @gsmiller, thanks for looking at the issue. Here's a draft PR for a test case that replicates this state: https://github.com/apache/lucene/pull/12563. The test ``testMaxDocIsN

[GitHub] [lucene] jpountz opened a new pull request, #12564: Window-at-a-time scoring for conjunctions.

2023-09-18 Thread via GitHub
jpountz opened a new pull request, #12564: URL: https://github.com/apache/lucene/pull/12564 This adds a bulk scorer for conjunctions that follows a similar idea as BS1: it is probably possible to make evaluation faster by evaluating windows of documents at a time instead of a single documen

[GitHub] [lucene] jpountz commented on pull request #12564: Window-at-a-time scoring for conjunctions.

2023-09-18 Thread via GitHub
jpountz commented on PR #12564: URL: https://github.com/apache/lucene/pull/12564#issuecomment-1722938506 This gives a major speedup in wikibigall: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

[GitHub] [lucene] dweiss commented on issue #12562: search.matchhighlight.TestPassageSelector.randomizedSanityCheck reproducible error

2023-09-18 Thread via GitHub
dweiss commented on issue #12562: URL: https://github.com/apache/lucene/issues/12562#issuecomment-1722957920 The "build success" here is caused by an explicit change made by Uwe, here: https://github.com/apache/lucene/issues/10513#issuecomment-1224072458 In his own words: The t

[GitHub] [lucene] dweiss opened a new issue, #12565: Omit -Ptests.haltonfailure=false in failed test repro line

2023-09-18 Thread via GitHub
dweiss opened a new issue, #12565: URL: https://github.com/apache/lucene/issues/12565 ### Description When a test fails in jenkins, it reports the gradle's build step as "build success" in the log file, even though some tests have failed. It's done so that all of the build runs until

[GitHub] [lucene] dweiss opened a new pull request, #12566: Omit -Ptests.haltonfailure=false in failed test repro line #12565

2023-09-18 Thread via GitHub
dweiss opened a new pull request, #12566: URL: https://github.com/apache/lucene/pull/12566 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

[GitHub] [lucene] dweiss opened a new pull request, #12567: TestPassageSelector.randomizedSanityCheck can fail when the random input clashes with an assertion

2023-09-18 Thread via GitHub
dweiss opened a new pull request, #12567: URL: https://github.com/apache/lucene/pull/12567 This is a test-assumption error only. As per issue: https://github.com/apache/lucene/issues/12562 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [lucene] dweiss commented on issue #12562: search.matchhighlight.TestPassageSelector.randomizedSanityCheck reproducible error

2023-09-18 Thread via GitHub
dweiss commented on issue #12562: URL: https://github.com/apache/lucene/issues/12562#issuecomment-1723003992 The issue is caused by random input clashing with test assertion, not a problem. I've filed PRs for both problems. -- This is an automated message from the Apache Git Service. To r

[GitHub] [lucene] dweiss merged pull request #12567: TestPassageSelector.randomizedSanityCheck can fail when the random input clashes with an assertion

2023-09-18 Thread via GitHub
dweiss merged PR #12567: URL: https://github.com/apache/lucene/pull/12567 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] dweiss closed issue #12562: search.matchhighlight.TestPassageSelector.randomizedSanityCheck reproducible error

2023-09-18 Thread via GitHub
dweiss closed issue #12562: search.matchhighlight.TestPassageSelector.randomizedSanityCheck reproducible error URL: https://github.com/apache/lucene/issues/12562 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [lucene] dweiss merged pull request #12566: Omit -Ptests.haltonfailure=false in failed test repro line #12565

2023-09-18 Thread via GitHub
dweiss merged PR #12566: URL: https://github.com/apache/lucene/pull/12566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] dweiss closed issue #12565: Omit -Ptests.haltonfailure=false in failed test repro line

2023-09-18 Thread via GitHub
dweiss closed issue #12565: Omit -Ptests.haltonfailure=false in failed test repro line URL: https://github.com/apache/lucene/issues/12565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [lucene] uschindler commented on a diff in pull request #12489: Add support for recursive graph bisection.

2023-09-18 Thread via GitHub
uschindler commented on code in PR #12489: URL: https://github.com/apache/lucene/pull/12489#discussion_r1328974164 ## lucene/misc/src/test/org/apache/lucene/misc/index/TestBPIndexReorderer.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [lucene] jpountz opened a new pull request, #12568: Fix issues with BP tests and the security manager.

2023-09-18 Thread via GitHub
jpountz opened a new pull request, #12568: URL: https://github.com/apache/lucene/pull/12568 The default ForkJoinPool implementation uses a thread factory that removes all permissions on threads, so we need to create our own to avoid tests failing with FS-based directories. -- This is

[GitHub] [lucene] gsmiller commented on pull request #12445: Expression: add a set of duplicate variables

2023-09-18 Thread via GitHub
gsmiller commented on PR #12445: URL: https://github.com/apache/lucene/pull/12445#issuecomment-1724108149 Thanks for the idea @epotyom! I wonder if this is relevant after #12560 was merged? With that change, we now only ever evaluate any given expression argument once (within an expression)

[GitHub] [lucene] javanna opened a new pull request, #12569: Prevent concurrent tasks from parallelizing further

2023-09-18 Thread via GitHub
javanna opened a new pull request, #12569: URL: https://github.com/apache/lucene/pull/12569 Concurrent search is currently applied once per search call, either when search is called, or when concurrent query rewrite happens. They generally don't happen within one another. There are situatio

[GitHub] [lucene] javanna commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-18 Thread via GitHub
javanna commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1328932010 ## lucene/core/src/java/org/apache/lucene/index/TermStates.java: ## @@ -86,19 +93,40 @@ public TermStates( * @param needsStats if {@code true} then all leaf contex

[GitHub] [lucene] javanna commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-18 Thread via GitHub
javanna commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1328936157 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,4 +68,46 @@ final List invokeAll(Collection> tasks) throws IOExcept } return

[GitHub] [lucene] benwtrent opened a new issue, #12570: Reading after Segment Merge fails for HNSW

2023-09-18 Thread via GitHub
benwtrent opened a new issue, #12570: URL: https://github.com/apache/lucene/issues/12570 ### Description While testing with Lucene Util, I ran 50 vectors through current main. To test merging, I decreased the rambuffersize to 128MB, indexed, and then force-merged. When mer

[GitHub] [lucene] benwtrent commented on issue #12570: Reading after Segment Merge fails for HNSW

2023-09-18 Thread via GitHub
benwtrent commented on issue #12570: URL: https://github.com/apache/lucene/issues/12570#issuecomment-1724341985 OK, this bug has been around since 9.6. I tried to replicate in 9.5 and it didn't. So, I am guessing something around using an existing graph to merge ends up creating mor

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-18 Thread via GitHub
shubhamvishu commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1329276879 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util

[GitHub] [lucene] benwtrent commented on issue #12570: Reading after Segment Merge fails for HNSW

2023-09-18 Thread via GitHub
benwtrent commented on issue #12570: URL: https://github.com/apache/lucene/issues/12570#issuecomment-1724413959 OK, debugging some more, looks like merging multiple segments and then force merging with Lucene Util 100% replicates this in 9.6. The connection count on force merge increa

[GitHub] [lucene] benwtrent opened a new pull request, #12571: Fix HNSW graph reading with excessive connections

2023-09-18 Thread via GitHub
benwtrent opened a new pull request, #12571: URL: https://github.com/apache/lucene/pull/12571 When re-using the HNSW graph during segment merges, it is possible that more than the configured `M*2` connections could be made per vector. In those instances, we should allow the graph to s

[GitHub] [lucene] benwtrent commented on pull request #12571: Fix HNSW graph reading with excessive connections

2023-09-18 Thread via GitHub
benwtrent commented on PR #12571: URL: https://github.com/apache/lucene/pull/12571#issuecomment-1724430058 I think this is a two part fix. One, to make sure if users hit this that their graph is readable (this PR) and two, get the graph builder to output reasonable connection numbers again.

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-18 Thread via GitHub
shubhamvishu commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1329299492 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -48,10 +58,17 @@ class TaskExecutor { * @return a list containing the results from t

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-18 Thread via GitHub
shubhamvishu commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1329299924 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -26,17 +26,21 @@ import java.util.concurrent.Executor; import java.util.concurrent.Fut

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12183: Make TermStates#build concurrent

2023-09-18 Thread via GitHub
shubhamvishu commented on code in PR #12183: URL: https://github.com/apache/lucene/pull/12183#discussion_r1329306960 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,4 +68,46 @@ final List invokeAll(Collection> tasks) throws IOExcept } re

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-18 Thread via GitHub
shubhamvishu commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1329311171 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-18 Thread via GitHub
shubhamvishu commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1329314180 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## Review Comment: Does it make sense to move `TaskExecutor` under `org.apache.lucene.uti

[GitHub] [lucene] Tony-X commented on issue #12513: Try out a tantivy's term dictionary format

2023-09-18 Thread via GitHub
Tony-X commented on issue #12513: URL: https://github.com/apache/lucene/issues/12513#issuecomment-1724547262 I've been designing how to possibly account for the optional states that each term may end up with. Namely how to deal with the following: * if a term has singleton docid * if

[GitHub] [lucene] jmazanec15 commented on issue #12570: Reading after Segment Merge fails for HNSW

2023-09-18 Thread via GitHub
jmazanec15 commented on issue #12570: URL: https://github.com/apache/lucene/issues/12570#issuecomment-1724649688 Ill take a look too @benwtrent. 192 seems to be a lot. From a high level scan of the HNSWGraphBuilder, I do not see anything that obviously causes neighbors to go above M/2M. The

[GitHub] [lucene] benwtrent commented on issue #12570: Reading after Segment Merge fails for HNSW

2023-09-18 Thread via GitHub
benwtrent commented on issue #12570: URL: https://github.com/apache/lucene/issues/12570#issuecomment-1724671063 > What did you mean by merging and then force merging? Do you have the luceneutil params you ran with for replication? I indexed 50 cohere 768 vectors with a buffer of 1

[GitHub] [lucene] benwtrent commented on issue #12570: Reading after Segment Merge fails for HNSW

2023-09-18 Thread via GitHub
benwtrent commented on issue #12570: URL: https://github.com/apache/lucene/issues/12570#issuecomment-1724675517 @jmazanec15 I just noticed something in my testing as a tripled checked everything. `'maxConn': (96,),` was left over from a previous test. So, a max of 192 is perfectly fi

[GitHub] [lucene] epotyom commented on pull request #12445: Expression: add a set of duplicate variables

2023-09-18 Thread via GitHub
epotyom commented on PR #12445: URL: https://github.com/apache/lucene/pull/12445#issuecomment-1724861051 @gsmiller , very nice improvement! Yes, this change is obsolete now as we only need to worry about variables reused across expressions, but not within one expression. I'll cancel this PR

[GitHub] [lucene] epotyom closed pull request #12445: Expression: add a set of duplicate variables

2023-09-18 Thread via GitHub
epotyom closed pull request #12445: Expression: add a set of duplicate variables URL: https://github.com/apache/lucene/pull/12445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [lucene] jpountz commented on pull request #12568: Fix issues with BP tests and the security manager.

2023-09-18 Thread via GitHub
jpountz commented on PR #12568: URL: https://github.com/apache/lucene/pull/12568#issuecomment-1724930942 > Did you test with -Dtests.directory=MMapDirectory? I did. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [lucene] jpountz merged pull request #12568: Fix issues with BP tests and the security manager.

2023-09-18 Thread via GitHub
jpountz merged PR #12568: URL: https://github.com/apache/lucene/pull/12568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1329665865 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## Review Comment: I am not entirely sure: one aspect is that I'd like to make sure that there

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1329672558 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.Obje

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1329673052 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.Obje

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1329674640 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -48,10 +58,17 @@ class TaskExecutor { * @return a list containing the results from the ta

[GitHub] [lucene] uschindler commented on pull request #12568: Fix issues with BP tests and the security manager.

2023-09-19 Thread via GitHub
uschindler commented on PR #12568: URL: https://github.com/apache/lucene/pull/12568#issuecomment-1724994726 Thanks @jpountz for merging. I did not notice that you assigned the issue to me so I should merge it. I was about to checkout the repo and test it intensively. But as you have ran the

[GitHub] [lucene] jpountz commented on pull request #12568: Fix issues with BP tests and the security manager.

2023-09-19 Thread via GitHub
jpountz commented on PR #12568: URL: https://github.com/apache/lucene/pull/12568#issuecomment-1725228619 Oops, I think I confused the "Reviewers" and "Assignees" section, I didn't mean to put it on your plate! -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [lucene] uschindler commented on pull request #12568: Fix issues with BP tests and the security manager.

2023-09-19 Thread via GitHub
uschindler commented on PR #12568: URL: https://github.com/apache/lucene/pull/12568#issuecomment-1725333605 MMap Jenkins tests were happy, all tests passed: - https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/ - https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Windows/ -- This is an a

[GitHub] [lucene] uschindler commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
uschindler commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330062036 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.O

[GitHub] [lucene] msokolov commented on a diff in pull request #12547: Compute multiple float aggregations in one go

2023-09-19 Thread via GitHub
msokolov commented on code in PR #12547: URL: https://github.com/apache/lucene/pull/12547#discussion_r1330093278 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java: ## @@ -37,33 +37,43 @@ abstract class FloatTaxonomyFacets extends TaxonomyFacets {

[GitHub] [lucene] msokolov commented on issue #12553: [DISCUSS] Identifying Gaps in Lucene’s Faceting

2023-09-19 Thread via GitHub
msokolov commented on issue #12553: URL: https://github.com/apache/lucene/issues/12553#issuecomment-1725489583 So many ideas here! It's clear we have some room to grow this API. I wonder if we could organize them into a plan with dependencies and priorities. Also some of the ideas I'm not

[GitHub] [lucene] rmuir commented on issue #12561: UAX29URLEmailTokenizerImpl.jflex matches emails with commas and invalid periods in the local part

2023-09-19 Thread via GitHub
rmuir commented on issue #12561: URL: https://github.com/apache/lucene/issues/12561#issuecomment-1725653759 awesome find, wow, embedded sneaky range in the grammar :) Locally I modified the grammar per your suggestion and ran `gradle regenerate`, tests seemed happy. i want to add a si

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330210656 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.Obje

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330212822 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## Review Comment: which public methods require changing? As far as I understand visibility of

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330215231 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## Review Comment: > why not move to work-stealing fork/join here? There were concerns

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330223982 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.Obje

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330223982 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.Obje

[GitHub] [lucene] javanna commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
javanna commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330265734 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.Obje

[GitHub] [lucene] uschindler commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
uschindler commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330330240 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.O

[GitHub] [lucene] uschindler commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
uschindler commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330347382 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.O

[GitHub] [lucene] jpountz commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
jpountz commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330338668 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,4 +82,26 @@ final List invokeAll(Collection> tasks) throws IOExcept } return

[GitHub] [lucene] jpountz commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
jpountz commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330366074 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.Obje

[GitHub] [lucene] uschindler commented on a diff in pull request #12569: Prevent concurrent tasks from parallelizing further

2023-09-19 Thread via GitHub
uschindler commented on code in PR #12569: URL: https://github.com/apache/lucene/pull/12569#discussion_r1330398850 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -22,18 +22,28 @@ import java.util.Collection; import java.util.List; import java.util.O

[GitHub] [lucene] jmazanec15 commented on issue #12570: Reading after Segment Merge fails for HNSW

2023-09-19 Thread via GitHub
jmazanec15 commented on issue #12570: URL: https://github.com/apache/lucene/issues/12570#issuecomment-1726067867 Oh right I actually raised a PR sometime back around this but forgot about it: https://github.com/apache/lucene/pull/12002. > The question then becomes, is this a valid us

<    1   2   3   4   5   6   7   8   9   10   >