Re: [I] TestTrie OOMs [lucene]

2025-04-14 Thread via GitHub
gf2121 closed issue #14487: TestTrie OOMs URL: https://github.com/apache/lucene/issues/14487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsu

Re: [I] update PRs for java dependencies [lucene]

2025-04-14 Thread via GitHub
rmuir commented on issue #14490: URL: https://github.com/apache/lucene/issues/14490#issuecomment-2803967384 I'm not actively working this yet, it is just for discussion. But if we can coerce the thing to work well with our project, I think it is beneficial. Better to have a failing PR, well

Re: [PR] ci: pin github actions versions in use [lucene]

2025-04-14 Thread via GitHub
dweiss commented on PR #14492: URL: https://github.com/apache/lucene/pull/14492#issuecomment-2803973302 Fair enough. Whenever I've used dependabot I gave up, eventually - I have to deal with javascript and it's a nightmare there. -- This is an automated message from the Apache Git Service

Re: [I] update PRs for java dependencies [lucene]

2025-04-14 Thread via GitHub
dweiss commented on issue #14490: URL: https://github.com/apache/lucene/issues/14490#issuecomment-2803968903 I think the "gradle convention" is to keep libs.version.toml under gradle/ and I think it's what dependabot understands - and I personally hate this, without any specific objective r

Re: [PR] ci: pin github actions versions in use [lucene]

2025-04-14 Thread via GitHub
dweiss commented on PR #14492: URL: https://github.com/apache/lucene/pull/14492#issuecomment-2803938283 It would be lovely if we could store these versions in one place somewhere - updating would be easier. In general, I've never had a problem with the official gh actions and keep them at m

Re: [I] TestTrie OOMs [lucene]

2025-04-14 Thread via GitHub
dweiss commented on issue #14487: URL: https://github.com/apache/lucene/issues/14487#issuecomment-2803941886 Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] ci: pin github actions versions in use [lucene]

2025-04-14 Thread via GitHub
rmuir commented on PR #14492: URL: https://github.com/apache/lucene/pull/14492#issuecomment-2803947960 @dweiss I agree this is overkill for the "official" actions/ stuff. It is solid as I mentioned on the issue. Instead this is just about giving ourselves certain guarantees, regardle

[PR] ci: pin github actions versions in use [lucene]

2025-04-14 Thread via GitHub
rmuir opened a new pull request, #14492: URL: https://github.com/apache/lucene/pull/14492 Currently the github actions are pinned to major versions only, which means the code changes out from under us, without notice. Instead pin the versions exactly, and have dependabot send us PR up

[I] update PRs for java dependencies [lucene]

2025-04-14 Thread via GitHub
rmuir opened a new issue, #14490: URL: https://github.com/apache/lucene/issues/14490 ### Description I added dependabot.yml in https://github.com/apache/lucene/pull/14462 Currently it sends us pull requests for: - github actions - pip dependencies in `dev-tools/` Bu

[I] better pin github actions versions [lucene]

2025-04-14 Thread via GitHub
rmuir opened a new issue, #14491: URL: https://github.com/apache/lucene/issues/14491 ### Description I added dependabot.yml in https://github.com/apache/lucene/pull/14462 Currently it sends us pull requests for: - github actions - pip dependencies in dev-tools/ But

Re: [PR] Fix OOM of TestTrie [lucene]

2025-04-14 Thread via GitHub
gf2121 merged PR #14488: URL: https://github.com/apache/lucene/pull/14488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Fix OOM of TestTrie [lucene]

2025-04-14 Thread via GitHub
gf2121 commented on PR #14488: URL: https://github.com/apache/lucene/pull/14488#issuecomment-2803838248 Thank you @rmuir ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] TestTrie OOMs [lucene]

2025-04-14 Thread via GitHub
gf2121 commented on issue #14487: URL: https://github.com/apache/lucene/issues/14487#issuecomment-2803714454 Thank you @dweiss , the dump is great! Sorry for introducing this monster test. I did not realize we were using 512M heap for nightly tests. I don't think the `TrieBuilder` ca

Re: [I] TestIndexWriterMergePolicy opens too many file handles at night [lucene]

2025-04-14 Thread via GitHub
rmuir commented on issue #14483: URL: https://github.com/apache/lucene/issues/14483#issuecomment-2803747512 The problematic helper code (used by both test methods implicated here) is `stressUpdateSameDocumentWithMergeOnX`. This one looks tricky as it indexes a non-obvious amount of do

Re: [I] Add a timeout for forceMergeDeletes in IndexWriter [lucene]

2025-04-14 Thread via GitHub
houserjohn commented on issue #14431: URL: https://github.com/apache/lucene/issues/14431#issuecomment-2803254782 After looking into the suggestions you mentioned, I still believe there is a valid need for a timeout for `forceMergeDeletes`. In the first suggestion, you recommended using two

[PR] Fix OOM of TestTrie [lucene]

2025-04-14 Thread via GitHub
gf2121 opened a new pull request, #14488: URL: https://github.com/apache/lucene/pull/14488 Closes https://github.com/apache/lucene/issues/14487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Test TestIndexWriterWithThreads#testIOExceptionDuringWriteSegmentWithThreadsOnlyOnce Failed [lucene]

2025-04-14 Thread via GitHub
aoli-al commented on issue #13552: URL: https://github.com/apache/lucene/issues/13552#issuecomment-2803201150 Fixed #14424 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[I] TestIndexWriterMergePolicy opens too many file handles at night [lucene]

2025-04-14 Thread via GitHub
rmuir opened a new issue, #14483: URL: https://github.com/apache/lucene/issues/14483 ### Description Haven't dug in yet: this test probably just needs tweaking to use less files, there are some helper methods we can use to do it. ``` Build: https://jenkins.thetaphi.de/job/Lu

Re: [PR] cache preset dict for LZ4WithPresetDictDecompressor [lucene]

2025-04-14 Thread via GitHub
github-actions[bot] commented on PR #14397: URL: https://github.com/apache/lucene/pull/14397#issuecomment-2803410041 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-04-14 Thread via GitHub
dweiss commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r2042933311 ## lucene/core/src/java/org/apache/lucene/codecs/Codec.java: ## @@ -56,7 +56,7 @@ static NamedSPILoader getLoader() { } @SuppressWarnings("NonFinalStaticFie

Re: [I] Test TestIndexWriterWithThreads#testIOExceptionDuringWriteSegmentWithThreadsOnlyOnce Failed [lucene]

2025-04-14 Thread via GitHub
aoli-al closed issue #13552: Test TestIndexWriterWithThreads#testIOExceptionDuringWriteSegmentWithThreadsOnlyOnce Failed URL: https://github.com/apache/lucene/issues/13552 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] TestTrie OOMs [lucene]

2025-04-14 Thread via GitHub
jainankitk commented on issue #14487: URL: https://github.com/apache/lucene/issues/14487#issuecomment-2802977682 Having these dumps on OOM is so useful. Thank you @dweiss for https://github.com/apache/lucene/issues/14481! -- This is an automated message from the Apache Git Service. To res

Re: [I] Dump hprof on OOM from tests [lucene]

2025-04-14 Thread via GitHub
uschindler commented on issue #14481: URL: https://github.com/apache/lucene/issues/14481#issuecomment-2802644911 I replaced config like this: ``` jenkins@serv1:~/jobs$ fgrep '' */config.xml Forbidden-APIs/config.xml: dist/** Lucene-10.x-Linux/config.xml: **/build*/*

Re: [I] TestTrie OOMs [lucene]

2025-04-14 Thread via GitHub
dweiss commented on issue #14487: URL: https://github.com/apache/lucene/issues/14487#issuecomment-2802994408 I don't know how to fix this one - not familiar with this area at all. Seems like limiting the test to smaller instances isn't the right way to do it though. -- This is an automate

[PR] Reference managed MultiReader [lucene]

2025-04-14 Thread via GitHub
vigyasharma opened a new pull request, #14486: URL: https://github.com/apache/lucene/pull/14486 Going by #13976 and this [dated stack overflow thread](https://stackoverflow.com/questions/49817453/searchermanager-and-multireader-in-lucene), it seems like there is desire for a reference manag

Re: [I] TestTrie OOMs [lucene]

2025-04-14 Thread via GitHub
dweiss commented on issue #14487: URL: https://github.com/apache/lucene/issues/14487#issuecomment-2802983241 So this seed hits the trie builder with 40k strings and requires _a lot_ of memory to process them. There is a comment in trie builder: ``` TODO make this trie builder a more me

Re: [PR] Multireader Support in Searcher Manager [lucene]

2025-04-14 Thread via GitHub
vigyasharma commented on PR #13976: URL: https://github.com/apache/lucene/pull/13976#issuecomment-2802966023 I hacked together a prototype impl. for a reference managed `MultiReader`. I can add some tests and clean it up if this meets our requirements. – https://github.com/apache/lucene/pu

[I] TestTrie OOMs [lucene]

2025-04-14 Thread via GitHub
dweiss opened a new issue, #14487: URL: https://github.com/apache/lucene/issues/14487 ### Description TestTrie OOMs on nightly. Repro: ``` ./gradlew :lucene:core:test --tests "org.apache.lucene.codecs.lucene103.blocktree.TestTrie" -Ptests.jvms=6 "-Ptests.jvmargs=-XX:-UseCompres

Re: [I] Dynamic threshold for DocIdSetBuilder [lucene]

2025-04-14 Thread via GitHub
jainankitk commented on issue #14485: URL: https://github.com/apache/lucene/issues/14485#issuecomment-2802944756 Thinking bit more, this selectivity approach won't work as the docId arrive out of order and only sorted during `DocIdSetIterator#build`. Although, with the introduction of `Leaf

Re: [I] Dump hprof on OOM from tests [lucene]

2025-04-14 Thread via GitHub
uschindler commented on issue #14481: URL: https://github.com/apache/lucene/issues/14481#issuecomment-2802646795 Tell me if does or doesnt work! Not sure how to provoke an OOM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[I] TestIndexWriterWithThreads can sometimes use too many open files [lucene]

2025-04-14 Thread via GitHub
rmuir opened a new issue, #14484: URL: https://github.com/apache/lucene/issues/14484 ### Description This is distinct from #14483, didn't happen in a nightly build but instead an s390 one. NOTE: reproduce with: gradlew test --tests TestIndexWriterWithThreads.testCloseWithThrea

Re: [I] TestIndexWriterMergePolicy opens too many file handles at night [lucene]

2025-04-14 Thread via GitHub
rmuir commented on issue #14483: URL: https://github.com/apache/lucene/issues/14483#issuecomment-2802644523 It also happens with a sister test method `testStressUpdateSameDocumentWithMergeOnGetReader`, likely sharing same code and needing similar adjustments: ``` 2 tests failed.

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-14 Thread via GitHub
benwtrent commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2042590532 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -130,4 +134,48 @@ public KnnVectorsReader getMergeInstance() { * The default implem

Re: [I] Dump hprof on OOM from tests [lucene]

2025-04-14 Thread via GitHub
uschindler commented on issue #14481: URL: https://github.com/apache/lucene/issues/14481#issuecomment-2802599793 I raised the number of builds to keep to 100 already. I also ask it to keep 5 days minimum. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-04-14 Thread via GitHub
msokolov commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2801573375 I think this whole `conda` Python ecosystem is kind of alien to folks used to working in Java world primarily, so it is probably a stumbling block for this PR. I'm struggling a bit with

Re: [I] Dump hprof on OOM from tests [lucene]

2025-04-14 Thread via GitHub
uschindler commented on issue #14481: URL: https://github.com/apache/lucene/issues/14481#issuecomment-2802505790 What pattern would you suggest? I am not sure if the `*.events` is still relevant. That was the ANT output to connect test runner with ant. -- This is an automated message from

Re: [I] Dump hprof on OOM from tests [lucene]

2025-04-14 Thread via GitHub
uschindler commented on issue #14481: URL: https://github.com/apache/lucene/issues/14481#issuecomment-2802466381 I will add the following pattern for archiving. Then it would archive the hprof files as artifacts: `**/tests-cwd/*.hprof` -- This is an automated message from the Apache Gi

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-14 Thread via GitHub
navneet1v commented on PR #14426: URL: https://github.com/apache/lucene/pull/14426#issuecomment-2802385473 @ChrisHegarty this PR provides the info on how much off heap space is needed but this doesn't provide info on how much is loaded into memory correct? and do we have any plans to expose

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-04-14 Thread via GitHub
gf2121 merged PR #14333: URL: https://github.com/apache/lucene/pull/14333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-04-14 Thread via GitHub
gf2121 commented on PR #14333: URL: https://github.com/apache/lucene/pull/14333#issuecomment-2802090562 Thank you @mikemccand and @jpountz for the patient review and all these great suggestions! I raised https://github.com/mikemccand/luceneutil/pull/369 to switch codec for luceneutil

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-04-14 Thread via GitHub
mikemccand commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r2042247332 ## lucene/core/src/java/org/apache/lucene/codecs/Codec.java: ## @@ -56,7 +56,7 @@ static NamedSPILoader getLoader() { } @SuppressWarnings("NonFinalStati

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-04-14 Thread via GitHub
msokolov commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2801486476 Sorry I've been out to lunch on this - yes, I agree we should move forward here. I think we might have ended up in some analysis paralysis. We have a good candidate let's push it to mai

Re: [I] Dump hprof on OOM from tests [lucene]

2025-04-14 Thread via GitHub
uschindler commented on issue #14481: URL: https://github.com/apache/lucene/issues/14481#issuecomment-2801441402 where are the dumps currently saved? search anywhere or use `**/*.hprof`? In Ant, we used `-XX:HeapDumpPath=path` and used the heapdumps path above. Should we keep it in bu

Re: [I] Dump hprof on OOM from tests [lucene]

2025-04-14 Thread via GitHub
uschindler commented on issue #14481: URL: https://github.com/apache/lucene/issues/14481#issuecomment-2801351717 Will check that! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-14 Thread via GitHub
thecoop opened a new pull request, #14482: URL: https://github.com/apache/lucene/pull/14482 Refactor `IOContext` and create `FileOpenHint` to specify how files are likely to be accessed once opened. Relates #14422 -- This is an automated message from the Apache Git Service. To resp

Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-14 Thread via GitHub
thecoop commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2041871342 ## lucene/core/src/java/org/apache/lucene/store/IOContext.java: ## @@ -44,71 +37,125 @@ public enum Context { DEFAULT }; + /** Implemented by classes that c

Re: [I] Dump hprof on OOM from tests [lucene]

2025-04-14 Thread via GitHub
dweiss commented on issue #14481: URL: https://github.com/apache/lucene/issues/14481#issuecomment-2801240542 @uschindler , you'd need to modify artifact capturing globs to include **/*.hprof - currently they're using ant's old paths: ``` No artifacts found that match the file pattern

Re: [I] Dump hprof on OOM from tests [lucene]

2025-04-14 Thread via GitHub
dweiss commented on issue #14481: URL: https://github.com/apache/lucene/issues/14481#issuecomment-2801235997 Applied to main and branch_10x. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] [Bug] Stored fields force merge regression between Lucene 9.12 and Lucene 10.0 [lucene]

2025-04-14 Thread via GitHub
bharath-techie commented on issue #14463: URL: https://github.com/apache/lucene/issues/14463#issuecomment-2800468376 Looks like `compound` is true and `FDT` being within the compound file matters here and it ended up being the root cause. `fieldsStream = d.openInput(fieldsStreamFN

Re: [I] TestFloatVectorSimilarityQuery.testTimeout fails intermittently [lucene]

2025-04-14 Thread via GitHub
dweiss commented on issue #14480: URL: https://github.com/apache/lucene/issues/14480#issuecomment-2800738236 Thank you for trying to figure this out, @jainankitk ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[I] TestFloatVectorSimilarityQuery.testTimeout fails intermittently [lucene]

2025-04-14 Thread via GitHub
jainankitk opened a new issue, #14480: URL: https://github.com/apache/lucene/issues/14480 ### Description ### Description TestFloatVectorSimilarityQuery.testTimeout fails non-deterministically (10 out of 50) on main in Linux: ``` 2> NOTE: reproduce with: gradlew test --