[PR] IndexWriter: Treat java.lang.Error as tragedy [lucene]

2024-04-05 Thread via GitHub
rmuir opened a new pull request, #13277: URL: https://github.com/apache/lucene/pull/13277 Background: Historically IndexWriter treated OutOfMemoryError special, for defensive reasons. It was expanded to VirtualMachineError, to try to play it safe in similar disastrous circumstances.

Re: [I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
rmuir commented on issue #13275: URL: https://github.com/apache/lucene/issues/13275#issuecomment-2040800816 Please, see PR: #13277 I think we should be more defensive here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Use IOContext#RANDOM when appropriate. [lucene]

2024-04-05 Thread via GitHub
jpountz merged PR #13267: URL: https://github.com/apache/lucene/pull/13267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add new pluggable vector similarity to field info [lucene]

2024-04-05 Thread via GitHub
benwtrent commented on PR #13200: URL: https://github.com/apache/lucene/pull/13200#issuecomment-2040503529 @jimczi I match your wall of text with my own :). > I am a bit concerned about the generalization here. The whole similarity is currently modeled around the

Re: [PR] Propagate the flush IOContext to stored fields / term vectors writers when index sorting is enabled. [lucene]

2024-04-05 Thread via GitHub
jpountz merged PR #13265: URL: https://github.com/apache/lucene/pull/13265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
rmuir commented on issue #13275: URL: https://github.com/apache/lucene/issues/13275#issuecomment-2040348204 and really, that safety mechanism is probably the easiest way to defend against this one. Problem is that only `VirtualMachineError` is considered a tragedy, and `IncompatibleClassCha

[PR] Add 'passageSortComparator' option in FieldHighlighter [lucene]

2024-04-05 Thread via GitHub
Seunghan-Jung opened a new pull request, #13276: URL: https://github.com/apache/lucene/pull/13276 ### Description `FieldHighlighter` always sorts the final selected passages based on startOffset, but this may not align with the user's intentions. For example, in the case of Sol

Re: [I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
rmuir commented on issue #13275: URL: https://github.com/apache/lucene/issues/13275#issuecomment-2040296052 This is why indexwriter closes itself on oom. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
DaveCTurner commented on issue #13275: URL: https://github.com/apache/lucene/issues/13275#issuecomment-2040291818 You could get an exception in these places even if there were no JVM bug, for instance an OOME can be thrown basically anywhere. The JVM still carries on running `finally` block

Re: [I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
benwtrent commented on issue #13275: URL: https://github.com/apache/lucene/issues/13275#issuecomment-2040291826 @rmuir for sure, I am not suggesting one solution or the other. Just that the only way I could reproduce was abusing the code in horrific ways. Which is a testament to the general

Re: [I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
rmuir commented on issue #13275: URL: https://github.com/apache/lucene/issues/13275#issuecomment-2040281982 Only thing lucene could do is to not do this reference counting with java code in indexwriter. Can't make java code more complicated to deal with issues like this, if the jvm is broke

Re: [PR] Revert version files to not include unreleased version [lucene]

2024-04-05 Thread via GitHub
benwtrent merged PR #13274: URL: https://github.com/apache/lucene/pull/13274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] update commons-compress from 1.19 to 1.21 [lucene]

2024-04-05 Thread via GitHub
rmuir commented on PR #13270: URL: https://github.com/apache/lucene/pull/13270#issuecomment-2040158640 anyway, if there's no objection, i'd like to bump this to 1.21 which is an easy win and doesnt modify the dependency graph. -- This is an automated message from the Apache Git Service.

Re: [I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
rmuir commented on issue #13275: URL: https://github.com/apache/lucene/issues/13275#issuecomment-2040140953 if the bug is in G1, just use a different collector to workaround it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
benwtrent commented on issue #13275: URL: https://github.com/apache/lucene/issues/13275#issuecomment-2040132270 I was able to replicate the weird behavior by randomly failing an `incRef` for a given file. The JDK bug could occur during a simple iteration, meaning, we could increment some bu

Re: [I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
uschindler commented on issue #13275: URL: https://github.com/apache/lucene/issues/13275#issuecomment-2040094344 Now it makes sense: The broken segments look like a side-effect of the JVM bug because it brings IndexWriter into some broken state so it deletes the files without the commit ful

Re: [I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
DaveCTurner commented on issue #13275: URL: https://github.com/apache/lucene/issues/13275#issuecomment-2039959127 In particular I'm a little suspicious about the way we call `org.apache.lucene.index.IndexFileDeleter#incRef(org.apache.lucene.index.SegmentInfos, boolean)` _after_ doing the re

Re: [I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
DaveCTurner commented on issue #13275: URL: https://github.com/apache/lucene/issues/13275#issuecomment-2039906668 All we have right now is a correlation: we've seen quite a few nodes fall over with that JVM bug, and some subset of them have come back up and discovered a missing `.si` (and `

Re: [I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
uschindler commented on issue #13275: URL: https://github.com/apache/lucene/issues/13275#issuecomment-2039883343 Are we sure this is all the same bug in JDK 22? The above bugs are about FillerArray and the other one about strange segment states. How does this fit together? -- This

Re: [I] Demo indexing custom ordinal data in the taxonomy [lucene]

2024-04-05 Thread via GitHub
stefanvodita commented on issue #13166: URL: https://github.com/apache/lucene/issues/13166#issuecomment-2039818948 The demo would facet over docs in the main index and aggregate over values stored in the taxonomy, e.g. using `IntTaxonomyFacets`. The missing piece is reading values from the

[I] Strange Segment State after encoutering JDK22 bug [lucene]

2024-04-05 Thread via GitHub
benwtrent opened a new issue, #13275: URL: https://github.com/apache/lucene/issues/13275 ### Description There is a nasty bug in JDK 22: Elasticsearch issue: https://github.com/elastic/elasticsearch/issues/106987 JDK issue: https://bugs.openjdk.org/browse/JDK-8329528 W

[PR] Revert version files to not include unreleased version [lucene]

2024-04-05 Thread via GitHub
benwtrent opened a new pull request, #13274: URL: https://github.com/apache/lucene/pull/13274 I was a little too eager. Removing unreleased version so that smoke tests are happy again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Fix TestTaxonomyFacetValueSource.testRandom [lucene]

2024-04-05 Thread via GitHub
stefanvodita commented on PR #13198: URL: https://github.com/apache/lucene/pull/13198#issuecomment-2039599158 @iamsanjay - thank you for working on this! I merged #12966, which should mean the original test failure is fixed. Do you want to verify that all is working as expected now? -- T

Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

2024-04-05 Thread via GitHub
stefanvodita merged PR #12966: URL: https://github.com/apache/lucene/pull/12966 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [I] Is it correct for facets to assume positive aggregation values? [lucene]

2024-04-05 Thread via GitHub
stefanvodita closed issue #12585: Is it correct for facets to assume positive aggregation values? URL: https://github.com/apache/lucene/issues/12585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

2024-04-05 Thread via GitHub
stefanvodita commented on PR #12966: URL: https://github.com/apache/lucene/pull/12966#issuecomment-2039548228 I did another benchmark run after the rebase just to make sure I haven't broken anything when integrating the split taxo arrays change. I see no significant changes. `python3

Re: [PR] update commons-compress from 1.19 to 1.21 [lucene]

2024-04-05 Thread via GitHub
rmuir commented on PR #13270: URL: https://github.com/apache/lucene/pull/13270#issuecomment-2039464814 @dweiss I think the problem is more keeping up, eg I'm unable to get things working with their latest versions. Weve got dependencies on jars like nekohtml unmaintained for over a de

Re: [PR] update commons-compress from 1.19 to 1.21 [lucene]

2024-04-05 Thread via GitHub
dweiss commented on PR #13270: URL: https://github.com/apache/lucene/pull/13270#issuecomment-2039122529 If it's used by the benchmark module then I don't think it's that problematic? Similar to test framework dependencies? Your suggestion to use a pipe works too but will hurt people living

Re: [I] TestIndexFileDeleter.testExcInDecRef test failure [LUCENE-9839] [lucene]

2024-04-05 Thread via GitHub
dweiss commented on issue #10878: URL: https://github.com/apache/lucene/issues/10878#issuecomment-2039114558 It still does pop up from time to time. No idea what's happening there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH