[GitHub] [lucene] jpountz commented on pull request #12051: Fix wrong assertion in TestBooleanQuery.testQueryMatchesCount
jpountz commented on PR #12051: URL: https://github.com/apache/lucene/pull/12051#issuecomment-1368388218 Thanks for catching this. Would it also work if we fixed indexing to sometimes index other values, e.g. replacing `if (random().nextBoolean()) {` with `if (i != 3 && random().nextBoolean()) {` and force-merged before opening a reader? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz opened a new pull request, #12053: Allow reusing indexed binary fields.
jpountz opened a new pull request, #12053: URL: https://github.com/apache/lucene/pull/12053 Today Lucene allows creating indexed binary fields, e.g. via `StringField(String, BytesRef, Field.Store)`, but not reusing them: calling `setBytesValue` on a `StringField` throws. This commit removes the check that prevents reusing fields with binary values. I considered an alternative that consisted of failing if calling `setBytesValue` on a field that is indexed and tokenized, but we currently don't have such checks e.g. on numeric values, so it did not feel consistent. Doing this change would help improve the [nightly benchmarks for the NYC taxis dataset](http://people.apache.org/~mikemccand/lucenebench/sparseResults.html) by doing the String -> UTF-8 conversion only once for keywords, instead of once for the `StringField` and one for the `SortedDocValuesField`, while still reusing fields. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz opened a new pull request, #12054: Introduce a new `KeywordField`.
jpountz opened a new pull request, #12054: URL: https://github.com/apache/lucene/pull/12054 `KeywordField` is a combination of `StringField` and `SortedSetDocValuesField`, similarly to how `LongField` is a combination of `LongPoint` and `SortedNumericDocValuesField`. This makes it easier for users to create fields that can be used for filtering, sorting and faceting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz opened a new pull request, #12055: Better skipping for multi-term queries with a FILTER rewrite.
jpountz opened a new pull request, #12055: URL: https://github.com/apache/lucene/pull/12055 Currently multi-term queries with a filter rewrite internally rewrite to a disjunction if 16 terms or less match the query. Otherwise postings lists of matching terms are collected into a `DocIdSetBuilder`. This change replaces the latter with a mixed approach where a disjunction is created between the 16 terms that have the highest document frequency and an iterator produced from the `DocIdSetBuilder` that collects all other terms. On fields that have a zipfian distribution, it's quite likely that no high-frequency terms make it to the `DocIdSetBuilder`. This provides two main benefits: - Queries are less likely to allocate a FixedBitSet of size `maxDoc`. - Queries are better at skipping or early terminating. On the other hand, queries that need to consume most or all matching documents may get a slowdown. The slowdown is unfortunate, but my gut feeling is that this change still has more pros than cons. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.
jpountz commented on PR #12055: URL: https://github.com/apache/lucene/pull/12055#issuecomment-1368422269 Here is what luceneutil gives on wikimedium10m: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseDateTaxoFacets 43.81 (3.9%) 42.63 (12.5%) -2.7% ( -18% - 14%) 0.359 OrHighNotLow 397.64 (9.3%) 387.31 (7.0%) -2.6% ( -17% - 15%) 0.320 BrowseDayOfYearTaxoFacets 44.21 (4.4%) 43.10 (12.8%) -2.5% ( -18% - 15%) 0.406 OrHighNotMed 439.76 (8.8%) 431.26 (6.6%) -1.9% ( -15% - 14%) 0.432 OrHighNotHigh 349.59 (8.0%) 342.97 (5.8%) -1.9% ( -14% - 12%) 0.391 BrowseMonthTaxoFacets 29.26 (8.8%) 28.75 (12.3%) -1.7% ( -21% - 21%) 0.609 OrNotHighHigh 359.69 (6.8%) 353.47 (5.4%) -1.7% ( -13% - 11%) 0.374 MedTerm 741.89 (6.7%) 729.74 (6.8%) -1.6% ( -14% - 12%) 0.442 AndHighHigh 104.97 (5.6%) 103.30 (5.7%) -1.6% ( -12% - 10%) 0.373 HighTerm 509.66 (7.1%) 501.79 (7.3%) -1.5% ( -14% - 13%) 0.498 OrHighHigh 46.45 (4.3%) 45.79 (3.3%) -1.4% ( -8% -6%) 0.240 LowTerm 972.50 (7.7%) 959.89 (6.7%) -1.3% ( -14% - 14%) 0.570 HighTermTitleSort 174.75 (6.6%) 172.71 (5.5%) -1.2% ( -12% - 11%) 0.544 AndHighLow 1288.70 (2.7%) 1274.94 (3.1%) -1.1% ( -6% -4%) 0.247 OrNotHighMed 456.13 (4.5%) 452.07 (3.9%) -0.9% ( -8% -7%) 0.504 HighTermMonthSort 3799.69 (6.2%) 3765.99 (4.8%) -0.9% ( -11% - 10%) 0.613 BrowseMonthSSDVFacets 21.87 (9.8%) 21.67 (10.7%) -0.9% ( -19% - 21%) 0.786 HighPhrase 93.80 (7.4%) 92.97 (6.1%) -0.9% ( -13% - 13%) 0.680 LowSloppyPhrase 59.38 (3.5%) 58.90 (4.2%) -0.8% ( -8% -7%) 0.513 Fuzzy2 49.64 (1.6%) 49.26 (2.7%) -0.8% ( -4% -3%) 0.268 Fuzzy1 108.72 (1.5%) 107.94 (1.6%) -0.7% ( -3% -2%) 0.148 LowSpanNear 157.46 (4.0%) 156.35 (4.0%) -0.7% ( -8% -7%) 0.577 BrowseRandomLabelSSDVFacets 14.99 (5.9%) 14.88 (5.9%) -0.7% ( -11% - 11%) 0.712 AndHighHighDayTaxoFacets6.15 (6.0%)6.11 (5.6%) -0.6% ( -11% - 11%) 0.743 AndHighMed 206.86 (5.1%) 205.71 (5.7%) -0.6% ( -10% - 10%) 0.745 OrHighMed 178.33 (3.7%) 177.55 (3.7%) -0.4% ( -7% -7%) 0.709 MedSpanNear 55.68 (3.1%) 55.48 (3.3%) -0.4% ( -6% -6%) 0.713 HighSpanNear 14.27 (3.5%) 14.23 (3.1%) -0.3% ( -6% -6%) 0.780 Respell 106.00 (1.8%) 105.77 (1.6%) -0.2% ( -3% -3%) 0.695 HighTermTitleBDVSort 18.32 (3.7%) 18.30 (5.7%) -0.1% ( -9% -9%) 0.927 PKLookup 235.19 (3.3%) 234.99 (3.8%) -0.1% ( -6% -7%) 0.939 MedSloppyPhrase 19.51 (3.8%) 19.49 (4.0%) -0.1% ( -7% -8%) 0.957 MedIntervalsOrdered 72.80 (5.6%) 72.76 (5.1%) -0.1% ( -10% - 11%) 0.971 OrHighLow 711.52 (2.3%) 712.77 (2.6%)0.2% ( -4% -5%) 0.823 LowPhrase 37.03 (5.3%) 37.10 (4.7%)0.2% ( -9% - 10%) 0.908 MedPhrase 147.57 (5.0%) 147.86 (4.2%)0.2% ( -8% -9%) 0.893 BrowseRandomLabelTaxoFacets 35.14 (10.5%) 35.21 (12.7%)0.2% ( -20% - 26%) 0.955 HighTermDayOfYearSort 394.78 (5.3%) 395.67 (2.7%)0.2% ( -7% -8%) 0.865 TermDTSort 129.27 (3.6%) 129.82 (3.7%)0.4% ( -6% -7%) 0.711 AndHighMedDayTaxoFacets 157.89 (2.1%) 158.59 (1.8%)0.4% ( -3% -4%) 0.475 OrNotHighLow 1014.83 (4.5%) 1020.03 (4.1%)0.5% ( -7% -
[GitHub] [lucene] jpountz commented on pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.
jpountz commented on PR #12055: URL: https://github.com/apache/lucene/pull/12055#issuecomment-1368423502 For the record, the reason why we're seeing a speedup here is because prefix and wildcard queries produce constant scores, so the query can early terminate once 1,000 hits have been collected. Before the change, we would always create a bitset of all matches, and that would force evaluating the query against the entire doc ID space up-front. Evaluation is more lazy now, with only low-frequency postings being evaluated up-front and high-frequency postings being evaulated lazily. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov merged pull request #12047: fix typo analysis-kuromoji
msokolov merged PR #12047: URL: https://github.com/apache/lucene/pull/12047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] twosom commented on pull request #12047: fix typo analysis-kuromoji
twosom commented on PR #12047: URL: https://github.com/apache/lucene/pull/12047#issuecomment-1368474848 @msokolov Thanks~! and Happy New Year!👻 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a diff in pull request #12029: introduce support in KnnVectorQuery for getters/setters
msokolov commented on code in PR #12029: URL: https://github.com/apache/lucene/pull/12029#discussion_r1059769467 ## lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java: ## @@ -33,6 +33,7 @@ import org.apache.lucene.store.Directory; import org.apache.lucene.util.TestVectorUtil; import org.apache.lucene.util.VectorUtil; +import org.junit.Assert; Review Comment: OK, tiny nit here - but LuceneTestCase inherits from Assert so we don't need to import and can just use the assertions directly without qualification. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #12048: Move HNSW parameters to the HnswGraphBuilder class
msokolov commented on PR #12048: URL: https://github.com/apache/lucene/pull/12048#issuecomment-1368477965 Sorry, I don't see this being any better than the current situation; aside from tests, the parameters are only used in HnswVectorsFormat where they are currently defined, so I think we should leave them there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11354: Reuse HNSW graphs when merging segments? [LUCENE-10318]
msokolov commented on issue #11354: URL: https://github.com/apache/lucene/issues/11354#issuecomment-1368479497 HI Jack, thanks for persisting and returning to this. I haven't had a chance to review the PR yet, just looking at the results here I have a few questions. First, it looks to me as if we see some very since improvement for the larger graphs, preserve the same recall, and changes to QPS are probably noise. I guess the assumption is we are producing similar results with less work? Just so we can understand these results a little better, could you document how you arrived at them? What dataset did you use? How did you measure the times and recall (was it using KnnGraphTester? luceneutil? some other benchmarking tool?). I'd also be curious to see the numbers and sizes of the segments in the results: I assume they would be unchanged from Control to Test, but it would be nice to be able to verify. Thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12053: Allow reusing indexed binary fields.
rmuir commented on PR #12053: URL: https://github.com/apache/lucene/pull/12053#issuecomment-1368481511 > I considered an alternative that consisted of failing if calling `setBytesValue` on a field that is indexed and tokenized Can we just do this instead? I think an important point here is that you shouldnt be calling setBytesValue if it is tokenized (TokenStream in use). You need Reader/String. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12053: Allow reusing indexed binary fields.
rmuir commented on PR #12053: URL: https://github.com/apache/lucene/pull/12053#issuecomment-1368481827 and yeah, you don't have such checks on numeric values, but numeric values don't have TokenStream tokenization. Being consistent with them makes no sense, that isn't what this is about. otherwise, if we cant agree here, lets just keep the restriction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12053: Allow reusing indexed binary fields.
rmuir commented on PR #12053: URL: https://github.com/apache/lucene/pull/12053#issuecomment-1368482600 the fact that the tests pass with this change is really upsetting too. we should at least add checks for the type of luser moments we want to prevent, e.g. calling setBytesRef on a fucking TextField, etc. If we dont add these checks then users are going to invoke these methods and... nothing will happen at all... or something that isn't what they want. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on pull request #12051: Fix wrong assertion in TestBooleanQuery.testQueryMatchesCount
zhaih commented on PR #12051: URL: https://github.com/apache/lucene/pull/12051#issuecomment-1368503412 Yeah it should work unless we later come up with some way to quickly pull out count in that situation as well. But I think the assertion here may not be necessary because I see you have already added a specific test testing more comprehensive situations where boolean weights should or should not return -1. The assertion here seems was introduced at the time when the `Weight#count` API was first introduced and should be removed IMO since we have had a non-default impl right now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request, #12056: Update to error-prone 2.17
rmuir opened a new pull request, #12056: URL: https://github.com/apache/lucene/pull/12056 I investigated each of the new checks, nothing really interesting except an incorrect javadoc link (discovered manually) linking to Object.finalize() -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new issue, #12057: Forbidden-apis "built-in" signatures don't appear to be working?
rmuir opened a new issue, #12057: URL: https://github.com/apache/lucene/issues/12057 ### Description I was looking at new error-prone checks in #12056 and one fails on Object.finalize Because the method is in the built-in JDK deprecated list (e.g. https://github.com/policeman-tools/forbidden-apis/blob/main/src/main/resources/de/thetaphi/forbiddenapis/signatures/jdk-deprecated-11.txt#L195), I would expect the check to fail if i override finalize. If I give `lucene/core/src/test/org/apache/lucene/TestDemo.java` a finalizer method, nothing fails. It makes me worried the built-in signatures lists aren't being applied somehow? Maybe the gradle task matching logic in the forbidden-apis config is buggy? not sure what is going on. cc: @uschindler ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12057: Forbidden-apis "built-in" signatures don't appear to be working?
rmuir commented on issue #12057: URL: https://github.com/apache/lucene/issues/12057#issuecomment-1368524586 Here's how to reproduce: apply this patch, then run `gradlew check -x test`. I would expect the build to fail, because we added a deprecated finalizer. Maybe forbidden doesn't fail because we don't actually call Object.finalize()? This method is a little special in that overriding it is enough to be bad. Maybe we should try to fix javac or ECJ to fail on deprecated usages instead? ``` diff --git a/lucene/core/src/test/org/apache/lucene/TestDemo.java b/lucene/core/src/test/org/apache/lucene/TestDemo.java index 6c608e1d0b1..8bcbdc813ee 100644 --- a/lucene/core/src/test/org/apache/lucene/TestDemo.java +++ b/lucene/core/src/test/org/apache/lucene/TestDemo.java @@ -46,6 +46,11 @@ import org.apache.lucene.util.IOUtils; */ public class TestDemo extends LuceneTestCase { + @Override + protected void finalize() { +System.out.println("YOLO"); + } + public void testDemo() throws IOException { String longTerm = "longtermlongtermlongtermlongtermlongtermlongtermlongtermlong" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12057: Forbidden-apis "built-in" signatures don't appear to be working?
rmuir commented on issue #12057: URL: https://github.com/apache/lucene/issues/12057#issuecomment-1368527335 Confirmed that's the issue, if i add a `super.finalize()` call to my finalizer, then forbidden fails. I will edit the issue. So we may need to use a different tool (javac, ecj) to ban finalizers. worst-case we just enable the error-prone check for them, but I try to avoid using error-prone if something simpler will do it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #12057: ban finalizers in the build somehow (worst-case: use error-prone)
rmuir commented on issue #12057: URL: https://github.com/apache/lucene/issues/12057#issuecomment-1368539424 Currently there is no good way with ECJ/javac, unless we fail on all deprecations, which is very noisy at the moment. We can probably do it better with ECJ if we enable all their deprecation options, and clean up codebase (e.g. ensure tests calling deprecated stuff are also themselves deprecated) * org.eclipse.jdt.core.compiler.problem.deprecation * org.eclipse.jdt.core.compiler.problem.deprecationInDeprecatedCode * org.eclipse.jdt.core.compiler.problem.deprecationWhenOverridingDeprecatedMethod But looking at the code, that's gonna require quite a bit of work, I just wanted to make some progress here and prevent finalizers from slipping in. Unfortunately the error-prone checker doesn't fail on this case either yet: https://github.com/google/error-prone/pull/3652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler opened a new pull request, #12058: Fix detection of Hotspot in TestRamUsageEstimator so it works with OpenJ9 that has the bean, but without properties
uschindler opened a new pull request, #12058: URL: https://github.com/apache/lucene/pull/12058 This improves the test, which fails with OpenJ9 VMs, due to the following problem: - OpenJ9 returns the HotspotMXBean, but it is empty and has no properties. So we can't detect compressed pointers. But the test requires it - The assumption now uses the compilation bean to detect this in tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12058: Fix detection of Hotspot in TestRamUsageEstimator so it works with OpenJ9 that has the bean, but without properties
uschindler commented on PR #12058: URL: https://github.com/apache/lucene/pull/12058#issuecomment-1368556378 Thanks Robert! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #12058: Fix detection of Hotspot in TestRamUsageEstimator so it works with OpenJ9 that has the bean, but without properties
uschindler merged PR #12058: URL: https://github.com/apache/lucene/pull/12058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.
rmuir commented on code in PR #12055: URL: https://github.com/apache/lucene/pull/12055#discussion_r1059804843 ## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ## @@ -183,23 +182,31 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException { } Query q = new ConstantScoreQuery(bq.build()); final Weight weight = searcher.rewrite(q).createWeight(searcher, scoreMode, score()); - return new WeightOrDocIdSet(weight); + return new WeightOrDocIdSetIterator(weight); } // Too many terms: go back to the terms we already collected and start building the bit set -DocIdSetBuilder builder = new DocIdSetBuilder(context.reader().maxDoc(), terms); +PriorityQueue highFrequencyTerms = +new PriorityQueue(collectedTerms.size()) { + @Override + protected boolean lessThan(PostingsEnum a, PostingsEnum b) { +return a.cost() < b.cost(); Review Comment: `pq.insertWithOverflow` uses `!lessThan()` in its code. So I'm worried about this PQ behaving stupidly on ties with the same `docFreq`. Is there a simple tiebreaker we can use (even synthetic such as `int termId`) so that such ties don't enter the PQ? I'm just concerned about "collect remaining terms" piece for cases where there are jazillions of terms. should also allow the IO to be a bit more sequential in such cases, rather than constantly replacing top of PQ with more ties? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.
rmuir commented on code in PR #12055: URL: https://github.com/apache/lucene/pull/12055#discussion_r1059807197 ## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ## @@ -183,23 +182,31 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException { } Query q = new ConstantScoreQuery(bq.build()); final Weight weight = searcher.rewrite(q).createWeight(searcher, scoreMode, score()); - return new WeightOrDocIdSet(weight); + return new WeightOrDocIdSetIterator(weight); } // Too many terms: go back to the terms we already collected and start building the bit set -DocIdSetBuilder builder = new DocIdSetBuilder(context.reader().maxDoc(), terms); +PriorityQueue highFrequencyTerms = +new PriorityQueue(collectedTerms.size()) { + @Override + protected boolean lessThan(PostingsEnum a, PostingsEnum b) { +return a.cost() < b.cost(); + } +}; +DocIdSetBuilder otherTerms = new DocIdSetBuilder(context.reader().maxDoc(), terms); if (collectedTerms.isEmpty() == false) { TermsEnum termsEnum2 = terms.iterator(); for (TermAndState t : collectedTerms) { termsEnum2.seekExact(t.term, t.state); -docs = termsEnum2.postings(docs, PostingsEnum.NONE); -builder.add(docs); +PostingsEnum postings = termsEnum2.postings(null, PostingsEnum.NONE); +highFrequencyTerms.add(postings); Review Comment: Rather than just blindly add terms to the PQ, should we just have a constant mininum `cost` threshold (e.g. 256, 1024, whatever) to even consider it? otherwise go directly to `otherTerms`. The skipping stuff isn't going to be useful for the long-tail of low-cost terms (the majority, if we are thinking zipf). Ideally we wouldnt waste our time unless it has skipdata? And we want to be careful about the performance of these queries when there are jazillions of jazillions of matching low-frequency terms. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.
rmuir commented on code in PR #12055: URL: https://github.com/apache/lucene/pull/12055#discussion_r1059807649 ## lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java: ## @@ -183,23 +182,31 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOException { } Query q = new ConstantScoreQuery(bq.build()); final Weight weight = searcher.rewrite(q).createWeight(searcher, scoreMode, score()); - return new WeightOrDocIdSet(weight); + return new WeightOrDocIdSetIterator(weight); } // Too many terms: go back to the terms we already collected and start building the bit set -DocIdSetBuilder builder = new DocIdSetBuilder(context.reader().maxDoc(), terms); +PriorityQueue highFrequencyTerms = +new PriorityQueue(collectedTerms.size()) { + @Override + protected boolean lessThan(PostingsEnum a, PostingsEnum b) { +return a.cost() < b.cost(); + } +}; +DocIdSetBuilder otherTerms = new DocIdSetBuilder(context.reader().maxDoc(), terms); if (collectedTerms.isEmpty() == false) { TermsEnum termsEnum2 = terms.iterator(); for (TermAndState t : collectedTerms) { termsEnum2.seekExact(t.term, t.state); -docs = termsEnum2.postings(docs, PostingsEnum.NONE); -builder.add(docs); +PostingsEnum postings = termsEnum2.postings(null, PostingsEnum.NONE); +highFrequencyTerms.add(postings); } } -// Then keep filling the bit set with remaining terms +// Then collect remaining terms +PostingsEnum postings = null; do { - docs = termsEnum.postings(docs, PostingsEnum.NONE); + postings = termsEnum.postings(postings, PostingsEnum.NONE); Review Comment: i don't understand how this is safe at all, we are reusing PostingsEnum instances yet also stuffing them into a priority queue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler closed issue #8485: TestIndexWriterOnError.testCheckpoint fails on IBM J9 [LUCENE-7432]
uschindler closed issue #8485: TestIndexWriterOnError.testCheckpoint fails on IBM J9 [LUCENE-7432] URL: https://github.com/apache/lucene/issues/8485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #7580: Reproducible fieldcache AIOOBE only on J9 [LUCENE-6522]
uschindler commented on issue #7580: URL: https://github.com/apache/lucene/issues/7580#issuecomment-1368575244 This seems fixed now, tets no longer fails. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler closed issue #7580: Reproducible fieldcache AIOOBE only on J9 [LUCENE-6522]
uschindler closed issue #7580: Reproducible fieldcache AIOOBE only on J9 [LUCENE-6522] URL: https://github.com/apache/lucene/issues/7580 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler closed issue #7579: org.apache.xerces.util is a protected pkg on IBM J9 [LUCENE-6521]
uschindler closed issue #7579: org.apache.xerces.util is a protected pkg on IBM J9 [LUCENE-6521] URL: https://github.com/apache/lucene/issues/7579 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #7579: org.apache.xerces.util is a protected pkg on IBM J9 [LUCENE-6521]
uschindler commented on issue #7579: URL: https://github.com/apache/lucene/issues/7579#issuecomment-1368575466 This is fixed in J9, as it now uses OpenJDK class library. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler closed issue #7575: mockfilesystem tests fail with IBM jdk [LUCENE-6517]
uschindler closed issue #7575: mockfilesystem tests fail with IBM jdk [LUCENE-6517] URL: https://github.com/apache/lucene/issues/7575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #7575: mockfilesystem tests fail with IBM jdk [LUCENE-6517]
uschindler commented on issue #7575: URL: https://github.com/apache/lucene/issues/7575#issuecomment-1368575612 This should no longer be an issue, as OpenJ9 uses the OpenJDK class library now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #7614: TestQueryTemplateManager always fails on J9 [LUCENE-6556]
uschindler commented on issue #7614: URL: https://github.com/apache/lucene/issues/7614#issuecomment-1368575794 This is no longer an issue, all tests pass, because OpenJ9 now uses the OpenJDK class library and no longer Harmony. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler closed issue #7614: TestQueryTemplateManager always fails on J9 [LUCENE-6556]
uschindler closed issue #7614: TestQueryTemplateManager always fails on J9 [LUCENE-6556] URL: https://github.com/apache/lucene/issues/7614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #7580: Reproducible fieldcache AIOOBE only on J9 [LUCENE-6522]
uschindler commented on issue #7580: URL: https://github.com/apache/lucene/issues/7580#issuecomment-1368576005 In addition, Lucene has no fieldcache anymore. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler closed issue #5001: TestNRTManager hangs with IBM JRE [LUCENE-3928]
uschindler closed issue #5001: TestNRTManager hangs with IBM JRE [LUCENE-3928] URL: https://github.com/apache/lucene/issues/5001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #5001: TestNRTManager hangs with IBM JRE [LUCENE-3928]
uschindler commented on issue #5001: URL: https://github.com/apache/lucene/issues/5001#issuecomment-1368577762 This test now passes with IBM Semeru / OpenJ9 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jasirkt commented on issue #11701: Deadlock in AnalysisSPILoader [LUCENE-10665]
jasirkt commented on issue #11701: URL: https://github.com/apache/lucene/issues/11701#issuecomment-1368693920 > In which verison did you see this? 9.1.0 Thanks for fixing. It works now! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org