[GitHub] [lucene] matriv commented on issue #11459: Remove uses of wall-clock time in codebase [LUCENE-10423]
matriv commented on issue #11459: URL: https://github.com/apache/lucene/issues/11459#issuecomment-1237784068 @rmuir When you have time, please take a look at the PR and let me know, Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #11749: Remove usages of System.currentTimeMillis() from tests
rmuir commented on code in PR #11749: URL: https://github.com/apache/lucene/pull/11749#discussion_r963782724 ## lucene/classification/src/test/org/apache/lucene/classification/Test20NewsgroupsClassification.java: ## @@ -123,13 +124,13 @@ public void test20Newsgroups() throws Exception { System.out.println("Indexing 20 Newsgroups..."); -long startIndex = System.currentTimeMillis(); +long startIndex = Clock.systemDefaultZone().millis(); Review Comment: Can we fix this test to use `nanoTime` too (monotonic clock), rather than the system clock? Currently, the `precommit` checks are angry about use of the default time zone, but we can avoid dealing with timezones completely here i think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph
mayya-sharipova commented on PR #11743: URL: https://github.com/apache/lucene/pull/11743#issuecomment-1238235494 @msokolov > Although ... the RAM needed for the graph was always required, even when building the graph during flush, it just wasn't accounted for I think. I suppose a possible way to improve the buffering situation would be to buffer the vectors in RAM and then on merge, write them out, freeing the on-heap copy, and while building the graph, access the vectors from disk Indeed, that how we do that in Lucene 9.3, [using off-heap vector values to build graph](https://github.com/apache/lucene/blob/branch_9_3/lucene/core/src/java/org/apache/lucene/codecs/lucene92/Lucene92HnswVectorsWriter.java#L148-L154) The problem with this is that building graph on flush would take a lot of time, which makes searches that needed updated changes unpredictably long. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
rmuir commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238236097 Thanks for doing this work, this change looks great! avoiding wall-clock time should make tests more reproducible when they fail. also, this change fixes tests that were configured to run for `3 seconds` or similar, to instead use a fixed number of iterations. This should REALLY help reproducibility, esp when one CPU is faster than another. I added a small comment, i think we can fix that one classification test to get the build happy here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv commented on a diff in pull request #11749: Remove usages of System.currentTimeMillis() from tests
matriv commented on code in PR #11749: URL: https://github.com/apache/lucene/pull/11749#discussion_r963785882 ## lucene/classification/src/test/org/apache/lucene/classification/Test20NewsgroupsClassification.java: ## @@ -123,13 +124,13 @@ public void test20Newsgroups() throws Exception { System.out.println("Indexing 20 Newsgroups..."); -long startIndex = System.currentTimeMillis(); +long startIndex = Clock.systemDefaultZone().millis(); Review Comment: oops, of course, it was some play around commit that I messed up when preparing the PR, thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
matriv commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238246924 Thx @rmuir! Pushed the fix. Also I forgot to mention that for every replacement of time based by counter based, I added a counter and run the test on my machine multiple times and tried to use an avg value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph
mayya-sharipova commented on PR #11743: URL: https://github.com/apache/lucene/pull/11743#issuecomment-1238251543 @jpountz > could we buffer vectors on disk with the approach of building the graph during indexing? We explored this, but could not find any way to do that. To build a graph, we need to access to all vector values indexed so far. If vectors are buffered in memory, this works. But if we are to buffer vectors on disk, we can’t at the same time write vectors to this file and read vectors from it during the graph construction, as reading from unclosed index outputs is not possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
rmuir commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238256051 thanks @matriv for tuning the iterations. Much better than a random guess, which could lead to timeouts in CI servers, etc. I restarted the tests here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
matriv commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238258581 Mentioned, also, so that we can keep an eye on those tests, in case some turn out to be rather slow, maybe some further tuning of the max iterations is needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new issue, #11754: TestBoolean2.testRandomQueries fails in CI due to eating up heap space
rmuir opened a new issue, #11754: URL: https://github.com/apache/lucene/issues/11754 ### Description Jenkins failure (sorry I don't have full build log, but the failure is reproducible, see below): ``` [JENKINS] Lucene-main-Linux (64bit/jdk-17.0.3) - Build # 36812 - Unstable! Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/36812/ Java: 64bit/jdk-17.0.3 -XX:-UseCompressedOops -XX:+UseParallelGC 1 tests failed. FAILED: org.apache.lucene.search.TestBoolean2.testRandomQueries Error Message: java.lang.OutOfMemoryError: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at __randomizedtesting.SeedInfo.seed([A0883CC08C1C22AB:FEA38C2CB4C60F35]:0) at org.apache.lucene.util.PriorityQueue.(PriorityQueue.java:96) at org.apache.lucene.util.PriorityQueue.(PriorityQueue.java:43) at org.apache.lucene.search.FieldValueHitQueue.(FieldValueHitQueue.java:123) at org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.(FieldValueHitQueue.java:59) at org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:159) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:454) at org.apache.lucene.search.TopFieldCollector$1.newCollector(TopFieldCollector.java:501) at org.apache.lucene.search.TopFieldCollector$1.newCollector(TopFieldCollector.java:493) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:669) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:656) at org.apache.lucene.search.TestBoolean2.testRandomQueries(TestBoolean2.java:406) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996) at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) ``` ### Gradle command to reproduce Reproduce with: `./gradlew -p lucene/core -Ptests.seed=A0883CC08C1C22AB -Ptests.heapsize=256m -Ptests.minheapsize=256m -Ptests.multiplier=3 test --tests TestBoolean2.testRandomQueries` The test is well-behaved without the `multiplier=3` so this may be related to the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
rmuir commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238285805 > Mentioned, also, so that we can keep an eye on those tests, in case, some turn out to be rather slow, maybe some further tuning of the max iterations is needed. I see two suspicion ones in the build log after tests are run: `TestIDVersionPostingsFormat.testGlobalVersions`, and `TestNeverDelete.testIndexing`. Looks like we may want to tone these down, 20+ seconds is way too much, since we have thousands of tests. ``` The slowest tests (exceeding 500 ms) during this run: 35.84s TestIDVersionPostingsFormat.testGlobalVersions (:lucene:sandbox) 23.68s TestNeverDelete.testIndexing (:lucene:core) 10.95s TestScripts.testLukeCanBeLaunched (:lucene:distribution.tests) 6.94s TestSortedSetDocValuesFacets.testRandomHierarchicalFlatMix (:lucene:facet) 6.52s TestTessellator.testComplexPolygon47 (:lucene:core) 4.93s TestExitableDirectoryReader.testExitableTermsEnumSampleTimeoutCheck (:lucene:core) 3.48s TestNRTThreads.testNRTThreads (:lucene:core) 3.36s TestIndexAndTaxonomyReplicationClient.testConsistencyOnExceptions (:lucene:replicator) 3.27s TestStressIndexing.testStressIndexAndSearching (:lucene:core) 3.20s TestIndexWriterMergePolicy.testStressUpdateSameDocumentWithMergeOnGetReader (:lucene:core) The slowest suites (exceeding 1s) during this run: 36.11s TestIDVersionPostingsFormat (:lucene:sandbox) 23.73s TestNeverDelete (:lucene:core) 10.96s TestScripts (:lucene:distribution.tests) 10.45s TestLucene90DocValuesFormatMergeInstance (:lucene:core) 10.38s TestBackwardsCompatibility (:lucene:backward-codecs) 9.90s TestLucene90DocValuesFormat (:lucene:core) 8.62s TestSimpleTextDocValuesFormat (:lucene:codecs) 7.90s TestTessellator (:lucene:core) 7.53s TestSortedSetDocValuesFacets (:lucene:facet) 7.27s TestIndexWriterExceptions (:lucene:core) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
matriv commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238296959 thx! pushed a toned down version, let's see. (my machine is quite poweful and they didn't show up in the slow tests) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
rmuir commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238304029 @thanks matriv ! We can inspect the log again for the next build. Some of these tests use threads, so they will run a lot slower in CI or small computers like my 2-core machine. I can also trigger a "nightly" run on my computer with your branch before merging, to help prevent noise on the CI servers. Some of these tests turn monstrous in "nightly" mode, they have always been monsters, we just don't want them to run for hours or timeout. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph
msokolov commented on PR #11743: URL: https://github.com/apache/lucene/pull/11743#issuecomment-1238330802 OK, thanks for the reminder of the arguments for moving the graph creation to index time. > May be, we can come up with some sophisticated solution, writing vector values in batches to several files, but not sure if this complexity worth it. Right, we could buffer up to some % of indexwriter buffer size in RAM, and then write to a (list of) temporary file(s), freeing RAM and thenceforth accumulating new writes in RAM. Kind of like a pre-flush flush? Reading would require a wrapper that presents this all as a single VectorValues. It is more complex, but seems like it could be worthwhile since it will help reduce the pressure on the index writer to flush "prematurely," and this HNSW stuff is sensitive to being fragmented. The current situation is not terrible; eventually, merging should improve the index geometry. I don't think we have a blocker to release. At any rate for typical use cases I have in mind, the index size is still dominated by other types of fields and this is unlikely to be a problem. Although for a vectors-only index it looks worse, I think that exaggerates the typical impact? Not sure how it is looking from other perspective though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
rmuir commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238331021 This build looks good: ``` The slowest tests (exceeding 500 ms) during this run: 6.93s TestStressIndexing.testStressIndexAndSearching (:lucene:core) 6.84s TestTessellator.testComplexPolygon47 (:lucene:core) 6.52s TestNRTThreads.testNRTThreads (:lucene:core) 5.74s TestScripts.testLukeCanBeLaunched (:lucene:distribution.tests) 4.92s TestExitableDirectoryReader.testExitableTermsEnumSampleTimeoutCheck (:lucene:core) 4.74s TestIDVersionPostingsFormat.testGlobalVersions (:lucene:sandbox) 4.51s TestTransactions.testTransactions (:lucene:core) 3.41s TestTieredMergePolicy.testSimulateAppendOnly (:lucene:core) 3.39s TestBackwardsCompatibility.testUnsupportedOldIndexes (:lucene:backward-codecs) 3.13s TestIndexWriterExceptions.testRandomExceptions (:lucene:core) The slowest suites (exceeding 1s) during this run: 12.66s TestBackwardsCompatibility (:lucene:backward-codecs) 10.81s TestLucene90DocValuesFormat (:lucene:core) 10.32s TestLucene90DocValuesFormatMergeInstance (:lucene:core) 9.56s TestSimpleTextDocValuesFormat (:lucene:codecs) 8.13s TestTessellator (:lucene:core) 7.98s TestIndexWriterExceptions (:lucene:core) 7.52s TestPerFieldDocValuesFormat (:lucene:core) 7.10s TestIndexWriter (:lucene:core) 6.95s TestStressIndexing (:lucene:core) 6.66s TestNRTThreads (:lucene:core) ``` I started a nightly run on my computer with: `./gradlew -Dtests.nightly=true test`. It may take a bit, but i'll post the slowest-tests-report when its done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on issue #11696: precompute the max level in LogMergePolicy [LUCENE-10660]
jpountz commented on issue #11696: URL: https://github.com/apache/lucene/issues/11696#issuecomment-1238341620 I just checked, it's here: https://github.com/apache/lucene/commit/3f6dbe8b55eb48066a800f706c371fd6f56afab0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on issue #11547: IntersectIterators is not necessary under matchAll case in Facet [LUCENE-10511]
gsmiller commented on issue #11547: URL: https://github.com/apache/lucene/issues/11547#issuecomment-1238537126 Interesting observation @LuXugang. We could only "know" ahead of time that multiple iterators are "the same" though when they're effectively "all" docs in a segment right? I can't really think of a way to generalize beyond that, but maybe I'm overlooking something? I'd be +1 to seeing if we can add some intelligence to `ConjunctionUtils` to handle the case where one of the iterators covers all docs in a segment though. Sounds like what you had in mind? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
rmuir commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238540078 The `-Dtests.nightly` run has some tests that timeout (theres a 7200 second hard limit to a test class) or take hours. At a glance, most of the trouble seems to be subclasses of the `ThreadedIndexingAndSearchingTestCase`: * TestNRTThreads * TestControlledRealTimeReopenThread * TestSearcherManager Also see a ton of time spent in: * TestIDVersionPostingsFormat This list may not be exhaustive, due to timeouts/failures... the build is still running after many hours. Let's tweak the nightly parameters for these tests? I think its ok if they do something like 10x the iterations normally at most. But we don't want them taking hours in CI, or when we're smoketesting a release (the release smoketester uses `-Dtests.nightly`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
matriv commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238583388 Thank you, I'm also running it locally with `-Dtests.nightly=true` and I'm going to significantly decrease those limits. i.e. 3 -> 1000 or even less. Once I have something that looks ok locally will push the changes and ask you to also check please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
matriv commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238611322 With the last commit, locally I get: ``` :lucene:core:test (SUCCESS): 5584 test(s), 60 skipped The slowest tests (exceeding 500 ms) during this run: 865.31s Test2BPostings.test (:lucene:core) 340.06s TestIndexWriterExceptions.testTooManyTokens (:lucene:core) 219.59s TestStressNRTReplication.test (:lucene:replicator) 127.04s TestSearcherTaxonomyManager.testDirectory (:lucene:facet) 120.10s TestIndexWriterOnDiskFull.testAddIndexOnDiskFull (:lucene:core) 118.26s TestSimpleTextPostingsFormat.testDocsAndFreqsAndPositionsAndOffsetsAndPayloads (:lucene:codecs) 117.13s TestStringValueFacetCounts.testRandom (:lucene:facet) 103.57s TestBestSpeedLucene80DocValuesFormat.testNumericFieldJumpTables (:lucene:backward-codecs) 102.00s TestNRTThreads.testNRTThreads (:lucene:core) 96.18s TestBestCompressionLucene80DocValuesFormat.testNumericFieldJumpTables (:lucene:backward-codecs) The slowest suites (exceeding 1s) during this run: 865.34s Test2BPostings (:lucene:core) 380.84s TestIndexWriterExceptions (:lucene:core) 242.11s TestSimpleTextPostingsFormat (:lucene:codecs) 219.85s TestStressNRTReplication (:lucene:replicator) 207.11s TestBestSpeedLucene80DocValuesFormat (:lucene:backward-codecs) 198.10s TestBestCompressionLucene80DocValuesFormat (:lucene:backward-codecs) 183.90s TestLucene90DocValuesFormatMergeInstance (:lucene:core) 163.38s TestLucene90DocValuesFormat (:lucene:core) 140.44s TestSearcherTaxonomyManager (:lucene:facet) 120.36s TestIndexWriterOnDiskFull (:lucene:core) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on pull request #1068: LUCENE-10674: Update subiterators when BitSetConjDISI exhausts
zhaih commented on PR #1068: URL: https://github.com/apache/lucene/pull/1068#issuecomment-1238647657 Thanks, looks reasonable to me, please add an entry to CHANGES.txt! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
rmuir commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238691660 Here's my numbers for comparison. I think we are ok here for nightly: ``` > Task :lucene:core:wipeTaskTemp The slowest tests (exceeding 500 ms) during this run: 2560.31s Test2BPostings.test (:lucene:core) 1023.83s TestIndexWriterExceptions.testTooManyTokens (:lucene:core) 764.84s TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom (:lucene:core) 597.04s TestNRTThreads.testNRTThreads (:lucene:core) 426.40s TestSimpleTextPostingsFormat.testDocsAndFreqsAndPositionsAndOffsetsAndPayloads (:lucene:codecs) 379.06s TestSearcherTaxonomyManager.testDirectory (:lucene:facet) 264.05s TestLucene90DocValuesFormat.testNumericFieldJumpTables (:lucene:core) 237.30s TestIndexWriterOnDiskFull.testAddIndexOnDiskFull (:lucene:core) 234.66s TestBestCompressionLucene80DocValuesFormat.testNumericFieldJumpTables (:lucene:backward-codecs) 233.04s TestBestSpeedLucene80DocValuesFormat.testNumericFieldJumpTables (:lucene:backward-codecs) The slowest suites (exceeding 1s) during this run: 2560.41s Test2BPostings (:lucene:core) 1049.11s TestIndexWriterExceptions (:lucene:core) 952.28s TestSimpleTextPostingsFormat (:lucene:codecs) 786.17s TestIndexWriterThreadsToSegments (:lucene:core) 616.61s TestLucene90DocValuesFormat (:lucene:core) 597.08s TestNRTThreads (:lucene:core) 531.11s TestBestCompressionLucene80DocValuesFormat (:lucene:backward-codecs) 514.22s TestBestSpeedLucene80DocValuesFormat (:lucene:backward-codecs) 485.99s TestLucene90DocValuesFormatMergeInstance (:lucene:core) 409.27s TestSearcherTaxonomyManager (:lucene:facet) BUILD SUCCESSFUL in 1h 10m 52s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir closed issue #11459: Remove uses of wall-clock time in codebase [LUCENE-10423]
rmuir closed issue #11459: Remove uses of wall-clock time in codebase [LUCENE-10423] URL: https://github.com/apache/lucene/issues/11459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #11749: Remove usages of System.currentTimeMillis() from tests
rmuir merged PR #11749: URL: https://github.com/apache/lucene/pull/11749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
rmuir commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238699562 Thank you for the work here on these tests @matriv ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new issue, #11755: TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull failure
rmuir opened a new issue, #11755: URL: https://github.com/apache/lucene/issues/11755 ### Description From jenkins: ``` > Task :lucene:core:test org.apache.lucene.index.TestIndexWriterOnDiskFull > testAddDocumentOnDiskFull FAILED java.lang.IllegalStateException: this writer hit an unrecoverable error; cannot commit at __randomizedtesting.SeedInfo.seed([8E19B38FADF450D3:2018BF050618617]:0) at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:5441) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3716) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:4044) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:4006) at org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:80) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996) at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) at com.carrotsearch.randomizedtest
[GitHub] [lucene] mayya-sharipova commented on pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph
mayya-sharipova commented on PR #11743: URL: https://github.com/apache/lucene/pull/11743#issuecomment-1238777470 @msokolov Thanks for your feedback, and good ideas, we can experiment with them. We've discussed within our team (including @jpountz and @jtibshirani) and decided that we still would like to proceed with this change (building graph on indexing) for 9.4 release, as we see benefits outweigh extra memory used. For example, we see significant improvement in refresh time in our Elasticsearch benchmarks, which makes searches on updated data much faster and predictable: https://user-images.githubusercontent.com/5738841/188762407-0ecaaa9b-71df-4290-83db-630bfd01e5bd.png";> For follow-ups (beyond 9.4 release), we can experiment with : - pre-flush flush (flushing a portion of vectors) - using data with less precision than integers and floats for storing neighbours' info in graph - some other ideas? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk opened a new pull request, #2669: SOLR-16324: Upgrade commons-configuration2 to 2.8.0 and commons-text to 1.9
risdenk opened a new pull request, #2669: URL: https://github.com/apache/lucene-solr/pull/2669 Backport of https://issues.apache.org/jira/browse/SOLR-16324 to branch_8_11 * Upgrade commons-configuration2 to 2.8.0 due to CVE-2022-33980 * Upgrade commons-text to 1.9 since it gets upgraded with commons-configuration2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani opened a new pull request, #11756: LUCENE-10577: Remove LeafReader#searchNearestVectorsExhaustively
jtibshirani opened a new pull request, #11756: URL: https://github.com/apache/lucene/pull/11756 This PR removes the recently added function on LeafReader to exhaustively search through vectors, plus the helper function KnnVectorsReader#searchExhaustively. Instead it performs the exact search within KnnVectorQuery, using a new helper class called VectorScorer. Follow-up to #1054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #11756: LUCENE-10577: Remove LeafReader#searchNearestVectorsExhaustively
jtibshirani commented on PR #11756: URL: https://github.com/apache/lucene/pull/11756#issuecomment-1238843697 This cuts down on the API surface area for LeafReader results in less logic. I think what we tried to do initially was to push the exhaustive search down to the codec level, so that KnnVectorQuery doesn't need to know about the encoding. However, the VectorEncoding is already public, so it doesn't seem too bad for KnnVectorQuery to consider it (especially since it's abstracted away into VectorScorer). If we want to limit knowledge of the encoding outside each codec, I think we should focus on removing VectorValues#binaryValue altogether, which would require a different refactoring approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] matriv commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests
matriv commented on PR #11749: URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238934086 Thx for reviewing and helping tuning the iterations @rmuir ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org