[GitHub] [lucene] matriv commented on issue #11459: Remove uses of wall-clock time in codebase [LUCENE-10423]

2022-09-06 Thread GitBox


matriv commented on issue #11459:
URL: https://github.com/apache/lucene/issues/11459#issuecomment-1237784068

   @rmuir When you have time, please take a look at the PR and let me know,
   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


rmuir commented on code in PR #11749:
URL: https://github.com/apache/lucene/pull/11749#discussion_r963782724


##
lucene/classification/src/test/org/apache/lucene/classification/Test20NewsgroupsClassification.java:
##
@@ -123,13 +124,13 @@ public void test20Newsgroups() throws Exception {
 
 System.out.println("Indexing 20 Newsgroups...");
 
-long startIndex = System.currentTimeMillis();
+long startIndex = Clock.systemDefaultZone().millis();

Review Comment:
   Can we fix this test to use `nanoTime` too (monotonic clock), rather than 
the system clock? Currently, the `precommit` checks are angry about use of the 
default time zone, but we can avoid dealing with timezones completely here i 
think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph

2022-09-06 Thread GitBox


mayya-sharipova commented on PR #11743:
URL: https://github.com/apache/lucene/pull/11743#issuecomment-1238235494

   @msokolov 
   > Although ... the RAM needed for the graph was always required, even when 
building the graph during flush, it just wasn't accounted for I think. I 
suppose a possible way to improve the buffering situation would be to buffer 
the vectors in RAM and then on merge, write them out, freeing the on-heap copy, 
and while building the graph, access the vectors from disk
   
   Indeed, that how we do that in Lucene 9.3, [using off-heap vector values to 
build 
graph](https://github.com/apache/lucene/blob/branch_9_3/lucene/core/src/java/org/apache/lucene/codecs/lucene92/Lucene92HnswVectorsWriter.java#L148-L154)
  
   
   The problem with this is that building graph on flush would take a lot of 
time, which makes searches that needed updated changes unpredictably long. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


rmuir commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238236097

   Thanks for doing this work, this change looks great!
   avoiding wall-clock time should make tests more reproducible when they fail.
   also, this change fixes tests that were configured to run for `3 seconds` or 
similar, to instead use a fixed number of iterations. This should REALLY help 
reproducibility, esp when one CPU is faster than another.
   
   I added a small comment, i think we can fix that one classification test to 
get the build happy here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on a diff in pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


matriv commented on code in PR #11749:
URL: https://github.com/apache/lucene/pull/11749#discussion_r963785882


##
lucene/classification/src/test/org/apache/lucene/classification/Test20NewsgroupsClassification.java:
##
@@ -123,13 +124,13 @@ public void test20Newsgroups() throws Exception {
 
 System.out.println("Indexing 20 Newsgroups...");
 
-long startIndex = System.currentTimeMillis();
+long startIndex = Clock.systemDefaultZone().millis();

Review Comment:
   oops, of course, it was some play around commit that I messed up when 
preparing the PR, thx



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


matriv commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238246924

   Thx @rmuir! Pushed the fix.
   
   Also I forgot to mention that for every replacement of time based by counter 
based, I added a counter and run the test on my machine multiple times and 
tried to use an avg value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph

2022-09-06 Thread GitBox


mayya-sharipova commented on PR #11743:
URL: https://github.com/apache/lucene/pull/11743#issuecomment-1238251543

   @jpountz  
   > could we buffer vectors on disk with the approach of building the graph 
during indexing?
   
   We explored this, but could not find any way to do that.  To build a graph, 
we need to access to all vector values indexed so far.  If vectors are buffered 
in memory, this works.  But if we are to buffer vectors on disk, we can’t at 
the same time write vectors to this file and read vectors from it during the 
graph construction, as reading from unclosed index outputs is not possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


rmuir commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238256051

   thanks @matriv for tuning the iterations. Much better than a random guess, 
which could lead to timeouts in CI servers, etc. I restarted the tests here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


matriv commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238258581

   Mentioned, also, so that we can keep an eye on those tests, in case some 
turn out to be rather slow, maybe some further tuning of the max iterations is 
needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new issue, #11754: TestBoolean2.testRandomQueries fails in CI due to eating up heap space

2022-09-06 Thread GitBox


rmuir opened a new issue, #11754:
URL: https://github.com/apache/lucene/issues/11754

   ### Description
   
   Jenkins failure (sorry I don't have full build log, but the failure is 
reproducible, see below):
   
   ```
   [JENKINS] Lucene-main-Linux (64bit/jdk-17.0.3) - Build # 36812 - Unstable!
   Build: https://jenkins.thetaphi.de/job/Lucene-main-Linux/36812/
   Java: 64bit/jdk-17.0.3 -XX:-UseCompressedOops -XX:+UseParallelGC
   
   1 tests failed.
   FAILED: org.apache.lucene.search.TestBoolean2.testRandomQueries
   
   Error Message:
   java.lang.OutOfMemoryError: Java heap space
   
   Stack Trace:
   java.lang.OutOfMemoryError: Java heap space
   at __randomizedtesting.SeedInfo.seed([A0883CC08C1C22AB:FEA38C2CB4C60F35]:0)
   at org.apache.lucene.util.PriorityQueue.(PriorityQueue.java:96)
   at org.apache.lucene.util.PriorityQueue.(PriorityQueue.java:43)
   at 
org.apache.lucene.search.FieldValueHitQueue.(FieldValueHitQueue.java:123)
   at 
org.apache.lucene.search.FieldValueHitQueue$OneComparatorFieldValueHitQueue.(FieldValueHitQueue.java:59)
   at 
org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:159)
   at 
org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:454)
   at 
org.apache.lucene.search.TopFieldCollector$1.newCollector(TopFieldCollector.java:501)
   at 
org.apache.lucene.search.TopFieldCollector$1.newCollector(TopFieldCollector.java:493)
   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:669)
   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:656)
   at 
org.apache.lucene.search.TestBoolean2.testRandomQueries(TestBoolean2.java:406)
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.base/java.lang.reflect.Method.invoke(Method.java:568)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
   at 
org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   at 
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
   ```
   
   ### Gradle command to reproduce
   
   Reproduce with:
   
   `./gradlew -p lucene/core -Ptests.seed=A0883CC08C1C22AB 
-Ptests.heapsize=256m -Ptests.minheapsize=256m -Ptests.multiplier=3 test 
--tests TestBoolean2.testRandomQueries`
   
   The test is well-behaved without the `multiplier=3` so this may be related 
to the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


rmuir commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238285805

   > Mentioned, also, so that we can keep an eye on those tests, in case, some 
turn out to be rather slow, maybe some further tuning of the max iterations is 
needed.
   
   I see two suspicion ones in the build log after tests are run: 
`TestIDVersionPostingsFormat.testGlobalVersions`, and 
`TestNeverDelete.testIndexing`. Looks like we may want to tone these down, 20+ 
seconds is way too much, since we have thousands of tests. 
   
   ```
   The slowest tests (exceeding 500 ms) during this run:
 35.84s TestIDVersionPostingsFormat.testGlobalVersions (:lucene:sandbox)
 23.68s TestNeverDelete.testIndexing (:lucene:core)
 10.95s TestScripts.testLukeCanBeLaunched (:lucene:distribution.tests)
  6.94s TestSortedSetDocValuesFacets.testRandomHierarchicalFlatMix 
(:lucene:facet)
  6.52s TestTessellator.testComplexPolygon47 (:lucene:core)
  4.93s TestExitableDirectoryReader.testExitableTermsEnumSampleTimeoutCheck 
(:lucene:core)
  3.48s TestNRTThreads.testNRTThreads (:lucene:core)
  3.36s TestIndexAndTaxonomyReplicationClient.testConsistencyOnExceptions 
(:lucene:replicator)
  3.27s TestStressIndexing.testStressIndexAndSearching (:lucene:core)
  3.20s 
TestIndexWriterMergePolicy.testStressUpdateSameDocumentWithMergeOnGetReader 
(:lucene:core)
   The slowest suites (exceeding 1s) during this run:
 36.11s TestIDVersionPostingsFormat (:lucene:sandbox)
 23.73s TestNeverDelete (:lucene:core)
 10.96s TestScripts (:lucene:distribution.tests)
 10.45s TestLucene90DocValuesFormatMergeInstance (:lucene:core)
 10.38s TestBackwardsCompatibility (:lucene:backward-codecs)
  9.90s TestLucene90DocValuesFormat (:lucene:core)
  8.62s TestSimpleTextDocValuesFormat (:lucene:codecs)
  7.90s TestTessellator (:lucene:core)
  7.53s TestSortedSetDocValuesFacets (:lucene:facet)
  7.27s TestIndexWriterExceptions (:lucene:core)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


matriv commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238296959

   thx! pushed a toned down version, let's see. (my machine is quite poweful 
and they didn't show up in the slow tests)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


rmuir commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238304029

   @thanks matriv !
   
   We can inspect the log again for the next build. 
   
   Some of these tests use threads, so they will run a lot slower in CI or 
small computers like my 2-core machine.
   
   I can also trigger a "nightly" run on my computer with your branch before 
merging, to help prevent noise on the CI servers. Some of these tests turn 
monstrous in "nightly" mode, they have always been monsters, we just don't want 
them to run for hours or timeout.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph

2022-09-06 Thread GitBox


msokolov commented on PR #11743:
URL: https://github.com/apache/lucene/pull/11743#issuecomment-1238330802

   OK, thanks for the reminder of the arguments for moving the graph creation 
to index time.
   
   > May be, we can come up with some sophisticated solution, writing vector 
values in batches to several files, but not sure if this complexity worth it.
   
   Right, we could buffer up to some % of indexwriter buffer size in RAM, and 
then write to a (list of) temporary file(s), freeing RAM and thenceforth 
accumulating new writes in RAM. Kind of like a pre-flush flush? Reading would 
require a wrapper that presents this all as a single VectorValues. It is more 
complex, but seems like it could be worthwhile since it will help reduce the 
pressure on the index writer to flush "prematurely," and this HNSW stuff is 
sensitive to being fragmented.
   
   The current situation is not terrible; eventually, merging should improve 
the index geometry. I don't think we have a blocker to release. At any rate for 
typical use cases I have in mind, the index size is still dominated by other 
types of fields and this is unlikely to be a problem. Although for a 
vectors-only index it looks worse, I think that exaggerates the typical impact? 
Not sure how it is looking from other perspective though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


rmuir commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238331021

   This build looks good:
   ```
   The slowest tests (exceeding 500 ms) during this run:
  6.93s TestStressIndexing.testStressIndexAndSearching (:lucene:core)
  6.84s TestTessellator.testComplexPolygon47 (:lucene:core)
  6.52s TestNRTThreads.testNRTThreads (:lucene:core)
  5.74s TestScripts.testLukeCanBeLaunched (:lucene:distribution.tests)
  4.92s TestExitableDirectoryReader.testExitableTermsEnumSampleTimeoutCheck 
(:lucene:core)
  4.74s TestIDVersionPostingsFormat.testGlobalVersions (:lucene:sandbox)
  4.51s TestTransactions.testTransactions (:lucene:core)
  3.41s TestTieredMergePolicy.testSimulateAppendOnly (:lucene:core)
  3.39s TestBackwardsCompatibility.testUnsupportedOldIndexes 
(:lucene:backward-codecs)
  3.13s TestIndexWriterExceptions.testRandomExceptions (:lucene:core)
   The slowest suites (exceeding 1s) during this run:
 12.66s TestBackwardsCompatibility (:lucene:backward-codecs)
 10.81s TestLucene90DocValuesFormat (:lucene:core)
 10.32s TestLucene90DocValuesFormatMergeInstance (:lucene:core)
  9.56s TestSimpleTextDocValuesFormat (:lucene:codecs)
  8.13s TestTessellator (:lucene:core)
  7.98s TestIndexWriterExceptions (:lucene:core)
  7.52s TestPerFieldDocValuesFormat (:lucene:core)
  7.10s TestIndexWriter (:lucene:core)
  6.95s TestStressIndexing (:lucene:core)
  6.66s TestNRTThreads (:lucene:core)
   ```
   
   I started a nightly run on my computer with: `./gradlew -Dtests.nightly=true 
test`. It may take a bit, but i'll post the slowest-tests-report when its done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on issue #11696: precompute the max level in LogMergePolicy [LUCENE-10660]

2022-09-06 Thread GitBox


jpountz commented on issue #11696:
URL: https://github.com/apache/lucene/issues/11696#issuecomment-1238341620

   I just checked, it's here: 
https://github.com/apache/lucene/commit/3f6dbe8b55eb48066a800f706c371fd6f56afab0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on issue #11547: IntersectIterators is not necessary under matchAll case in Facet [LUCENE-10511]

2022-09-06 Thread GitBox


gsmiller commented on issue #11547:
URL: https://github.com/apache/lucene/issues/11547#issuecomment-1238537126

   Interesting observation @LuXugang. We could only "know" ahead of time that 
multiple iterators are "the same" though when they're effectively "all" docs in 
a segment right? I can't really think of a way to generalize beyond that, but 
maybe I'm overlooking something? I'd be +1 to seeing if we can add some 
intelligence to `ConjunctionUtils` to handle the case where one of the 
iterators covers all docs in a segment though. Sounds like what you had in mind?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


rmuir commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238540078

   The `-Dtests.nightly` run has some tests that timeout (theres a 7200 second 
hard limit to a test class) or take hours. At a glance, most of the trouble 
seems to be subclasses of the `ThreadedIndexingAndSearchingTestCase`:
   * TestNRTThreads
   * TestControlledRealTimeReopenThread
   * TestSearcherManager
   Also see a ton of time spent in:
   * TestIDVersionPostingsFormat
   
   This list may not be exhaustive, due to timeouts/failures... the build is 
still running after many hours.
   
   Let's tweak the nightly parameters for these tests? I think its ok if they 
do something like 10x the iterations normally at most. But we don't want them 
taking hours in CI, or when we're smoketesting a release (the release 
smoketester uses `-Dtests.nightly`)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


matriv commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238583388

   Thank you, I'm also running it locally with `-Dtests.nightly=true` and I'm 
going to significantly decrease those limits.
   i.e. 3 -> 1000 or even less.
   Once I have something that looks ok locally will push the changes and ask 
you to also check please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


matriv commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238611322

   With the last commit, locally I get:
   ```
   :lucene:core:test (SUCCESS): 5584 test(s), 60 skipped
   The slowest tests (exceeding 500 ms) during this run:
 865.31s Test2BPostings.test (:lucene:core)
 340.06s TestIndexWriterExceptions.testTooManyTokens (:lucene:core)
 219.59s TestStressNRTReplication.test (:lucene:replicator)
 127.04s TestSearcherTaxonomyManager.testDirectory (:lucene:facet)
 120.10s TestIndexWriterOnDiskFull.testAddIndexOnDiskFull (:lucene:core)
 118.26s 
TestSimpleTextPostingsFormat.testDocsAndFreqsAndPositionsAndOffsetsAndPayloads 
(:lucene:codecs)
 117.13s TestStringValueFacetCounts.testRandom (:lucene:facet)
 103.57s TestBestSpeedLucene80DocValuesFormat.testNumericFieldJumpTables 
(:lucene:backward-codecs)
 102.00s TestNRTThreads.testNRTThreads (:lucene:core)
 96.18s 
TestBestCompressionLucene80DocValuesFormat.testNumericFieldJumpTables 
(:lucene:backward-codecs)
   The slowest suites (exceeding 1s) during this run:
 865.34s Test2BPostings (:lucene:core)
 380.84s TestIndexWriterExceptions (:lucene:core)
 242.11s TestSimpleTextPostingsFormat (:lucene:codecs)
 219.85s TestStressNRTReplication (:lucene:replicator)
 207.11s TestBestSpeedLucene80DocValuesFormat (:lucene:backward-codecs)
 198.10s TestBestCompressionLucene80DocValuesFormat 
(:lucene:backward-codecs)
 183.90s TestLucene90DocValuesFormatMergeInstance (:lucene:core)
 163.38s TestLucene90DocValuesFormat (:lucene:core)
 140.44s TestSearcherTaxonomyManager (:lucene:facet)
 120.36s TestIndexWriterOnDiskFull (:lucene:core)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #1068: LUCENE-10674: Update subiterators when BitSetConjDISI exhausts

2022-09-06 Thread GitBox


zhaih commented on PR #1068:
URL: https://github.com/apache/lucene/pull/1068#issuecomment-1238647657

   Thanks, looks reasonable to me, please add an entry to CHANGES.txt!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


rmuir commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238691660

   Here's my numbers for comparison. I think we are ok here for nightly:
   ```
   > Task :lucene:core:wipeTaskTemp
   The slowest tests (exceeding 500 ms) during this run:
 2560.31s Test2BPostings.test (:lucene:core)
 1023.83s TestIndexWriterExceptions.testTooManyTokens (:lucene:core)
 764.84s TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom 
(:lucene:core)
 597.04s TestNRTThreads.testNRTThreads (:lucene:core)
 426.40s 
TestSimpleTextPostingsFormat.testDocsAndFreqsAndPositionsAndOffsetsAndPayloads 
(:lucene:codecs)
 379.06s TestSearcherTaxonomyManager.testDirectory (:lucene:facet)
 264.05s TestLucene90DocValuesFormat.testNumericFieldJumpTables 
(:lucene:core)
 237.30s TestIndexWriterOnDiskFull.testAddIndexOnDiskFull (:lucene:core)
 234.66s 
TestBestCompressionLucene80DocValuesFormat.testNumericFieldJumpTables 
(:lucene:backward-codecs)
 233.04s TestBestSpeedLucene80DocValuesFormat.testNumericFieldJumpTables 
(:lucene:backward-codecs)
   The slowest suites (exceeding 1s) during this run:
 2560.41s Test2BPostings (:lucene:core)
 1049.11s TestIndexWriterExceptions (:lucene:core)
 952.28s TestSimpleTextPostingsFormat (:lucene:codecs)
 786.17s TestIndexWriterThreadsToSegments (:lucene:core)
 616.61s TestLucene90DocValuesFormat (:lucene:core)
 597.08s TestNRTThreads (:lucene:core)
 531.11s TestBestCompressionLucene80DocValuesFormat 
(:lucene:backward-codecs)
 514.22s TestBestSpeedLucene80DocValuesFormat (:lucene:backward-codecs)
 485.99s TestLucene90DocValuesFormatMergeInstance (:lucene:core)
 409.27s TestSearcherTaxonomyManager (:lucene:facet)
   
   BUILD SUCCESSFUL in 1h 10m 52s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir closed issue #11459: Remove uses of wall-clock time in codebase [LUCENE-10423]

2022-09-06 Thread GitBox


rmuir closed issue #11459: Remove uses of wall-clock time in codebase 
[LUCENE-10423]
URL: https://github.com/apache/lucene/issues/11459


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir merged pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


rmuir merged PR #11749:
URL: https://github.com/apache/lucene/pull/11749


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


rmuir commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238699562

   Thank you for the work here on these tests @matriv !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new issue, #11755: TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull failure

2022-09-06 Thread GitBox


rmuir opened a new issue, #11755:
URL: https://github.com/apache/lucene/issues/11755

   ### Description
   
   From jenkins:
   ```
   > Task :lucene:core:test
   
   org.apache.lucene.index.TestIndexWriterOnDiskFull > 
testAddDocumentOnDiskFull FAILED
   java.lang.IllegalStateException: this writer hit an unrecoverable error; 
cannot commit
   at 
__randomizedtesting.SeedInfo.seed([8E19B38FADF450D3:2018BF050618617]:0)
   at 
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:5441)
   at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3716)
   at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:4044)
   at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:4006)
   at 
org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:80)
   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.base/java.lang.reflect.Method.invoke(Method.java:568)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
   at 
org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   at 
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   at 
com.carrotsearch.randomizedtest

[GitHub] [lucene] mayya-sharipova commented on pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph

2022-09-06 Thread GitBox


mayya-sharipova commented on PR #11743:
URL: https://github.com/apache/lucene/pull/11743#issuecomment-1238777470

   @msokolov  Thanks for your feedback, and good ideas, we can experiment with 
them.
   
   We've discussed within our team (including @jpountz  and @jtibshirani) and 
decided that we still would like to proceed with this change (building graph on 
indexing) for 9.4 release, as  we see benefits outweigh extra memory used.  For 
example, we see significant improvement in refresh time in our Elasticsearch 
benchmarks, which makes searches on updated data much faster and predictable:
   
   https://user-images.githubusercontent.com/5738841/188762407-0ecaaa9b-71df-4290-83db-630bfd01e5bd.png";>
   
   
   
   For follow-ups (beyond 9.4 release), we can experiment with :
   - pre-flush flush (flushing a portion of vectors)
   -  using data with less precision than integers and floats for storing 
neighbours' info in graph 
   -  some other ideas? 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk opened a new pull request, #2669: SOLR-16324: Upgrade commons-configuration2 to 2.8.0 and commons-text to 1.9

2022-09-06 Thread GitBox


risdenk opened a new pull request, #2669:
URL: https://github.com/apache/lucene-solr/pull/2669

   Backport of https://issues.apache.org/jira/browse/SOLR-16324 to branch_8_11
   
   * Upgrade commons-configuration2 to 2.8.0 due to CVE-2022-33980
   * Upgrade commons-text to 1.9 since it gets upgraded with 
commons-configuration2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani opened a new pull request, #11756: LUCENE-10577: Remove LeafReader#searchNearestVectorsExhaustively

2022-09-06 Thread GitBox


jtibshirani opened a new pull request, #11756:
URL: https://github.com/apache/lucene/pull/11756

   This PR removes the recently added function on LeafReader to exhaustively
   search through vectors, plus the helper function
   KnnVectorsReader#searchExhaustively. Instead it performs the exact search
   within KnnVectorQuery, using a new helper class called VectorScorer.
   
   Follow-up to #1054


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on pull request #11756: LUCENE-10577: Remove LeafReader#searchNearestVectorsExhaustively

2022-09-06 Thread GitBox


jtibshirani commented on PR #11756:
URL: https://github.com/apache/lucene/pull/11756#issuecomment-1238843697

   This cuts down on the API surface area for LeafReader results in less logic. 
I think what we tried to do initially was to push the exhaustive search down to 
the codec level, so that KnnVectorQuery doesn't need to know about the 
encoding. However, the VectorEncoding is already public, so it doesn't seem too 
bad for KnnVectorQuery to consider it (especially since it's abstracted away 
into VectorScorer).
   
   If we want to limit knowledge of the encoding outside each codec, I think we 
should focus on removing VectorValues#binaryValue altogether, which would 
require a different refactoring approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] matriv commented on pull request #11749: Remove usages of System.currentTimeMillis() from tests

2022-09-06 Thread GitBox


matriv commented on PR #11749:
URL: https://github.com/apache/lucene/pull/11749#issuecomment-1238934086

   Thx for reviewing and helping tuning the iterations @rmuir !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org