[ https://issues.apache.org/jira/browse/LUCENE-9817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302410#comment-17302410 ]
Simon Willnauer commented on LUCENE-9817: ----------------------------------------- thanks rob for taking the time to do all this analysis. I do wonder if some tests should be @nightly only for N-2 indices or if we can take a random list of versions we test in each of these tests to make sure we have more reliable times even with more versions released? > pathological test fixes > ----------------------- > > Key: LUCENE-9817 > URL: https://issues.apache.org/jira/browse/LUCENE-9817 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Robert Muir > Priority: Major > Attachments: LUCENE-9817.patch, LUCENE-9817.patch, LUCENE-9817.patch > > > There are now 13,000+ tests in lucene, and if you don't have dozens of cores > the situation is slow (around 7 minutes here, with everything tuned as fast > as i can get it, running on tmpfs). > It is tricky to keep the situation sustainable: so many tests that usually > just take a few seconds but they all add up. To put it in perspective, > imagine if all 13000 tests only took 1s each, that's 3.5 hours of cpu time. > From my inspection, there are a few cases of inefficiency: > * tests with bad random parameters: they might normally be semi-well-behaved, > but "rarely" take 30 seconds. That's maybe like a 1% chance but keep in mind > 1% equates to 130 wild-west tests every run. > * tests spinning up too many threads and indexing too many docs > unnecessarily: there might literally be thousands of these, so that's a hard > problem to fix... and developers love to use lots of threads and docs in > tests. > * tests just being inefficient: stuff like creating indexes in setup/teardown > when they have many methods that may not even use them (hey, why did > testEqualsHashcode take 30 seconds, what is it doing?) > I only worked on the first case here, if i fixed anything involving the other > two, it was just because I noticed them while I was there. I temporarily > overrode methods like LuceneTestCase.rarely(), atLeast(), and so on to > present more pathological/worst-case conditions and tried to address them all. > So here's a patch to give ~ 80 seconds of cpu-time in tests back. YMMV, maybe > it helps you more if you are actually using hard disks and stuff! > Fixing the other issues here will require some more creativity/work, I will > followup. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org