[ https://issues.apache.org/jira/browse/LUCENE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027934#comment-17027934 ]
Robert Muir commented on LUCENE-9191: ------------------------------------- [~krisden] the class tries to deliver simple documents that are each one one line (LineFileDocs). The constructor tries to start you at a random position in the documents. Today, in order to accomplish that, it opens the stream and does a lot of scanning (just brute force forward decompressing) because the gzip file is not set up for seeking. so it ends up decompressing at worst 15MB of UTF-8 text (probably more in java heap: depends how you hold it) every single time a new LineFileDocs is instantiated. and doing a fair amount of cpu to get there too. This cost is expensive and makes tests show up on Dawid's "shitlist" at the end of the gradle build because they run for more than 1s for no good reason. So this is very inefficient and expensive for individual test methods to be doing. There are like 11K tests according to https://people.apache.org/~mikemccand/lucenebench/antcleantest.html . If each test takes even only 1s, you have hours long test suite. > Fix linefiledocs compression or replace in tests > ------------------------------------------------ > > Key: LUCENE-9191 > URL: https://issues.apache.org/jira/browse/LUCENE-9191 > Project: Lucene - Core > Issue Type: Task > Reporter: Robert Muir > Priority: Major > > LineFileDocs(random) is very slow, even to open. It does a very slow "random > skip" through a gzip compressed file. > For the analyzers tests, in LUCENE-9186 I simply removed its usage, since > TestUtil.randomAnalysisString is superior, and fast. But we should address > other tests using it, since LineFileDocs(random) is slow! > I think it is also the case that every lucene test has probably tested every > LineFileDocs line many times now, whereas randomAnalysisString will invent > new ones. > Alternatively, we could "fix" LineFileDocs(random), e.g. special compression > options (in blocks)... deflate supports such stuff. But it would make it even > hairier than it is now. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org