[ https://issues.apache.org/jira/browse/LUCENE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027600#comment-17027600 ]
Michael McCandless commented on LUCENE-9191: -------------------------------------------- I think it's healthy to test on real world human created text – that is how people are using Lucene in the real world (sometimes). But I agree the current impl is slow, and, our randomized tests have likely exhausted the usefulness of the current text. And the fully randomly (and somewhat realistically) generated strings are an effectively infinite test space, though not that realistic. Not sure how to make {{LineFileDocs}} faster :) Would deflate make it simple/fast to seek in the compressed bytes? > Fix linefiledocs compression or replace in tests > ------------------------------------------------ > > Key: LUCENE-9191 > URL: https://issues.apache.org/jira/browse/LUCENE-9191 > Project: Lucene - Core > Issue Type: Task > Reporter: Robert Muir > Priority: Major > > LineFileDocs(random) is very slow, even to open. It does a very slow "random > skip" through a gzip compressed file. > For the analyzers tests, in LUCENE-9186 I simply removed its usage, since > TestUtil.randomAnalysisString is superior, and fast. But we should address > other tests using it, since LineFileDocs(random) is slow! > I think it is also the case that every lucene test has probably tested every > LineFileDocs line many times now, whereas randomAnalysisString will invent > new ones. > Alternatively, we could "fix" LineFileDocs(random), e.g. special compression > options (in blocks)... deflate supports such stuff. But it would make it even > hairier than it is now. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org