[ https://issues.apache.org/jira/browse/LUCENE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027923#comment-17027923 ]
Kevin Risden commented on LUCENE-9191: -------------------------------------- So I don't know a ton about LineFileDocs - or the reason trying to seek inside a gzip. However just reading this it seems like it might be possible to read it all upfront? Wouldn't it be possible to read the whole gzip into a byte array or something like that? It would use more memory, but it would be quick - wouldn't have to seek upfront. You would have to do some checking to make sure you don't read a file that is TOO big (could fall back to slow mode for a big file). If I understand correctly though this is a small file and so reading it into memory would make it then quick to seek through. If I am completely off my rocker here feel free to just ignore me. > Fix linefiledocs compression or replace in tests > ------------------------------------------------ > > Key: LUCENE-9191 > URL: https://issues.apache.org/jira/browse/LUCENE-9191 > Project: Lucene - Core > Issue Type: Task > Reporter: Robert Muir > Priority: Major > > LineFileDocs(random) is very slow, even to open. It does a very slow "random > skip" through a gzip compressed file. > For the analyzers tests, in LUCENE-9186 I simply removed its usage, since > TestUtil.randomAnalysisString is superior, and fast. But we should address > other tests using it, since LineFileDocs(random) is slow! > I think it is also the case that every lucene test has probably tested every > LineFileDocs line many times now, whereas randomAnalysisString will invent > new ones. > Alternatively, we could "fix" LineFileDocs(random), e.g. special compression > options (in blocks)... deflate supports such stuff. But it would make it even > hairier than it is now. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org