[ https://issues.apache.org/jira/browse/LUCENE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027921#comment-17027921 ]
Robert Muir commented on LUCENE-9191: ------------------------------------- Sorry i'm not sure there is a place for metadata about this. May need a separate text file. I think it is just that you sync the data with a "Full Flush" with zlib and flush the underling output and it all does what it needs with its deflate blocks and stuff. and the next block after that will have been reset to a "clean state". so you can then safely random-access seek to that fp and start deflate-decoding again from there. If you really want to get fancy, I think it is possible to use a "shared static dictionary" so that when you reset to a "clean state" instead you already have some common compression history in the 32k buffer, rather than zeros or whatever. It would make it so that these seek points do not hurt the compression much, and then you can have more of them, etc. > Fix linefiledocs compression or replace in tests > ------------------------------------------------ > > Key: LUCENE-9191 > URL: https://issues.apache.org/jira/browse/LUCENE-9191 > Project: Lucene - Core > Issue Type: Task > Reporter: Robert Muir > Priority: Major > > LineFileDocs(random) is very slow, even to open. It does a very slow "random > skip" through a gzip compressed file. > For the analyzers tests, in LUCENE-9186 I simply removed its usage, since > TestUtil.randomAnalysisString is superior, and fast. But we should address > other tests using it, since LineFileDocs(random) is slow! > I think it is also the case that every lucene test has probably tested every > LineFileDocs line many times now, whereas randomAnalysisString will invent > new ones. > Alternatively, we could "fix" LineFileDocs(random), e.g. special compression > options (in blocks)... deflate supports such stuff. But it would make it even > hairier than it is now. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org