[ 
https://issues.apache.org/jira/browse/LUCENE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027921#comment-17027921
 ] 

Robert Muir commented on LUCENE-9191:
-------------------------------------

Sorry i'm not sure there is a place for metadata about this. May need a 
separate text file. I think it is just that you sync the data with a "Full 
Flush" with zlib and flush the underling output and it all does what it needs 
with its deflate blocks and stuff. and the next block after that will have been 
reset to a "clean state". so you can then safely random-access seek to that fp 
and start deflate-decoding again from there.

If you really want to get fancy, I think it is possible to use a "shared static 
dictionary" so that when you reset to a "clean state" instead you already have 
some common compression history in the 32k buffer, rather than zeros or 
whatever. It would make it so that these seek points do not hurt the 
compression much, and then you can have more of them, etc.

> Fix linefiledocs compression or replace in tests
> ------------------------------------------------
>
>                 Key: LUCENE-9191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9191
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Robert Muir
>            Priority: Major
>
> LineFileDocs(random) is very slow, even to open. It does a very slow "random 
> skip" through a gzip compressed file.
> For the analyzers tests, in LUCENE-9186 I simply removed its usage, since 
> TestUtil.randomAnalysisString is superior, and fast. But we should address 
> other tests using it, since LineFileDocs(random) is slow!
> I think it is also the case that every lucene test has probably tested every 
> LineFileDocs line many times now, whereas randomAnalysisString will invent 
> new ones.
> Alternatively, we could "fix" LineFileDocs(random), e.g. special compression 
> options (in blocks)... deflate supports such stuff. But it would make it even 
> hairier than it is now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to