[ 
https://issues.apache.org/jira/browse/LUCENE-10019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373632#comment-17373632
 ] 

Uwe Schindler commented on LUCENE-10019:
----------------------------------------

I just figured out, that Lucene90CompoundFileReader checks the file size and of 
course does not round individual file sizes up to next alignment.

Therefor I also have to change the reader to calculate the file size correctly. 
Because of this *it is* a file format change (as older reader cant read file 
due to unexpected file size in initialization check), so Lucene 9.0 is the 
ideal time to change this.

> Align file starts in CFS files to have proper alignment (8 bytes)
> -----------------------------------------------------------------
>
>                 Key: LUCENE-10019
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10019
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs, core/store
>    Affects Versions: main (9.0)
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Major
>
> While discussing about MMapDirectory and fast access to file contents through 
> MMap (https://github.com/apache/lucene/pull/177 and previous versions of this 
> draft, also), I figured out that for most Lucene files, the data inside is 
> not aligned at all.
> We can't fix this easily and it's also not always important, but some files 
> should really have a CPU fieldly alignment from beginning! This is 
> escpecially important when we use slices().
> I got many tests with aligned VarHandles to pass, but it broke instantly, if 
> the file was inside a Compound CFS file.
> CompoundFormat.write() just appends all data to the IndexOutput and writes 
> the offset to the entries file. The fix to make at least file starts aligned 
> is to just write some null-bytes between the files, so startOffset is aligned 
> to multiples of 8 bytes.
> At a later stage we could also think of aligning to LBA 
> blocks/sectors/whatever to make OS paging work better. But for performance of 
> index access, slices of compound files when memory mapped should at least 
> align to 8 bytes.
> Fix is easy: Just add some modulo on startOffset and write some extra bytes 
> before the next file is serialized. The change is only 2 lines. It does not 
> even change index format!
> I'd like to get this in for 9.0 so we can at least say: our CFS files are 
> aligned. Aligning other files like docvalues to better help CPU is then 
> possible.
> I will provide a simple pull request for Lucene90CompoundFormat soon. If you 
> don't see any problems, this is a no-brainer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to