[ 
https://issues.apache.org/jira/browse/LUCENE-10019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378170#comment-17378170
 ] 

ASF subversion and git services commented on LUCENE-10019:
----------------------------------------------------------

Commit 69e85924b7456b052bc87bcaf4793a6925f38dc5 in lucene's branch 
refs/heads/main from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=69e8592 ]

LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes) 
(#203)



> Align file starts in CFS files to have proper alignment (8 bytes)
> -----------------------------------------------------------------
>
>                 Key: LUCENE-10019
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10019
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs, core/store
>    Affects Versions: main (9.0)
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 9.0
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> While discussing about MMapDirectory and fast access to file contents through 
> MMap (https://github.com/apache/lucene/pull/177 and previous versions of this 
> draft, also), I figured out that for most Lucene files, the data inside is 
> not aligned at all.
> We can't fix this easily and it's also not always important, but some files 
> should really have a CPU fieldly alignment from beginning! This is 
> escpecially important when we use slices().
> I got many tests with aligned VarHandles to pass, but it broke instantly, if 
> the file was inside a Compound CFS file.
> CompoundFormat.write() just appends all data to the IndexOutput and writes 
> the offset to the entries file. The fix to make at least file starts aligned 
> is to just write some null-bytes between the files, so startOffset is aligned 
> to multiples of 8 bytes.
> At a later stage we could also think of aligning to LBA 
> blocks/sectors/whatever to make OS paging work better. But for performance of 
> index access, slices of compound files when memory mapped should at least 
> align to 8 bytes.
> Fix is easy: Just add some modulo on startOffset and write some extra bytes 
> before the next file is serialized. The change is only 2 lines. It does not 
> even change index format!
> I'd like to get this in for 9.0 so we can at least say: our CFS files are 
> aligned. Aligning other files like docvalues to better help CPU is then 
> possible.
> I will provide a simple pull request for Lucene90CompoundFormat soon. If you 
> don't see any problems, this is a no-brainer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to