[ 
https://issues.apache.org/jira/browse/LUCENE-10019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-10019.
------------------------------------
    Resolution: Fixed

As a start I merged this including the new API. I will look into more 
improvements with DirectPackedWriter, so docvalues gets also aligned if the 
inner docvalues representation is of correct bitsize (16, 32, 64).

This issue at least made the CFS file not completely badly aligned, so adding 
the wrapper around a file does not enforce misalignment. It also added an API 
to easily align the filepointer while writing to IndexOutput.

> Align file starts in CFS files to have proper alignment (8 bytes)
> -----------------------------------------------------------------
>
>                 Key: LUCENE-10019
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10019
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs, core/store
>    Affects Versions: main (9.0)
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 9.0
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> While discussing about MMapDirectory and fast access to file contents through 
> MMap (https://github.com/apache/lucene/pull/177 and previous versions of this 
> draft, also), I figured out that for most Lucene files, the data inside is 
> not aligned at all.
> We can't fix this easily and it's also not always important, but some files 
> should really have a CPU fieldly alignment from beginning! This is 
> escpecially important when we use slices().
> I got many tests with aligned VarHandles to pass, but it broke instantly, if 
> the file was inside a Compound CFS file.
> CompoundFormat.write() just appends all data to the IndexOutput and writes 
> the offset to the entries file. The fix to make at least file starts aligned 
> is to just write some null-bytes between the files, so startOffset is aligned 
> to multiples of 8 bytes.
> At a later stage we could also think of aligning to LBA 
> blocks/sectors/whatever to make OS paging work better. But for performance of 
> index access, slices of compound files when memory mapped should at least 
> align to 8 bytes.
> Fix is easy: Just add some modulo on startOffset and write some extra bytes 
> before the next file is serialized. The change is only 2 lines. It does not 
> even change index format!
> I'd like to get this in for 9.0 so we can at least say: our CFS files are 
> aligned. Aligning other files like docvalues to better help CPU is then 
> possible.
> I will provide a simple pull request for Lucene90CompoundFormat soon. If you 
> don't see any problems, this is a no-brainer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to