[
https://issues.apache.org/jira/browse/LUCENE-10019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-10019:
-----------------------------------
Fix Version/s: 9.0
> Align file starts in CFS files to have proper alignment (8 bytes)
> -----------------------------------------------------------------
>
> Key: LUCENE-10019
> URL: https://issues.apache.org/jira/browse/LUCENE-10019
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs, core/store
> Affects Versions: main (9.0)
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Priority: Major
> Fix For: 9.0
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> While discussing about MMapDirectory and fast access to file contents through
> MMap (https://github.com/apache/lucene/pull/177 and previous versions of this
> draft, also), I figured out that for most Lucene files, the data inside is
> not aligned at all.
> We can't fix this easily and it's also not always important, but some files
> should really have a CPU fieldly alignment from beginning! This is
> escpecially important when we use slices().
> I got many tests with aligned VarHandles to pass, but it broke instantly, if
> the file was inside a Compound CFS file.
> CompoundFormat.write() just appends all data to the IndexOutput and writes
> the offset to the entries file. The fix to make at least file starts aligned
> is to just write some null-bytes between the files, so startOffset is aligned
> to multiples of 8 bytes.
> At a later stage we could also think of aligning to LBA
> blocks/sectors/whatever to make OS paging work better. But for performance of
> index access, slices of compound files when memory mapped should at least
> align to 8 bytes.
> Fix is easy: Just add some modulo on startOffset and write some extra bytes
> before the next file is serialized. The change is only 2 lines. It does not
> even change index format!
> I'd like to get this in for 9.0 so we can at least say: our CFS files are
> aligned. Aligning other files like docvalues to better help CPU is then
> possible.
> I will provide a simple pull request for Lucene90CompoundFormat soon. If you
> don't see any problems, this is a no-brainer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]