[ https://issues.apache.org/jira/browse/LUCENE-10019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe Schindler updated LUCENE-10019: ----------------------------------- Fix Version/s: 9.0 > Align file starts in CFS files to have proper alignment (8 bytes) > ----------------------------------------------------------------- > > Key: LUCENE-10019 > URL: https://issues.apache.org/jira/browse/LUCENE-10019 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/store > Affects Versions: main (9.0) > Reporter: Uwe Schindler > Assignee: Uwe Schindler > Priority: Major > Fix For: 9.0 > > Time Spent: 1h > Remaining Estimate: 0h > > While discussing about MMapDirectory and fast access to file contents through > MMap (https://github.com/apache/lucene/pull/177 and previous versions of this > draft, also), I figured out that for most Lucene files, the data inside is > not aligned at all. > We can't fix this easily and it's also not always important, but some files > should really have a CPU fieldly alignment from beginning! This is > escpecially important when we use slices(). > I got many tests with aligned VarHandles to pass, but it broke instantly, if > the file was inside a Compound CFS file. > CompoundFormat.write() just appends all data to the IndexOutput and writes > the offset to the entries file. The fix to make at least file starts aligned > is to just write some null-bytes between the files, so startOffset is aligned > to multiples of 8 bytes. > At a later stage we could also think of aligning to LBA > blocks/sectors/whatever to make OS paging work better. But for performance of > index access, slices of compound files when memory mapped should at least > align to 8 bytes. > Fix is easy: Just add some modulo on startOffset and write some extra bytes > before the next file is serialized. The change is only 2 lines. It does not > even change index format! > I'd like to get this in for 9.0 so we can at least say: our CFS files are > aligned. Aligning other files like docvalues to better help CPU is then > possible. > I will provide a simple pull request for Lucene90CompoundFormat soon. If you > don't see any problems, this is a no-brainer. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org