ChrisHegarty commented on issue #12482: URL: https://github.com/apache/lucene/issues/12482#issuecomment-1660728819
Thanks for engaging :-) > * boundary conditions: if the file is ginormous, we don't just map it to a single memory segment but multiple segments i think, how can any new api prevent this complexity from working its way into the vector formulas (which are already complicated enough, this would be too much) Yeah, as @uschindler said, depending on how this was structured we likely fallback to a copy. I don't expect this to be too common as the default is 16GB. But of course it has to work correctly. > * alignment issues: afaik, with heap arrays there is some amount of alignment by the jvm, but with mapped files this may not be the case unless we do extra work. do we care? what is the penalty (on common arm/x86 cpus)? You are spot on, alignment does affect things. As it already does - when reading from an unaligned location in the segment to a float[]. I quickly updated the micro benchmark to read from an offset of one byte in the map file. The readFloats / memory segment copy hurts a lot! While the vector load fromMemorySegment hurts a lot less. This by itself deserves some investigation. ``` Benchmark (size) Mode Cnt Score Error Units FloatDotProductBenchmark.dotProductCopyFromArray 1024 thrpt 5 0.697 ± 0.002 ops/us FloatDotProductBenchmark.dotProductFromMemorySegment 1024 thrpt 5 15.864 ± 0.005 ops/us ``` > Please note: the idea for this issue was already discussed in the original mmapdir issues. I proposed this already there. Ok. So it is not new, and not crazy. Cool! >The problem with older panama vector apis (around java 17) was that there was no real byte/float buffer interop available. Implementations at that time still copied to heap. Ah ok. I think I'm still missing something. We don't want a ByteBuffer or MemorySegment per vector, right? This would be too much garbage. We just want to return a VectorXX loaded with the values from a given position and length. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org