[GitHub] [pinot] gortiz opened a new issue, #10783: Extensible PinotDataBuffer SPI

via GitHub Fri, 19 May 2023 02:45:15 -0700


gortiz opened a new issue, #10783:
URL: https://github.com/apache/pinot/issues/10783


   This is a PEP-Request. I propose to create a extensible SPI that can be used 
to change the PinotDataBuffer implementation used at runtime. Treat this issue 
as a clone of https://github.com/apache/pinot/issues/9162, but more formal.
   
   # What needs to be done?
   
   Right now PinotDataBuffer instances are obtained by calling static methods:
   - PinotDataBuffer allocateDirect(long size, ByteOrder byteOrder, String 
description)
   - PinotDataBuffer loadFile(File file, long offset, long size, ByteOrder 
byteOrder, String description)
   - PinotDataBuffer mapFile(File file, boolean readOnly, long offset, long 
size, ByteOrder byteOrder, String description)
   
   These methods have a static implementation that return a PinotDataBuffer 
that is backed by a ByteBuffer or by a LArray, depending on whether the 
requested size is greater than 2GBs or not. ByteBuffers are faster and more 
reliable than LArray, but they cannot be larger than Integer.MAX_INT (aka 2GB - 
1byte).
   
   Here I propose to change that implementation. Instead of having a static 
implementation, the algorithm used to instantiate the PinotDataBuffer will be 
delegated to an interface. The specific implementation will be set at Pinot 
startup time by reading the property `pinot.offheap.buffer.factory`, which will 
be an optional property whose value is a String. In case it exists, it should 
the qualified class name of a Class that implements `PinotDataBufferFactory`. 
There will be also another optional property called 
`pinot.offheap.prioritize.bytebuffer` which will be a boolean. If it is true, 
then the ByteBuffer implementation will be used when less than 2GB buffers are 
requested.
   
   # Why the feature is needed 
   
   The reason to be able to have new PinotDataBuffer implementations is that 
LArray is not compatible with Java >= 16, as detected in #8529. It also seems 
that our LArray implementation may have some bugs (see 
https://github.com/apache/pinot/pull/10774) and it doesn't seem that LArray 
will be updated (See https://github.com/xerial/larray/issues/75), so we need to 
move on.
   
   # Initial idea/proposal
   
   Adding a SPI is something we have done several times in the last 
months/years. Most of the times we use one or more configuration properties to 
decide which instance we need to use and set that instance in a static 
attribute. That system works, but it is problematic when we want to change the 
implementation in an isolated way (for example, when doing tests).
   
   That is why I propose to extend this system with a thread local on top of 
that. Something like:
   
   ```java
     /**
      * The default {@link PinotBufferFactory} used by all threads that do not 
define their own factory.
      */
     private static PinotBufferFactory _defaultFactory = createDefaultFactory();
     /**
      * A thread local variable that can be used to customize the {@link 
PinotBufferFactory} used on tests. This is mostly
      * useful in tests.
      */
     private static final ThreadLocal<PinotBufferFactory> _FACTORY = new 
ThreadLocal<>();
   
     /**
      * Creates the default factory depending on the JVM version
      */
     public static PinotBufferFactory createDefaultFactory() {...}
   
     /**
      * Changes the default factory
      */
     public static void setDefaultFactory(PinotBufferFactory) {...}
   
     /**
      * Change the {@link PinotBufferFactory} used by the current thread.
      *
      * @see #loadDefaultFactory(PinotConfiguration)
      */
     public static void useFactory(PinotBufferFactory factory) {
       _FACTORY.set(factory);
     }
   
     /**
      * Returns the factory the current thread should use.
      */
     public static PinotBufferFactory getFactory() {
       PinotBufferFactory pinotBufferFactory = _FACTORY.get();
       if (pinotBufferFactory == null) {
         pinotBufferFactory = _defaultFactory;
       }
       return pinotBufferFactory;
     }
   ```
   
   Then the static methods PinotDataBuffer.allocateDirect (and others) should 
delegate on PinotDataBuffer.getFactory(). By doing that, tests can call 
`PinotDataBuffer.useFactory()` in order to use, on that thread, the buffer 
library they want to test. Exploratory draft 
https://github.com/apache/pinot/pull/10528 implements all of these and modifies 
tests to use this API.
   
   # Actual buffer implementations
   
   This is an optional part of the PEP. What we request is to have the 
PinotBufferFactory SPI. This is what we would like to do with it, but it could 
be part of another PEP if necessary.
   
   Given that we cannot use LArray in modern versions of Java, we have three 
alternatives :
   - Use other third party libraries
   - Implement our own library on top of Unsafe
   - Use Foreign Memory API
   
   I tried to use Chronicle Bytes in the past with partial success, see 
https://github.com/apache/pinot/pull/9842. Given that I don't know more third 
party buffer libraries that use long offsets (Netty and Agrona use ints), I 
explored the last two alternatives in 
https://github.com/apache/pinot/pull/10528.
   
   Foreign Memory API is clearly the future, as it provides exactly what we 
need: A high performance buffer API that is maintained (given that will be 
included in the JVM) and uses longs as offsets and sizes. But it has two issues:
   - It is not included in Java 17
   - It is still in preview mode and it will continue like that in Java 21 (see 
[this post where one of the author explains the reasons and the changes since 
Java 
20](https://github.com/minborg/articles/tree/jep442/2023/March/22-jep442-FFM-Third-Preview)).
   
   Therefore we have three options:
   1. Wait until it is stable (possible in Java 22 or 23, which won't be LTS, 
so maybe we need to wait 2 years until Java 25).
   2. Use it as a preview in Java 21. This would imply that we could either run 
with Java 11+LArray or Java 21+Foreign, but do not support Java 17. Also, in 
Java 21 we would need to compile and start with --enable-preview.
   3. Create our own library.
   
   Previously mentioned https://github.com/apache/pinot/pull/10528 does create 
two buffer implementations: One on top of Unsafe and another on top of Foreign 
Memory API. The latter is trivial to implement, but it is very difficult to 
include in the current Apache Pinot CI. The problems I found are:
   - Some Maven plugins don't work in Java 21-ea (at least spotless doesn't 
work)
   - GitHub Actions cannot be configured to use Java 21-ea (see 
https://github.com/actions/setup-java/issues/492#issuecomment-1551508335).
   - In general, it is difficult to maintain in the same compilation unit code 
that has to be compiled with Java 11 and code that has to be compiled with Java 
21
   - Presto still requires to use Java 8, which makes it even more difficult.
   
   In case we decide to use my draft in the actual implementation of the PEP, I 
would suggest to remove the Foreign Memory API implementation from the branch 
and optionally move it to another github repo.
   
   I've also modified BenchmarkPinotDataBuffer in order to run with the Unsafe 
based implementation. The modifications are in the draft.
   I've run the benchmark in a M1 Pro and in a Ryzen 9 3900X with Ubuntu 22.10. 
The exact results of the benchmark can be found 
[here](https://docs.google.com/spreadsheets/d/1xgxR6_mMwq2zhe8ZkpR5X4VKLfvwyTqT6qFZsMkh49E/edit?usp=sharing)
 but the following chars should be good enough:
   
   Please note that LArray only runs in Java 11, so there are no data with 
LArray in Java 17 and 21.
   
   This one shows the cost of executing batch writes (aka call 
PinotDataBuffer.readFrom) and batch read (aka call PinotDataBuffer.copyTo) with 
a byte array of 1024 elements:
   
![Batch](https://github.com/apache/pinot/assets/1913993/2f5c22fd-a673-4410-ae5f-039f9147bdce)
   
   It is important to note that the implementation used in both LArray and 
Unsafe when dealing with batch reads and writes is to create a temporal direct 
byte buffer that points to the same address of the current buffer, so we can 
assume that the performance difference is due to the new instance creation.
   
   This one shows the cost of executing non batch writes (calling 
PinotDataBuffer.putByte()) and reads (calling PinotDataBuffer.getByte) in a 
loop with 1024 consecutive offsets starting from a random one:
   ![No 
batch](https://github.com/apache/pinot/assets/1913993/1fe73241-8ead-42f0-b5d8-b4fae426af75)
   
   In my opinion the latest is the most interesting, as it shows the 
improvements introduced in Java 17 and 21 JIT when dealing with loops.
   
   I've also tried to add the Foreign Memory API implementation in the 
benchmark, but I found several problems so I gave up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] gortiz opened a new issue, #10783: Extensible PinotDataBuffer SPI

Reply via email to