cbalci opened a new pull request, #10394:
URL: https://github.com/apache/pinot/pull/10394

   **Background**
   Apache Spark has 
[changed](https://blog.madhukaraphatak.com/spark-3-datasource-v2-part-3) the 
Datasource interface significantly between Spark2 and Spark3, so current 
pinot-spark-connector doesn't work for Spark3. In a previous PR(#10321) I 
refactored the spark-connector into two modules (`pinot-spark-common` and 
`pinot-spark-2-connector`) to be able to reuse shared logic which gives us a 
clean base to implement the new version.
   
   **Change**
   In this PR I'm implementing the DataSourceV2 interface as published by 
Spark3. Functionality is exactly same as Pinot Spark 2 Connector and it 
supports all existing configuration options such as:
   - Ability to read from REALTIME, OFFLINE or HYBRID Pinot tables
   - Ability to scan using HTTP or GRPC server endpoints
   - Column pruning and filter push down
   - etc. (see docs)
   
   
   It can be used as a drop in replacement when migrating from Spark2 to 
Spark3. Spark3 also brings some new features and improvements such as 
'Aggregation push down' which can be taken advantage of in the future.
   
   **Testing**
   I added basic unit test coverage as well as a good list of integration tests 
under `ExampleSparkPinotConnectorTest` similar to Spark2 Connector. 
   
   `feature`
   `release-notes` (Added Spark3 support for Pinot Spark Connector)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to