cbalci opened a new pull request, #10394: URL: https://github.com/apache/pinot/pull/10394
**Background** Apache Spark has [changed](https://blog.madhukaraphatak.com/spark-3-datasource-v2-part-3) the Datasource interface significantly between Spark2 and Spark3, so current pinot-spark-connector doesn't work for Spark3. In a previous PR(#10321) I refactored the spark-connector into two modules (`pinot-spark-common` and `pinot-spark-2-connector`) to be able to reuse shared logic which gives us a clean base to implement the new version. **Change** In this PR I'm implementing the DataSourceV2 interface as published by Spark3. Functionality is exactly same as Pinot Spark 2 Connector and it supports all existing configuration options such as: - Ability to read from REALTIME, OFFLINE or HYBRID Pinot tables - Ability to scan using HTTP or GRPC server endpoints - Column pruning and filter push down - etc. (see docs) It can be used as a drop in replacement when migrating from Spark2 to Spark3. Spark3 also brings some new features and improvements such as 'Aggregation push down' which can be taken advantage of in the future. **Testing** I added basic unit test coverage as well as a good list of integration tests under `ExampleSparkPinotConnectorTest` similar to Spark2 Connector. `feature` `release-notes` (Added Spark3 support for Pinot Spark Connector) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org