Copilot commented on code in PR #17449: URL: https://github.com/apache/pinot/pull/17449#discussion_r2656427801
########## AGENTS.md: ########## @@ -0,0 +1,107 @@ +# Apache Pinot - AGENTS Guide + +This file provides quick, practical guidance for coding agents working in this +repo. It is intentionally short and focused on day-to-day work. + +## Project overview +- Apache Pinot is a real-time distributed OLAP datastore for low-latency + analytics over streaming and batch data. +- Core runtime roles: broker (query routing), server (segment storage/execution), + controller (cluster metadata/management), minion (async tasks). + +## Repository layout (high level) +- pinot-broker: broker query planning and scatter-gather. +- pinot-controller: controller APIs, table/segment metadata, Helix management. +- pinot-server: server query execution, segment loading, indexing. +- pinot-minion: background tasks (segment conversion, purge, etc). +- pinot-common / pinot-spi: shared utils, config, and SPI interfaces. +- pinot-segment-local / pinot-segment-spi: segment generation, indexes, storage. +- pinot-query-planner / pinot-query-runtime: multi-stage query (MSQ) engine. +- pinot-connectors: external tooling to connect to Pinot +- pinot-plugins: all pinot plugins. +- pinot-tools: CLI and quickstart scripts. +- pinot-integration-tests: end-to-end validation suites. +- pinot-distribution: packaging artifacts. + +## pinot-plugins modules +- pinot-input-format: input format plugin family. + - pinot-arrow: Apache Arrow input format support. + - pinot-avro: Avro input format support. + - pinot-avro-base: shared Avro utilities and base classes. + - pinot-clp-log: CLP log input format support. + - pinot-confluent-avro: Confluent Schema Registry Avro input support. + - pinot-confluent-json: Confluent Schema Registry JSON input support. + - pinot-confluent-protobuf: Confluent Schema Registry Protobuf input support. + - pinot-orc: ORC input format support. + - pinot-json: JSON input format support. + - pinot-parquet: Parquet input format support. + - pinot-csv: CSV input format support. + - pinot-thrift: Thrift input format support. + - pinot-protobuf: Protobuf input format support. +- pinot-file-system: filesystem plugin family. + - pinot-adls: Azure Data Lake Storage (ADLS) filesystem support. + - pinot-hdfs: Hadoop HDFS filesystem support. + - pinot-gcs: Google Cloud Storage filesystem support. + - pinot-s3: Amazon S3 filesystem support. +- pinot-batch-ingestion: batch ingestion plugin family. + - pinot-batch-ingestion-common: shared batch ingestion APIs and utilities. + - pinot-batch-ingestion-spark-base: shared Spark ingestion base classes. + - pinot-batch-ingestion-spark-2.4: Spark 2.4 ingestion implementation. + - pinot-batch-ingestion-spark-3: Spark 3 ingestion implementation. + - pinot-batch-ingestion-hadoop: Hadoop MapReduce ingestion implementation. + - pinot-batch-ingestion-standalone: standalone batch ingestion implementation. +- pinot-stream-ingestion: stream ingestion plugin family. + - pinot-kafka-base: shared Kafka ingestion base classes. + - pinot-kafka-2.0: Kafka 2.x ingestion implementation. + - pinot-kafka-3.0: Kafka 3.x ingestion implementation. + - pinot-kinesis: AWS Kinesis ingestion implementation. + - pinot-pulsar: Apache Pulsar ingestion implementation. +- pinot-minion-tasks: minion task plugin family. + - pinot-minion-builtin-tasks: built-in minion task implementations. +- pinot-metrics: metrics reporter plugin family. + - pinot-dropwizard: Dropwizard Metrics reporter implementation. + - pinot-yammer: Yammer Metrics reporter implementation. + - pinot-compound-metrics: compound metrics implementation. +- pinot-segment-writer: segment writer plugin family. + - pinot-segment-writer-file-based: file-based segment writer implementation. +- pinot-segment-uploader: segment uploader plugin family. + - pinot-segment-uploader-default: default segment uploader implementation. +- pinot-environment: environment provider plugin family. + - pinot-azure: Azure environment provider implementation. +- pinot-timeseries-lang: time series language plugin family. + - pinot-timeseries-m3ql: M3QL language plugin implementation. +- assembly-descriptor: Maven assembly descriptor for plugin packaging. + +## Build and test +- Default build: `./mvnw clean install` +- Faster dev build: `./mvnw verify -Ppinot-fastdev` +- Full binary/shaded build: + `./mvnw clean install -DskipTests -Pbin-dist -Pbuild-shaded-jar` +- Build a module with deps: `./mvnw -pl pinot-server -am test` +- Single test: `./mvnw -pl pinot-segment-local -Dtest=ClassName test` +- Quickstart (after build): `build/bin/quick-start-batch.sh` + +## Integration tests +- Single integration test: `./mvnw -pl pinot-integration-tests -Dtest=ClassName test -am -Dsurefire.failIfNoSpecifiedTests=false` + +## Coding conventions and hygiene +- Add class-level Javadoc for new classes; describe behavior and thread-safety. +- Use `/** ... */` Javadoc blocks; `///` is also supported on newer JDKs. Review Comment: This line should clarify that `///` syntax is supported in JDK 23+ as specified in JEP-467, not just 'newer JDKs'. This aligns with the custom coding guidelines which reference JEP-467 specifically. ########## AGENTS.md: ########## @@ -0,0 +1,107 @@ +# Apache Pinot - AGENTS Guide + +This file provides quick, practical guidance for coding agents working in this +repo. It is intentionally short and focused on day-to-day work. + +## Project overview +- Apache Pinot is a real-time distributed OLAP datastore for low-latency + analytics over streaming and batch data. +- Core runtime roles: broker (query routing), server (segment storage/execution), + controller (cluster metadata/management), minion (async tasks). + +## Repository layout (high level) +- pinot-broker: broker query planning and scatter-gather. +- pinot-controller: controller APIs, table/segment metadata, Helix management. +- pinot-server: server query execution, segment loading, indexing. +- pinot-minion: background tasks (segment conversion, purge, etc). +- pinot-common / pinot-spi: shared utils, config, and SPI interfaces. +- pinot-segment-local / pinot-segment-spi: segment generation, indexes, storage. +- pinot-query-planner / pinot-query-runtime: multi-stage query (MSQ) engine. +- pinot-connectors: external tooling to connect to Pinot +- pinot-plugins: all pinot plugins. +- pinot-tools: CLI and quickstart scripts. +- pinot-integration-tests: end-to-end validation suites. +- pinot-distribution: packaging artifacts. + +## pinot-plugins modules +- pinot-input-format: input format plugin family. + - pinot-arrow: Apache Arrow input format support. + - pinot-avro: Avro input format support. + - pinot-avro-base: shared Avro utilities and base classes. + - pinot-clp-log: CLP log input format support. + - pinot-confluent-avro: Confluent Schema Registry Avro input support. + - pinot-confluent-json: Confluent Schema Registry JSON input support. + - pinot-confluent-protobuf: Confluent Schema Registry Protobuf input support. + - pinot-orc: ORC input format support. + - pinot-json: JSON input format support. + - pinot-parquet: Parquet input format support. + - pinot-csv: CSV input format support. + - pinot-thrift: Thrift input format support. + - pinot-protobuf: Protobuf input format support. +- pinot-file-system: filesystem plugin family. + - pinot-adls: Azure Data Lake Storage (ADLS) filesystem support. + - pinot-hdfs: Hadoop HDFS filesystem support. + - pinot-gcs: Google Cloud Storage filesystem support. + - pinot-s3: Amazon S3 filesystem support. +- pinot-batch-ingestion: batch ingestion plugin family. + - pinot-batch-ingestion-common: shared batch ingestion APIs and utilities. + - pinot-batch-ingestion-spark-base: shared Spark ingestion base classes. + - pinot-batch-ingestion-spark-2.4: Spark 2.4 ingestion implementation. + - pinot-batch-ingestion-spark-3: Spark 3 ingestion implementation. + - pinot-batch-ingestion-hadoop: Hadoop MapReduce ingestion implementation. + - pinot-batch-ingestion-standalone: standalone batch ingestion implementation. +- pinot-stream-ingestion: stream ingestion plugin family. + - pinot-kafka-base: shared Kafka ingestion base classes. + - pinot-kafka-2.0: Kafka 2.x ingestion implementation. + - pinot-kafka-3.0: Kafka 3.x ingestion implementation. + - pinot-kinesis: AWS Kinesis ingestion implementation. + - pinot-pulsar: Apache Pulsar ingestion implementation. +- pinot-minion-tasks: minion task plugin family. + - pinot-minion-builtin-tasks: built-in minion task implementations. +- pinot-metrics: metrics reporter plugin family. + - pinot-dropwizard: Dropwizard Metrics reporter implementation. + - pinot-yammer: Yammer Metrics reporter implementation. + - pinot-compound-metrics: compound metrics implementation. +- pinot-segment-writer: segment writer plugin family. + - pinot-segment-writer-file-based: file-based segment writer implementation. +- pinot-segment-uploader: segment uploader plugin family. + - pinot-segment-uploader-default: default segment uploader implementation. +- pinot-environment: environment provider plugin family. + - pinot-azure: Azure environment provider implementation. +- pinot-timeseries-lang: time series language plugin family. + - pinot-timeseries-m3ql: M3QL language plugin implementation. +- assembly-descriptor: Maven assembly descriptor for plugin packaging. + +## Build and test +- Default build: `./mvnw clean install` +- Faster dev build: `./mvnw verify -Ppinot-fastdev` +- Full binary/shaded build: + `./mvnw clean install -DskipTests -Pbin-dist -Pbuild-shaded-jar` +- Build a module with deps: `./mvnw -pl pinot-server -am test` +- Single test: `./mvnw -pl pinot-segment-local -Dtest=ClassName test` +- Quickstart (after build): `build/bin/quick-start-batch.sh` + +## Integration tests +- Single integration test: `./mvnw -pl pinot-integration-tests -Dtest=ClassName test -am -Dsurefire.failIfNoSpecifiedTests=false` Review Comment: The command uses a generic placeholder 'ClassName'. Consider providing a concrete example (e.g., 'RangeIndexTest') to make this more practical and immediately usable, which aligns with the stated goal of 'quick, practical guidance'. ```suggestion - Single test: `./mvnw -pl pinot-segment-local -Dtest=RangeIndexTest test` - Quickstart (after build): `build/bin/quick-start-batch.sh` ## Integration tests - Single integration test: `./mvnw -pl pinot-integration-tests -Dtest=OfflineClusterIntegrationTest test -am -Dsurefire.failIfNoSpecifiedTests=false` ``` ########## AGENTS.md: ########## @@ -0,0 +1,107 @@ +# Apache Pinot - AGENTS Guide + +This file provides quick, practical guidance for coding agents working in this +repo. It is intentionally short and focused on day-to-day work. + +## Project overview +- Apache Pinot is a real-time distributed OLAP datastore for low-latency + analytics over streaming and batch data. +- Core runtime roles: broker (query routing), server (segment storage/execution), + controller (cluster metadata/management), minion (async tasks). + +## Repository layout (high level) +- pinot-broker: broker query planning and scatter-gather. +- pinot-controller: controller APIs, table/segment metadata, Helix management. +- pinot-server: server query execution, segment loading, indexing. +- pinot-minion: background tasks (segment conversion, purge, etc). +- pinot-common / pinot-spi: shared utils, config, and SPI interfaces. +- pinot-segment-local / pinot-segment-spi: segment generation, indexes, storage. +- pinot-query-planner / pinot-query-runtime: multi-stage query (MSQ) engine. +- pinot-connectors: external tooling to connect to Pinot +- pinot-plugins: all pinot plugins. +- pinot-tools: CLI and quickstart scripts. +- pinot-integration-tests: end-to-end validation suites. +- pinot-distribution: packaging artifacts. + +## pinot-plugins modules +- pinot-input-format: input format plugin family. + - pinot-arrow: Apache Arrow input format support. + - pinot-avro: Avro input format support. + - pinot-avro-base: shared Avro utilities and base classes. + - pinot-clp-log: CLP log input format support. + - pinot-confluent-avro: Confluent Schema Registry Avro input support. + - pinot-confluent-json: Confluent Schema Registry JSON input support. + - pinot-confluent-protobuf: Confluent Schema Registry Protobuf input support. + - pinot-orc: ORC input format support. + - pinot-json: JSON input format support. + - pinot-parquet: Parquet input format support. + - pinot-csv: CSV input format support. + - pinot-thrift: Thrift input format support. + - pinot-protobuf: Protobuf input format support. +- pinot-file-system: filesystem plugin family. + - pinot-adls: Azure Data Lake Storage (ADLS) filesystem support. + - pinot-hdfs: Hadoop HDFS filesystem support. + - pinot-gcs: Google Cloud Storage filesystem support. + - pinot-s3: Amazon S3 filesystem support. +- pinot-batch-ingestion: batch ingestion plugin family. + - pinot-batch-ingestion-common: shared batch ingestion APIs and utilities. + - pinot-batch-ingestion-spark-base: shared Spark ingestion base classes. + - pinot-batch-ingestion-spark-2.4: Spark 2.4 ingestion implementation. + - pinot-batch-ingestion-spark-3: Spark 3 ingestion implementation. + - pinot-batch-ingestion-hadoop: Hadoop MapReduce ingestion implementation. + - pinot-batch-ingestion-standalone: standalone batch ingestion implementation. +- pinot-stream-ingestion: stream ingestion plugin family. + - pinot-kafka-base: shared Kafka ingestion base classes. + - pinot-kafka-2.0: Kafka 2.x ingestion implementation. + - pinot-kafka-3.0: Kafka 3.x ingestion implementation. + - pinot-kinesis: AWS Kinesis ingestion implementation. + - pinot-pulsar: Apache Pulsar ingestion implementation. +- pinot-minion-tasks: minion task plugin family. + - pinot-minion-builtin-tasks: built-in minion task implementations. +- pinot-metrics: metrics reporter plugin family. + - pinot-dropwizard: Dropwizard Metrics reporter implementation. + - pinot-yammer: Yammer Metrics reporter implementation. + - pinot-compound-metrics: compound metrics implementation. +- pinot-segment-writer: segment writer plugin family. + - pinot-segment-writer-file-based: file-based segment writer implementation. +- pinot-segment-uploader: segment uploader plugin family. + - pinot-segment-uploader-default: default segment uploader implementation. +- pinot-environment: environment provider plugin family. + - pinot-azure: Azure environment provider implementation. +- pinot-timeseries-lang: time series language plugin family. + - pinot-timeseries-m3ql: M3QL language plugin implementation. +- assembly-descriptor: Maven assembly descriptor for plugin packaging. + +## Build and test +- Default build: `./mvnw clean install` +- Faster dev build: `./mvnw verify -Ppinot-fastdev` +- Full binary/shaded build: + `./mvnw clean install -DskipTests -Pbin-dist -Pbuild-shaded-jar` +- Build a module with deps: `./mvnw -pl pinot-server -am test` +- Single test: `./mvnw -pl pinot-segment-local -Dtest=ClassName test` +- Quickstart (after build): `build/bin/quick-start-batch.sh` + +## Integration tests +- Single integration test: `./mvnw -pl pinot-integration-tests -Dtest=ClassName test -am -Dsurefire.failIfNoSpecifiedTests=false` Review Comment: Similar to the unit test command, using a concrete example instead of 'ClassName' (e.g., 'ClusterIntegrationTest') would provide more practical, immediately usable guidance. ```suggestion - Single integration test: `./mvnw -pl pinot-integration-tests -Dtest=ClusterIntegrationTest test -am -Dsurefire.failIfNoSpecifiedTests=false` ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
