Copilot commented on code in PR #17449: URL: https://github.com/apache/pinot/pull/17449#discussion_r2656397435
########## AGENTS.md: ########## @@ -0,0 +1,106 @@ +# Apache Pinot - AGENTS Guide + +This file provides quick, practical guidance for coding agents working in this +repo. It is intentionally short and focused on day-to-day work. + +## Project overview +- Apache Pinot is a real-time distributed OLAP datastore for low-latency + analytics over streaming and batch data. +- Core runtime roles: broker (query routing), server (segment storage/execution), + controller (cluster metadata/management), minion (async tasks). + +## Repository layout (high level) +- pinot-broker: broker query planning and scatter-gather. +- pinot-controller: controller APIs, table/segment metadata, Helix management. +- pinot-server: server query execution, segment loading, indexing. +- pinot-minion: background tasks (segment conversion, purge, etc). +- pinot-common / pinot-spi: shared utils, config, and SPI interfaces. +- pinot-segment-local / pinot-segment-spi: segment generation, indexes, storage. +- pinot-query-planner / pinot-query-runtime: multi-stage query (MSQ) engine. +- pinot-connectors: external tooling to connect to Pinot +- pinot-plugins: all pinot plugins. +- pinot-tools: CLI and quickstart scripts. +- pinot-integration-tests: end-to-end validation suites. +- pinot-distribution: packaging artifacts. + +## pinot-plugins modules +- pinot-input-format: input format plugin family. + - pinot-arrow: Apache Arrow input format support. + - pinot-avro: Avro input format support. + - pinot-avro-base: shared Avro utilities and base classes. + - pinot-clp-log: CLP log input format support. + - pinot-confluent-avro: Confluent Schema Registry Avro input support. + - pinot-confluent-json: Confluent Schema Registry JSON input support. + - pinot-confluent-protobuf: Confluent Schema Registry Protobuf input support. + - pinot-orc: ORC input format support. + - pinot-json: JSON input format support. + - pinot-parquet: Parquet input format support. + - pinot-csv: CSV input format support. + - pinot-thrift: Thrift input format support. + - pinot-protobuf: Protobuf input format support. +- pinot-file-system: filesystem plugin family. + - pinot-adls: Azure Data Lake Storage (ADLS) filesystem support. + - pinot-hdfs: Hadoop HDFS filesystem support. + - pinot-gcs: Google Cloud Storage filesystem support. + - pinot-s3: Amazon S3 filesystem support. +- pinot-batch-ingestion: batch ingestion plugin family. + - pinot-batch-ingestion-common: shared batch ingestion APIs and utilities. + - pinot-batch-ingestion-spark-base: shared Spark ingestion base classes. + - pinot-batch-ingestion-spark-2.4: Spark 2.4 ingestion implementation. + - pinot-batch-ingestion-spark-3: Spark 3 ingestion implementation. + - pinot-batch-ingestion-hadoop: Hadoop MapReduce ingestion implementation. + - pinot-batch-ingestion-standalone: standalone batch ingestion implementation. +- pinot-stream-ingestion: stream ingestion plugin family. + - pinot-kafka-base: shared Kafka ingestion base classes. + - pinot-kafka-2.0: Kafka 2.x ingestion implementation. + - pinot-kafka-3.0: Kafka 3.x ingestion implementation. + - pinot-kinesis: AWS Kinesis ingestion implementation. + - pinot-pulsar: Apache Pulsar ingestion implementation. +- pinot-minion-tasks: minion task plugin family. + - pinot-minion-builtin-tasks: built-in minion task implementations. +- pinot-metrics: metrics reporter plugin family. + - pinot-dropwizard: Dropwizard Metrics reporter implementation. + - pinot-yammer: Yammer Metrics reporter implementation. + - pinot-compound-metrics: compound metrics implementation. +- pinot-segment-writer: segment writer plugin family. + - pinot-segment-writer-file-based: file-based segment writer implementation. +- pinot-segment-uploader: segment uploader plugin family. + - pinot-segment-uploader-default: default segment uploader implementation. +- pinot-environment: environment provider plugin family. + - pinot-azure: Azure environment provider implementation. +- pinot-timeseries-lang: time series language plugin family. + - pinot-timeseries-m3ql: M3QL language plugin implementation. +- assembly-descriptor: Maven assembly descriptor for plugin packaging. + +## Build and test +- Default build: `./mvnw clean install` +- Faster dev build: `./mvnw verify -Ppinot-fastdev` +- Full binary/shaded build: + `./mvnw clean install -DskipTests -Pbin-dist -Pbuild-shaded-jar` +- Build a module with deps: `./mvnw -pl pinot-server -am test` +- Single test: `./mvnw -pl pinot-segment-local -Dtest=ClassName test` +- Quickstart (after build): `build/bin/quick-start-batch.sh` + +## Integration tests +- Single integration test: `./mvnw -pl pinot-integration-tests -Dtest=ClassName test -am -Dsurefire.failIfNoSpecifiedTests=false` + +## Coding conventions and hygiene +- Add class-level Javadoc for new classes; describe behavior and thread-safety. +- Keep license headers on all new source files. +- Use `./mvnw license:format` to add headers to new files. +- Preserve backward compatibility across mixed-version broker/server/controller. +- Prefer targeted unit tests; use integration tests when behavior crosses roles. + +## Checkstyle config +- Checkstyle rules and related config files live under `config/`. +- Use `./mvnw spotless:apply checkstyle:check` to ensure check style passed. Review Comment: Corrected 'check style' to 'checkstyle' for consistency with the section heading and standard terminology. ```suggestion - Use `./mvnw spotless:apply checkstyle:check` to ensure checkstyle passed. ``` ########## AGENTS.md: ########## @@ -0,0 +1,106 @@ +# Apache Pinot - AGENTS Guide + +This file provides quick, practical guidance for coding agents working in this +repo. It is intentionally short and focused on day-to-day work. + +## Project overview +- Apache Pinot is a real-time distributed OLAP datastore for low-latency + analytics over streaming and batch data. +- Core runtime roles: broker (query routing), server (segment storage/execution), + controller (cluster metadata/management), minion (async tasks). + +## Repository layout (high level) +- pinot-broker: broker query planning and scatter-gather. +- pinot-controller: controller APIs, table/segment metadata, Helix management. +- pinot-server: server query execution, segment loading, indexing. +- pinot-minion: background tasks (segment conversion, purge, etc). +- pinot-common / pinot-spi: shared utils, config, and SPI interfaces. +- pinot-segment-local / pinot-segment-spi: segment generation, indexes, storage. +- pinot-query-planner / pinot-query-runtime: multi-stage query (MSQ) engine. +- pinot-connectors: external tooling to connect to Pinot +- pinot-plugins: all pinot plugins. +- pinot-tools: CLI and quickstart scripts. +- pinot-integration-tests: end-to-end validation suites. +- pinot-distribution: packaging artifacts. + +## pinot-plugins modules +- pinot-input-format: input format plugin family. + - pinot-arrow: Apache Arrow input format support. + - pinot-avro: Avro input format support. + - pinot-avro-base: shared Avro utilities and base classes. + - pinot-clp-log: CLP log input format support. + - pinot-confluent-avro: Confluent Schema Registry Avro input support. + - pinot-confluent-json: Confluent Schema Registry JSON input support. + - pinot-confluent-protobuf: Confluent Schema Registry Protobuf input support. + - pinot-orc: ORC input format support. + - pinot-json: JSON input format support. + - pinot-parquet: Parquet input format support. + - pinot-csv: CSV input format support. + - pinot-thrift: Thrift input format support. + - pinot-protobuf: Protobuf input format support. +- pinot-file-system: filesystem plugin family. + - pinot-adls: Azure Data Lake Storage (ADLS) filesystem support. + - pinot-hdfs: Hadoop HDFS filesystem support. + - pinot-gcs: Google Cloud Storage filesystem support. + - pinot-s3: Amazon S3 filesystem support. +- pinot-batch-ingestion: batch ingestion plugin family. + - pinot-batch-ingestion-common: shared batch ingestion APIs and utilities. + - pinot-batch-ingestion-spark-base: shared Spark ingestion base classes. + - pinot-batch-ingestion-spark-2.4: Spark 2.4 ingestion implementation. + - pinot-batch-ingestion-spark-3: Spark 3 ingestion implementation. + - pinot-batch-ingestion-hadoop: Hadoop MapReduce ingestion implementation. + - pinot-batch-ingestion-standalone: standalone batch ingestion implementation. +- pinot-stream-ingestion: stream ingestion plugin family. + - pinot-kafka-base: shared Kafka ingestion base classes. + - pinot-kafka-2.0: Kafka 2.x ingestion implementation. + - pinot-kafka-3.0: Kafka 3.x ingestion implementation. + - pinot-kinesis: AWS Kinesis ingestion implementation. + - pinot-pulsar: Apache Pulsar ingestion implementation. +- pinot-minion-tasks: minion task plugin family. + - pinot-minion-builtin-tasks: built-in minion task implementations. +- pinot-metrics: metrics reporter plugin family. + - pinot-dropwizard: Dropwizard Metrics reporter implementation. + - pinot-yammer: Yammer Metrics reporter implementation. + - pinot-compound-metrics: compound metrics implementation. +- pinot-segment-writer: segment writer plugin family. + - pinot-segment-writer-file-based: file-based segment writer implementation. +- pinot-segment-uploader: segment uploader plugin family. + - pinot-segment-uploader-default: default segment uploader implementation. +- pinot-environment: environment provider plugin family. + - pinot-azure: Azure environment provider implementation. +- pinot-timeseries-lang: time series language plugin family. + - pinot-timeseries-m3ql: M3QL language plugin implementation. +- assembly-descriptor: Maven assembly descriptor for plugin packaging. + +## Build and test +- Default build: `./mvnw clean install` +- Faster dev build: `./mvnw verify -Ppinot-fastdev` +- Full binary/shaded build: + `./mvnw clean install -DskipTests -Pbin-dist -Pbuild-shaded-jar` +- Build a module with deps: `./mvnw -pl pinot-server -am test` +- Single test: `./mvnw -pl pinot-segment-local -Dtest=ClassName test` +- Quickstart (after build): `build/bin/quick-start-batch.sh` + +## Integration tests +- Single integration test: `./mvnw -pl pinot-integration-tests -Dtest=ClassName test -am -Dsurefire.failIfNoSpecifiedTests=false` + +## Coding conventions and hygiene +- Add class-level Javadoc for new classes; describe behavior and thread-safety. +- Keep license headers on all new source files. +- Use `./mvnw license:format` to add headers to new files. +- Preserve backward compatibility across mixed-version broker/server/controller. +- Prefer targeted unit tests; use integration tests when behavior crosses roles. + +## Checkstyle config +- Checkstyle rules and related config files live under `config/`. +- Use `./mvnw spotless:apply checkstyle:check` to ensure check style passed. Review Comment: Changed 'passed' to 'passes' for grammatical correctness. ```suggestion - Use `./mvnw spotless:apply checkstyle:check` to ensure check style passes. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
