This is an automated email from the ASF dual-hosted git repository. mattisonchao pushed a commit to branch mattison/pip-298-authorization-metrics in repository https://gitbox.apache.org/repos/asf/pulsar.git
commit 0703dbd7af64098007e46d108f6077a427873c1f Author: mattisonchao <[email protected]> AuthorDate: Mon Apr 13 18:28:16 2026 +0800 Move authorization metrics proposal to PIP-466 --- pip/pip-298.md | 309 ++++++++++++++++++++--------------------- pip/{pip-298.md => pip-466.md} | 2 +- 2 files changed, 149 insertions(+), 162 deletions(-) diff --git a/pip/pip-298.md b/pip/pip-298.md index 7b4123f178b..ec154f228c8 100644 --- a/pip/pip-298.md +++ b/pip/pip-298.md @@ -1,212 +1,199 @@ -# PIP-298: Authorization Operation Metrics +# PIP-298: Support read transaction buffer snapshot segments from earliest -# Background knowledge +# Background -Pulsar brokers perform authorization checks before allowing clients, proxies, and administrative callers to access -topics, namespaces, tenants, brokers, and policy operations. These checks are enforced through the broker-side -`AuthorizationService`, which delegates the decision to the configured `AuthorizationProvider`. - -Pulsar already exposes several security-related metrics, especially around authentication. These metrics are useful for -detecting login failures, unhealthy client behavior, and changes in access patterns. However, Pulsar does not expose an -equivalent broker-level metric stream for authorization outcomes. In practice, authorization failures are primarily -visible through logs and request failures rather than through a dedicated metric. - -Pulsar also supports OpenTelemetry metrics. For operational consistency, new broker observability features should align -Prometheus-style metrics with OpenTelemetry counters rather than introducing instrumentation in only one pipeline. +In the implementation of the Pulsar Transaction, each topic is configured with a `Transaction Buffer` to prevent +consumers from reading uncommitted messages, which are invisible until the transaction is committed. Transaction Buffer +works with Position (maxReadPosition) and `TxnID` Set (aborts). The broker only dispatches messages, before the +maxReadPosition, to the consumers. When the broker dispatches the messages before maxReadPosition to the consumer, the +messages sent by aborted transactions will get filtered by the Transaction Buffer. # Motivation -Operators need a low-cardinality, broker-native signal that shows whether authorization checks are succeeding or -failing. This is needed for security alerting, baseline monitoring, and compliance reporting. - -Without a dedicated authorization metric, operators have to infer authorization denials from logs, HTTP status codes, -or client-side errors. That is brittle and does not support standard monitoring patterns such as: - -- Alerting on spikes in authorization failures. -- Comparing authorization failures against successful authorizations. -- Building dashboards that differentiate between authentication problems and authorization problems. -- Exporting equivalent signals through both Prometheus and OpenTelemetry. - -The lack of a generic metric also encourages overly narrow designs such as a failure-only counter. That limits -observability because operators often need both success and failure counts to understand whether a denial spike reflects -an attack, a rollout problem, or a normal traffic shift. - -# Goals +Currently, Pulsar transactions do not have configurable isolation levels. By introducing isolation level configuration +for consumers, we can enhance the flexibility of Pulsar transactions. + +Let's consider an example: + +**System**: Financial Transaction System + +**Operations**: Large volume of deposit and withdrawal operations, a +small number of transfer operations. + +**Roles**: + +- **Client A1** +- **Client A2** +- **User Account B1** +- **User Account B2** +- **Request Topic C** +- **Real-time Monitoring System D** +- **Business Processing System E** + +**Client Operations**: + +- **Withdrawal**: Client A1 decreases the deposit amount from User + Account B1 or B2. +- **Deposit**: Client A1 increases the deposit amount in User Account B1 or B2. +- **Transfer**: Client A2 decreases the deposit amount from User + Account B1 and increases it in User Account B2. Or vice versa. + +**Real-time Monitoring System D**: Obtains the latest data from +Request Topic C as quickly as possible to monitor transaction data and +changes in bank reserves in real-time. This is necessary for the +timely detection of anomalies and real-time decision-making. + +**Business Processing System E**: Reads data from Request Topic C, +then actually operates User Accounts B1, B2. + +**User Scenario**: Client A1 sends a large number of deposit and +withdrawal requests to Request Topic C. Client A2 writes a small +number of transfer requests to Request Topic C. + +In this case, Business Processing System E needs a read-committed +isolation level to ensure operation consistency and Exactly Once +semantics. The real-time monitoring system does not care if a small +number of transfer requests are incomplete (dirty data). What it +cannot tolerate is a situation where a large number of deposit and +withdrawal requests cannot be presented in real time due to a small +number of transfer requests (the current situation is that uncommitted +transaction messages can block the reading of committed transaction +messages). + +In this case, it is necessary to set different isolation levels for +different consumers/subscriptions. +The uncommitted transactions do not impact actual users' bank accounts. +Business Processing System E only reads committed transactional +messages and operates users' accounts. It needs Exactly-once semantic. +Real-time Monitoring System D reads uncommitted transactional +messages. It does not need Exactly-once semantic. + +They use different subscriptions and choose different isolation +levels. One needs transaction, one does not. +In general, multiple subscriptions of the same topic do not all +require transaction guarantees. +Some want low latency without the exact-once semantic guarantee, and +some must require the exactly-once guarantee. +We just provide a new option for different subscriptions. + +# Goal ## In Scope -- Add a low-cardinality broker authorization metric for operation outcomes. -- Record both successful and failed authorization decisions. -- Expose the metric through Prometheus-compatible broker metrics. -- Expose the same metric through OpenTelemetry. -- Centralize instrumentation in `AuthorizationService` so all broker authorization paths share the same metric model. +Implement Read Committed and Read Uncommitted isolation levels for Pulsar transactions. Allow consumers to configure +isolation levels during the building process. ## Out of Scope -- Per-role, per-topic, per-tenant, or per-principal labels. -- Audit-log payloads or structured security event streams. -- New authorization APIs or protocol changes. -- Alert rule definitions for downstream monitoring stacks. - +None. # High Level Design -Introduce a generic authorization operation counter that is incremented whenever the broker finishes an authorization -decision. - -The metric is recorded centrally in `AuthorizationService`, which already serves as the broker-side entry point for -authorization checks across topic, namespace, tenant, broker, cluster, and policy operations. Each authorization check -will emit one result with a small, fixed label set: - -- what kind of resource was checked -- what operation category was requested -- whether the result was a success or failure - -This metric will be exported in two equivalent forms: - -- a Prometheus counter for the existing broker metrics endpoint -- an OpenTelemetry counter for modern metrics pipelines - -Invalid original-principal combinations in proxied authorization flows will also be counted as authorization failures, -because they represent rejected authorization attempts from the broker’s perspective. +Add a configuration 'subscriptionIsolationLevel' in the consumer builder to allow users to choose different transaction +isolation levels. # Detailed Design -## Design & Implementation Details - -This proposal introduces a broker authorization metrics helper that owns: - -- a Prometheus counter for broker metrics scraping -- an OpenTelemetry `LongCounter` for broker metrics export - -The helper is instantiated by `AuthorizationService`. `AuthorizationService` records results after each completed -authorization decision. If the provider returns `true`, the helper records a success. If the provider returns `false`, -the helper records a failure. If `AuthorizationService` rejects a request before provider evaluation, such as an -invalid original-principal combination for proxied requests, the helper records a failure directly. - -The instrumentation is attached to the following authorization flows: - -- superuser checks -- tenant-admin checks -- tenant operations -- broker operations -- cluster operations -- cluster policy operations -- namespace operations -- namespace policy operations -- topic operations -- topic policy operations - -This proposal intentionally keeps the label space small. It does not include role names, topic names, tenant names, -client addresses, provider names, or error strings. - ## Public-facing Changes -### Public API - -No public API changes. - -### Binary protocol - -No binary protocol changes. +Update the PulsarConsumer builder process to include isolation level configurations for Read Committed and Read +Uncommitted. -### Configuration +### Before the Change -No new configuration is required. +The PulsarConsumer builder process currently does not include isolation level configurations. The consumer creation +process might look like this: -### CLI +``` +PulsarClient client = PulsarClient.builder().serviceUrl("pulsar://localhost:6650").build(); -No CLI changes. +Consumer<String> consumer = client.newConsumer(Schema.STRING) + .topic("persistent://my-tenant/my-namespace/my-topic") + .subscriptionName("my-subscription") + .subscriptionType(SubscriptionType.Shared) + .subscribe(); +``` -### Metrics +### After the Change -Prometheus metric: +Update the PulsarConsumer builder process to include isolation level configurations for Read Committed and Read +Uncommitted. Introduce a new method subscriptionIsolationLevel() in the consumer builder, which accepts an enumeration +value representing the isolation level: -- Full name: `pulsar_authorization_operations_total` -- Description: Total number of broker authorization operations. -- Attributes: - - `resource_type` - - `operation` - - `result` -- Unit: operations +``` +public enum SubscriptionIsolationLevel { + // Consumer can only consume all transactional messages which have been committed. + READ_COMMITTED, -OpenTelemetry metric: + // Consumer can consume all messages, even transactional messages which have been aborted. + READ_UNCOMMITTED; +} +``` -- Full name: `pulsar.authorization.operation.count` -- Description: Total number of broker authorization operations. -- Attributes: - - `pulsar.authorization.type` - - `pulsar.authorization.operation` - - `pulsar.authorization.result` -- Unit: `{operation}` +Then, modify the consumer creation process to include the new isolation level configuration: -Attribute values: +``` +PulsarClient client = PulsarClient.builder().serviceUrl("pulsar://localhost:6650").build(); -- `result`: `success` or `failure` -- `resource_type`: fixed categories such as `topic`, `namespace`, `tenant`, `broker`, `cluster`, `superuser`, - `tenant_admin`, `topic_policy`, `namespace_policy`, and `cluster_policy` -- `operation`: the normalized operation name for the authorization check +Consumer<String> consumer = client.newConsumer(Schema.STRING) + .topic("persistent://my-tenant/my-namespace/my-topic") + .subscriptionName("my-subscription") + .subscriptionType(SubscriptionType.Shared) + .subscriptionIsolationLevel(SubscriptionIsolationLevel.READ_COMMITTED) // Adding the isolation level configuration + .subscribe(); +``` +With this change, users can now choose between Read Committed and Read Uncommitted isolation levels when creating a new +consumer. If the isolationLevel() method is not called during the builder process, the default isolation level will be +Read Committed. +Note that this is a subscription dimension configuration, and all consumers under the same subscription need to be +configured with the same IsolationLevel. -# Monitoring - -Operators should monitor both absolute authorization failures and the relationship between failures and successes. -Recommended patterns include: - -- Alert on sustained increases in `result="failure"`. -- Build dashboards that show `success` and `failure` together by `resource_type`. -- Investigate rollout regressions by comparing the failure rate before and after authorization policy changes. -- Distinguish authentication incidents from authorization incidents by correlating authorization failures with existing - authentication metrics. - -This proposal intentionally enables ratio-based alerting, such as failure/success comparisons, by including both result -types in the same metric family. - -# Security Considerations +## Design & Implementation Details -This proposal improves security observability but does not change authorization semantics. +### Client Changes -Because authorization decisions can be high volume and can involve sensitive identifiers, the metric must avoid -high-cardinality or identity-bearing labels. This proposal therefore excludes role names, topic names, namespaces, -tenants, and client network information from metric attributes. That preserves operational usefulness without turning -the metric into a data-leak or cardinality risk. +Update the PulsarConsumer builder to accept isolation level configurations for Read Committed and Read Uncommitted levels. -Failed proxy original-principal validation is counted as an authorization failure because the broker rejects the -request during authorization handling. +In order to achieve the above goals, the following modifications need to be made: -# Backward & Forward Compatibility +- Added `IsolationLevel` related fields and methods in `ConsumerConfigurationData` and `ConsumerBuilderImpl` and `ConsumerImpl` -## Upgrade +- Modify PulsarApi.CommandSubscribe, add field -- IsolationLevel -No special upgrade action is required. The new metrics appear automatically after upgrading brokers that include this -feature. +``` +message CommandSubscribe { -## Downgrade / Rollback + enum IsolationLevel { + READ_COMMITTED = 0; + READ_UNCOMMITTED = 1; + } + optional IsolationLevel isolation_level = 20 [default = READ_COMMITTED]; +} +``` -Downgrading removes the metrics. Monitoring systems should tolerate missing-series behavior during rollback. +### Broker changes -## Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations +Modify the transaction buffer and dispatching mechanisms to handle messages based on the chosen isolation level. -No geo-replication protocol or metadata compatibility changes are introduced. +In order to achieve the above goals, the following modifications need to be made: -# Alternatives +- Determine in the `readMoreEntries` method of Dispatchers such as `PersistentDispatcherSingleActiveConsumer` + and `PersistentDispatcherMultipleConsumers`: -- Failure-only counter: - Rejected because operators often need both success and failure counts to interpret changes correctly and to build - ratio-based alerts. + - If Subscription.isolationLevel == ReadCommitted, then MaxReadPosition = topic.getMaxReadPosition(), that is, + transactionBuffer.getMaxReadPosition() -- Add detailed identity labels such as role or topic: - Rejected due to cardinality and privacy concerns. + - If Subscription.isolationLevel == ReadUnCommitted, then MaxReadPosition = PositionImpl.LATEST -- Instrument each authorization call site independently: - Rejected because it would be error-prone and would likely produce inconsistent semantics across broker paths. +- Add a new metrics `subscriptionIsolationLevel` in `SubscriptionStatsImpl`. -# General Notes +# Monitoring -This proposal is intentionally limited to broker metrics. It does not attempt to replace audit logging or structured -security events. +After this PIP, Users can query the subscription stats of a topic through the admin tool, and observe the `subscriptionIsolationLevel` in the subscription stats to determine the isolation level of the subscription. # Links -* Mailing List discussion thread: -* Mailing List voting thread: +* Mailing List discussion thread: https://lists.apache.org/thread/8ny0qtp7m9qcdbvnfjdvpnkc4c5ssyld +* Mailing List voting thread: https://lists.apache.org/thread/4q1hrv466h8w9ccpf4moxt6jv1jxp1mr +* Document link: https://github.com/apache/pulsar-site/pull/712 diff --git a/pip/pip-298.md b/pip/pip-466.md similarity index 99% copy from pip/pip-298.md copy to pip/pip-466.md index 7b4123f178b..6e74a52bdef 100644 --- a/pip/pip-298.md +++ b/pip/pip-466.md @@ -1,4 +1,4 @@ -# PIP-298: Authorization Operation Metrics +# PIP-466: Authorization Operation Metrics # Background knowledge
