moomindani opened a new pull request, #16574:
URL: https://github.com/apache/iceberg/pull/16574

   Closes #16573.
   
   Adds an optional filtering layer above any `MetricsReporter` implementation 
that drops `ScanReport` and `CommitReport` instances whose `tableName()` does 
not pass the configured include / exclude regex. The filter applies uniformly 
to `LoggingMetricsReporter`, `RESTMetricsReporter`, and custom user-supplied 
reporters. The proposal surfaced in the dev@ DISCUSS thread for #16250 
(per-table cardinality of the OTel reporter) and is intentionally scoped as 
cross-reporter, not OTel-specific.
   
   ## Design
   
   `CatalogUtil.loadMetricsReporter` wraps the resolved reporter in a 
`FilteringMetricsReporter` when either of the new properties is set. When 
neither is set, the resolved reporter is returned unchanged — no wrapper 
instantiated, no runtime overhead on the default path. `MetricsReport` subtypes 
that do not expose a table name (anything other than `ScanReport` / 
`CommitReport`) are forwarded without filtering.
   
   ## Configuration
   
   Two new catalog properties:
   
   ```
   metrics-reporter.table-name.include=prod_db\..*
   metrics-reporter.table-name.exclude=.*\.tmp_.*
   ```
   
   Values are Java regex patterns matched against the table name. When both are 
set, `exclude` wins over `include` (an explicit deny overrides an include). 
Empty values are treated as not set to avoid accidentally silencing all metrics 
on misconfiguration. Invalid regex values fail fast at catalog initialization 
with a clear error pointing at the offending property.
   
   Behavior:
   
   - `include` only: forward reports whose table name matches; drop others.
   - `exclude` only: drop reports whose table name matches; forward others.
   - Both set: drop if `exclude` matches; otherwise forward only if `include` 
matches.
   - Neither set: forward everything (current behavior).
   
   This mirrors the existing `route-regex` pattern used in 
`iceberg-kafka-connect` (`IcebergSinkConfig`), where a user-supplied regex from 
configuration is compiled via `Pattern.compile()` and matched against incoming 
data. Same trust model: catalog property = admin-controlled.
   
   ## Disclosure
   
   Per the project's [AI-assisted contribution 
guidelines](https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions),
 I used Claude Code to help draft this work. I reviewed every change by hand 
and ran the full test/lint loop locally before opening this PR. The design and 
motivation discussion is in #16573.
   
   cc @ebyhr @jbonofre — happy to address any feedback.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to