moomindani opened a new issue, #16573:
URL: https://github.com/apache/iceberg/issues/16573

   ### Feature Request / Improvement
   
   Add a built-in mechanism that lets users restrict which tables' 
`ScanReport`s and `CommitReport`s are forwarded to a configured 
`MetricsReporter`, applied uniformly across all reporter implementations 
(`LoggingMetricsReporter`, `RESTMetricsReporter`, `OtelMetricsReporter`, and 
custom user-supplied ones).
   
   #### Motivation
   
   In deployments with many tables, users frequently want to emit metrics for 
only a subset:
   
   - Only tables in production databases (e.g., `prod.*`), not staging or 
sandbox
   - Only specific business-critical tables, excluding intermediate ones
   - Exclude noisy test or scratch tables (`tmp.*`, `*.bench_*`)
   
   Existing per-reporter knobs only partially address this. The 
`iceberg.otel.metrics.attributes` allowlist added in #16250 controls *which 
attributes* an OTel metric carries — useful for cardinality but does not stop 
metrics from being emitted for tables the user doesn't care about. 
Cardinality-control mechanisms in time-series backends (OTel Views, Prometheus 
relabel rules, etc.) are reporter-specific and require host-side knowledge.
   
   Table-level filtering is a cross-cutting concern that belongs above any 
single reporter. Putting it inside each reporter implementation would lead to 
repeated, slightly inconsistent flag sets per reporter. Putting it once in the 
framework layer means every existing and future `MetricsReporter` benefits 
without re-implementation.
   
   #### Proposal
   
   Introduce two catalog properties recognized by the catalog when constructing 
the reporter pipeline:
   
   ```
   metrics-reporter-impl=org.apache.iceberg.metrics.OtelMetricsReporter
   metrics-reporter.table-name.include=prod\..*
   metrics-reporter.table-name.exclude=.*\.tmp_.*
   ```
   
   Values are Java regex patterns matched against `ScanReport.tableName()` / 
`CommitReport.tableName()`. The catalog wraps the user's reporter in a 
filtering layer when either property is present. When both are present, 
`exclude` wins over `include` (an explicit deny overrides an include). When 
neither is set, behavior is identical to today (pass-through, with no runtime 
overhead).
   
   #### Behavior
   
   - `include` only set: forward reports whose table name matches; drop others.
   - `exclude` only set: drop reports whose table name matches; forward others.
   - Both set: drop if `exclude` matches; otherwise forward only if `include` 
matches.
   - Neither set: forward everything (current behavior).
   - Empty value (`metrics-reporter.table-name.include=`) is treated as "not 
set" rather than "match nothing" to avoid accidentally silencing all metrics on 
misconfiguration.
   
   #### Relationship to existing work
   
   - #16169, #16250 — surfaced this concern during discussion of per-table 
cardinality of the OTel reporter. This proposal complements 
`iceberg.otel.metrics.attributes` (attribute pruning) by giving users a way to 
also drop entire reports for uninteresting tables.
   - dev@ DISCUSS for #16250: 
https://lists.apache.org/thread/vn4gglocg2g40p69mfrrh86qzkn1rr4b
   
   ### Query engine
   
   None — applies to all engines that consume `MetricsReporter`.
   
   ### Willingness to contribute
   
   - [X] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to