lhotari opened a new pull request, #25965: URL: https://github.com/apache/pulsar/pull/25965
### Motivation Pulsar still depends on `com.yahoo.datasketches:sketches-core:0.8.3`. The DataSketches project moved to the Apache Software Foundation and was renamed to `org.apache.datasketches:datasketches-java`; the old Yahoo coordinates have been unmaintained for years. This mirrors the motivation of apache/bookkeeper#4774. Pulsar `master` is still on BookKeeper 4.17.3 but will soon migrate to 4.18.0, which uses Apache DataSketches — and there is an intent to drop Yahoo DataSketches on the Pulsar side as well. Migrating now removes the unmaintained dependency and aligns Pulsar with BookKeeper. The Apache DataSketches maintainers recommend the **KLL sketch** over the classic quantiles sketch for new code ([datasketches-java#398](https://github.com/apache/datasketches-java/issues/398#issuecomment-1151490051)): it is faster and more memory efficient with comparable accuracy. This change therefore ports to `KllDoublesSketch` rather than a 1:1 replacement with the classic `DoublesSketch`. ### Modifications - **Version catalog** (`gradle/libs.versions.toml`): removed `com.yahoo.datasketches:sketches-core:0.8.3`; added `org.apache.datasketches:datasketches-java:7.0.1` and `datasketches-memory:4.1.0`. Build files now reference `libs.datasketches.java`. - **Sketch usages** ported to `org.apache.datasketches.kll.KllDoublesSketch` in `DataSketchesOpStatsLogger` (broker + bookkeeper-prometheus-metrics-provider), `DataSketchesSummaryLogger`, `ThreadLocalAccessor`, and the client `ProducerStatsRecorderImpl`: - On-heap throughout (`newHeapInstance()`), matching prior behavior. - KLL has no `DoublesUnion`; aggregation uses `merge()` into a freshly allocated heap sketch. - KLL has no `reset()`; per-thread sketches are reassigned to a fresh `newHeapInstance()` under the existing `StampedLock` write lock. - `KllDoublesSketch.getQuantile()/getQuantiles()` throw on an empty sketch (the classic sketch returned `NaN`), so callers now guard with `isEmpty()` to preserve the previous `NaN` return. - **Shading**: updated relocation/include rules in `pulsar.client-shade-conventions.gradle.kts` and `pulsar-functions/localrun-shaded` from `com.yahoo*` to `org.apache.datasketches`. - **LICENSE** (`distribution/server`, `distribution/shell`): updated the bundled jar entries to the Apache coordinates/versions. - **Tests**: `ThreadLocalAccessorTest` ported to `KllDoublesSketch`. Note: metric names and formats are unchanged; only the internal sketch algorithm changes, so approximate quantile values may differ slightly. ### Verifying this change This change is already covered by existing tests (e.g. `ThreadLocalAccessorTest` and the broker stats/metrics tests). Checkstyle and spotless pass locally. ### Does this pull request potentially affect one of the following parts: - [x] Dependencies (add or upgrade a dependency) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
