lhotari opened a new pull request, #25965:
URL: https://github.com/apache/pulsar/pull/25965

   ### Motivation
   
   Pulsar still depends on `com.yahoo.datasketches:sketches-core:0.8.3`. The 
DataSketches project moved to the Apache Software Foundation and was renamed to 
`org.apache.datasketches:datasketches-java`; the old Yahoo coordinates have 
been unmaintained for years.
   
   This mirrors the motivation of apache/bookkeeper#4774. Pulsar `master` is 
still on BookKeeper 4.17.3 but will soon migrate to 4.18.0, which uses Apache 
DataSketches — and there is an intent to drop Yahoo DataSketches on the Pulsar 
side as well. Migrating now removes the unmaintained dependency and aligns 
Pulsar with BookKeeper.
   
   The Apache DataSketches maintainers recommend the **KLL sketch** over the 
classic quantiles sketch for new code 
([datasketches-java#398](https://github.com/apache/datasketches-java/issues/398#issuecomment-1151490051)):
 it is faster and more memory efficient with comparable accuracy. This change 
therefore ports to `KllDoublesSketch` rather than a 1:1 replacement with the 
classic `DoublesSketch`.
   
   ### Modifications
   
   - **Version catalog** (`gradle/libs.versions.toml`): removed 
`com.yahoo.datasketches:sketches-core:0.8.3`; added 
`org.apache.datasketches:datasketches-java:7.0.1` and 
`datasketches-memory:4.1.0`. Build files now reference `libs.datasketches.java`.
   - **Sketch usages** ported to `org.apache.datasketches.kll.KllDoublesSketch` 
in `DataSketchesOpStatsLogger` (broker + 
bookkeeper-prometheus-metrics-provider), `DataSketchesSummaryLogger`, 
`ThreadLocalAccessor`, and the client `ProducerStatsRecorderImpl`:
     - On-heap throughout (`newHeapInstance()`), matching prior behavior.
     - KLL has no `DoublesUnion`; aggregation uses `merge()` into a freshly 
allocated heap sketch.
     - KLL has no `reset()`; per-thread sketches are reassigned to a fresh 
`newHeapInstance()` under the existing `StampedLock` write lock.
     - `KllDoublesSketch.getQuantile()/getQuantiles()` throw on an empty sketch 
(the classic sketch returned `NaN`), so callers now guard with `isEmpty()` to 
preserve the previous `NaN` return.
   - **Shading**: updated relocation/include rules in 
`pulsar.client-shade-conventions.gradle.kts` and 
`pulsar-functions/localrun-shaded` from `com.yahoo*` to 
`org.apache.datasketches`.
   - **LICENSE** (`distribution/server`, `distribution/shell`): updated the 
bundled jar entries to the Apache coordinates/versions.
   - **Tests**: `ThreadLocalAccessorTest` ported to `KllDoublesSketch`.
   
   Note: metric names and formats are unchanged; only the internal sketch 
algorithm changes, so approximate quantile values may differ slightly.
   
   ### Verifying this change
   
   This change is already covered by existing tests (e.g. 
`ThreadLocalAccessorTest` and the broker stats/metrics tests). Checkstyle and 
spotless pass locally.
   
   ### Does this pull request potentially affect one of the following parts:
   
   - [x] Dependencies (add or upgrade a dependency)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to