Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

2024-01-13 Thread via GitHub
stefanvodita commented on PR #12966: URL: https://github.com/apache/lucene/pull/12966#issuecomment-1890420916 I found a fun HeisenBug in one of the tests. When we iterate cursors from `IntFloatHashMap`, the order is not deterministic. Float summation is not commutative, so the result we get

Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

2024-01-13 Thread via GitHub
stefanvodita commented on PR #12966: URL: https://github.com/apache/lucene/pull/12966#issuecomment-1890420978 I've also run the benchmarks (`python3 src/python/localrun.py -source wikimediumall`). There is measurable regression in the `BrowseRandomLabelTaxoFacets` task, but not in other tax

Re: [I] Use Kahan summation for float aggregations to reduce errors [lucene]

2024-01-13 Thread via GitHub
mikemccand commented on issue #13011: URL: https://github.com/apache/lucene/issues/13011#issuecomment-1890454889 Neat -- I had never heard of Kahan summation. Here is its [Wikipedia page](https://en.wikipedia.org/wiki/Kahan_summation_algorithm). -- This is an automated message from the A

Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

2024-01-13 Thread via GitHub
mikemccand commented on PR #12966: URL: https://github.com/apache/lucene/pull/12966#issuecomment-1890455264 > I found a fun HeisenBug in one of the tests. Oh the joys of floating point math. > For those who want to follow along, here are the exact numbers we are adding in the t

Re: [PR] Initial impl of MMapDirectory for Java 22 [lucene]

2024-01-13 Thread via GitHub
uschindler commented on PR #12706: URL: https://github.com/apache/lucene/pull/12706#issuecomment-1890455325 I also committed supprot for incubator SIMD vectorization in Java 22. According to Java's change logs there were no changes to API at all, so code runs as is. @ChrisHegarty jus

Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

2024-01-13 Thread via GitHub
mikemccand commented on PR #12966: URL: https://github.com/apache/lucene/pull/12966#issuecomment-1890455536 > The regression in the taxo task is explained in the profiler. Boxing is not cheap: > `11.24% 10402M java.lang.Integer#valueOf()` Hmm this is sort of spooky -- should we aim

Re: [PR] Split taxonomy arrays across chunks [lucene]

2024-01-13 Thread via GitHub
stefanvodita commented on code in PR #12995: URL: https://github.com/apache/lucene/pull/12995#discussion_r1451664416 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java: ## @@ -68,25 +90,49 @@ public TaxonomyIndexArrays(IndexReader reader

Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

2024-01-13 Thread via GitHub
stefanvodita commented on PR #12966: URL: https://github.com/apache/lucene/pull/12966#issuecomment-1890862937 What I've done is I've only taken advantage of the boxing for genericity when collecting results `getTop...` and not use it while performing the aggregations themselves. Most of the