HoustonPutman commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2294155051
Thanks for the correction, sorry for the noise!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
jpountz commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2294115438
@HoustonPutman This is the same issue as reported above: the logic for
lazily decoding blocks of freqs was broken and would decompress whole blocks of
freqs on every doc ID. It is now fi
HoustonPutman commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2293832375
It looks like grouping queries are really affected by this change. The
throughput of each of them were halved:
[100
groups](https://home.apache.org/~mikemccand/lucenebench/TermG
mikemccand commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2263239458
Phew, thanks for catching the performance regression and tracking it down
@jpountz. GO BENCHMARKING!
--
This is an automated message from the Apache Git Service.
To respond to the
jpountz commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2263216720
I found the problem with `CombinedHighHigh`, the logic for lazily decoding
frequencies was broken and we'd decode the whole block of frequencies on every
freq() calls. It's now fixed so
jpountz commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2262842204
Hmm,
[`CombinedHighHigh`](https://home.apache.org/~mikemccand/lucenebench/CombinedHighHigh.html)
is angry. I had not benchmarked it while developping, I'll check it out.
Some spee
mikemccand commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2262812551
Nice pop in the nightly benchmarks from this!
[`OrHighMedium`](https://home.apache.org/~mikemccand/lucenebench/OrHighMed.html)
jumped. Even
[`Phrase`](https://home.apache.org/~mike
jpountz commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2262631387
Things got a bit better later on
(https://github.com/apache/lucene/pull/13585#issuecomment-2246112137), but your
reading is correct that some queries get slower. This seems to especially
dsmiley commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2261732510
I'm looking at the performance results in the description. The bottom end
(performance improvements) look really nice, but the top end (performance
regressions) look even worse. Am I r
jpountz merged PR #13585:
URL: https://github.com/apache/lucene/pull/13585
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
mikemccand commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2258788196
> Note that I removed `TestLazyProxSkipping`, which assumed separate skip
data and postings.
YAY!
> I plan on merging soon, as this PR is now in a state where conflicts
jpountz commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2258120692
Thanks @mikemccand for taking a look at this large PR! I think I applied all
your suggestions. The format docs should be up-to-date wrt how skip data is
stored, and I did the codec dance
jpountz commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1696779655
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##
@@ -0,0 +1,2028 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
jpountz commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1696780288
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##
@@ -0,0 +1,2028 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
jpountz commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1696777219
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##
@@ -0,0 +1,2028 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
jpountz commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1696776133
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##
@@ -0,0 +1,2028 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
jpountz commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1696774526
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##
@@ -0,0 +1,2028 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1691325098
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsWriter.java:
##
@@ -0,0 +1,597 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
jpountz commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1689561156
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##
@@ -0,0 +1,1998 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1689557959
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##
@@ -0,0 +1,1998 @@
+/*
+ * Licensed to the Apache Software Foundation (A
jpountz commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1689502270
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##
@@ -0,0 +1,1998 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1689493102
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java:
##
@@ -0,0 +1,1998 @@
+/*
+ * Licensed to the Apache Software Foundation (A
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1688761798
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsWriter.java:
##
@@ -0,0 +1,597 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1688752555
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java:
##
@@ -0,0 +1,1148 @@
+// This file has been automatically generated, DO NOT EDIT
+
+/*
+ *
mikemccand commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246352717
Do you have any measure of how many bytes in a big posting is spent on skip
data vs doc/freq blocks?
The gains on the last benchy look awesome! It's surprising
`CountOrHighHig
mikemccand commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246348961
> Also I noticed we would sometimes decode the same block of positions
multiple times when it's shared by two doc blocks (because when moving to the
next doc block we reset the positi
jpountz commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246112137
Skip data at level 0 now stores pointers into pos/pay files instead of
incrementing posPendingCount by the total term freq of the block. This seems to
slow down term queries marginally a
jpountz commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1688593960
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java:
##
@@ -0,0 +1,1148 @@
+// This file has been automatically generated, DO NOT EDIT
+
+/*
+ * Li
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1688514148
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java:
##
@@ -0,0 +1,1148 @@
+// This file has been automatically generated, DO NOT EDIT
+
+/*
+ *
jpountz commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1685274565
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsWriter.java:
##
@@ -0,0 +1,563 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
jpountz commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2240948449
> I couldn't see where we also wrote the file pointer into pos/pay files
Indeed level 0 doesn't write pointers into pos/pay, it only records the
total term freq of the block to kno
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1684540391
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsWriter.java:
##
@@ -0,0 +1,563 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
mikemccand commented on code in PR #13585:
URL: https://github.com/apache/lucene/pull/13585#discussion_r1684500686
##
lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsWriter.java:
##
@@ -0,0 +1,563 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
mikemccand commented on PR #13585:
URL: https://github.com/apache/lucene/pull/13585#issuecomment-2239177486
SO EXCITING, thank you for tackling this @jpountz -- I'll have a closer look
soon.
#4036 is the long-ago issue where we started some discussion and linked to a
cool paper about
jpountz opened a new pull request, #13585:
URL: https://github.com/apache/lucene/pull/13585
This updates the postings format in order to inline skip data into postings.
This format is generally similar to the current `Lucene99PostingsFormat`, e.g.
it shares the same block encoding logic, bu
35 matches
Mail list logo