Re: [PR] Inline skip data into postings lists [lucene]

2024-08-16 Thread via GitHub
HoustonPutman commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2294155051 Thanks for the correction, sorry for the noise! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Inline skip data into postings lists [lucene]

2024-08-16 Thread via GitHub
jpountz commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2294115438 @HoustonPutman This is the same issue as reported above: the logic for lazily decoding blocks of freqs was broken and would decompress whole blocks of freqs on every doc ID. It is now fi

Re: [PR] Inline skip data into postings lists [lucene]

2024-08-16 Thread via GitHub
HoustonPutman commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2293832375 It looks like grouping queries are really affected by this change. The throughput of each of them were halved: [100 groups](https://home.apache.org/~mikemccand/lucenebench/TermG

Re: [PR] Inline skip data into postings lists [lucene]

2024-08-01 Thread via GitHub
mikemccand commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2263239458 Phew, thanks for catching the performance regression and tracking it down @jpountz. GO BENCHMARKING! -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Inline skip data into postings lists [lucene]

2024-08-01 Thread via GitHub
jpountz commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2263216720 I found the problem with `CombinedHighHigh`, the logic for lazily decoding frequencies was broken and we'd decode the whole block of frequencies on every freq() calls. It's now fixed so

Re: [PR] Inline skip data into postings lists [lucene]

2024-08-01 Thread via GitHub
jpountz commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2262842204 Hmm, [`CombinedHighHigh`](https://home.apache.org/~mikemccand/lucenebench/CombinedHighHigh.html) is angry. I had not benchmarked it while developping, I'll check it out. Some spee

Re: [PR] Inline skip data into postings lists [lucene]

2024-08-01 Thread via GitHub
mikemccand commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2262812551 Nice pop in the nightly benchmarks from this! [`OrHighMedium`](https://home.apache.org/~mikemccand/lucenebench/OrHighMed.html) jumped. Even [`Phrase`](https://home.apache.org/~mike

Re: [PR] Inline skip data into postings lists [lucene]

2024-08-01 Thread via GitHub
jpountz commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2262631387 Things got a bit better later on (https://github.com/apache/lucene/pull/13585#issuecomment-2246112137), but your reading is correct that some queries get slower. This seems to especially

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-31 Thread via GitHub
dsmiley commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2261732510 I'm looking at the performance results in the description. The bottom end (performance improvements) look really nice, but the top end (performance regressions) look even worse. Am I r

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-31 Thread via GitHub
jpountz merged PR #13585: URL: https://github.com/apache/lucene/pull/13585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-30 Thread via GitHub
mikemccand commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2258788196 > Note that I removed `TestLazyProxSkipping`, which assumed separate skip data and postings. YAY! > I plan on merging soon, as this PR is now in a state where conflicts

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-30 Thread via GitHub
jpountz commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2258120692 Thanks @mikemccand for taking a look at this large PR! I think I applied all your suggestions. The format docs should be up-to-date wrt how skip data is stored, and I did the codec dance

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-30 Thread via GitHub
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1696779655 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,2028 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-30 Thread via GitHub
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1696780288 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,2028 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-30 Thread via GitHub
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1696777219 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,2028 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-30 Thread via GitHub
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1696776133 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,2028 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-30 Thread via GitHub
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1696774526 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,2028 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-25 Thread via GitHub
mikemccand commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1691325098 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsWriter.java: ## @@ -0,0 +1,597 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-24 Thread via GitHub
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1689561156 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,1998 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-24 Thread via GitHub
mikemccand commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1689557959 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,1998 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-24 Thread via GitHub
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1689502270 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,1998 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-24 Thread via GitHub
mikemccand commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1689493102 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,1998 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-23 Thread via GitHub
mikemccand commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1688761798 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsWriter.java: ## @@ -0,0 +1,597 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-23 Thread via GitHub
mikemccand commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1688752555 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java: ## @@ -0,0 +1,1148 @@ +// This file has been automatically generated, DO NOT EDIT + +/* + *

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-23 Thread via GitHub
mikemccand commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246352717 Do you have any measure of how many bytes in a big posting is spent on skip data vs doc/freq blocks? The gains on the last benchy look awesome! It's surprising `CountOrHighHig

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-23 Thread via GitHub
mikemccand commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246348961 > Also I noticed we would sometimes decode the same block of positions multiple times when it's shared by two doc blocks (because when moving to the next doc block we reset the positi

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-23 Thread via GitHub
jpountz commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246112137 Skip data at level 0 now stores pointers into pos/pay files instead of incrementing posPendingCount by the total term freq of the block. This seems to slow down term queries marginally a

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-23 Thread via GitHub
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1688593960 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java: ## @@ -0,0 +1,1148 @@ +// This file has been automatically generated, DO NOT EDIT + +/* + * Li

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-23 Thread via GitHub
mikemccand commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1688514148 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java: ## @@ -0,0 +1,1148 @@ +// This file has been automatically generated, DO NOT EDIT + +/* + *

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-19 Thread via GitHub
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1685274565 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsWriter.java: ## @@ -0,0 +1,563 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-19 Thread via GitHub
jpountz commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2240948449 > I couldn't see where we also wrote the file pointer into pos/pay files Indeed level 0 doesn't write pointers into pos/pay, it only records the total term freq of the block to kno

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-19 Thread via GitHub
mikemccand commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1684540391 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsWriter.java: ## @@ -0,0 +1,563 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-19 Thread via GitHub
mikemccand commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1684500686 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsWriter.java: ## @@ -0,0 +1,563 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Inline skip data into postings lists [lucene]

2024-07-19 Thread via GitHub
mikemccand commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2239177486 SO EXCITING, thank you for tackling this @jpountz -- I'll have a closer look soon. #4036 is the long-ago issue where we started some discussion and linked to a cool paper about

[PR] Inline skip data into postings lists [lucene]

2024-07-18 Thread via GitHub
jpountz opened a new pull request, #13585: URL: https://github.com/apache/lucene/pull/13585 This updates the postings format in order to inline skip data into postings. This format is generally similar to the current `Lucene99PostingsFormat`, e.g. it shares the same block encoding logic, bu