Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-11-06 Thread via GitHub
mikemccand closed issue #12696: Adding option to codec to disable patching in Lucene's PFOR encoding URL: https://github.com/apache/lucene/issues/12696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-11-02 Thread via GitHub
slow-J commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1790638953 > Another exciting optimization such a "patch-less" encoding could implement is within-block skipping (I believe Tantivy does this). > > Today, our skipper is forced to align t

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-31 Thread via GitHub
slow-J commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1787521225 Thanks all for the feedback. Will proceed with removing patching only for doc blocks (reverting some of https://github.com/apache/lucene/pull/69) All the changes needed to crea

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-31 Thread via GitHub
jpountz commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1786969801 > Normally the IntNRQ (1D points numeric range query) is very noisy, but maybe this gain is real? p-value seems to think it could be close to real? I'm not sure how it could n

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-31 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1786949852 Thanks for testing @jpountz. I think at some point we also enabled patching for the freq blocks inside `.doc` file too? Normally the `IntNRQ` (1D points numeric rang

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-27 Thread via GitHub
jpountz commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1782814872 FWIW I could reproduce the speedup from disabling patching locally on wikibigall: ``` TaskQPS baseline StdDevQPS my_modified_version

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-25 Thread via GitHub
jpountz commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1779221543 For reference, Lucene used to use FOR for postings and PFOR for positions in 8.x. This was changed in 9.0 via #69 to use PFOR for both postings and positions. This PR says it made t

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
Tony-X commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775993940 > would the goal here be to eliminate overhead of having to read the number of patches when decoding each block? Yes. This means we could know upfront at segment opening time w

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
gsmiller commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775871779 > Maybe write something in the index header to indicate if patching is there (default to yes - in 9.x ). Then new indexes will write additional header to indicate there is not patc

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
Tony-X commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775807115 > In 11.0, remove all patching logic which will, a) simplify the code a bit, and b) remove the (likely minor) overhead on read of looking up the number of patches in a block, which i

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
gsmiller commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775725064 > +1. I recalled that @gsmiller was playing with some SIMD algos for decoding blocks of delta-encoded ints. Even if that is fruitful it'd be tricky to apply it because of the patch

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
msokolov commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775716306 > Hmm, can you elaborate how it can be fully backwards-compatible on with the indexes that have patching? I think the idea is that because we always maintain readers that can

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
gsmiller commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775717147 I like the idea of removing the complexity associated with patching if we're convinced it's the right trade-off (and +1 to the pain of vectorizing with patching going away).

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
Tony-X commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775698779 > It is a lot of complexity, especially to vectorize. +1. I recalled that @gsmiller was playing with some SIMD algos for decoding blocks of delta-encoded ints. Even if that is

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775638986 > Are there any additional corpora that we should also test this with? Maybe the NYC taxis? This is a more sparse, and tiny docs (vs dense and medium/large docs in `enwiki

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
slow-J commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775353027 If we want to remove the patching entirely, which Lucene version (and which Codec) should we implement this in? Would this be a potential change for Lucene 9.9 or perhaps 10.0?

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-22 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1774236604 > Should we just do more tests and start writing indexes without patching? Only a 4 percent disk savings? It is a lot of complexity, especially to vectorize. A runtime option is

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-21 Thread via GitHub
rmuir commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1773935712 Should we just do more tests and start writing indexes without patching? Only a 4 percent disk savings? It is a lot of complexity, especially to vectorize. A runtime option is more ex

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-19 Thread via GitHub
slow-J commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1771275256 >Did you turn off patching for all encoded int[] blocks (docs, freqs, positions)? Yes, I think so. All uses of `pforUtil` in the postingsReader and writer were replaced with t

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1770461719 Another exciting optimization such a "patch-less" encoding could implement is within-block skipping (I believe Tantivy does this). Today, our skipper is forced to align to

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1770457453 > Posting results below The results are impressive! Conjunctive (-like) queries see sizable gains. Did you turn off patching for all encoded `int[]` blocks (docs, fr

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1770452771 That's a neat idea (separate codec that trades off index size for faster search performance). Maybe it could also fold in the [fully in RAM FST term dictionary](https://github.c

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-18 Thread via GitHub
gsmiller commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1769095158 These results are really interesting! As another option, I wonder if it's worth thinking about this problem as a new codec (sandbox module to start?) that biases towards query spee

[I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-18 Thread via GitHub
slow-J opened a new issue, #12696: URL: https://github.com/apache/lucene/issues/12696 ### Description Background: In https://github.com/Tony-X/search-benchmark-game we were comparing performance of Tantivy and Lucene. "One difference between Lucene and Tantivy is Lucene uses the "pat