Re: [I] Try encoding very frequent terms using a dense bitmap [lucene]

2024-03-04 Thread via GitHub
mikemccand commented on issue #13147: URL: https://github.com/apache/lucene/issues/13147#issuecomment-1976830787 I really love this idea! And it's wild that it's net/net reducing `enwiki` index size even at higher than expected cutover to dense encoding criteria! -- This is an automated

Re: [I] Try encoding very frequent terms using a dense bitmap [lucene]

2024-03-04 Thread via GitHub
msokolov commented on issue #13147: URL: https://github.com/apache/lucene/issues/13147#issuecomment-1976829058 oh dear you are correct - I was reading this table backwards! OK, so perhaps this "optimization" is not really helping very much at query time. Still it might be possible to squeez

Re: [I] Try encoding very frequent terms using a dense bitmap [lucene]

2024-03-04 Thread via GitHub
mikemccand commented on issue #13147: URL: https://github.com/apache/lucene/issues/13147#issuecomment-1976825383 > also -- the slowdown for AndHighHighDayTaxoFacets counters the overall trend. I wonder what's going on there. Wait -- this task got faster right? And some others got slo

Re: [I] Try encoding very frequent terms using a dense bitmap [lucene]

2024-03-04 Thread via GitHub
mikemccand commented on issue #13147: URL: https://github.com/apache/lucene/issues/13147#issuecomment-1976819984 > I am seeing CheckIndex gets a handle on an EverythingEnum (and other enums) over a test field indexed with no positions and no freqs. Hmm, does `CheckIndex` pull all the

Re: [I] Try encoding very frequent terms using a dense bitmap [lucene]

2024-03-04 Thread via GitHub
msokolov commented on issue #13147: URL: https://github.com/apache/lucene/issues/13147#issuecomment-1976556731 I tried increasing the usage of dense encoding by enabling it when it would consume up to 3/2 as many bits as packed bits encoding, rather than using it only when it would use up t

Re: [I] Try encoding very frequent terms using a dense bitmap [lucene]

2024-03-03 Thread via GitHub
msokolov commented on issue #13147: URL: https://github.com/apache/lucene/issues/13147#issuecomment-1975387056 The results are kind of noisy -- on re-running `AndHighHighDayTaxoFacets` didn't show much change but there was a ~8.4% regression for `AndHighMedDayTaxoFacets`, and the cast of ch

Re: [I] Try encoding very frequent terms using a dense bitmap [lucene]

2024-03-03 Thread via GitHub
msokolov commented on issue #13147: URL: https://github.com/apache/lucene/issues/13147#issuecomment-1975204564 also -- the slowdown for `AndHighHighDayTaxoFacets` counters the overall trend. I wonder what's going on there. -- This is an automated message from the Apache Git Service. To re

Re: [I] Try encoding very frequent terms using a dense bitmap [lucene]

2024-03-03 Thread via GitHub
msokolov commented on issue #13147: URL: https://github.com/apache/lucene/issues/13147#issuecomment-1975201779 I ran luceneutil over wikimediumall. The index size was slightly reduced: ``` 65200 ../indices/baseline/facets 18923720../indices/baseline/index 18988924

Re: [I] Try encoding very frequent terms using a dense bitmap [lucene]

2024-03-01 Thread via GitHub
msokolov commented on issue #13147: URL: https://github.com/apache/lucene/issues/13147#issuecomment-1973428361 I have some initial implementation working in BlockDocsEnum, but one thing I'm unsure about is whether to provide it in all of the PostingsEnum/ImpactsEnum specializations. I feel

Re: [I] Try encoding very frequent terms using a dense bitmap [lucene]

2024-02-29 Thread via GitHub
msokolov commented on issue #13147: URL: https://github.com/apache/lucene/issues/13147#issuecomment-1971797658 One question I have is how to indicate the dense encoding of a block. I see our blocks start with a single byte that indicates number of packed bits per doc, 0 means the block is t