Re: [I] Compression cache of numeric docvalues [lucene]

2025-06-19 Thread via GitHub
easyice commented on issue #14803: URL: https://github.com/apache/lucene/issues/14803#issuecomment-2989753149 @rmuir You are right, it needs to be sorted on the timestamp field. In addition to enabling delta-compression on the timestamp field, index sorting brings another benefit: when sort

Re: [I] Compression cache of numeric docvalues [lucene]

2025-06-19 Thread via GitHub
rmuir commented on issue #14803: URL: https://github.com/apache/lucene/issues/14803#issuecomment-2989679301 @easyice Something like DELTA+FOR shouldn't require any cache, right? To me that is a different problem with other challenges: index would need to be e.g. sorted on timestamp field fo

Re: [I] Compression cache of numeric docvalues [lucene]

2025-06-19 Thread via GitHub
easyice commented on issue #14803: URL: https://github.com/apache/lucene/issues/14803#issuecomment-2989482091 Yeah, I’ve been thinking about this. Elasticsearch now supports a time_series index mode with DELTA + FOR encoding on doc values. In time series or logging scenarios, storage cost u

Re: [I] Compression cache of numeric docvalues [lucene]

2025-06-19 Thread via GitHub
gf2121 commented on issue #14803: URL: https://github.com/apache/lucene/issues/14803#issuecomment-2987035325 OLAP engines splits format to `codec` and `compression`, both configurable. For example, you can: * Use `ForUtil` codec and `LZ4` compression in normal filesystem, cache manag

Re: [I] Compression cache of numeric docvalues [lucene]

2025-06-18 Thread via GitHub
rmuir commented on issue #14803: URL: https://github.com/apache/lucene/issues/14803#issuecomment-2985347971 The advantage of letting a filesystem such as zfs (which was designed to do exactly this), is that it is integrated in the correct place and operating system caches work as expected.

Re: [I] Compression cache of numeric docvalues [lucene]

2025-06-17 Thread via GitHub
gf2121 commented on issue #14803: URL: https://github.com/apache/lucene/issues/14803#issuecomment-2982794733 Thanks for feedback! I agree that a transparent compression filesystem is pretty straightforward and helpful. But i suspect it is hard for user to know when Lucene can take c

Re: [I] Compression cache of numeric docvalues [lucene]

2025-06-17 Thread via GitHub
rmuir commented on issue #14803: URL: https://github.com/apache/lucene/issues/14803#issuecomment-2981632167 IMO: just use a filesystem with this feature such as zfs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[I] Compression cache of numeric docvalues [lucene]

2025-06-17 Thread via GitHub
gf2121 opened a new issue, #14803: URL: https://github.com/apache/lucene/issues/14803 ### Description When benchmarking recently with some OLAP engines (no indexes, no stored fields, only column data), the results showed that they only occupy 50-70% of the storage of `NumericDocvalue