easyice commented on issue #14803:
URL: https://github.com/apache/lucene/issues/14803#issuecomment-2989753149
@rmuir You are right, it needs to be sorted on the timestamp field. In
addition to enabling delta-compression on the timestamp field, index sorting
brings another benefit: when sort
rmuir commented on issue #14803:
URL: https://github.com/apache/lucene/issues/14803#issuecomment-2989679301
@easyice Something like DELTA+FOR shouldn't require any cache, right? To me
that is a different problem with other challenges: index would need to be e.g.
sorted on timestamp field fo
easyice commented on issue #14803:
URL: https://github.com/apache/lucene/issues/14803#issuecomment-2989482091
Yeah, I’ve been thinking about this. Elasticsearch now supports a
time_series index mode with DELTA + FOR encoding on doc values. In time series
or logging scenarios, storage cost u
gf2121 commented on issue #14803:
URL: https://github.com/apache/lucene/issues/14803#issuecomment-2987035325
OLAP engines splits format to `codec` and `compression`, both configurable.
For example, you can:
* Use `ForUtil` codec and `LZ4` compression in normal filesystem, cache
manag
rmuir commented on issue #14803:
URL: https://github.com/apache/lucene/issues/14803#issuecomment-2985347971
The advantage of letting a filesystem such as zfs (which was designed to do
exactly this), is that it is integrated in the correct place and operating
system caches work as expected.
gf2121 commented on issue #14803:
URL: https://github.com/apache/lucene/issues/14803#issuecomment-2982794733
Thanks for feedback!
I agree that a transparent compression filesystem is pretty straightforward
and helpful. But i suspect it is hard for user to know when Lucene can take
c
rmuir commented on issue #14803:
URL: https://github.com/apache/lucene/issues/14803#issuecomment-2981632167
IMO: just use a filesystem with this feature such as zfs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
gf2121 opened a new issue, #14803:
URL: https://github.com/apache/lucene/issues/14803
### Description
When benchmarking recently with some OLAP engines (no indexes, no stored
fields, only column data), the results showed that they only occupy 50-70% of
the storage of `NumericDocvalue