[ https://issues.apache.org/jira/browse/LUCENE-9663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263831#comment-17263831 ]
Jaison.Bi commented on LUCENE-9663: ----------------------------------- Thanks for the comment, [~sokolov] {quote}if you are running luceneutil tests, could you please also report QPS changes? {quote} Sure, I will. {quote}I'm not clear what the usage of this {{keywords}} field is exactly - is it used for aggregations? {quote} Ya, "keyword" field is used for aggregations mostly. {quote}It would be good to run a faceting test; luceneutil doesn't really have any tests of high-cardinality SSDV aggregations; I think day-of-year is the closest it gets. Maybe you could add one? It's important to test the impact on the query side. {quote} ok, I will learn how to change luceneutil. Meanwhile, I can do another benchmark test using *esrally* as a supplement, it has some aggregation tests. would it be alright? Actually, aggregations are using *global ordinal data* instead of terms dict, terms dict compression will affect the performance of building global oridinal data. Anyway, I will test the impact on query side. > Adding compression to terms dict from SortedSet/Sorted DocValues > ---------------------------------------------------------------- > > Key: LUCENE-9663 > URL: https://issues.apache.org/jira/browse/LUCENE-9663 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Jaison.Bi > Priority: Trivial > > Elasticsearch keyword field uses SortedSet DocValues. In our applications, > “keyword” is the most frequently used field type. > LUCENE-7081 has done prefix-compression for docvalues terms dict. We can do > better by replacing prefix-compression with LZ4. In one of our application, > the dvd files were ~41% smaller with this change(from 1.95 GB to 1.15 GB). > I've done simple tests based on the real application data, comparing the > write/merge time cost, and the on-disk *.dvd file size(after merge into 1 > segment). > || ||Before||After|| > |Write time cost(ms)|591972|618200| > |Merge time cost(ms)|270661|294663| > |*.dvd file size(GB)|1.95|1.15| > This feature is only for the high-cardinality fields. > I'm doing the benchmark test based on luceneutil. Will attach the report and > patch after the test. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org