jpountz commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1712166542

   I did. My wikimedium file is sorted by title, which already gives some 
compression compared to random ordering. Disappointedly, recursive graph 
bisection only improved compression of postings (doc) by 1.5%. It significantly 
hurts stored fields though, I suspect it's because the `title` field is stored, 
and stored fields take advantage of splits of the same article being next to 
one another.
   
   | File | before (MB) | after (MB) |
   | - | - | - |
   | terms (tim) | 307 |315 |
   | postings (doc) | 1706 | 1685 |
   | positions (pos) | 2563 | 2540 |
   | points (kdd) | 122 | 126 |
   | doc values (dvd) | 686 | 693 |
   | stored fields (fdt) | 255 | 364 |
   | norms (nvd) | 20 | 20 |
   | total | 5664 |5747 |
   
   It gave me doubts whether the algorithm was correctly implemented in the 
beginning, but the query speedups suggest it is not completely wrong.
   
   I should run on wikibigall too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to