[GitHub] [lucene] sherman commented on issue #12203: Scalable merge/compaction of big doc values segments.

via GitHub Fri, 14 Apr 2023 07:55:04 -0700


sherman commented on issue #12203:
URL: https://github.com/apache/lucene/issues/12203#issuecomment-1508705081


   Yes, I saw that code. However, I thought it is intended for use when merging 
multiple segments into one, field by field.
   
   May I share a simple example to explain why I think so? Let's say we have 
the following example:
   
   ```java
   var logger = LogManager.getLogger("lucene_index");
   var infoStream = new Log4j2InfoStream(logger, Level.toLevel("info"));
   
   var out = FSDirectory.open(Path.of("/home/sherman/experiments7/"));
   var cfg = new IndexWriterConfig();
   cfg.setUseCompoundFile(false);
   var indexWriter = new IndexWriter(out, cfg);
   for (var i = 0; i < 10000; i++) {
       var doc = new Document();
       doc.add(new SortedSetDocValuesField("field1", new BytesRef("term1_" + 
i)));
       if (i < 500) {
           doc.add(new SortedSetDocValuesField("field2", new BytesRef("term1_" 
+ i)));
       }
       indexWriter.addDocument(doc);
   }
   indexWriter.close();
   ```
   
   In this example, each field has its own term dictionary, because I see two 
calls of [addTermsDict() 
method](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java#L544).
   
   I think I need to conduct further investigation to create a prototype that 
can produce the same files as a result.
   
   Maybe, I missed something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] sherman commented on issue #12203: Scalable merge/compaction of big doc values segments.

Reply via email to