sherman commented on issue #12203:
URL: https://github.com/apache/lucene/issues/12203#issuecomment-1508705081

   Yes, I saw that code. However, I thought it is intended for use when merging 
multiple segments into one, field by field.
   
   May I share a simple example to explain why I think so? Let's say we have 
the following example:
   
   ```java
   var logger = LogManager.getLogger("lucene_index");
   var infoStream = new Log4j2InfoStream(logger, Level.toLevel("info"));
   
   var out = FSDirectory.open(Path.of("/home/sherman/experiments7/"));
   var cfg = new IndexWriterConfig();
   cfg.setUseCompoundFile(false);
   var indexWriter = new IndexWriter(out, cfg);
   for (var i = 0; i < 10000; i++) {
       var doc = new Document();
       doc.add(new SortedSetDocValuesField("field1", new BytesRef("term1_" + 
i)));
       if (i < 500) {
           doc.add(new SortedSetDocValuesField("field2", new BytesRef("term1_" 
+ i)));
       }
       indexWriter.addDocument(doc);
   }
   indexWriter.close();
   ```
   
   In this example, each field has its own term dictionary, because I see two 
calls of [addTermsDict() 
method](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java#L544).
   
   I think I need to conduct further investigation to create a prototype that 
can produce the same files as a result.
   
   Maybe, I missed something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to