sherman commented on issue #12203: URL: https://github.com/apache/lucene/issues/12203#issuecomment-1508705081
Yes, I saw that code. However, I thought it is intended for use when merging multiple segments into one, field by field. May I share a simple example to explain why I think so? Let's say we have the following example: ```java var logger = LogManager.getLogger("lucene_index"); var infoStream = new Log4j2InfoStream(logger, Level.toLevel("info")); var out = FSDirectory.open(Path.of("/home/sherman/experiments7/")); var cfg = new IndexWriterConfig(); cfg.setUseCompoundFile(false); var indexWriter = new IndexWriter(out, cfg); for (var i = 0; i < 10000; i++) { var doc = new Document(); doc.add(new SortedSetDocValuesField("field1", new BytesRef("term1_" + i))); if (i < 500) { doc.add(new SortedSetDocValuesField("field2", new BytesRef("term1_" + i))); } indexWriter.addDocument(doc); } indexWriter.close(); ``` In this example, each field has its own term dictionary, because I see two calls of [addTermsDict() method](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java#L544). I think I need to conduct further investigation to create a prototype that can produce the same files as a result. Maybe, I missed something. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org