JervenBolleman commented on issue #13373: URL: https://github.com/apache/lucene/issues/13373#issuecomment-2114206903
Hi @easyice, I am the original reporter on the mailing list. As the code around indexing is a bit abstracted it might be hard to follow. What I do have, is the index that failed merging it is however, 173 GB xz compressed. I could use luke or a tool like that to extract more information for the lucene team. The fieldtype that we are indexing into is ```java UNSTORED_POSITIONAL.setOmitNorms(true); UNSTORED_POSITIONAL.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS); UNSTORED_POSITIONAL.setStored(false); UNSTORED_POSITIONAL.setTokenized(false); UNSTORED_POSITIONAL.freeze();``` ``` Then we add fields like so ```java doc.add(new Field("type", value.toLowerCase(Locale.US), UNSTORED_POSITIONAL); ``` With over 1,177,800,000 documents in this index, all with the term "positional" at least once in their documents. On average there are three fields of this type in each document. So to create local sample data I would just do ;) ```java for (int i=0;i<2_000_000_000;i++){ { Document doc = new Document(); doc.add(new Field("type", "number", UNSTORED_POSITIONAL); if (i % 2 == 0} { doc.add(new Field("type", "even", UNSTORED_POSITIONAL); } else { doc.add(new Field("type", "un-even", UNSTORED_POSITIONAL); } writer.addDocument(doc); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org