JervenBolleman commented on issue #13373:
URL: https://github.com/apache/lucene/issues/13373#issuecomment-2114206903

   Hi @easyice, I am the original reporter on the mailing list. 
   
   As the code around indexing is a bit abstracted it might be hard to follow. 
What I do have, is the index that failed merging it is however, 173 GB xz 
compressed. I could use luke or a tool like that to extract more information 
for the lucene team.
   
   The fieldtype that we are indexing into is
   ```java
   UNSTORED_POSITIONAL.setOmitNorms(true);
   
UNSTORED_POSITIONAL.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
   UNSTORED_POSITIONAL.setStored(false);
   UNSTORED_POSITIONAL.setTokenized(false);
   UNSTORED_POSITIONAL.freeze();```
   ```
   Then we add fields like so
   
   ```java
   doc.add(new Field("type", value.toLowerCase(Locale.US), UNSTORED_POSITIONAL);
   ```
   
   With over 1,177,800,000 documents in this index, all with the term 
"positional" at least once in their documents.
   On average there are three fields of this type in each document.
   
   So to create local sample data I would just do ;)
   
   ```java
   for (int i=0;i<2_000_000_000;i++){
   {
       Document doc = new Document();
       doc.add(new Field("type", "number", UNSTORED_POSITIONAL);
       if (i % 2 == 0} {
           doc.add(new Field("type", "even", UNSTORED_POSITIONAL);
       } else {
           doc.add(new Field("type", "un-even", UNSTORED_POSITIONAL);
      }
      writer.addDocument(doc);
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to