[ 
https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358154#comment-17358154
 ] 

Nam-Quang Tran commented on LUCENE-8118:
----------------------------------------

Here's another stacktrace, but slightly different from the one in the original 
post. My crash happens not with *addDocuments*, but with *addDocument*. Also, 
the suggested workaround of committing every 50k documents does not work for 
me, it still crashes the same way. Committing every 5k documents does not work 
either. Maximum heap size is 16 GB. Lucene version is 8.5.2.

{{org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
        at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:681)
        at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:695)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1591)
        at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1213)
        at 
com.docfetcherpro.model.TreeFolderWrapper$.addDoc(TreeModelWrapper.scala:189)
        at 
com.docfetcherpro.model.TreeNodeWrapper.addDoc(TreeModelWrapper.scala:533)
        at 
com.docfetcherpro.model.TreeNodeWrapper.update(TreeModelWrapper.scala:315)
        at 
com.docfetcherpro.model.TreeUpdate$.updateNodePair(TreeUpdate.scala:333)
        at 
com.docfetcherpro.model.TreeUpdate$.$anonfun$update$6(TreeUpdate.scala:137)
        at 
com.docfetcherpro.model.TreeUpdate$.$anonfun$update$6$adapted(TreeUpdate.scala:133)
        at 
scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at 
scala.collection.generic.TraversableForwarder.foreach(TraversableForwarder.scala:38)
        at 
scala.collection.generic.TraversableForwarder.foreach$(TraversableForwarder.scala:38)
        at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:47)
        at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
        at com.docfetcherpro.model.TreeUpdate$.update(TreeUpdate.scala:133)
        at com.docfetcherpro.model.IndexActor.index1(IndexActor.scala:127)
        at 
com.docfetcherpro.model.IndexActor.$anonfun$index$1(IndexActor.scala:18)
        at com.docfetcherpro.util.MethodActor$$anon$3.run(MethodActor.scala:86)
        at 
com.docfetcherpro.util.MethodActor.com$docfetcherpro$util$MethodActor$$threadLoop(MethodActor.scala:185)
        at com.docfetcherpro.util.MethodActor$$anon$2.run(MethodActor.scala:67)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index -65536 out of bounds 
for length 71428
        at 
org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
        at 
org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:221)
        at 
org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:80)
        at 
org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:121)
        at 
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:178)
        at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:862)
        at 
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:442)
        at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:406)
        at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:250)
        at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:495)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)
        at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1213)
        at 
com.docfetcherpro.model.TreeFolderWrapper$.addDoc(TreeModelWrapper.scala:189)
        at 
com.docfetcherpro.model.TreeNodeWrapper.addDoc(TreeModelWrapper.scala:533)
        at 
com.docfetcherpro.model.TreeNodeWrapper.update(TreeModelWrapper.scala:310)
        ... 15 more}}

> ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-8118
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8118
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 7.2
>         Environment: Debian/Stretch
> java version "1.8.0_144"                                                      
>                                                                               
>                                                    Java(TM) SE Runtime 
> Environment (build 1.8.0_144-b01)                                             
>                                                                               
>                                Java HotSpot(TM) 64-Bit Server VM (build 
> 25.144-b01, mixed mode)
>            Reporter: Laura Dietz
>            Priority: Major
>         Attachments: LUCENE-8118_test.patch
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Indexing a large collection of about 20 million paragraph-sized documents 
> results in an ArrayIndexOutOfBoundsException in 
> org.apache.lucene.index.TermsHashPerField.writeByte  (full stack trace 
> below). 
> The bug is possibly related to issues described in 
> [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html]
>   and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I 
> am not using SOLR, I am directly using Lucene Core.
> The issue can be reproduced using code from  [GitHub 
> trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example]
>  
> - compile with `mvn compile assembly:single`
> - run with `java -cp 
> ./target/treccar-tools-example-0.1-jar-with-dependencies.jar 
> edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir`
> Where paragraphCorpus.cbor is contained in this 
> [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz]
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536   
>                                                                         at 
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198)
>                                                                               
>                                                at 
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224)
>                                                                               
>                                                at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159)
>                                                                               
>                              at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)     
>                                                                               
>                                                 at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786)
>                                                                               
>                                    at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
>                                                                               
>                                       at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)
>                                                                               
>                                    at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281)
>                                                                               
>                            at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451)
>                                                                               
>                                              at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532)    
>                                                                               
>                                                 at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508)
>         at 
> edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to