P.S. Lucene by default limits the maximum field length to 10K tokens, so you have to bump that for large files.
Erick On Jan 16, 2008 11:04 AM, Erick Erickson <[EMAIL PROTECTED]> wrote: > I don't think this is a StringBuilder limitation, but rather your Java > JVM doesn't start with enough memory. i.e. -Xmx. > > In raw Lucene, I've indexed 240M files........ > > Best > Erick > > > On Jan 16, 2008 10:12 AM, David Thibault <[EMAIL PROTECTED]> > wrote: > > > All, > > I just found a thread about this on the mailing list archives because > > I'm > > troubleshooting the same problem. The kicker is that it doesn't take > > such > > large files to kill the StringBuilder. I have discovered the following: > > > > > > By using a text file made up of 3,443,464 bytes or less, I get no > > error. > > > > AT 3,443,465 bytes: > > > > > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > > > > at java.lang.String .<init>(String.java:208) > > > > at java.lang.StringBuilder.toString(StringBuilder.java:431) > > > > at org.junit.Assert.format(Assert.java:321) > > > > at org.junit.ComparisonFailure$ComparisonCompactor.compact ( > > ComparisonFailure.java:80) > > > > at org.junit.ComparisonFailure.getMessage(ComparisonFailure.java > > :37) > > > > at java.lang.Throwable.getLocalizedMessage(Throwable.java:267) > > > > at java.lang.Throwable.toString (Throwable.java:344) > > > > at java.lang.String.valueOf(String.java:2615) > > > > at java.io.PrintWriter.print(PrintWriter.java:546) > > > > at java.io.PrintWriter.println(PrintWriter.java:683) > > > > at java.lang.Throwable.printStackTrace(Throwable.java:510) > > > > at org.apache.tools.ant.util.StringUtils.getStackTrace( > > StringUtils.java:96) > > > > at > > > > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace > > (JUnitTestRunner.java:856) > > > > at > > > > org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.formatError > > (XMLJUnitResultFormatter.java:280) > > > > at > > > > org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.addError > > (XMLJUnitResultFormatter.java:255) > > > > at > > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$4.addError( > > JUnitTestRunner.java:988) > > > > at junit.framework.TestResult.addError(TestResult.java :38) > > > > at junit.framework.JUnit4TestAdapterCache$1.testFailure( > > JUnit4TestAdapterCache.java:51) > > > > at org.junit.runner.notification.RunNotifier$4.notifyListener( > > RunNotifier.java:96) > > > > at org.junit.runner.notification.RunNotifier$SafeNotifier.run( > > RunNotifier.java:37) > > > > at org.junit.runner.notification.RunNotifier.fireTestFailure( > > RunNotifier.java:93) > > > > at org.junit.internal.runners.TestMethodRunner.addFailure ( > > TestMethodRunner.java:104) > > > > at org.junit.internal.runners.TestMethodRunner.runUnprotected( > > TestMethodRunner.java:87) > > > > at org.junit.internal.runners.BeforeAndAfterRunner.runProtected( > > BeforeAndAfterRunner.java:34) > > > > at org.junit.internal.runners.TestMethodRunner.runMethod( > > TestMethodRunner.java:75) > > > > at org.junit.internal.runners.TestMethodRunner.run( > > TestMethodRunner.java :45) > > > > at > > org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod( > > TestClassMethodsRunner.java:71) > > > > at org.junit.internal.runners.TestClassMethodsRunner.run( > > TestClassMethodsRunner.java :35) > > > > at org.junit.internal.runners.TestClassRunner$1.runUnprotected( > > TestClassRunner.java:42) > > > > at org.junit.internal.runners.BeforeAndAfterRunner.runProtected( > > BeforeAndAfterRunner.java:34) > > > > at org.junit.internal.runners.TestClassRunner.run( > > TestClassRunner.java:52) > > > > at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java > > :32) > > > > > > > > AT 3,443,466 byes (or more) : > > > > > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > > > > at java.lang.AbstractStringBuilder.expandCapacity( > > AbstractStringBuilder.java:99) > > > > at java.lang.AbstractStringBuilder.append ( > > AbstractStringBuilder.java > > :393) > > > > at java.lang.StringBuilder.append(StringBuilder.java:120) > > > > at org.junit.Assert.format(Assert.java:321) > > > > at org.junit.ComparisonFailure$ComparisonCompactor.compact ( > > ComparisonFailure.java:80) > > > > at org.junit.ComparisonFailure.getMessage(ComparisonFailure.java > > :37) > > > > at java.lang.Throwable.getLocalizedMessage(Throwable.java:267) > > > > at java.lang.Throwable.toString (Throwable.java:344) > > > > at java.lang.String.valueOf(String.java:2615) > > > > at java.io.PrintWriter.print(PrintWriter.java:546) > > > > at java.io.PrintWriter.println(PrintWriter.java:683) > > > > at java.lang.Throwable.printStackTrace(Throwable.java:510) > > > > at org.apache.tools.ant.util.StringUtils.getStackTrace( > > StringUtils.java:96) > > > > at > > > > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace > > (JUnitTestRunner.java:856) > > > > at > > > > org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.formatError > > (XMLJUnitResultFormatter.java:280) > > > > at > > > > org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.addError > > (XMLJUnitResultFormatter.java:255) > > > > at > > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$4.addError( > > JUnitTestRunner.java:988) > > > > at junit.framework.TestResult.addError(TestResult.java :38) > > > > at junit.framework.JUnit4TestAdapterCache$1.testFailure( > > JUnit4TestAdapterCache.java:51) > > > > at org.junit.runner.notification.RunNotifier$4.notifyListener( > > RunNotifier.java:96) > > > > at org.junit.runner.notification.RunNotifier$SafeNotifier.run( > > RunNotifier.java:37) > > > > at org.junit.runner.notification.RunNotifier.fireTestFailure( > > RunNotifier.java:93) > > > > at org.junit.internal.runners.TestMethodRunner.addFailure ( > > TestMethodRunner.java:104) > > > > at org.junit.internal.runners.TestMethodRunner.runUnprotected( > > TestMethodRunner.java:87) > > > > at org.junit.internal.runners.BeforeAndAfterRunner.runProtected( > > BeforeAndAfterRunner.java:34) > > > > at org.junit.internal.runners.TestMethodRunner.runMethod( > > TestMethodRunner.java:75) > > > > at org.junit.internal.runners.TestMethodRunner.run( > > TestMethodRunner.java :45) > > > > at > > org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod( > > TestClassMethodsRunner.java:71) > > > > at org.junit.internal.runners.TestClassMethodsRunner.run( > > TestClassMethodsRunner.java :35) > > > > at org.junit.internal.runners.TestClassRunner$1.runUnprotected( > > TestClassRunner.java:42) > > > > at org.junit.internal.runners.BeforeAndAfterRunner.runProtected( > > BeforeAndAfterRunner.java:34) > > > > at org.junit.internal.runners.TestClassRunner.run( > > TestClassRunner.java:52) > > > > > > I am writing a filesystem crawler so I need to be able to crawl and > > index > > any size file (within reason). A 3-4MB file is certainly within reason. > > I > > rewrote my code to store the file contents in a file and read/write in > > one > > line at a time. However, when I post the XML file to Solr using > > SimplePostTool, I get another OutOfMemoryError about the java heap space > > > > (thrown from org.xmlpull... again). In any case, does anyone have any > > ideas > > about this? Has anyone posted documents with contents larger than 3.5MBto > > Solr successfully? If so, how was it done? I'm using Solr v1.2. > > > > > > Best, > > > > Dave > > > >